美股市场某研究项目

该项目研究了1987年至2018年美国三大股票市场中所有上市公司的财务报表信息和股票价格信息,包括大约500万个美国市场的公司样本。主体分为两个阶段:第一,搭建神经网络模型,根据每个公司的财务报表信息(包括构造每个公司的各种财务比率和F-Score、M-Score、Z-Score财务模型)预测该公司下一年的收益是否增加,准确率约为60%;第二,根据收益增长的预测结果,买进下一年收益增加的公司股票,卖出下一年收益下降的公司股票,形成一个多空股票组合(long-short portfolio),该组合的初始成本为零,但是年收益率显著为正(t=2.18),并通过Fama-French三因子模型得到了显著为正的超额收益(t=2.3)。

该项目的主程序代码如下所示:

def main():

    print("\n*******************************************************************************************************")
    print("******Transforming raw fundamentals into features data to be feeded into machine learning model.*******")
    print("*******************************************************************************************************\n")

    # Transform raw fundamentals into features data to be feeded into machine learning model.
    featuresdf = gen_featuresdf()

    print("This program has taken {}.".format(datetime.now()-S.start_time))

    print("\n*******************************************************************************************************")
    print("********************************Generating portfolios' information*************************************")
    print("*******************************************************************************************************\n")

    # There is a time concern here.*******************************************************************************
    # For example, training data is from 2000 to 2009, then the sample need to be predicted is 2010, furthermore, the corresponding stock data should be in 2011!
    # Thus, a complete sample spans two years, one year for fundamentals the following year for examing if the earnings grows!

    msf = database.get_msf(
        featuresdf['permnos'], year_span=S.year_span, local_data=S.local_data)

    fama = database.get_fama()

    msf['year'] = msf.date.apply(lambda x: x.year)

    results = to_portfolios(sic_span=S.sic_range,
                            fyear_span=S.MLyear_span,msf=msf,data=featuresdf)  # closed interval

    Portfolios = results[0]
    test_accuracies = results[1]
    print("This program has taken {}.".format(datetime.now()-S.start_time))

    print("\n*******************************************************************************************************")
    print("**************************Parsing the portfolio information into monthly returns***********************")
    print("*******************************************************************************************************\n")

    Mret = to_Mret(port=Portfolios)

    print("This program has taken {}.".format(datetime.now()-S.start_time))

    print("\n*******************************************************************************************************")
    print("**********************************Analyzing monthly returns********************************************")
    print("*******************************************************************************************************\n")

    ana = to_analysis(mport=Mret,fama=fama)
    print("\nThis program has taken {}.".format(datetime.now()-S.start_time))

    print("\n------------------------------------------------------------------------------------------------------")
    print("Done!")


if __name__ == "__main__":
    main()