Commit da79d3d5 authored by iLampard's avatar iLampard

Add keras example in notebooks

parent 2727902e
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* 本例展示如何在alpha-mind中使用深度学习模型。\n",
" - 为方便比较,使用的数据参数与[机器学习模型示例](https://github.com/alpha-miner/alpha-mind/blob/master/notebooks/Example%2012%20-%20Machine%20Learning%20Model%20Prediction.ipynb)一致。\n",
" - 本例以Keras实现深度学习模型,故需要预装Keras。\n",
"\n",
"* 请在环境变量中设置`DB_URI`指向数据库"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [],
"source": [
"%matplotlib inline\n",
"\n",
"import os\n",
"import datetime as dt\n",
"import numpy as np\n",
"import pandas as pd\n",
"from alphamind.api import *\n",
"from PyFin.api import *\n",
"from keras.models import Sequential\n",
"from keras.layers import Dense \n",
"from alphamind.model.modelbase import create_model_base"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 使用Keras构建模型(以线性回归为例)\n",
"\n",
"### 构建Keras的接口模型\n",
"\n",
"- alpha-mind中所有的模型算法都是通过底层接口模型实现的。在接口模型中都有统一的训练与预测方法,即*fit* 和 *predict*。\n",
"- 下面的代码就是创建一个接口类,使用Keras实现线性回归的算法。*fit* 和 *predict* 分别对应拟合与预测功能。"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [],
"source": [
"class LinearRegressionImpl(object):\n",
" def __init__(self, **kwargs):\n",
" self.learning_rate = kwargs.get('learning_rate', 0.01)\n",
" self.training_epochs = kwargs.get('training_epochs', 10)\n",
" self.display_steps = kwargs.get('display_steps', None)\n",
" self.W = None\n",
" self.b = None\n",
"\n",
" def result(self):\n",
" with tf.Session() as sess:\n",
" ret = [sess.run(self.W), sess.run(self.b)]\n",
" return ret\n",
"\n",
" def fit(self, x, y):\n",
" num_samples, num_features = x.shape\n",
"\n",
" output_dim = 1\n",
" input_dim = num_features\n",
" model = Sequential()\n",
" model.add(Dense(output_dim, input_dim=input_dim, kernel_initializer='normal', activation='linear'))\n",
" model.compile(loss='mean_squared_error', optimizer='adam')\n",
" model.fit(x, y, epochs=self.training_epochs, verbose=)\n",
"\n",
" print('Optimization finished ......')\n",
" self.model = model\n",
"\n",
" def predict(self, x):\n",
" ret = self.model.predict(x)\n",
" return np.squeeze(ret)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"为了与alpha-mind的框架对接,还需要定义如下一个wrapper。这个wrapper需要实现*load* 和*save* 两种方法。"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [],
"source": [
"class LinearRegressionKS(create_model_base()):\n",
" def __init__(self, features, fit_target, **kwargs):\n",
" super().__init__(features=features, fit_target=fit_target)\n",
" self.impl = LinearRegressionImpl(**kwargs)\n",
"\n",
" @classmethod\n",
" def load(cls, model_desc: dict):\n",
" return super().load(model_desc)\n",
"\n",
" def save(self):\n",
" model_desc = super().save()\n",
" model_desc['weight'] = self.impl.result()\n",
" return model_desc\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 测试Keras模型\n",
" \n",
"### 数据配置\n",
"------------"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [],
"source": [
"freq = '60b'\n",
"universe = Universe('zz800')\n",
"batch = 1\n",
"neutralized_risk = industry_styles\n",
"risk_model = 'short'\n",
"pre_process = [winsorize_normal, standardize]\n",
"post_process = [standardize]\n",
"warm_start = 3\n",
"data_source = os.environ['DB_URI']\n",
"horizon = map_freq(freq)\n",
"\n",
"engine = SqlEngine(data_source)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"我们使用当期的`roe_q`因子,来尝试预测未来大概一个月以后的`roe_q`因子。\n",
"\n",
"* 训练的股票池为`zz800`;;\n",
"* 因子都经过中性化以及标准化等预处理;\n",
"* 预测模型使用线性模型,以20个工作日为一个时间间隔,用过去4期的数据作为训练用特征。"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [],
"source": [
"\n",
"kernal_feature = 'roe_q'\n",
"regress_features = {kernal_feature: LAST(kernal_feature),\n",
" kernal_feature + '_l1': SHIFT(kernal_feature, 1),\n",
" kernal_feature + '_l2': SHIFT(kernal_feature, 2),\n",
" kernal_feature + '_l3': SHIFT(kernal_feature, 3)\n",
" }\n",
"fit_target = [kernal_feature]\n",
"\n",
"data_meta = DataMeta(freq=freq,\n",
" universe=universe,\n",
" batch=batch,\n",
" neutralized_risk=neutralized_risk,\n",
" risk_model=risk_model,\n",
" pre_process=pre_process,\n",
" post_process=post_process,\n",
" warm_start=warm_start,\n",
" data_source=data_source)\n",
"\n",
"regression_model_ks = LinearRegressionKS(features=regress_features, fit_target=fit_target, training_epochs=400)\n",
"regression_composer_ks = Composer(alpha_model=regression_model_ks, data_meta=data_meta)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 模型对比(sklearn线性回归模型 v.s. keras线性回归模型): IC 系数\n",
"------------------"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### model train and predict\n",
"- train: 给定ref_date, 模型提取ref_date之前的所有训练日期的因子数据,以及ref_date当日的收益率数据进行训练。\n",
"- predict: 给定ref_date, 模型提取ref_date当日的因子数据,预测下一期的收益率数据。\n",
"- ic:给定ref_date, 模型用预测的结果与下一期真实的收益率数据求相关性。"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [],
"source": [
"ref_date = '2017-01-31'\n",
"ref_date = adjustDateByCalendar('china.sse', ref_date).strftime('%Y-%m-%d')"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [],
"source": [
"regression_model_sk = LinearRegression(features=regress_features, fit_target=fit_target)\n",
"regression_composer_sk = Composer(alpha_model=regression_model_sk, data_meta=data_meta)"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"E:\\workarea\\software\\conda3\\lib\\site-packages\\alpha_mind-0.2.0-py3.6-win-amd64.egg\\alphamind\\data\\transformer.py:76: FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead.\n",
" dropna=False)\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Optimization finished ......\n",
"\n",
"Sklearn Regression Testing IC: 0.5464\n",
"Keras Regression Testing IC: 0.5462\n"
]
}
],
"source": [
"regression_composer_sk.train(ref_date)\n",
"regression_composer_ks.train(ref_date)\n",
"print(\"\\nSklearn Regression Testing IC: {0:.4f}\".format(regression_composer_sk.ic(ref_date=ref_date)))\n",
"print(\"Keras Regression Testing IC: {0:.4f}\".format(regression_composer_ks.ic(ref_date=ref_date)))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 回测( simple long short strategy)\n",
"--------------------------"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 策略的初始化\n",
"\n",
"#### 加载数据: fetch_data_package\n",
"- 因子数据\n",
"- 行业数据\n",
"- 风险模型数据\n",
"- 数据的预处理"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"2018-08-09 13:35:24,689 - ALPHA_MIND - INFO - Starting data package fetching ...\n",
"E:\\workarea\\software\\conda3\\lib\\site-packages\\alpha_mind-0.2.0-py3.6-win-amd64.egg\\alphamind\\data\\transformer.py:76: FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead.\n",
" dropna=False)\n",
"2018-08-09 13:35:25,457 - ALPHA_MIND - INFO - factor data loading finished\n",
"2018-08-09 13:36:15,526 - ALPHA_MIND - INFO - fit target data loading finished\n",
"2018-08-09 13:36:15,774 - ALPHA_MIND - INFO - industry data loading finished\n",
"2018-08-09 13:36:15,918 - ALPHA_MIND - INFO - benchmark data loading finished\n",
"2018-08-09 13:36:16,656 - ALPHA_MIND - INFO - data merging finished\n",
"2018-08-09 13:36:16,714 - ALPHA_MIND - INFO - Loading data is finished\n",
"2018-08-09 13:36:16,748 - ALPHA_MIND - INFO - Data processing is finished\n"
]
}
],
"source": [
"start_date = '2011-01-01'\n",
"end_date = '2012-01-01'\n",
"\n",
"data_package2 = fetch_data_package(engine,\n",
" alpha_factors=[kernal_feature],\n",
" start_date=start_date,\n",
" end_date=end_date,\n",
" frequency=freq,\n",
" universe=universe,\n",
" benchmark=906,\n",
" warm_start=warm_start,\n",
" batch=1,\n",
" neutralized_risk=neutralized_risk,\n",
" pre_process=pre_process,\n",
" post_process=post_process)\n",
"\n",
"model_dates = [d.strftime('%Y-%m-%d') for d in list(data_package2['predict']['x'].keys())]\n",
"\n",
"\n",
"industry_name = 'sw_adj'\n",
"industry_level = 1\n",
"\n",
"industry_names = industry_list(industry_name, industry_level)\n",
"industry_total = engine.fetch_industry_matrix_range(universe, dates=model_dates, category=industry_name, level=industry_level)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 运行策略:(sklearn线性回归模型 v.s.keras线性回归模型)"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"2018-08-09 13:36:59,018 - ALPHA_MIND - INFO - 2011-01-04 full re-balance: 799\n",
"E:\\workarea\\software\\conda3\\lib\\site-packages\\alpha_mind-0.2.0-py3.6-win-amd64.egg\\alphamind\\data\\transformer.py:76: FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead.\n",
" dropna=False)\n",
"2018-08-09 13:37:03,020 - ALPHA_MIND - INFO - 2011-01-04 is finished\n",
"2018-08-09 13:37:03,028 - ALPHA_MIND - INFO - 2011-04-07 full re-balance: 798\n",
"2018-08-09 13:37:06,784 - ALPHA_MIND - INFO - 2011-04-07 is finished\n",
"2018-08-09 13:37:06,794 - ALPHA_MIND - INFO - 2011-07-04 full re-balance: 798\n",
"2018-08-09 13:37:10,646 - ALPHA_MIND - INFO - 2011-07-04 is finished\n",
"2018-08-09 13:37:10,655 - ALPHA_MIND - INFO - 2011-09-27 full re-balance: 797\n",
"2018-08-09 13:37:14,539 - ALPHA_MIND - INFO - 2011-09-27 is finished\n",
"2018-08-09 13:37:14,548 - ALPHA_MIND - INFO - 2011-12-27 full re-balance: 798\n",
"2018-08-09 13:37:18,448 - ALPHA_MIND - INFO - 2011-12-27 is finished\n"
]
}
],
"source": [
"rets1 = []\n",
"rets2 = []\n",
"\n",
"\n",
"\n",
"for i, ref_date in enumerate(model_dates):\n",
" py_ref_date = dt.datetime.strptime(ref_date, '%Y-%m-%d')\n",
" industry_matrix = industry_total[industry_total.trade_date == ref_date]\n",
" dx_returns = pd.DataFrame({'dx': data_package2['predict']['y'][py_ref_date].flatten(),\n",
" 'code': data_package2['predict']['code'][py_ref_date].flatten()})\n",
" \n",
" res = pd.merge(dx_returns, industry_matrix, on=['code']).dropna()\n",
" codes = res.code.values.tolist()\n",
" \n",
" alpha_logger.info('{0} full re-balance: {1}'.format(ref_date, len(codes)))\n",
" \n",
" ## sklearn regression model\n",
" \n",
" raw_predict1 = regression_composer_sk.predict(ref_date).loc[codes]\n",
" er1 = raw_predict1.fillna(raw_predict1.median()).values\n",
" \n",
" target_pos1, _ = er_portfolio_analysis(er1,\n",
" res.industry_name.values,\n",
" None,\n",
" None,\n",
" False,\n",
" None,\n",
" method='ls')\n",
" \n",
" target_pos1['code'] = codes\n",
" result1 = pd.merge(target_pos1, dx_returns, on=['code'])\n",
" ret1 = result1.weight.values @ (np.exp(result1.dx.values) - 1.)\n",
" rets1.append(np.log(1. + ret1))\n",
"\n",
" ## keras regression model\n",
" \n",
" raw_predict2 = regression_composer_ks.predict(ref_date).loc[codes]\n",
" er2 = raw_predict2.fillna(raw_predict2.median()).values\n",
" \n",
" target_pos2, _ = er_portfolio_analysis(er2,\n",
" res.industry_name.values,\n",
" None,\n",
" None,\n",
" False,\n",
" None,\n",
" method='ls')\n",
" \n",
" target_pos2['code'] = codes\n",
" result2 = pd.merge(target_pos2, dx_returns, on=['code'])\n",
" ret2 = result2.weight.values @ (np.exp(result2.dx.values) - 1.)\n",
" rets2.append(np.log(1. + ret2))\n",
" ## perfect forcast\n",
" \n",
" alpha_logger.info('{0} is finished'.format(ref_date))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 收益图对比"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0x12194748>"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 864x432 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"ret_df = pd.DataFrame({'sklearn': rets1, 'keras': rets2}, index=model_dates)\n",
"ret_df.loc[advanceDateByCalendar('china.sse', model_dates[-1], freq).strftime('%Y-%m-%d')] = 0.\n",
"ret_df = ret_df.shift(1)\n",
"ret_df.iloc[0] = 0.\n",
"\n",
"ret_df[['sklearn', 'keras']].cumsum().plot(figsize=(12, 6),\n",
" title='Fixed freq rebalanced: {0}'.format(freq))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.5"
},
"varInspector": {
"cols": {
"lenName": 16.0,
"lenType": 16.0,
"lenVar": 40.0
},
"kernels_config": {
"python": {
"delete_cmd_postfix": "",
"delete_cmd_prefix": "del ",
"library": "var_list.py",
"varRefreshCmd": "print(var_dic_list())"
},
"r": {
"delete_cmd_postfix": ") ",
"delete_cmd_prefix": "rm(",
"library": "var_list.r",
"varRefreshCmd": "cat(var_dic_list()) "
}
},
"types_to_exclude": [
"module",
"function",
"builtin_function_or_method",
"instance",
"_Feature"
],
"window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment