RNNs在股票价格预测的应用

                RNNs在股票价格预测的应用

前言

RNN和LSTMs在时态数据上表现特别好,这就是为什么他们在语音识别上是有效的。我们通过前25天的开高收低价格,去预测下一时刻的前收盘价。每个时间序列通过一个高斯分布和2层LSTM模型训练数据。文章分为两个版块,外汇价格预测和每日盘中价格预测(30分钟、15分钟、5分钟,等等)。源代码请在文末获取!

外汇预测(用英语描述)

a. Daily Data is pulled from Yahoo’s Data Reader

b. Only the training set is preprocessed because we create a separate test set later on

c. “model_forex” is the model for to build and train.

d. Create separate daily test set by specifying dates which start after your training set ends.

e. You can see “model_forex” is plugged in here for running the prediction

predicted_st = predict_standard(X_test_stock,y_test_stock, model_forex)

盘中预测(用英语描述)

a. Intraday Data is pulled from Google’s API. The second argument is the time in seconds (900 secs = 15 mins) and the third argument it the number of days, the max backtrack day for Googles API is 15 days I believe.

df = get_google_data(INTRA_DAY_TICKER, 900, 150)

b. Preprocess the full set of data and train test split it with “train_test_split_intra”

c. “model_intra” is the model for to build and train.

d. You can see “model_intra” is plugged in here for running the prediction

predicted_intra = predict_intra(X_test_intra,y_test_intra, model_intra)

代码展示

SITE = "http://en.wikipedia.org/wiki/List_of_S%26P_500_companies"

def scrape_list(site):
    hdr = {‘User-Agent‘: ‘Mozilla/5.0‘}
    req = urllib2.Request(site, headers=hdr)
    page = urllib2.urlopen(req)
    soup = BeautifulSoup(page)

    table = soup.find(‘table‘, {‘class‘: ‘wikitable sortable‘})
    sector_tickers = dict()    for row in table.findAll(‘tr‘):
        col = row.findAll(‘td‘)        if len(col) > 0:
            sector = str(col[3].string.strip()).lower().replace(‘ ‘, ‘_‘)
            ticker = str(col[0].string.strip())            if sector not in sector_tickers:
                sector_tickers[sector] = list()
            sector_tickers[sector].append(ticker)    return sector_tickers
sector_tickers = scrape_list(SITE)##Help functions to normalize and denormalize values(省略)
# Sequence Length, or # of days of tradingSEQ_LENGTH = 25

# Number of units in the two hidden (LSTM) layersN_HIDDEN = 256

#Number of attributes used for each trading daynum_attr = 4

#Out of those attribute how many are indicatorsnum_indicators = 0

#Variable to help define how far you want your y to reachREWARD_LAG = 1

#How many dats ahead do you want to predictLOOK_AHEAD = 5

#Window StrideSTRIDE = 1
def _load_data(data, n_prev = SEQ_LENGTH):  
    docX, docY = [], []    for i in range(len(data)-n_prev):
        x,y = norm(data.iloc[i:i+n_prev,:num_attr].as_matrix(),data.iloc[i+n_prev-1,num_attr:].as_matrix())
        docX.append(x)
        docY.append(y)
    alsX = np.array(docX)
    alsY = np.array(docY)    return alsX, alsYdef _load_data_test(data, n_prev = SEQ_LENGTH):  
    docX, docY = [], []
    num_sequences = (len(data)-n_prev+1)/STRIDE    for i in range(num_sequences):
        i = i*STRIDE
        x = (data.iloc[i:i+n_prev,:num_attr].as_matrix())
        y = (data.iloc[i+n_prev-1,num_attr:].as_matrix())        #x,y = norm(data.iloc[i:i+n_prev,:num_attr].as_matrix(),data.iloc[i+n_prev-1,num_attr:].as_matrix())
        docX.append(x)
        docY.append(y)
    alsX = np.array(docX)
    alsY = np.array(docY)    return alsX, alsYdef _load_data_norm(data, n_prev = SEQ_LENGTH):  
    docX, docY = [], []    for i in range(len(data)-n_prev):
        x = np.array((data.iloc[i:i+n_prev,:num_attr].as_matrix()))
        y = np.array((data.iloc[i+n_prev-1,num_attr:].as_matrix()))
(省略)

外汇数据

##Dataset on just single ticker to test performancesdf = data.DataReader(‘EUR=X‘, ‘yahoo‘, datetime(2010,8,1), datetime(2014,8,1))# df[‘RSI‘] = ta.RSI(df.Close.values,timeperiod=14)# _,_, macdhist = ta.MACD(df.Close.values, fastperiod=12, slowperiod=26, signalperiod=9)# df[‘MACDHist‘] = macdhist##Add the predicted coloumn Y, as the last coloumn can be defined however you think is a good representation of a good decision ##Clean the rest of the Data Framey = []for i in range(0,len(df)):    if i >= (len(df)- STRIDE):
        y.append(None)    else:        if (REWARD_LAG > 1):
            val = 0
            for n in range(REWARD_LAG):
                val = val + df[‘Close‘][i+n+1]
            val = val / float(REWARD_LAG)
            y.append(val)        else:
            y.append(df[‘Close‘][i+REWARD_LAG])

df[‘Y_Values‘] =np.asarray(y)
df = df.dropna()#print (df)sliced_df = df.drop([‘Adj Close‘,‘Volume‘] ,axis=1)#print (sliced_df)#(X_train, y_train), (X_test, y_test) = train_test_split(sliced_df)(X_train, y_train) = train_test_split(sliced_df)
print(X_train[0],y_train[0])print (X_train.shape,y_train.shape)
(array([[-0.76244909, -0.75153814, -1.36800657, -1.28695383],
       [-1.28305706, -1.17005084, -1.66649887, -1.50673145],

(省略)

盘中数据

def get_google_data(symbol, period, window):
    url_root = ‘http://www.google.com/finance/getprices?i=‘
    url_root += str(period) + ‘&p=‘ + str(window)
    url_root += ‘d&f=d,o,h,l,c,v&df=cpct&q=‘ + symbol
    print(url_root)
    response = urllib2.urlopen(url_root)
    data = response.read().split(‘\n‘)    #actual data starts at index = 7
    #first line contains full timestamp,
    #every other line is offset of period from timestamp
    parsed_data = []
    anchor_stamp = ‘‘
    end = len(data)    for i in range(7, end):
        cdata = data[i].split(‘,‘)        if ‘a‘ in cdata[0]:            #first one record anchor timestamp
            anchor_stamp = cdata[0].replace(‘a‘, ‘‘)
            cts = int(anchor_stamp)        else:            try:
                coffset = int(cdata[0])
                cts = int(anchor_stamp) + (coffset * period)
                parsed_data.append((dt.datetime.fromtimestamp(float(cts)), float(cdata[1]), float(cdata[2]), float(cdata[3]), float(cdata[4]), float(cdata[5])))            except:                pass # for time zone offsets thrown into data
    df = pd.DataFrame(parsed_data)
    df.columns = [‘ts‘, ‘Open‘, ‘High‘, ‘Low‘, ‘Close‘, ‘Volume‘]
    df.index = df.ts    del df[‘ts‘]    return df

盘中创建单独的数据集

df = get_google_data(‘AAPL‘, 900, 150)#print(df)plt.plot(df[‘Close‘].values[:])
y = []for i in range(0,len(df)):    if i >= (len(df)- REWARD_LAG):
        y.append(None)    else:        if (REWARD_LAG > 1):
            val = 0
            for n in range(REWARD_LAG):
                val = val + df[‘Close‘][i+n+1]
            val = val / float(REWARD_LAG)
            y.append(val)
            print(‘here‘)        else:
            y.append(df[‘Close‘][i+REWARD_LAG])

df[‘Y_Values‘] =np.asarray(y)
df = df.dropna()
sliced_df = df.drop([‘Volume‘] ,axis=1)#print(sliced_df)(X_train, y_train), (X_test, y_test) = train_test_split_intra(sliced_df)#print(X_train[0],y_train[0])print(len(X_train),len(X_test))#print(X_test[0],y_test[0])
(1168, 108)

构建网络结构

model_intra = Sequential() 

model_intra.add(LSTM(N_HIDDEN, return_sequences=True, activation=‘tanh‘, input_shape=(SEQ_LENGTH, num_attr)))#model_intra.add(LSTM(N_HIDDEN, return_sequences=True, activation=‘tanh‘))model_intra.add(LSTM(N_HIDDEN, return_sequences=False, activation=‘tanh‘))

model_intra.add(Dense(1,activation=‘linear‘))
model_intra.compile(loss="mean_squared_error", optimizer=‘adam‘)
model_intra_full = Sequential() 

model_intra_full.add(LSTM(N_HIDDEN, return_sequences=True, activation=‘tanh‘, input_shape=(SEQ_LENGTH, num_attr)))#model_intra_full.add(LSTM(N_HIDDEN, return_sequences=True, activation=‘tanh‘))model_intra_full.add(LSTM(N_HIDDEN, return_sequences=False, activation=‘tanh‘))

model_intra_full.add(Dense(1,activation=‘linear‘))
model_intra_full.compile(loss="mean_squared_error", optimizer=‘adam‘)

model_forex = Sequential() 

model_forex.add(LSTM(N_HIDDEN, return_sequences=True, activation=‘tanh‘, input_shape=(SEQ_LENGTH, num_attr)))#model_forex.add(LSTM(N_HIDDEN, return_sequences=True, activation=‘tanh‘))model_forex.add(LSTM(N_HIDDEN, return_sequences=False, activation=‘tanh‘))

model_forex.add(Dense(1,activation=‘linear‘))
model_forex.compile(loss="mean_squared_error", optimizer

符合模型的模型和参数

print(X_train.shape)
print(y_train.shape)
(1018, 25, 4)
(1018, 1)
model_intra.fit(X_train, y_train, batch_size=50, nb_epoch=
Train on 1156 samples, validate on 12 samples
Epoch 1/150
1156/1156 [==============================] - 1s - loss: 1.9575 - val_loss: 0.5494
Epoch 2/150
1156/1156 [==============================] - 1s - loss: 1.4731 - val_loss: 0.4006

(省略)

辅助绩效评估功能

#Function to normalize the test input then denormalize the result. Calculate the rmse of the predicted values on the test setdef predict(X_test,y_test, myModel):
    predicted = []    for example in X_test:
        x = copy.copy(example)        #print (x)
        x_norm, mn, mx = normalize(x)
        toPred = []
        toPred.append(x_norm)
        add = np.array(toPred)        #Predict for the standard model
        predict_standard = myModel.predict(add)
        pred_st = copy.copy(predict_standard)
        y_real_st = deNormalizeY(pred_st,mn,mx)
        predicted.append(y_real_st[0])        #Predict for the bidirectional model#         predict_bidirectional = myModel.predict([add,add])#         pred_bi = copy.copy(predict_bidirectional)#         y_real_bi = deNormalizeY(pred_bi,mn,mx)#         predicted.append(y_real_bi[0])(省略)df_test = data.DataReader(‘EUR=X‘, ‘yahoo‘, datetime(2014,8,1), datetime(2015,8,1))# df_test[‘RSI‘] = ta.RSI(df_test.Close.values,timeperiod=14)# _,_, macdhist = ta.MACD(df_test.Close.values, fastperiod=12, slowperiod=26, signalperiod=9)# df_test[‘MACDHist‘] = macdhisty = []for i in range(0,len(df_test)):    if i >= (len(df_test)- STRIDE):
        y.append(None)    else:        if (REWARD_LAG > 1):
            val = 0
            for n in range(REWARD_LAG):
                val = val + df_test[‘Close‘][i+n+1]
            val = val / float(REWARD_LAG)
            y.append(val)        else:
            y.append(df_test[‘Close‘][i+REWARD_LAG])

(省略)

MAE for LSTM is: [0.0035823152701196983]
MAE for doing nothing is: [0.0045693478326778786]
RMSE for LSTM is: [0.0050684837061917686]
RMSE for doing nothing is: [0.0061416562709802761]
Net profit for 0.0 threshhold is 245.261025777 making 234 trades
Net profit for 0.001 threshhold is 242.673572498 making 201 trades
(省略)

盘中交易评价和结果

def predict_intra(X_test, y_test, myModel):
    print(len(X_test))
    predicted = []    for example in X_test:        #Transform the training example into gaussing distribution
        x_norm, mean, std = normDist(np.array(example))        #Add examples to array to predict
        toPred = []
        toPred.append(x_norm)
        add = np.array(toPred)        #Predict these examples
        predict_standard = myModel.predict(add)
        pred = copy.copy(predict_standard)
        y_real = deNormDist(pred,mean,std)
        predicted.append(y_real[0])    return predicted

predicted_intra = predict_intra(X_test,y_test, model_intra)
plt.figure(figsize=(20,20))
plt.plot(y_test)
plt.plot(predicted_intra)
plt.show()

MAE and RMSE 评估

sum_error = 0sum_error_donothing = 0for i in range(len(predicted_intra)):    if i>0:
        sum_error = sum_error + abs(predicted_intra[i] - y_test[i])
        sum_error_donothing = sum_error_donothing + abs(predicted_intra[i] - y_test[i-1])
MAE_lstm = sum_error/len(predicted_intra)
MAE_donothing = sum_error_donothing/len(predicted_intra)
print("MAE for LSTM is: " + str(MAE_lstm))
print("MAE for doing nothing is: " + str(MAE_donothing))
MAE for LSTM is: [0.091961468484759237]
MAE for doing nothing is: [0.16699238882416201]
sum_error = 0sum_error_donothing = 0for i in range(len(predicted_intra)):    if i>0:
        sum_error = sum_error + (predicted_intra[i] - y_test[i])**2
        sum_error_donothing = sum_error_donothing + (predicted_intra[i] - y_test[i-1])**2RMSE_lstm = (sum_error/len(predicted_intra))**(1.0/2.0)
RMSE_donothing = (sum_error_donothing/len(predicted_intra))**(1.0/2.0)
print("RMSE for LSTM is: " + str(RMSE_lstm))
print("RMSE for doing nothing is: " + str(RMSE_dono
RMSE for LSTM is: [0.15719269057322682]
RMSE for doing nothing is: [0.23207816758496383]

Policy的功能评价

net_profits = []
protits_per_trade = []for i in range(50):
    THRESH = i/10000.0
    LOT_SIZE = 100
    net_profit = 0
    num_trades = 0
    for i in range(len(predicted_intra)):        if i>1:
            predicted_change = ((predicted_intra[i] / y_test[i-1]) - 1)            #print(predicted_change)
            actual_change = (predicted_intra[i] -  y_test[i])*LOT_SIZE            if predicted_change >= THRESH:                #print("Buy")
                net_profit = net_profit + actual_change
                num_trades = num_trades + 1(省略)
(array([327.67074597699519], dtype=object), 106)
(array([322.81673063817777], dtype=object), 103)
plt.plot(net_profits)
plt.show()

plt.plot(protits_per_trade)
plt.show()

其他

buyTotal = 0sellTotal = 0correct = 0sellCorrect = 0buyCorrect = 0for i in range(len(predicted_st)):
    realAnswer = y_test_stock[i][0][0]    if predicted_st[i][1] > predicted_st[i][0]:
        predicted = 0 #Buy
    else:
        predicted = 1 #Sell

    if realAnswer == 0:        ##This is where the actual answer is Buy:Up:[0,1]:0
        buyTotal = buyTotal + 1
        if predicted == realAnswer:
            buyCorrect = buyCorrect + 1
            correct = correct + 1(省略)
(349, 730, 0.4780821917808219)
(210, 382, 0.5497382198952879)
(139, 348, 0.3994252873563218)
0.523287671233
0.476712328767
MMM
AYI
ALK
ALLE(省略)

创造基线RMSE

totalCorrect = 0total = 0for stock in testing_dataframes[:50]:

    X_test_stock, y_test_stock = _load_data_test(stock[1])
    predicted_st = predict_standard(X_test_stock,y_test_stock, model)

    buyTotal = 0
    sellTotal = 0
    correct = 0
    sellCorrect = 0
    buyCorrect = 0(省略)
#Count the number of positive and the number of negative calls you got righttotalCorrect = 0total = 0buyTotal = 0sellTotal = 0correct = 0sellCorrect = 0buyCorrect = 0for i in range(len(predicted_st)):
    realAnswer = y_test_stock[i][0][0]    if predicted_st[i][1] > predicted_st[i][0]:
        predicted = 0 #Buy(省略)
(104, 235, 0.4425531914893617)
(104, 104, 1.0)
(0, 131, 0.0)
0.442553191489
0.557446808511
from sklearn.metrics import f1_score##Calculate F1 scoreactual = []
result = []for y in y_test_merged:    if y[0] == 0:
        actual.append(0)    else:
        actual.append(1)for y in predicted_st:    if y[1] > y[0]:
        result.append(0)    else:
        result.append(1)
score = f1_score(actual,result,average=‘weighted‘,pos_label=1)
print(score)
0.498192044998
#Same percentage calculations but with a thresholdTHRESH = 0.1totalCorrect = 0total = 0noDecision = 0buyTotal = 0sellTotal = 0correct = 0sellCorrect = 0buyCorrect = 0for i in range(len(predicted_st)):
    realAnswer = y_test_merged[i][0]    if predicted_st[i][1] - THRESH > .5:
        predicted = 0 #Buy
    elif predicted_st[i][0] - THRESH > .5:
        predicted = 1 #Sell
    else:
        predicted = 2 #Pass, do not count towards percentages because you make no decision if .6>x>.4(省略)
(347, 750, 0.46266666666666667)
(190, 351, 0.5413105413105413)
(157, 399, 0.39348370927318294)
If you just predicted all Up 0.468
If you just predicted all Down 0.532
thresholds = []
totalAcc = []
positiveAcc = []
negativeAcc = []##Graph this graph of the threshold vs accuracyfor i in range(10):
    thresh = i/100.0
    totalCorrect = 0
    total = 0
    noDecision = 0
    buyTotal = 0
    sellTotal = 0
    correct = 0
    sellCorrect = 0
    buyCorrect = 0
    for i in range(len(predicted_st)):
        realAnswer = y_test_merged[i][0]        if predicted_st[i][1] - thresh > .5:
            predicted = 0 #Buy
        elif predicted_st[i][0] - thresh > .5:
            predicted = 1 #Sell(省略)    
plt.plot(totalAcc)
plt.show()

plt.plot(positiveAcc)
plt.show()

plt.plot(negativeAcc)
plt.show()

通过测试表明,每日价格预测,外汇有更好的表现,比传统股票。因为他有更少的噪音。

时间: 2024-10-25 08:00:12

RNNs在股票价格预测的应用的相关文章

股票涨跌和买卖预测计算公式

1. 次日买入价位的计算公式:买入预测=今日开盘+(今日收盘-今日开盘)/2 注:根据这种方式计算买入价的前提是当天股价出现上涨,今日K线以阳线报收. 举例:假设一只股票开盘价是10元,收盘价是10.8元,那么,次日的买入价计算应为10+(10.8-10)/2=10.40元.由于该股当天涨幅较大,达到8%,因此次日存在回探的可能,不管回探的结果是形成下影线还是形成光脚阴线,其下跌的幅度往往是开盘价与收盘价之间的一半稍多一些,因此,次日如果在10.45元至10.50元挂单买入,成交的可能性较大.

关于机器学习和深度学习的资料

声明:转来的,原文出处:http://blog.csdn.net/achaoluo007/article/details/43564321 编者按:本文收集了百来篇关于机器学习和深度学习的资料,含各种文档,视频,源码等.而且原文也会不定期的更新,望看到文章的朋友能够学到更多. <Brief History of Machine Learning> 介绍:这是一篇介绍机器学习历史的文章,介绍很全面,从感知机.神经网络.决策树.SVM.Adaboost 到随机森林.Deep Learning. &

提升大数据数据分析性能的方法及技术(二)

上部分链接 致谢:因为我的文章之前是在word中写的,贴过来很多用mathtype编辑的公式无法贴过来,之前也没有经验. 参考http://www.cnblogs.com/haore147/p/3629895.html,一文完成公式的迁移. 同时,说一句,word中用mathtype写的公式用ALT+\可以转换成对应的latex语法公式. 5 数据流过滤技术 信息大爆炸时代的到来使得针对数据进行深层次的挖掘成为数据处理的核心任务[21].但是在上面已经提到了,源数据的来源和数据的组成格式都是各种

隐马可夫模型:探索看不到的世界的数学工具

这篇要讨论的可不是哲学议题,而是希望以一个“数学工具”的角度来看隐马可夫模型(Hidden Markov Model, HMM)是什么,它的背后假设.长处与限制,以理解这样的工具可以拿来做什么用,而不是只与特定的应用绑在一起. 隐马可夫模型?隐藏的马可夫模型? 隐马可夫模型是机器学习(Machine Learning)领域中常常用到的理论模型,从语音识别(Speech Recognition).手势辨识(gesture recognition),到生物信息学(Bioinformatics)里的种

PyCon 2014:机器学习应用占据Python的半壁江山

来自http://www.infoq.com/cn/news/2014/07/pycon-2014 今年的PyCon于4月9日在加拿大蒙特利尔召开,凭借快速的原型实现能力, Python在学术界得到了广泛应用.最近其官方网站发布了大会教程部分的视频和幻灯片,其中有很多(接近一半数量)跟数据挖掘和机器学习相关的内容,本文对此逐一介绍. 如何形式化一个科学问题然后用Python进行分析 目前有很多很强大Python数据挖掘库,比如Python语言的交互开发环境IPython,Python机器学习库S

【Deep Learning Nanodegree Foundation笔记】第 1 课:课程计划

第一周 机器学习的类型,以及何时使用机器学习 我们将首先简单介绍线性回归和机器学习.这将让你熟悉这些领域的常用术语,你需要了解的技术进展,并了解深度学习在更大的机器学习背景中的位置. 直播:线性回归 WEEK 1Types of Machine Learning and when to use Machine LearningLive session: Linear regression from scratch 第二周 神经网络的架构和类型 然后,我们将深入探索神经网络,并了解各种规范架构,如

毕业设计课题大全

标题: 交换机端口数据流量信息采集方法评述(1人) 目的: 本题目意在通过检索"截获交换机封包"的相关资料,研究对交换机端口流量进行实时监测的手段和方法及实现的原理. 内容:论文要求分析交换机内部封包的交换和计数原理,进而探查如何通过局域网络监测交换机端口的实时流量信息. 参考资料:思科CCNA和CCNP认证教材,及相关参考资料 计算机专业毕业设计题目大全 http://blog.renren.com/share/250527820/12343150865 重点考虑: 5.电子邮件服务

Python学习路径(整合版)

PS:内容来源于网络 一.简介         Python是一种面向对象.直译式计算机程序设计语言,由Guido van Rossum于1989年底发明.由于他简单.易学.免费开源.可移植性.可扩展性等特点,Python又被称之为胶水语言.下图为主要程序语言近年来的流行趋势,Python受欢迎程度扶摇直上. 二.数据分析路径 由于Python拥有非常丰富的库,使其在数据分析领域也有广泛的应用.由于Python本身有十分广泛的应用,本期Python数据分析路线图主要从数据分析从业人员的角度讲述P

机器学习&amp;深度学习资料分享

感谢:https://github.com/ty4z2008/Qix/blob/master/dl.md <Brief History of Machine Learning> 介绍:这是一篇介绍机器学习历史的文章,介绍很全面,从感知机.神经网络.决策树.SVM.Adaboost 到随机森林.Deep Learning. <Deep Learning in Neural Networks: An Overview> 介绍:这是瑞士人工智能实验室 Jurgen Schmidhuber