从做空头寸数据里构建指标

Building an indicator from short volume data

Price of an asset or an ETF is of course the best indicator there is, but unfortunately there is only only so much information contained in it. Some people seem to think that the more indicators (rsi, macd, moving average crossover etc) , the better, but if all of them are based at the same underlying price series, they will all contain a subset of the same limited information contained in the price.
We need more information additional to what is contained the price to make a more informed guess about what is going to happen in the near future. An excellent example of combining all sorts of info to a clever analysis can be found on the The Short Side of Long blog. Producing this kind of analysis requires a great amount of work, for which I simply don‘t have the time as I only trade part-time.
So I built my own ‘market dashboard‘ that automatically collects information for me and presents it in an easily digestible form. In this post I‘m going to show how to build an indicator based on short volume data. This post will illustrate the process of data gathering and processing.

Step 1: Find data source. 
BATS exchange provides daily volume data for free on their site.

Step 2: Get data manually & inspect
Short volume data of the BATS exchange is contained in a text file that is zipped. Each day has its own zip file. After downloading and unzipping the txt file, this is what‘s inside (first several lines):

Date|Symbol|Short Volume|Total Volume|Market Center
20111230|A|26209|71422|Z
20111230|AA|298405|487461|Z
20111230|AACC|300|3120|Z
20111230|AAN|3600|10100|Z
20111230|AAON|1875|6156|Z

....

In total a file contains around 6000 symbols.
This data is needs quite some work before it can be presented in a meaningful manner.

Step 3: Automatically get data
What I really want is not just the data for one day, but a ratio of short volume to total volume for the past several years, and I don‘t really feel like downloading 500+ zip files and copy-pasting them in excel manually.
Luckily, full automation is only a couple of code lines away:
First we need to dynamically create an url from which a file will be downloaded:

?


1

2

3

4

5

6

7

8

9

10

from string import Template

def createUrl(date):

    s = Template(http://www.batstrading.com/market_data/shortsales/$year/$month/$fName-dl?mkt=bzx‘)

    fName = ‘BATSshvol%s.txt.zip‘ % date.strftime(‘%Y%m%d‘)

    

    url = s.substitute(fName=fName, year=date.year, month=‘%02d‘ % date.month)

    

    return url,fName

   

Output:

http://www.batstrading.com/market_data/shortsales/2013/08/BATSshvol20130813.txt.zip-dl?mkt=bzx

Now we can download multiple files at once:

?


1

2

3

4

5

6

7

8

9

10

import urllib

for i,date in enumerate(dates):

    source, fName =  createUrl(date)# create url and file name

    dest = os.path.join(dataDir,fName)

    if not os.path.exists(dest): # don‘t download files that are present

        print ‘Downloading [%i/%i]‘ %(i,len(dates)), source

        urllib.urlretrieve(source, dest)

    else:

        print ‘x‘,

Output:

Downloading [0/657] http://www.batstrading.com/market_data/shortsales/2011/01/BATSshvol20110103.txt.zip-dl?mkt=bzx
Downloading [1/657] http://www.batstrading.com/market_data/shortsales/2011/01/BATSshvol20110104.txt.zip-dl?mkt=bzx
Downloading [2/657] http://www.batstrading.com/market_data/shortsales/2011/01/BATSshvol20110105.txt.zip-dl?mkt=bzx
Downloading [3/657] http://www.batstrading.com/market_data/shortsales/2011/01/BATSshvol20110106.txt.zip-dl?mkt=bzx

Step 4. Parse downloaded files

We can use zip and pandas libraries to parse a single file:

?


1

2

3

4

5

6

7

8

9

10

11

12

13

import datetime as dt

import zipfile

import StringIO

def readZip(fName):

    zipped = zipfile.ZipFile(fName) # open zip file

    lines = zipped.read(zipped.namelist()[0]) # unzip and read first file

    buf = StringIO.StringIO(lines) # create buffer

    df = pd.read_csv(buf,sep=‘|‘,index_col=1,parse_dates=False,dtype={‘Date‘:object,‘Short Volume‘:np.float32,‘Total Volume‘:np.float32}) # parse to table

    s = df[‘Short Volume‘]/df[‘Total Volume‘] # calculate ratio

    s.name = dt.datetime.strptime(df[‘Date‘][-1],‘%Y%m%d‘)

    

    return s

It returns a ratio of Short Volume/Total Volume for all symbols in the zip file:

Symbol
A         0.531976
AA        0.682770
AAIT      0.000000
AAME      0.000000
AAN       0.506451
AAON      0.633841
AAP       0.413083
AAPL      0.642275
AAT       0.263158
AAU       0.494845
AAV       0.407976
AAWW      0.259511
AAXJ      0.334937
AB        0.857143
ABAX      0.812500
...
ZLC       0.192725
ZLCS      0.018182
ZLTQ      0.540341
ZMH       0.413315
ZN        0.266667
ZNGA      0.636890
ZNH       0.125000
ZOLT      0.472636
ZOOM      0.000000
ZQK       0.583743
ZROZ      0.024390
ZSL       0.482461
ZTR       0.584526
ZTS       0.300384
ZUMZ      0.385345
Name: 2013-08-13 00:00:00, Length: 5859, dtype: float32



Step 5: Make a chart:

Now the only thing left is to parse all downloaded files and combine them to a single table and plot the result:

In the figure above I have plotted the average short volume ratio for the past two years. I also could have used a subset of symbols if I wanted to take a look at a specific sector or stock. Quick look at the data gives me an impression that high short volume ratios usually correspond with market bottoms and low ratios seem to be good entry points for a long position.

Starting from here, this short volume ratio can be used as a basis for strategy development.

时间: 2024-10-13 07:50:48

从做空头寸数据里构建指标的相关文章

全球头条:对冲基金增原油做空头寸 美邀请

全球头条:对冲基金增原油做空头寸 美邀请中俄加入TPPVS坚固期权资产_坚固 11月2日,中共中央办公厅.国务院办公厅发布<深化农村改革综合性实施方案>,对当前我国农村经济社会发展改革的总体要求.任务目标,以及改革关键领域作出整体设计.方案指出,全面深化农村改革涉及经济.政治.文化.社会.生态文明和基层党建等领域,涉及农村多种所有制经济主体.当前和今后一个时期,深化农村改革要聚焦农村集体产权制度.农业经营制度.农业支持保护制度.城乡发展一体化体制机制和农村社会治理制度等5大领域. 南北国家上周

大道至简的数据体系构建方法论

大道至简的数据体系构建方法论:两步就让你打造出数据化运营的核心支柱! 很多企业已经意识到,一个系统化的数据体系将是数据化运营的核心支柱.那么,企业该如何清晰地打造自己的数据体系呢?作者将根据多年经验总结用简朴的语言告诉读者一套大道至简的方法论. 本文是“数据化运营方法论系列”文章的第二篇.第一篇<大道至简的数据分析方法论>之后的讲的是“不知道该怎么分析”的问题,本文讲的是“不知道该分析什么”的问题.第一篇文章更微观,站在个人分析师角度,本文更宏观,站在公司层面进行讲解. 与“不知道该怎么分析”

查看Orcale数据里的表是否有变化

由于我们公司一个数据库两个应用在使用,导致一个应用修改了数据库,另一个应用用的缓存而不知道有更新还是原来的结果.原来的处理方式是采用session缓存的方式,用户登出了就清空缓存,这样只需要重新登录一次就得到最新的快照放在缓存中了,但现在新的要求是不登出就要实时刷新改了的内容.其实这种方式最好的处理办法是一个应用改了数据库通知另一个应用去刷新缓存,但是线下应用用vb写的成熟的产品,都是一帮老员工很难让他们去改点东西来适合新应用,都是新应用去套他们的.领导本来说直接不用缓存了,每次去读数据库,我觉

webpack单独构建scss文件与.vue组件里构建scss的一个坑

在入口main.js里构建scss是通过引入模块的方式 import './assets/_reset.scss'; import './assets/_flex.scss'; import './assets/_functions.scss'; 在.vue组件里是单独构建的 <style lang="scss" scoped> img { width: rem(300px); } #product, .gallery, .detail { width: rem(750px

借用百度数据,构建自己的程序

个人软件开发过程中,有些需要的资源与数据,个人没有精力及时维护这些数据,但是怎么能轻松的构建自己的程序呢?其实国内的BAT巨头的好多软件数据都是开放的(提供API接口或者可以分析),下面拿百度壁纸客户端的例子来说,如何利用百度壁纸的数据来构建的自己的壁纸管理程序. 首先安装打开百度壁纸,使用Fiddler2来跟踪其数据获取的接口API,截图如下: 选中的网址就是获取壁纸数据的接口,每个分类的数据都分析记录下来. 第二步跟踪接口返回的数据,发现数据都是json结构的,我们的程序直接解析json就可

数据结构中构建顺序表

顺序表指的是数据元素在内存中连续分配地址的数组,由于指针无法指出数组长度,编译时不会报错,所有用结构体来表示一个顺序表: 顺序表用C语言的表示方法如下: <span style="font-family: Arial, Helvetica, sans-serif;"> #define OK 1</span> #define ERROR -1 typedef int elem_type; typedef int Statue; // int Arrylength;

Mysql插入数据里有中文字符出现Incorrect string value的错误

问题:Mysql插入数据里有中文字符出现Incorrect string value的错误 描述:CMD里直接敲代码插入数据 提示的部分截取为:ERROR 1366 (HY000): Incorrect string value 一般都是编码问题,show variables like 'character%' 查看后,发现所有编码都为UTF8,并没有错. 也有一种可能是CMD黑窗口的文字编码问题,试着先设置客户端命令的编码,再插入果然正确!然后百度搜索客户端编码相关的问题也发现有和我出现过同样

应用Pentaho Data Integration(Kettle) 6.1 进行数据抽取以及指标计算(一、同构数据抽取)

一.概述 本案例是一个小型数据抽取分析类系统,通过抽取数据共享中心中生产管理系统.营销管理系统.计量自动化系统的配网台区(一个台区一个配变)的相关数据进行整合,完成有关台区的50多个字段按照日.月.多月等维度的集中计算展示,其中有17个指标字段需要系统进行自动计算,并于每天对不同部门关注的台区指标进行超标告警,除开这些基本要求之外,用户还要求支持历史报表的查询以及可以对部分计算参数进行配置修改,甚至可以修改计算参数后对历史报表产生影响. 经过数据量分析,配变基本数据方面,生产有300多万个功能位

excel数据通过构建sql语句导入到数据库中

拿到一张excel数据表格,数据格式如下图所示: 2.根据excel数据结果,构建保存excel数据的表结构 CREATE TABLE #tmpExcel(IP VARCHAR(100),IPAddress VARCHAR(100),StartTime VARCHAR(50),EndTime VARCHAR(50),RankFirst VARCHAR(20),RankLast VARCHAR(20),Calculate INT,FirstName VARCHAR(10)) 3.在excel中构建