前段时间股票大火,不少朋友都赚了不少钱,作为一个猿类,除了成天沉浸在代码之中,还要养活老婆孩子,如果能炒炒股,赚点奶粉、尿不湿也是极好的。可惜时不再来,牛市已过,这2个月入市的,基本上被套的连家都不认识了。这也难怪,据说玩股票的参与方一共有4个,国家、券商、专业投资者和散户。这国家是收印花税的,1‰不还价,上半年就收了1380亿,股市7月初快崩盘的时候,国家也才投了1200亿救市,还是让券商出的钱,可见这国家是稳赚不赔的;券商是收佣金的,按照0.25‰-0.3‰收取,最坑爹的是最低5块钱起步,也就是说,如果你一次买卖股票少于2万元,那万分之2.5和万分之3也是没啥区别的,所以券商也是稳赚不赔的;专业投资者不管怎么说,收益都会比我们散户要强,所以你说这股市总要有人赔钱的,那肯定要从我们广大的散户身上榨取了。但是换句话说,中国的股市很大程度上是博傻,我们成不了一群羊中最快的,但是只要跑得过最慢的,就有活下来的机会,趁着现在练练手,以后牛市的时候就有经验了。于是我怀着大无畏的信念,冲入了股市,开户过程按下不表,网上多的是。
当我打开股票交易软件的客户端时,花花绿绿的股票一大堆,作为一个标准的三无人员(无内幕消息、无财务知识、无炒股知识),买股票就和随机挑选差不多,不过这随机挑选,也要看看自己的人品,毕竟作为我们来说,这工资可都是一行一行代码敲出的,尽管我们猿类钱多话少死得早,不过也不能白白的被贪得无厌的家伙给掠夺走。炒股千万不能炒成股东,炒房千万不能炒成房东,这股票还是要分析分析的。作为进化了40亿年才出现的高等级猿类,我们首先要分析分析这个股市,摸摸股市的底,这对于我们来说不难。不过用网上的股票交易软件,这也太没有逼格了,而且这些数据的分析方法也不符合我们猿类的思维,最好的办法就是直接抓取股票数据,在Google上百度了一番,发现有数据导出软件,可惜都是收费的,作为到处都能找到的免费数据,还要收费,简直太丧心病狂了。于是撸开袖子,自己从网上抓。开源的有一个用Python写的TuShare(地址),不过我自己用Python觉得很不习惯,还要学一大堆Pandas API,与其费那力气,不如自己搞一个。
首先需要建立一个数据库,名称就为Stock吧,数据库用的是SQL Server 2008,如下:
CREATE TABLE [dbo].[DayData]( [ID] [int] IDENTITY(1,1) NOT NULL, [日期] [smalldatetime] NULL, [股票代码] [nchar](10) NULL, [股票名称] [nchar](10) NULL, [收盘价] [numeric](6, 2) NULL, [最高价] [numeric](6, 2) NULL, [最低价] [numeric](6, 2) NULL, [开盘价] [numeric](6, 2) NULL, [前收盘] [numeric](6, 2) NULL, [涨跌额] [numeric](6, 2) NULL, [涨跌幅] [numeric](8, 4) NULL, [换手率] [numeric](8, 4) NULL, [成交量] [numeric](18, 0) NULL, [成交金额] [numeric](18, 2) NULL, [总市值] [numeric](18, 2) NULL, [流通市值] [numeric](18, 2) NULL, [复权因子] [numeric](8, 3) NULL ) ON [PRIMARY]
CREATE TABLE [dbo].[StockIndex]( [ID] [int] IDENTITY(1,1) NOT NULL, [日期] [smalldatetime] NULL, [股票代码] [nchar](10) NULL, [股票名称] [nchar](10) NULL, [收盘价] [numeric](8, 3) NULL, [最高价] [numeric](8, 3) NULL, [最低价] [numeric](8, 3) NULL, [开盘价] [numeric](8, 3) NULL, [前收盘] [numeric](8, 3) NULL, [涨跌额] [numeric](8, 3) NULL, [涨跌幅] [numeric](8, 4) NULL, [成交量] [numeric](18, 0) NULL, [成交金额] [numeric](18, 2) NULL ) ON [PRIMARY]
CREATE TABLE [dbo].[StockList]( [StockCode] [nchar](6) NOT NULL, [StockName] [nchar](10) NULL, [Url] [nvarchar](200) NULL, [SylBase] [numeric](14, 9) NULL, [Jzc] [numeric](8, 4) NULL, [PublicDate] [smalldatetime] NULL, CONSTRAINT [PK_StockList] PRIMARY KEY CLUSTERED ( [StockCode] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY]
StockList表为所有股票的代码表, 这个表的StockCode就是股票代码了,StockName是股票名称,Url为我们更新日记录的数据,SylBase为市盈率的基准值,Jzc是净资产,PublicDate是上市日期
StockIndex为指数的表,DayData是每天的记录。
接下来首先要抓取所有的股票了,由于这个是一次性的,在此不多费言,需要的可以联系我。然后是抓取日数据,这个是从网易上抓取的。这个笔记不是记录怎么写爬虫的,在此也不多言。直接贴代码:
using System; using System.Collections.Generic; using System.IO; using System.Net; using System.Data.SqlClient; namespace 网易历史数据 { class Program { public static string connString = "Data Source=.;Initial Catalog=Stock;Integrated Security=True"; static void Main(string[] args) { var dict = new Dictionary<String, String>(); var date = DateTime.Now.ToString("yyyyMMdd"); bool isClear = false; while(isClear==false) { using (SqlConnection connection = new SqlConnection(connString)) { dict.Clear(); connection.Open(); String SQL = "SELECT StockCode,Url FROM DataLost"; SqlCommand CMD = new SqlCommand(SQL, connection); SqlDataReader reader = CMD.ExecuteReader(); if (!reader.HasRows) isClear = true; while (reader.Read()) { var stockCode = Convert.ToString(reader["StockCode"]); var url = Convert.ToString(reader["Url"].ToString()); dict.Add(stockCode, url); } GetDataAndInsertBd(dict, date); } } } /// <summary> /// /// </summary> /// <param name="dict"></param> /// <param name="date"></param> private static void GetDataAndInsertBd(Dictionary<string, string> dict, string date) { foreach (var item in dict) { var stockCode = item.Key; var url = item.Value; var webClient = new WebClient(); using (SqlConnection conn = new SqlConnection(connString)) { conn.Open(); url = String.Format(url, date, date); var csvString = webClient.DownloadString(url); using (StringReader sr = new StringReader(csvString)) { sr.ReadLine(); while (sr.Peek() > 0) { String line = sr.ReadLine(); if (line.Length >= 2) { var array = line.Split(new char[] { ‘,‘ }); Object DATE = array[0].Trim(); Object CODE = array[1].Replace("‘", ""); Object NAME = array[2].Replace(" ", "").Replace(" ", ""); Object TCLOSE = array[3]; if (array[3] == "0.0" || array[4] == "0.0") TCLOSE = DBNull.Value; Object HIGH = array[4]; if (array[4] == "0.0" || array[4] == "0.0") HIGH = DBNull.Value; Object LOW = array[5]; if (array[5] == "0.0" || array[4] == "0.0") LOW = DBNull.Value; Object TOPEN = array[6]; if (array[6] == "0.0" || array[4] == "0.0") TOPEN = DBNull.Value; Object LCLOSE = array[7]; Object CHG = array[8]; if (array[8] == "None") CHG = DBNull.Value; Object PCHG = array[9]; if (array[9] == "None") PCHG = DBNull.Value; if (array.Length == 15) { String insertSQL = "INSERT INTO Stock.dbo.DayData (日期,股票代码,股票名称,收盘价,最高价,最低价,开盘价,前收盘,涨跌额,涨跌幅,换手率,成交量,成交金额,总市值,流通市值) VALUES (@DATE,@CODE,@NAME,@TCLOSE,@HIGH,@LOW,@TOPEN,@LCLOSE,@CHG,@PCHG,@TURNOVER,@VOTURNOVER,@VATURNOVER,@TCAP,@MCAP)"; SqlCommand insertCMD = new SqlCommand(insertSQL, conn); Object TURNOVER = array[10]; if (array[10] == "0" || array[10] == "None") TURNOVER = DBNull.Value; Object VOTURNOVER = array[11]; if (array[11] == "0" || array[11] == "None") VOTURNOVER = DBNull.Value; Object VATURNOVER = array[12]; if (array[12] == "0" || array[12] == "None") VATURNOVER = DBNull.Value; Object TCAP = Convert.ToDouble(array[13]); Object MCAP = Convert.ToDouble(array[14]); insertCMD.Parameters.AddWithValue("@DATE", DATE); insertCMD.Parameters.AddWithValue("@CODE", CODE); insertCMD.Parameters.AddWithValue("@NAME", NAME); insertCMD.Parameters.AddWithValue("@TCLOSE", TCLOSE); insertCMD.Parameters.AddWithValue("@HIGH", HIGH); insertCMD.Parameters.AddWithValue("@LOW", LOW); insertCMD.Parameters.AddWithValue("@TOPEN", TOPEN); insertCMD.Parameters.AddWithValue("@LCLOSE", LCLOSE); insertCMD.Parameters.AddWithValue("@CHG", CHG); insertCMD.Parameters.AddWithValue("@PCHG", PCHG); insertCMD.Parameters.AddWithValue("@TURNOVER", TURNOVER); insertCMD.Parameters.AddWithValue("@VOTURNOVER", VOTURNOVER); insertCMD.Parameters.AddWithValue("@VATURNOVER", VATURNOVER); insertCMD.Parameters.AddWithValue("@TCAP", TCAP); insertCMD.Parameters.AddWithValue("@MCAP", MCAP); insertCMD.ExecuteNonQuery(); } else { String insertSQL = "INSERT INTO Stock.dbo.StockIndex (日期,股票代码,股票名称,收盘价,最高价,最低价,开盘价,前收盘,涨跌额,涨跌幅,成交量,成交金额) VALUES (@DATE,@CODE,@NAME,@TCLOSE,@HIGH,@LOW,@TOPEN,@LCLOSE,@CHG,@PCHG,@VOTURNOVER,@VATURNOVER)"; SqlCommand insertCMD = new SqlCommand(insertSQL, conn); Object VOTURNOVER = array[10]; if (array[10] == "None") VOTURNOVER = DBNull.Value; Object VATURNOVER = array[11]; if (array[11] == "None") VATURNOVER = DBNull.Value; insertCMD.Parameters.AddWithValue("@DATE", DATE); insertCMD.Parameters.AddWithValue("@CODE", CODE); insertCMD.Parameters.AddWithValue("@NAME", NAME); insertCMD.Parameters.AddWithValue("@TCLOSE", Convert.ToDouble(TCLOSE)); insertCMD.Parameters.AddWithValue("@HIGH", Convert.ToDouble(HIGH)); insertCMD.Parameters.AddWithValue("@LOW", Convert.ToDouble(LOW)); insertCMD.Parameters.AddWithValue("@TOPEN", Convert.ToDouble(TOPEN)); insertCMD.Parameters.AddWithValue("@LCLOSE", Convert.ToDouble(LCLOSE)); insertCMD.Parameters.AddWithValue("@CHG", Convert.ToDouble(CHG)); insertCMD.Parameters.AddWithValue("@PCHG", Convert.ToDouble(PCHG)); try { insertCMD.Parameters.AddWithValue("@VOTURNOVER", Convert.ToDouble(VOTURNOVER)); } catch (Exception) { insertCMD.Parameters.AddWithValue("@VOTURNOVER", VOTURNOVER); } try { insertCMD.Parameters.AddWithValue("@VATURNOVER", Convert.ToDouble(VATURNOVER)); } catch (Exception) { insertCMD.Parameters.AddWithValue("@VATURNOVER", VATURNOVER); } insertCMD.ExecuteNonQuery(); } } Console.WriteLine(line); } } } } } } }
好了,每天下午5点钟就可以抓取数据了。