连续几天做实验,有种一波未平一波又起的赶脚,今天也是,想想还是记录一下下吧。
1.
首先,跑GPS(Graph Processing System)的时候,因为输入文件增大导致昨天运行正常的流程出问题,显示heap
size~~~~。所以毛病锁定在输入规模上面!也就是所谓的scalablity
issue。由于GPS的资料非常的少,基本没有。我大概搜了一下,有人说要增大堆栈空间,巴拉巴拉,都不好使(http://stackoverflow.com/questions/1596009/java-lang-outofmemoryerror-java-heap-space)。后来去了GPS的讨论组,GPS之父告诉我们,应该这样做:
参见:https://groups.google.com/forum/#!topic/stanfordgpsusers/62FeHpZijU0
Hi, Semih.
I am using GPS to process
twitter graph. I got a critical problem. Twitter has a very skewed graph. Some
users may have more than 100k followers. Such vertex will trigger huge number
of messages.
GPS generates "java heap overflow" exceptions. I thought that
was because the messages are buffered in the memory before sending
out.
I don‘t think this is really do to the skew in the data. 100k is still a
very small amount of messages. It just means that, that vertex will likely
generate or receive 100k * 8 = 800k bytes = 0.8 MB more data. How much memory
are you giving to your java virtual machine? How much memory do your machines
have. I have about 4GB on each of my machines so I give the following flags to
my java scripts: -Xmx3000M. You should change the script file here: https://www.assembla.com/code/phd-projects/subversion/nodes/gps/trunk/scripts/start_gps_node.sh?rev=95 There
are two jvm flags XMX_SIZE and
XMS_SIZE, which you should adjust.
于是乎我就跑过去修改了一下,分别增大XMX_SIZE
和 XMS_SIZE,然后又出现新的问题:连接不上端口,于是重新修改端口号,在传到hdfs上面去。
好了!
2.
第二个问题,下午跑RDFlib的时候(我主要用它来使得解析RDF文件,使其变成图数据)。一开始使用 easy_install
剁手~~~其实,进去文件目录后,prthon
rdflib来安装的时候总是显示有错误(可能这几天网络有问题,梯子不够长,嘿嘿)。后来急了,直接下载源文件,本地手动安装!可是找了半天,居然没有找到怎么手动安装!!
setup.py install就可以了。
安装好了,尝试跑数据了,小数据跑得呼呼爽,后来准备跑真实数据了,300M 左右,结果就怂了。
No handlers could be found for logger "rdflib.term"
后来搜了一下,这个网址里面有解决方案
http://stackoverflow.com/questions/17393664/no-handlers-could-be-found-for-logger-rdflib-term
import logging
import rdfliblogging.basicConfig()# now load your graph
g = rdflib.Graph()
g.load("life_the_universe_everything.rdf")
3.
跑rdflib的时候,遇到问题:
WARNING:rdflib.term:http://www.w3.org/1999/02/22-rdf-syntax-ns# first does not look like a valid URI, trying to serialize this will break.
然后会直接导致不能运行,郁闷啊!况且我的代码怎么能有Warning!于是乎去改,搜索了一下啊,有人说把URL里面的空格岁百纳用什么替代就好了,瞬间就笑开颜了,哈哈,果然好用!
当然了,我这么操作是因为我不在意具体的URL是什么,我只是把它当作一串字符而已!
毕业设计-6-3