一、新建工程
scrapy startproject shop |
二、Items.py文件代码:
import scrapy class title = scrapy.Field() time = scrapy.Field() |
三、shopspider.py文件爬虫代码
# -*-coding:UTF-8-*- import scrapy from shop.items class name = "shop" allowed_domains = start_urls = ["http://news.xxxxx.xxx.cn/hunan/"] def parse(self,response): item item[‘title‘] item[‘time‘] yield |
四、pipelines.py文件代码(打印出内容):
注意:如果在shopspider.py文件中打印出内容则显示的是unicode编码,而在pipelines.py打印出来的信息则是正常的显示内容。
class ShopPipeline(object): def process_item(self, item, spider): count=len(item[‘title‘]) print ‘news count: ‘ ,count for i in range(0,count): print ‘biaoti: ‘+item[‘title‘][i] print ‘shijian: ‘+item[‘time‘][i] return item |
五、爬取显示的结果:
[email protected]:~/shop# scrapy crawl shop --nolog news count: 40 biaoti: xxx建成国家食品安全示范城市 shijian: biaoti: xxxx考试开始报名 …………………… ………………….. |