1. 首先导入工具from scrapy.selector import Selector 2. selectors的使用实例:response.selector.xpath(‘//span/text()‘).extract() (1)选择title标签中text的文本内容 response.selector.xpath(‘//title/text()‘) 提供两个更简单的方法 response.xpath(‘//title/text()‘) response.css(‘title::text‘) 例子: response.css(‘img‘).xpath(‘@src‘).extract() response.xpath(‘//div[@id="images"]/a/text()‘).extract_first() response.xpath(‘//div[@id="not-exists"]/text()‘).extract_first(default=‘not-found‘) (2)使用正则匹配的 response.xpath(‘//a[contains(@href, "image")]/text()‘).re(r‘Name:\s*(.*)‘) response.xpath(‘//a[contains(@href, "image")]/text()‘).re_first(r‘Name:\s*(.*)‘) (3)Working with relative XPaths divs = response.xpath(‘//div‘) for p in divs.xpath(‘.//p‘): print p.extract() for p in divs.xpath(‘p‘): print p.extract() (4) (5) 官方实例:>>> links = response.xpath(‘//a[contains(@href, "image")]‘)>>> links.extract()[u‘<a href="image1.html">Name: My image 1 <br><img src="image1_thumb.jpg"></a>‘, u‘<a href="image2.html">Name: My image 2 <br><img src="image2_thumb.jpg"></a>‘, u‘<a href="image3.html">Name: My image 3 <br><img src="image3_thumb.jpg"></a>‘, u‘<a href="image4.html">Name: My image 4 <br><img src="image4_thumb.jpg"></a>‘, u‘<a href="image5.html">Name: My image 5 <br><img src="image5_thumb.jpg"></a>‘] >>> for index, link in enumerate(links):... args = (index, link.xpath(‘@href‘).extract(), link.xpath(‘img/@src‘).extract())... print ‘Link number %d points to url %s and image %s‘ % args Link number 0 points to url [u‘image1.html‘] and image [u‘image1_thumb.jpg‘]Link number 1 points to url [u‘image2.html‘] and image [u‘image2_thumb.jpg‘]Link number 2 points to url [u‘image3.html‘] and image [u‘image3_thumb.jpg‘]Link number 3 points to url [u‘image4.html‘] and image [u‘image4_thumb.jpg‘]Link number 4 points to url [u‘image5.html‘] and image [u‘image5_thumb.jpg‘]
时间: 2024-08-06 11:52:39