代码如下
#!/usr/bin/env python# -*- coding:utf-8 -*- from bs4 import BeautifulSoupimport requests def get_page_within(pages): for page in range(1, pages+1): wb = requests.get(‘http://bj.xiaozhu.com/search-duanzufang-p{}-0/‘.format(page)) soup = BeautifulSoup(wb.text, ‘lxml‘) titles = soup.select(‘span.result_title‘) prices = soup.select(‘span.result_price > i‘) for title, price in zip(titles, prices): date = { ‘title‘: title.get_text(), ‘price‘: price.get_text() } print(date)get_page_within(pages=1000)针对代码解释下
from bs4 import BeautifulSoupimport requests引入beautifulsoup和requests两个库
def get_page_within(pages):构建def函数意思是获取pages张页面的数据
for page in range(1, pages+1):以1为起点循环pages+1个数 wb = requests.get(‘http://bj.xiaozhu.com/search-duanzufang-p{}-0/‘.format(page))
通过.famate让括号内的数切换并且通过for循环和request库解析pages个网址的内容
soup = BeautifulSoup(wb.text, ‘lxml‘)通过beautifulsoup库解析网页内数据 titles = soup.select(‘span.result_title‘)
prices = soup.select(‘span.result_price > i‘)选取title和prices数据
for title, price in zip(titles, prices): date = { ‘title‘: title.get_text(), ‘price‘: price.get_text() } print(date)将获得的内容装到字典里并打印
get_page_within(pages=1000)给def一个值运行def函数
时间: 2024-10-24 19:53:19