安装 pip install selenium
web
phantomjs下载 :http://phantomjs.org/download.html
浏览器驱动下载:http://www.seleniumhq.com/download
chrome: http://chromedriver.storage.googleapis.com/index.html?path=2.22/
#!/usr/bin/env python # encoding: utf-8 from selenium import webdriver driver = webdriver.Chrome() url = ‘http://www.toutiao.com/news_fashion/‘ driver.get(url) print driver.title
爬取今日头条实例,使用刷新方法,来改变文章内容,暂时还不会控制鼠标滑动来实现
#!/usr/bin/env python # encoding: utf-8 import time from selenium import webdriver import itertools driver = webdriver.Chrome() url = ‘http://www.toutiao.com/news_fashion/‘ driver.get(url) print driver.get(url) for x in range(2): driver.refresh() titles = driver.find_elements_by_class_name("title-box") contents = driver.find_elements_by_class_name("abstract") imgs = driver.find_element_by_css_selector(".feedimg") for title, content, img in zip(titles, contents, itertools.repeat(imgs)): data = { ‘title‘: title.text, ‘content‘: content.text, ‘img‘: img.get_attribute(‘src‘) } print data time.sleep(10) driver.close()
时间: 2024-10-14 11:57:09