通过BeautifulSoup库的get_text方法找到网页的正文:
#!/usr/bin/env python #coding=utf-8 #HTML找出正文 import requests from bs4 import BeautifulSoup url=‘http://www.baidu.com‘ html=requests.get(url) soup=BeautifulSoup(html.text) print soup.get_text()
时间: 2024-09-30 15:20:42