今天在开源中国上看到有个有写了个小程序,用来获取代理IP地址。用的beautifulsoup。
自己动手用正则重写了一下。
#!/usr/bin/python import requests import re pattern=re.compile(r‘(\d+)\D(\d+)\D(\d+)\D(\d+)\D(\d+)‘) headers={‘Host‘:"www.ip-adress.com", ‘User-Agent‘:"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:34.0) Gecko/20100101 Firefox/34.0", ‘Accept‘:"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", ‘Accept-Language‘:"zh-cn,zh;q=0.8,en-us;q=0.5,en;q=0.3", ‘Accept-Encoding‘:"gzip, deflate", ‘Referer‘:"http://www.ip-adress.com/Proxy_Checker/" } url="http://www.ip-adress.com/proxy_list/" req=requests.get(url,headers=headers) content=req.content proxy_ip=re.findall(pattern,content) for ip in proxy_ip: print ‘.‘.join(ip)
时间: 2024-10-23 22:53:57