大体思路
使用redis作为队列,买了一份蘑菇代理,但是这个代理每5秒可以请求一次,我们将IP请求出来,从redis列表队列的左侧插入,要用的时候再从右侧取出,请求成功证明该IP是可用的,将该代理IP从左侧放回,三次都请求失败则认为该代理IP已经失效
代码如下:
import requestsimport jsonimport redisimport timer = redis.Redis(host=‘127.0.0.1‘, port=6379,db=3)num = r.llen(‘the_ip‘)print(num)while True: if num<5: ip = requests.get(‘http://piping.mogumiao.com/proxy/api/get_ip_al?appKey=b9bfb84c7ca34fec9f51b3a9dca147e5&count=2&expiryDate=0&format=1‘).text print(ip) code = json.loads(ip)[‘code‘] if code==‘0‘: msg = json.loads(ip)[‘msg‘] for i in msg: ip = i[‘ip‘]+‘:‘+i[‘port‘] print(ip) r.lpush(‘the_ip‘,ip) num = r.llen(‘the_ip‘) elif code==‘3001‘: "提取频繁,5秒提取一次!" time.sleep(5) else: print(‘调用IP接口错误,错误类型为‘+code) else: print(‘IP池已经满了‘) num = r.llen(‘the_ip‘) time.sleep(3) 上面这些代码是保证redis代理IP池里始终有5个左右的代理IP
import requestsimport jsonimport redisimport timefrom lxml import etreer = redis.Redis(host=‘127.0.0.1‘, port=6379,db=3)def get_source(url,header,data=None): ip = r.rpop(‘the_ip‘).decode(‘utf8‘) print(‘提取ip‘,ip) if data==None: n = 0 while True: try: source = requests.get(url,headers=header,proxies={‘http‘:ip},timeout=5).content r.lpush(‘the_ip‘,ip) print(‘请求成功返还IP‘,ip) return source except: n+=1 print(‘请求失败‘+str(n)+‘次‘) if n==3: return get_source(url,header) else: source = requests.get(url, headers=header, proxies={‘http‘: ip},data=data).content return source header = {‘User-Agent‘: "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.117 Safari/537.36}"}while True: source = get_source(‘http://www.ip111.cn/‘,header).decode(‘utf8‘) show = etree.HTML(source).xpath(‘//tr[2]/td[2]/text()‘) print(show) 上面的代理是循环请求查看当前IP的网址,从而看出代理IP的变化。每次请求都是轮着使用代理的,可以是代理用更长时间而不必担心老用一个代理IP被封了
原文地址:https://www.cnblogs.com/mypath/p/9024674.html
时间: 2024-11-05 13:33:08