本章描述:对于Protocol的封装
package com.digitalpebble.storm.crawler.fetcher; import com.digitalpebble.storm.crawler.util.Configuration; public interface Protocol { public ProtocolResponse getProtocolOutput(String url) throws Exception; public void configure(Configuration conf); }
对于ProtoclFactory的封装
package com.digitalpebble.storm.crawler.fetcher; import java.net.URL; import java.util.WeakHashMap; import com.digitalpebble.storm.crawler.fetcher.asynchttpclient.AHProtocol; import com.digitalpebble.storm.crawler.util.Configuration; /** * @author Yin Shuai * */ public class ProtocolFactory { private final Configuration config; private final WeakHashMap<String, Protocol> cache = new WeakHashMap<String, Protocol>(); public ProtocolFactory(Configuration conf) { config = conf; } /** Returns an instance of the protocol to use for a given URL **/ public synchronized Protocol getProtocol(URL url) { // get the protocol String protocol = url.getProtocol(); Protocol pp = cache.get(protocol); if (pp != null) return pp; // yuk! hardcoded for now pp = new AHProtocol(); pp.configure(config); cache.put(protocol,pp); return pp; } }
对于ProtocolResponse的封装
package com.digitalpebble.storm.crawler.fetcher; import java.util.HashMap; public class ProtocolResponse { final byte[] content; final int statusCode; final HashMap<String, String[]> metadata; public ProtocolResponse(byte[] c, int s, HashMap<String, String[]> md){ content = c; statusCode = s; metadata = md; } public byte[] getContent() { return content; } public int getStatusCode() { return statusCode; } public HashMap<String, String[]> getMetadata() { return metadata; } }
Storm【实践系列-如何写一个爬虫- 对于Protocol进行的封装】
时间: 2024-10-02 04:43:15