使用java.net.URL获取网页编码

在同一个类中

需要导入以下的包：

import java.net.MalformedURLException;
import java.net.URL;
import java.net.URLConnection;

 1 @Test
 2     public void e() throws MalformedURLException, IOException{
 3         System.out.println(testgetCharset());
 4     }
 5     public String testgetCharset() throws MalformedURLException, IOException{
 6         /**
 7          * 获取一个网页的编码形式
 8          *
 9          * @param url
10          */
11             String host="proxy3.bj.petrochina";
12             String port="8080";
13             setProxy(host,port);
14             String url="http://www.songtaste.com";
15             URLConnection uc = new URL(url).openConnection();
16             uc.setRequestProperty("User-Agent", "Mozilla/4.0 (compatible; MSIE 5.0; Windows XP; DigExt)");
17
18             InputStream is = uc.getInputStream();
19             BufferedReader br = new BufferedReader(new InputStreamReader(is));
20             String str = new String();
21             String temp = null;
22             int i = 0;
23             while ((temp = br.readLine()) != null)
24             {
25                 str += temp;
26                 if (temp.length()>6 && temp.substring(0, 6).equals("<body>"))
27                     break;
28                 if (++i > 50)
29                     break;
30             }
31             br.close();
32
33             //比较等于charset，然后找出charset的值
34             for (i=0; i<str.length()-7; i++)
35             {
36                 if (str.substring(i, i+7).equalsIgnoreCase("charset"))
37                 {
38                     i = i+7;
39                     break;
40                 }
41             }
42             int begin = 0;
43             while (true)
44             {
45                 if (i >= str.length())
46                     return new String();
47                 char c = str.charAt(i);
48                 if (c!=‘ ‘ && c!=‘=‘)
49                 {
50                     if (begin == 0)
51                         begin = i;
52                     if (c == ‘"‘)
53                         return str.substring(begin, i);
54                 }
55                 i++;
56             }
57         }
58
59     public static void setProxy(String host, String port) {
60         System.setProperty("proxySet", "true");
61         System.setProperty("proxyHost", host);
62         System.setProperty("proxyPort", port);
63     }

时间： 2024-08-15 21:12:26

使用java.net.URL获取网页编码的相关文章

Java 网络爬虫获取网页源代码原理及实现

Java 网络爬虫获取网页源代码原理及实现 1.网络爬虫是一个自动提取网页的程序,它为搜索引擎从万维网上下载网页,是搜索引擎的重要组成.传统爬虫从一个或若干初始网页的URL开始,获得初始网页上的URL,在抓取网页的过程中,不断从当前页面上抽取新的URL放入队列,直到满足系统的一定停止条件. 2.那么程序获取网页的原理到底是怎么回事呢?看下面的图:客服端首先向服务器端发出Http请求,之后服务器端返回相应的结果或者请求超时客户端自己报错. 服务器端发出的Http请求,实际上说是对服务器的文件的请求

【java】<Jsoup>获取网页中的图片

要做Android课程设计了,做一个爬漫画的东东练一下手 1 package asd; 2 3 import java.io.File; 4 import java.io.FileOutputStream; 5 import java.io.IOException; 6 import java.io.InputStream; 7 import java.io.OutputStream; 8 import java.net.URL; 9 import java.net.URLConnection;

asp.net 利用HttpWebRequest自动获取网页编码并获取网页源代码

/// <summary> /// 获取源代码 /// </summary> /// <param name="url"></param> /// <returns></returns> public static string GetHtml(string url, Encoding encoding) { HttpWebRequest request = null; HttpWebResponse respon

java通过URL获取文本内容

原文地址https://www.cnblogs.com/myadmin/p/7634262.html public static String readFileByUrl(String urlStr) { String res=null; try { URL url = new URL(urlStr); HttpURLConnection conn = (HttpURLConnection)url.openConnection(); //设置超时间为3秒 conn.setConnectTimeo

java如何URL获取下载的文件名

HttpURLConnection httpConnection = (HttpURLConnection) url.openConnection(); String str= httpConnection.getHeaderField("Content-Disposition"); 见图

JAVA通过url获取页面内容

String address = "http://sports.sina.com.cn/nba/live.html?id=2015050405"; URL url = new URL(address); HttpURLConnection connection = (HttpURLConnection)url.openConnection(); InputStreamReader input = new InputStreamReader(connection.getInputStre

java根据url获取完整域名

private String getDomain(String destination){ if(destination==null||destination.trim().equals("")){ return ""; } String domain = ""; URL url =null; try { url= new URL(destination); domain =url.getProtocol()+"://"+ur

java+selenium+new——获取网页源代码driver.getPageSource()

package rjcs; import org.openqa.selenium.firefox.FirefoxDriver; import org.testng.Assert; public class xinkaishi { public static void main(String[] args) { System.setProperty("webdriver.firefox.bin","C:\\Program Files (x86)\\Mozilla Firefox

获取网页编码

[javascript] if(document.charset){ document.writeln('<script src="http://www.zzwcw.com/swt/bottomfloat.js" charset="GBK"></script>'); }else if(document.characterSet){ document.writeln('<script src="http://www.zzwcw.