输入URL 将整个html内容保存到指定文件

package parser;

import java.io.BufferedReader;

import java.io.BufferedWriter;

import java.io.FileWriter;

import java.io.IOException;

import java.io.InputStream;

import java.io.InputStreamReader;

import java.net.MalformedURLException;

import java.net.URL;

/**

* 基本能实现网页抓取,不过要手动输入URL 将整个html内容保存到指定文件

*

*@author chenguoyong

*

*/

public class ScrubSelectedWeb {

privatefinal static String CRLF = System.getProperty("line.separator");

/**

* @param args

*/

publicstatic void main(String[] args) {

try{

URLur = newURL("http://10.249.187.199:8083/injs100/");

InputStreaminstr = ur.openStream();

Strings, str;

BufferedReaderin = new BufferedReader(new InputStreamReader(instr));

StringBuffersb = new StringBuffer();

BufferedWriterout = new BufferedWriter(new FileWriter(

"D:/outPut.txt"));

while((s = in.readLine()) != null) {

sb.append(s+ CRLF);

}

System.out.println(sb);

str= new String(sb);

out.write(str);

out.close();

in.close();

}catch (MalformedURLException e) {

e.printStackTrace();

}catch (IOException e) {

e.printStackTrace();

}

}

}

基本能实现网页抓取,不过要手动输入URL,此外没有重构。只是一个简单的思路。

1.htmlparser 使用

htmlparser是一个纯的java写的html解析的库,htmlparser不依赖于其它的java库,htmlparser主要用于改造或提取html。htmlparser能超高速解析html,而且不会出错。毫不夸张地说,htmlparser就是目前最好的html解析和分析的工具。无论你是想抓取网页数据还是改造html的内容,用了htmlparser绝对会忍不住称赞。由于htmlparser
结构设计精良,所以扩展htmlparser
非常便利。

http://c.tieba.baidu.com/p/3316726163

http://c.tieba.baidu.com/p/3316723845

http://c.tieba.baidu.com/p/3316722567

http://c.tieba.baidu.com/p/3316721327

http://c.tieba.baidu.com/p/3316717504

http://c.tieba.baidu.com/p/3316714975

http://c.tieba.baidu.com/p/3316710876

http://c.tieba.baidu.com/p/3316692502

http://c.tieba.baidu.com/p/3316689008

http://c.tieba.baidu.com/p/3316687706

http://c.tieba.baidu.com/p/3316750701

http://c.tieba.baidu.com/p/3316760692

http://c.tieba.baidu.com/p/3316762691

http://c.tieba.baidu.com/p/3316780765

http://c.tieba.baidu.com/p/3316781850

http://c.tieba.baidu.com/p/3316787592

http://c.tieba.baidu.com/p/3316798631

http://c.tieba.baidu.com/p/3316804467

http://c.tieba.baidu.com/p/3316806665

http://c.tieba.baidu.com/p/3316811332

http://c.tieba.baidu.com/p/3316828201

http://c.tieba.baidu.com/p/3316826791

http://c.tieba.baidu.com/p/3311944721

http://c.tieba.baidu.com/p/3311943490

http://c.tieba.baidu.com/p/3311943062

http://c.tieba.baidu.com/p/3305095344

http://c.tieba.baidu.com/p/3305097954

http://c.tieba.baidu.com/p/3305100697

http://c.tieba.baidu.com/p/3305103600

http://c.tieba.baidu.com/p/3305105795

http://c.tieba.baidu.com/p/3305110305

http://c.tieba.baidu.com/p/3305112079

http://c.tieba.baidu.com/p/3305115018

http://c.tieba.baidu.com/p/3305117117

http://c.tieba.baidu.com/p/3305118990

http://c.tieba.baidu.com/p/3305123204

http://c.tieba.baidu.com/p/3305123924

http://c.tieba.baidu.com/p/3305124673

http://c.tieba.baidu.com/p/3305130305

http://c.tieba.baidu.com/p/3305136460

http://c.tieba.baidu.com/p/3305140204

http://c.tieba.baidu.com/p/3316925465

http://c.tieba.baidu.com/p/3317149335

http://c.tieba.baidu.com/p/3317148112

http://c.tieba.baidu.com/p/3317146582

http://c.tieba.baidu.com/p/3317151995

http://c.tieba.baidu.com/p/3287967193

http://c.tieba.baidu.com/p/3317242653

http://c.tieba.baidu.com/p/3317244575

http://c.tieba.baidu.com/p/3317242653

http://c.tieba.baidu.com/p/3317247843

http://c.tieba.baidu.com/p/3317248495

http://c.tieba.baidu.com/p/3317251825

http://c.tieba.baidu.com/p/3317253337

http://c.tieba.baidu.com/p/3317253840

http://c.tieba.baidu.com/p/3317146582

http://c.tieba.baidu.com/p/3317148112

http://c.tieba.baidu.com/p/3317149335

http://c.tieba.baidu.com/p/3317151995

http://c.tieba.baidu.com/p/3317176379

http://c.tieba.baidu.com/p/3317177568

http://c.tieba.baidu.com/p/3317178811

http://c.tieba.baidu.com/p/3317192065

http://c.tieba.baidu.com/p/3317193734

http://c.tieba.baidu.com/p/3317195526

http://c.tieba.baidu.com/p/3317213453

http://c.tieba.baidu.com/p/3317218881

http://c.tieba.baidu.com/p/3317220460

http://c.tieba.baidu.com/p/3317221802

http://c.tieba.baidu.com/p/3317264965

http://c.tieba.baidu.com/p/3317266739

http://c.tieba.baidu.com/p/3317292343

http://c.tieba.baidu.com/p/3317302135

http://c.tieba.baidu.com/p/3317301165

http://c.tieba.baidu.com/p/3316890365

http://c.tieba.baidu.com/p/3305128709

http://c.tieba.baidu.com/p/3305138616

http://c.tieba.baidu.com/p/3305142732

http://c.tieba.baidu.com/p/3317222864

http://c.tieba.baidu.com/p/3317334635

http://c.tieba.baidu.com/p/3317335086

http://c.tieba.baidu.com/p/3317335730

http://c.tieba.baidu.com/p/3317336186

http://c.tieba.baidu.com/p/3317337213

http://c.tieba.baidu.com/p/3317337665

http://c.tieba.baidu.com/p/3317338031

http://c.tieba.baidu.com/p/3317338563

http://c.tieba.baidu.com/p/3317339537

http://c.tieba.baidu.com/p/3317341138

http://c.tieba.baidu.com/p/3317341879

http://c.tieba.baidu.com/p/3317342204

http://c.tieba.baidu.com/p/3317342973

http://c.tieba.baidu.com/p/3317343387

http://c.tieba.baidu.com/p/3317345377

http://c.tieba.baidu.com/p/3317345704

http://c.tieba.baidu.com/p/3317346163

http://c.tieba.baidu.com/p/3317347026

http://c.tieba.baidu.com/p/3317347678

http://c.tieba.baidu.com/p/3317351042

http://c.tieba.baidu.com/p/3317351639

http://c.tieba.baidu.com/p/3317351921

http://c.tieba.baidu.com/p/3317352257

http://c.tieba.baidu.com/p/3317355970

http://c.tieba.baidu.com/p/3317356202

http://c.tieba.baidu.com/p/3317356465

http://c.tieba.baidu.com/p/3317356701

http://c.tieba.baidu.com/p/3317356930

http://c.tieba.baidu.com/p/3317358153

http://c.tieba.baidu.com/p/3317358297

http://c.tieba.baidu.com/p/3317358544

http://c.tieba.baidu.com/p/3317358737

http://c.tieba.baidu.com/p/3317358972

http://c.tieba.baidu.com/p/3317352535

http://c.tieba.baidu.com/p/3317384138

http://c.tieba.baidu.com/p/3317384839

http://c.tieba.baidu.com/p/3317385738

http://c.tieba.baidu.com/p/3317386332

http://c.tieba.baidu.com/p/3317387505

http://c.tieba.baidu.com/p/3317386978

http://c.tieba.baidu.com/p/3317387505

http://c.tieba.baidu.com/p/3317388068

http://c.tieba.baidu.com/p/3317389331

http://c.tieba.baidu.com/p/3317389987

http://c.tieba.baidu.com/p/3317390834

http://c.tieba.baidu.com/p/3317392303

http://c.tieba.baidu.com/p/3317392786

http://c.tieba.baidu.com/p/3317396474

http://c.tieba.baidu.com/p/3317407676

http://c.tieba.baidu.com/p/3317409934

http://c.tieba.baidu.com/p/3317419426

http://c.tieba.baidu.com/p/3317429015

http://c.tieba.baidu.com/p/3317422129

http://c.tieba.baidu.com/p/3317416171

http://c.tieba.baidu.com/p/3317414267

http://c.tieba.baidu.com/p/3317412472

http://c.tieba.baidu.com/p/3317411779

时间: 2024-08-01 16:12:01

输入URL 将整个html内容保存到指定文件的相关文章

将整个html内容保存到指定文件

package parser; import java.io.BufferedReader; import java.io.BufferedWriter; import java.io.FileWriter; import java.io.IOException; import java.io.InputStream; import java.io.InputStreamReader; import java.net.MalformedURLException; import java.net.

Java学习(2):将键盘录入的内容保存到指定文件中

要求:保存键盘录入的内容,当键盘输入end时,录入结束. 1 /** 2 * 保存键盘输入,并以end结束 3 * 4 * @author xcx 5 * @time 2017年6月24日下午3:32:50 6 */ 7 public class GetData { 8 9 public static void main(String[] args) throws IOException { 10 String fileName = "d:\\java\\jj\\dd.txt";//

谷歌浏览器怎样把网页全部内容保存为.mhtml文件?

Chrome保存.mhtml网页文件的方法: 在 Chrome 地址栏中键入chrome://flags,回车, 在页面搜索栏输入mhtml 把“Save Page as MHTML”项修改为 Enabled (启用) ,然后重启浏览器就行了 原文地址:https://www.cnblogs.com/ChouXiaoShou/p/mhtml.html

Vim 中截取部分内容保存到其他文件

最近无聊,突然想跟着玩玩天池数据挖掘,发现数据好大,想转换到mysql数据库,phpmyadmin import 导入时抱错! 数据文件大大! 于是乎,准备截取一小段到另外一个文件测试先,然后,发现了一个vim中一个很好用的命令: 1. vim 打开data.csv文件,按esc,切换到命令模式,截取第2到200行到新文件data_copy.csv中: : 2, 200 w ./data_copy.csv 2. 追加到原有文件中命令:加上 “>>” 表示追加而不覆盖! :201,380 w &

sublimeText3 中配置sass环境,并将编译后文件保存到指定文件夹

sass基于ruby引擎,所以安装时ass.compass之前需要安装ruby.具体的链接应该是(http://rubyinstaller.org/downloads).下载并安装相应的版本,勾选第二项(要在cmd中使用ruby). 打开命令行,输入ruby -v,查看我们安装的ruby版本信息. ruby安装完成之后,打开ruby的command面板,接下来就是安装sass了.Windows下安装sass有多种方法,这里说一下其中的两种: 1.到 Rubygems(http://rubygem

c# 多个图片单独上传 保存到指定文件夹 保存到数据库

1.引用js文件 <script src="~/Scripts/uploadPreview.js"></script> <html> //toCalid()在表单提交之前进行非空验证 <form action="/Home/insertImg" method="post" enctype="multipart/form-data"  onsubmit="return toCa

拍照并保存到指定文件夹

关键代码: protected void takePhoto(View v){ if( v.getId() == R.id.btn ) { File dir = new File(Environment.getExternalStorageDirectory(),"Guo"); if(!dir.exists()){ dir.mkdir(); } curFile = new File(dir, System.currentTimeMillis() + ".jpg");

android画图板,可将内容保存为图片

画图板,可通过直线或填充的方式进行绘画,可以设置画笔的颜色,粗细,并能够将绘制的内容保存为jpg文件(保存位置为sdcard/huaban) 下载地址: http://www.dwz.cn/zbOoL     

linux tail显示指定文件末尾内容

tail 命令从指定点开始将文件写到标准输出.使用tail命令的-f选项可以方便的查阅正在改变的日志文件,tail -f filename会把filename里最尾部的内容显示在屏幕上,并且不但刷新,使你看到最新的文件内容. 1.命令格式; tail[必要参数][选择参数][文件] 2.命令功能: 用于显示指定文件末尾内容,不指定文件时,作为输入信息进行处理.常用查看日志文件. 3.命令参数: -f 循环读取 -q 不显示处理信息 -v 显示详细的处理信息 -c<数目> 显示的字节数 -n&l