Learn Web.Crawling of Perl

#####
#Overview of Web.Crawling related modules.
#Note that, below codes can not be executed just for overview intention.
#####

#!/usr/bin/perl

#####
#HTTP::Thin
#####
use 5.12.1;
use HTTP::Request::Common;
use HTTP::Thin;

say HTTP::Thin->new()->request(GET 'http://example.com')->as_string;

#####
#HTTP:Tiny
#####
use HTTP::Tiny;

my $response = HTTP::Tiny->new->get('http://example.com/');
die "Failed! \n" unless $response->{success};
print "$response->{status} $response->{reason} \n";

while (my ($k, $v) = each %{$response->{headers}}) {
  for (ref $v eq 'ARRAY' ? @$v : $v) {
    print "$k: $_ \n";
  }
}

print $response->{content} if length $response->{content};

#new
$http = HTTP::Tiny->new{ %attrubutes };

#valid attributes include:
#-agent
#-cookie_jar
#-default_headers
#-local_address
#-keep_alive
#-max_redirect
#-max_size
#-https_proxy
#-proxy
#-no_proxy
#-timeout
#-verify_SSL
#-SSL_options

#get[head][put][post]delete
$response = $http->get($url);
$response = $http->get($url, \%options);
$response = $http->head($url);

#post_form
$response = $http->post_form($url, $form_data);
$response = $http->post_form($url, $form_data, \%options);

#request
$response = $http->request($method, $url);
$response = $http->request($method, $url, \%options);

$http->request('GET', 'http://user:pwd [email protected]');
#or
$http->request('GET', 'http://mars%40:pwd [email protected]');

#www_form_urlencode
$params = $http->www_form_urlencode( $data );
$response = $http->get("http://example.com/query?$params");

#SSL support
SSL_options => {
  SSL_ca_file => $file_path,
}

#proxy support

#####
#www::Mechanize
#
#Stateful programmatic web browsing, used for automating interaction with websites.
#####

use WWW::Mechanize;

my $mech = WWW::Mechanize->new();

$mech->get( $url );

$mech->follow_link( n => 3 );
$mech->follow_link( text_regex => qr/download this/i );
$mech->follow_link( url => 'http://host.com/index.html' );

$mech->submit_form(
  form_number => 3,
  fields => {
    username => 'banana',
    passoword => 'lost-and-alone',
  }
);

$mech->submit_form(
  form_name => 'search',
  fields => { query => 'pot of gold', },
  button => 'search now'
);

#testing web applications
use Test::More;

like( $mech->content(), qr/$expected/, "Got expected content" );

#page traverse
$mech->back();

#finer control over page
$mech->find_link( n => $number );
$mech->form_number( $number );
$mech->form_name( $name );
$mech->field( $name, $value );
$mech->set_fields( $field_values );
$mech->set_visible( @criteria );
$mech->click( $button );

#subclass of LWP::UserAgent, eg:
$mech->add_header( $name =>$value );

#page-fecting methods

#status methods

#content-handling methods

#link methods

#image methods

#form methods

#field methods

#miscellaneous methods

#overridden LWP::UserAgent methods
#inherited unchanced LWP::UserAgent methods

#yeah now, it's easy to implement a spider project for future integration use.

Mars

时间： 2024-10-05 20:18:55

Learn Web.Crawling of Perl的相关文章

Learn Web.Development of Perl

##### #Overview of Web.Development related modules. #Note that, below codes can not be executed just for overview intention. ##### #!/usr/bin/perl #CGI::FormBuilder::Source::Perl #Dancer, A lightweight yet powerful web application framework #HTML::Fo

Plan for Perl Web Development

Plan for Perl Web Development [2015/2/8 to 2015/4/1] Perl Foundation (Task::Kensho,DBDev: Database Development,XML: XML Development) Web Foundation (Task::Kensho::WebCrawling: Web Crawling) Web Development (Task::Kensho::WebDev: Web Development) Web

Linux -- Web服务器配置之用户认证；Perl语言解释器的安装

一.用户认证用户认证在网络安全中是非常重要的技术之一,它是保护网络系统资源的第一道防线.用户认证控制着所有登录并检查访问用户的合法性,其目标是仅让合法用户以合法的权限访问网络系统的资源.当用户第一次访问了启用用户认证目录下的任何文件,浏览器会显示一个对话框,要求输入正确的登录用户名和口令进行用户身份的确认.若是合法用户,则显示所访问的文件内容.此后访问该目录的每个文件时,浏览器会自动送出用户名和密码,不用再输入了,直到关闭浏览器为止.用户认证功能起到了一个屏障的作用,限制非授权用户非法访问一些

A web crawler design for data mining

Abstract The content of the web has increasingly become a focus for academic research. Computer programs are needed in order to conduct any large-scale processing of web pages, requiring the use of a web crawler at some stage in order to fetch the pa

Guide Practice of Perl

#!/usr/bin/perl use strict; use warnings; #This is a practice of perl, as which shows with below perl code...codes. print "Hello, world!\n"; my $animal = "camel\n"; my $answer = 42; printf $animal; print "The animal is $animal\n&q

Web Service学习笔记：动态调用WebService

原文:Web Service学习笔记:动态调用WebService 多数时候我们通过 "添加 Web 引用..." 创建客户端代理类的方式调用WebService,但在某些情况下我们可能需要在程序运行期间动态调用一个未知的服务.在 .NET Framework 的 System.Web.Services.Description 命名空间中有我们需要的东西. 具体步骤: 1. 从目标 URL 下载 WSDL 数据. 2. 使用 ServiceDescription 创建和格式化 WSDL

简单而直接的Python web 框架：web.py

From:https://www.oschina.net/question/5189_4306 Web.py github 地址:https://github.com/webpy/webpy https://pypi.python.org/pypi/web.py Web.py Cookbook 简体中文版:http://webpy.org/cookbook/index.zh-cn web.py 0.3 新手指南:http://webpy.org/docs/0.3/tutorial.

玩转 Perl ABC

Perl includes a rich documentation system called Perdoc, as part of the language package. Perldoc is also available on the web at perldoc.perl.org, and in fact this is probably much more convenient for most people. 下载ActivePerlhttps://www.activestate

web.py尝试性学习！

首先导入web.py模块! import web 没有的话就: pip install web web.py的URL结构: urls = ( '/', "index" ) 第一部分是一个正则表达式匹配的URL,如"/", "/item/(\d+)", 括号表示匹配到的数据可以供后面继续使用.第二部分则是类名称,简而言之,就是会将此次请求发送到第二部分类名的类中. 有请求就必然少不了,GET和POST!! class index: def GET(