Googlebot (Google Web search)

w推测“域名解析过程中,Google crawlers中首先是Googlebo中的Google Web search上阵。”。

  1 +-----+----------------+---------------------+-------------------------+------------------+
  2 |  23 | 111.251.93.170 | 2017-01-24 17:48:19 | Unidentified User Agent |                  |
  3 |  24 | 111.251.93.170 | 2017-01-24 17:49:19 | Unidentified User Agent |                  |
  4 |  51 | 119.147.32.253 | 2017-01-24 17:59:32 | Unidentified User Agent |                  |
  5 |  53 | 183.57.53.197  | 2017-01-24 18:11:56 | Mozilla 5.0             | iOS              |
  6 |  54 | 123.56.233.103 | 2017-01-24 18:14:39 | Unidentified User Agent |                  |
  7 |  56 | 112.90.142.207 | 2017-01-24 18:18:05 | Firefox 3.0             | Windows XP       |
  8 |  57 | 183.232.120.37 | 2017-01-24 18:18:05 | Firefox 3.0             | Windows XP       |
  9 |  59 | 117.136.40.218 | 2017-01-24 18:18:47 | ZTE                     | Android          |
 10 |  60 | 117.136.40.218 | 2017-01-24 18:18:50 | ZTE                     | Android          |
 11 |  61 | 117.136.40.218 | 2017-01-24 18:18:51 | ZTE                     | Android          |
 12 |  62 | 117.136.40.218 | 2017-01-24 18:18:53 | ZTE                     | Android          |
 13 |  63 | 117.136.40.218 | 2017-01-24 18:19:00 | Safari 534.30           | Android          |
 14 |  64 | 117.136.40.218 | 2017-01-24 18:19:13 | Safari 534.30           | Android          |
 15 |  65 | 117.136.40.218 | 2017-01-24 18:29:31 | Chrome 37.0.0.0         | Android          |
 16 |  66 | 117.136.40.218 | 2017-01-24 18:29:41 | Chrome 37.0.0.0         | Android          |
 17 |  67 | 117.136.40.218 | 2017-01-24 18:30:02 | Chrome 37.0.0.0         | Android          |
 18 |  68 | 117.136.40.218 | 2017-01-24 18:30:15 | Chrome 37.0.0.0         | Android          |
 19 |  69 | 117.136.40.218 | 2017-01-24 18:40:37 | Chrome 55.0.2883.87     | Windows 7        |
 20 |  70 | 177.193.53.212 | 2017-01-24 18:47:00 | Googlebot               | Unknown Platform |
 21 |  71 | 111.251.93.170 | 2017-01-24 18:49:26 | Unidentified User Agent |                  |
 22 |  72 | 139.162.108.53 | 2017-01-24 19:05:15 | Chrome 50.0.2661.102    | Windows 10       |
 23 |  73 | 111.251.93.170 | 2017-01-24 19:08:52 | Unidentified User Agent |                  |
 24 |  74 | 111.251.93.170 | 2017-01-24 19:09:40 | Unidentified User Agent |                  |
 25 |  75 | 111.251.93.170 | 2017-01-24 19:29:51 | Unidentified User Agent |                  |
 26 |  76 | 61.142.176.19  | 2017-01-24 19:46:40 | Firefox 3.6.3           | Windows 7        |
 27 |  77 | 111.251.93.170 | 2017-01-24 19:49:40 | Unidentified User Agent |                  |
 28 |  78 | 111.251.93.170 | 2017-01-24 19:50:49 | Unidentified User Agent |                  |
 29 |  79 | 111.251.93.170 | 2017-01-24 20:09:52 | Unidentified User Agent |                  |
 30 |  80 | 111.251.93.170 | 2017-01-24 20:30:06 | Unidentified User Agent |                  |
 31 |  81 | 23.251.63.45   | 2017-01-24 20:37:14 | Unidentified User Agent |                  |
 32 |  82 | 111.251.93.170 | 2017-01-24 20:49:53 | Unidentified User Agent |                  |
 33 |  83 | 111.251.93.170 | 2017-01-24 21:10:04 | Unidentified User Agent |                  |
 34 |  84 | 111.251.93.170 | 2017-01-24 21:30:32 | Unidentified User Agent |                  |
 35 |  85 | 111.251.93.170 | 2017-01-24 21:50:46 | Unidentified User Agent |                  |
 36 |  86 | 111.251.93.170 | 2017-01-24 21:51:33 | Unidentified User Agent |                  |
 37 |  87 | 61.142.176.20  | 2017-01-24 21:58:34 | Unidentified User Agent | Unknown Platform |
 38 |  88 | 111.251.93.170 | 2017-01-24 22:11:24 | Unidentified User Agent |                  |
 39 |  89 | 111.251.93.170 | 2017-01-24 22:30:22 | Unidentified User Agent |                  |
 40 |  90 | 111.251.93.170 | 2017-01-24 22:31:24 | Unidentified User Agent |                  |
 41 |  91 | 23.251.63.45   | 2017-01-24 22:41:58 | Unidentified User Agent |                  |
 42 |  92 | 111.251.93.170 | 2017-01-24 22:50:40 | Unidentified User Agent |                  |
 43 |  93 | 111.251.93.170 | 2017-01-24 23:31:12 | Unidentified User Agent |                  |
 44 |  94 | 111.251.93.170 | 2017-01-24 23:32:00 | Unidentified User Agent |                  |
 45 |  95 | 111.251.93.170 | 2017-01-24 23:32:40 | Unidentified User Agent |                  |
 46 |  96 | 111.251.93.170 | 2017-01-24 23:51:21 | Unidentified User Agent |                  |
 47 |  97 | 111.251.93.170 | 2017-01-25 00:11:27 | Unidentified User Agent |                  |
 48 |  98 | 111.251.93.170 | 2017-01-25 00:12:45 | Unidentified User Agent |                  |
 49 |  99 | 111.251.93.170 | 2017-01-25 00:13:50 | Unidentified User Agent |                  |
 50 | 100 | 111.251.93.170 | 2017-01-25 00:14:47 | Unidentified User Agent |                  |
 51 | 101 | 111.251.93.170 | 2017-01-25 00:16:26 | Unidentified User Agent |                  |
 52 | 102 | 111.251.93.170 | 2017-01-25 00:31:19 | Unidentified User Agent |                  |
 53 | 103 | 111.251.93.170 | 2017-01-25 01:11:45 | Unidentified User Agent |                  |
 54 | 104 | 111.251.93.170 | 2017-01-25 01:31:54 | Unidentified User Agent |                  |
 55 | 105 | 23.251.63.45   | 2017-01-25 01:48:22 | Unidentified User Agent |                  |
 56 | 106 | 111.251.93.170 | 2017-01-25 02:12:40 | Unidentified User Agent |                  |
 57 | 107 | 111.251.93.170 | 2017-01-25 02:33:18 | Unidentified User Agent |                  |
 58 | 108 | 111.251.93.170 | 2017-01-25 02:34:48 | Unidentified User Agent |                  |
 59 | 109 | 111.251.93.170 | 2017-01-25 02:35:53 | Unidentified User Agent |                  |
 60 | 110 | 111.251.93.170 | 2017-01-25 02:37:17 | Unidentified User Agent |                  |
 61 | 111 | 111.251.93.170 | 2017-01-25 02:43:16 | Unidentified User Agent |                  |
 62 | 112 | 111.251.93.170 | 2017-01-25 02:46:22 | Unidentified User Agent |                  |
 63 | 113 | 111.251.93.170 | 2017-01-25 02:48:32 | Unidentified User Agent |                  |
 64 | 114 | 111.251.93.170 | 2017-01-25 02:51:58 | Unidentified User Agent |                  |
 65 | 115 | 111.251.93.170 | 2017-01-25 03:01:26 | Unidentified User Agent |                  |
 66 | 116 | 111.251.93.170 | 2017-01-25 03:16:49 | Unidentified User Agent |                  |
 67 | 117 | 111.251.93.170 | 2017-01-25 03:22:45 | Unidentified User Agent |                  |
 68 | 118 | 111.251.93.170 | 2017-01-25 03:26:47 | Unidentified User Agent |                  |
 69 | 119 | 111.251.93.170 | 2017-01-25 03:33:23 | Unidentified User Agent |                  |
 70 | 120 | 111.251.93.170 | 2017-01-25 03:43:50 | Unidentified User Agent |                  |
 71 | 121 | 111.251.93.170 | 2017-01-25 03:49:33 | Unidentified User Agent |                  |
 72 | 122 | 111.251.93.170 | 2017-01-25 03:53:22 | Unidentified User Agent |                  |
 73 | 123 | 111.251.93.170 | 2017-01-25 03:58:46 | Unidentified User Agent |                  |
 74 | 124 | 111.251.93.170 | 2017-01-25 04:06:35 | Unidentified User Agent |                  |
 75 | 125 | 111.251.93.170 | 2017-01-25 04:08:54 | Unidentified User Agent |                  |
 76 | 126 | 111.251.93.170 | 2017-01-25 04:17:26 | Unidentified User Agent |                  |
 77 | 127 | 111.251.93.170 | 2017-01-25 04:21:49 | Unidentified User Agent |                  |
 78 | 128 | 111.251.93.170 | 2017-01-25 04:25:36 | Unidentified User Agent |                  |
 79 | 129 | 111.251.93.170 | 2017-01-25 04:31:20 | Unidentified User Agent |                  |
 80 | 130 | 111.251.93.170 | 2017-01-25 04:39:50 | Unidentified User Agent |                  |
 81 | 131 | 111.251.93.170 | 2017-01-25 04:46:16 | Unidentified User Agent |                  |
 82 | 132 | 111.251.93.170 | 2017-01-25 05:00:27 | Unidentified User Agent |                  |
 83 | 133 | 111.251.93.170 | 2017-01-25 05:05:55 | Unidentified User Agent |                  |
 84 | 134 | 111.251.93.170 | 2017-01-25 05:20:32 | Unidentified User Agent |                  |
 85 | 135 | 111.251.93.170 | 2017-01-25 05:23:52 | Unidentified User Agent |                  |
 86 | 136 | 111.251.93.170 | 2017-01-25 05:30:00 | Unidentified User Agent |                  |
 87 | 137 | 111.251.93.170 | 2017-01-25 05:44:46 | Unidentified User Agent |                  |
 88 | 138 | 111.251.93.170 | 2017-01-25 05:50:59 | Unidentified User Agent |                  |
 89 | 139 | 111.251.93.170 | 2017-01-25 05:54:41 | Unidentified User Agent |                  |
 90 | 140 | 23.251.63.45   | 2017-01-25 05:58:54 | Unidentified User Agent |                  |
 91 | 141 | 111.251.93.170 | 2017-01-25 06:14:16 | Unidentified User Agent |                  |
 92 | 142 | 111.251.93.170 | 2017-01-25 06:26:27 | Unidentified User Agent |                  |
 93 | 143 | 111.251.93.170 | 2017-01-25 06:32:40 | Unidentified User Agent |                  |
 94 | 144 | 111.251.93.170 | 2017-01-25 06:40:17 | Unidentified User Agent |                  |
 95 | 145 | 111.251.93.170 | 2017-01-25 06:53:45 | Unidentified User Agent |                  |
 96 | 146 | 111.251.93.170 | 2017-01-25 06:58:59 | Unidentified User Agent |                  |
 97 | 147 | 125.39.207.33  | 2017-01-25 07:05:01 | Unidentified User Agent | Unknown Platform |
 98 | 148 | 111.251.93.170 | 2017-01-25 07:11:58 | Unidentified User Agent |                  |
 99 | 149 | 111.251.93.170 | 2017-01-25 07:19:30 | Unidentified User Agent |                  |
100 | 150 | 183.60.48.110  | 2017-01-25 07:24:55 | Unidentified User Agent | Unknown Platform |
101 | 151 | 111.251.93.170 | 2017-01-25 07:25:34 | Unidentified User Agent |                  |
102 | 152 | 111.251.93.170 | 2017-01-25 07:28:56 | Unidentified User Agent |                  |
103 | 153 | 111.251.93.170 | 2017-01-25 07:35:52 | Unidentified User Agent |                  |
104 | 154 | 111.251.93.170 | 2017-01-25 07:43:21 | Unidentified User Agent |                  |
105 | 155 | 111.251.93.170 | 2017-01-25 07:48:11 | Unidentified User Agent |                  |
106 | 156 | 101.226.51.229 | 2017-01-25 07:57:36 | Chrome 45.0.2454.101    | Windows XP       |
107 | 157 | 111.251.93.170 | 2017-01-25 08:02:04 | Unidentified User Agent |                  |
108 | 158 | 111.251.93.170 | 2017-01-25 08:08:18 | Unidentified User Agent |                  |
109 | 159 | 111.251.93.170 | 2017-01-25 08:16:22 | Unidentified User Agent |                  |
110 | 160 | 111.251.93.170 | 2017-01-25 08:22:15 | Unidentified User Agent |                  |
111 | 161 | 111.251.93.170 | 2017-01-25 08:31:19 | Unidentified User Agent |                  |
112 | 162 | 111.251.93.170 | 2017-01-25 08:36:05 | Unidentified User Agent |                  |
113 | 163 | 111.251.93.170 | 2017-01-25 08:43:38 | Unidentified User Agent |                  |
114 | 164 | 111.251.93.170 | 2017-01-25 08:59:11 | Unidentified User Agent |                  |
115 | 165 | 111.251.93.170 | 2017-01-25 09:07:05 | Unidentified User Agent |                  |
116 | 166 | 111.251.93.170 | 2017-01-25 09:11:57 | Unidentified User Agent |                  |
117 +-----+----------------+---------------------+-------------------------+------------------+
118  

https://support.google.com/webmasters/answer/1061943?hl=en

Google crawlers

See which robots Google uses to crawl the web

"Crawler" is a generic term for any program (such as a robot or spider) used to automatically discover and scan websites by following links from one webpage to another. Google‘s main crawler is called Googlebot. This table lists information about the common Google crawlers you may see in your referrer logs, and how they should be specified in robots.txt, the robots meta tags, and the X-Robots-Tag HTTP directives.

Crawler User agent token Full user agent string (as seen in website log files)
Googlebot (Google Web search) Googlebot Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
or
(rarely used): Googlebot/2.1 (+http://www.google.com/bot.html)
Googlebot News Googlebot-News
(Googlebot)
Googlebot-News
Googlebot Images Googlebot-Image
(Googlebot)
Googlebot-Image/1.0
Googlebot Video Googlebot-Video
(Googlebot)
Googlebot-Video/1.0
Google Smartphone Googlebot
?Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

Google Mobile AdSense Mediapartners-Google

or

Mediapartners
(Googlebot)

[various mobile device types] (compatible; Mediapartners-Google/2.1+http://www.google.com/bot.html)
Google AdSense Mediapartners-Google
Mediapartners
(Googlebot)
Mediapartners-Google
Google AdsBot landing page quality check AdsBot-Google AdsBot-Google (+http://www.google.com/adsbot.html)

Google app crawler

(Used to fetch resources for mobile apps, obeys AdsBot-Google robots rules.)

AdsBot-Google-Mobile-Apps AdsBot-Google-Mobile-Apps

robots.txt

Where several user-agents are recognized in the robots.txt file, Google will follow the most specific. If you want all of Google to be able to crawl your pages, you don‘t need a robots.txt file at all. If you want to block or allow all of Google‘s crawlers from accessing some of your content, you can do this by specifying Googlebot as the user-agent. For example, if you want all your pages to appear in Google search, and if you want AdSense ads to appear on your pages, you don‘t need a robots.txt file. Similarly, if you want to block some pages from Google altogether, blocking the user-agent Googlebot will also block all Google‘s other user-agents.

But if you want more fine-grained control, you can get more specific. For example, you might want all your pages to appear in Google Search, but you don‘t want images in your personal directory to be crawled. In this case, use robots.txt to disallow the user-agent Googlebot-image from crawling the files in your /personal directory (while allowing Googlebot to crawl all files), like this:

User-agent: Googlebot
Disallow:

User-agent: Googlebot-Image
Disallow: /personal

To take another example, say that you want ads on all your pages, but you don‘t want those pages to appear in Google Search. Here, you‘d block Googlebot, but allow Mediapartners-Google, like this:

User-agent: Googlebot
Disallow: /

User-agent: Mediapartners-Google
Disallow:

robots meta tag

Some pages use multiple robots meta tags to specify directives for different crawlers, like this:

<meta name="robots" content="nofollow"><meta name="googlebot" content="noindex">

In this case, Google will use the sum of the negative directives, and Googlebot will follow both the noindex and nofollow directives. More detailed information about controlling how Google crawls and indexes your site.

时间: 2024-08-25 01:55:27

Googlebot (Google Web search)的相关文章

OPEN INFOREB SEARCH QUERY LOGSMATION EXTRACTION FROM WEB SEARCH QUERY LOGS

OPEN INFOREB SEARCH QUERY LOGSMATION EXTRACTION FROM WEB SEARCH QUERY LOGS 第一章  介绍 搜索引擎日益比传统的关键字输入.文档输出的先进,通过关注面向用户的任务提高用户体验,面向用户的任务包括查询建议.搜索个性化.推荐链接.这些以用户为中心的任务被从search query logs挖掘数据支撑.事实上,查询日志抓住用户对世界的认知,是这项应用的关键. 从查询日志中抽取的语言知识,如实体和关系,对上面的应用来说有很大的价

Google Web Toolkit(GWT) 在windows下环境搭建

1.什么是GWT? Google Web Toolkit(简称GWT,读作/?ɡw?t/),是一个前端使用JavaScript,后端使用Java的AJAX framework,以Apache许可证2.0版本开放源始码.GWT通过编译器将Java代码编译成JavaScript,可让开发人员使用Java程序设计语言,快速建置与维护复杂但高性能的JavaScript前端应用程序,借此减轻开发人员负担. 参见:http://zh.wikipedia.org/wiki/GWT 2.GWT SDK安装配置

GWT-Dev-Plugin(即google web toolkit developer plugin)for Chrome的安装方法

如果你想要在Chrome中进行GWT调试,需要安装"gwt developer plugin for chrome",但是普通安装模式下,会提示: This application is not supported on this computer. Installation has been disabled. The following problems are detected: NPAPI plugin is required by this app 中文大意是:该应用要求使用

Google Web Designer打开白屏问题的解决方案

Google Web Designer是谷歌出品的一个可视化的  HTML5  网页和广告的设计开发工具  Google Web Designer . 官网地址:https://www.google.com/webdesigner/ 但是我下载发现,打开就是白屏,后通过抓包发现webdesigner会连接谷歌网站,被墙了自然就无法打开了. 所以挂上VPN后自然就可以正常运行了. fiddler抓包: 挂上VPN后可正常进入界面:

Google Web开发最佳实践(一)

这篇文章最初是在阿里通信前端团队的github博客(http://aliqin.github.io)上看到的,原文地址https://developers.google.com/web/fundamentals/(要翻墙).既然要去阿里了,就得先熟悉熟悉环境,既然是最佳实践,就得自己亲自实践一下. 1.创建网站的内容和结构 内容是任何网站最重要的部分.所以让我们为内容而设计,而不要让设计支配内容.在这个手册中,我们首先确定我们需要的内容,基于这个内容创建一个页面结构,然后在简单的线性布局里呈现页

Mac效率:配置Alfred web search

// 这是一篇导入进来的旧博客,可能有时效性问题. 想用搜索引擎搜东西,或者查字典时,一般的workflow是:打开浏览器-->打开搜索引擎/字典网站-->输入搜索关键字-->回车.配置好Alfred web search后简化为:快捷键调出Alfred-->调用搜索指令-->回车.完全省去了与浏览器本身的交互,随时调用随时得到结果,专注于内容本身而不是浏览器操作. 另外我关闭了Spotlight,因为响应速度慢且有过多内容,只保留Alfred在快捷键command+spac

使用 Google Web Fonts

Google Fonts 的介绍:Google Fonts 并不是简单的字体下载站 Google Fonts 地址:https://www.google.com/fonts 左上角可以输入查找的字体名称, 比如输入 Source Sans Pro. 点击字体之后,可以选择使用的字体样式. 选择字体左下角的 Add to Collection, 之后,可以看到窗口左下角的 Use 可用了. 点击 Use 之后,进入使用选中字体的界面. 第一步,可以选择字体的样式. 第二步使用的字符集 第三步,导入

Click Models for Web Search(2) - Parameter Estimation

在Click Model中进行参数预估的方法有两种:最大似然(MLE)和期望最大(EM).至于每个click model使用哪种参数预估的方法取决于此model中的随机变量的特性.如果model中的随机变量都是可以observed,那么无疑使用MLE,而如果model中含有某些hidden variables,则应该使用EM算法. 1. THE MLE ALGORITHM 似然函数为: 则需要预估的参数的在似然函数最大时候的值为: 1)MLE FOR THE RCM AND CTR MODELS

Click Models for Web Search(1) - Basic Click Models

这篇文章主要是介绍一些基本的click model,这些不同的click model对用户与搜索结果页的交互行为进行不同的假设. 为了定义一个model,我们需要描述出observed variables,hidden variables,以及它们之间的关联,以及它们对model parameters的依赖关系.当我们获取了model parameters之后,我们便可以进行CTR 预估,或者计算数据的最大似然估计. 1. RANDOM CLICK MODEL (RCM) 这是最简单的一个mod