自己网站的ROBOTS.TXT屏蔽的记录,以及一些代码和示例:
屏蔽后台目录,为了安全,做双层管理后台目录/a/xxxx/,蜘蛛屏蔽/a/,既不透露后台路径,也屏蔽蜘蛛爬后台目录
缓存,阻止蜘蛛爬静态缓存文件
下载,阻止蜘蛛爬下载目录,若无用,删除下载目录
编辑器,阻止蜘蛛爬编辑器,也防止编辑器目录被发现产生安全隐患
邮件,阻止蜘蛛爬静态邮件模板
其他页面,无收录价值页面屏蔽
图片,阻止蜘蛛爬除JPG/jpg类文件之外的任何类型图片
核心文件目录,阻止蜘蛛直接爬include及其子目录(函数/类库/模型/模板等)
媒体目录,阻止爬播放类型媒体目录,若无用,删除该目录
附加参数页面,阻止蜘蛛爬带参数的页面
RAR ZIP GZ文件类型
无效蜘蛛、恶意蜘蛛屏蔽
指定sitemap.xml位置
目录屏蔽:
User-agent: *
Disallow: /a/
Disallow: /cache/
Disallow: /download/
Disallow: /editors/
Disallow: /email/
Disallow: /extras/
Disallow: /images/
Disallow: /includes/
Disallow: /media/
Disallow: /pub/
Disallow: /nddbc.html
Disallow: /page_not_found.php
Disallow: /login.html
Disallow: /privacy.html
Disallow: /conditions.html
Disallow: /contact_us.html
Disallow: /gv_faq.html
Disallow: /discount_coupon.html
Disallow: /unsubscribe.html
Disallow: /shopping_cart.html
Disallow: /ask_a_question.html
Disallow: /popup_image_additional.html
Disallow: /product_reviews_write.html
Disallow: /tell_a_friend.html
Disallow: /pages-popup_image.html
Disallow: /popup_image_additional.html
Disallow: /login.html
阻止蜘蛛爬非jpg图片(限制产品图片格式为jpg)
User-agent: Googlebot
Allow: .jpg$
Disallow: .jpeg$
Disallow: .gif$
Disallow: .png$
Disallow: .bmp$
阻止蜘蛛爬压缩文件
User-agent: *
Disallow: .zip$
Disallow: .rar$
Disallow: .gz$
Disallow: .tar $
制定sitemap地址
Sitemap: http://www.xxx.jp/sitemap.xml
其他无效蜘蛛、恶意蜘蛛屏蔽:
User-Agent: almaden
Disallow: /
User-Agent: ASPSeek
Disallow: /
User-Agent: Axmo
Disallow: /
User-Agent: BaiduSpider
Disallow: /
User-Agent: booch
Disallow: /
User-Agent: DTS Agent
Disallow: /
User-Agent: Downloader
Disallow: /
User-Agent: EmailCollector
Disallow: /
User-Agent: EmailSiphon
Disallow: /
User-Agent: EmailWolf
Disallow: /
User-Agent: Expired Domain Sleuth
Disallow: /
User-Agent: Franklin Locator
Disallow: /
User-Agent: Gaisbot
Disallow: /
User-Agent: grub
Disallow: /
User-Agent: HughCrawler
Disallow: /
User-Agent: iaea.org
Disallow: /
User-Agent: lcabotAccept
Disallow: /
User-Agent: IconSurf
Disallow: /
User-Agent: Iltrovatore-Setaccio
Disallow: /
User-Agent: Indy Library
Disallow: /
User-Agent: IUPUI
Disallow: /
User-Agent: Kittiecentral
Disallow: /
User-Agent: iaea.org
Disallow: /
User-Agent: larbin
Disallow: /
User-Agent: lwp-trivial
Disallow: /
User-Agent: MetaTagRobot
Disallow: /
User-Agent: Missigua Locator
Disallow: /
User-Agent: NetResearchServer
Disallow: /
User-Agent: NextGenSearch
Disallow: /
User-Agent: NPbot
Disallow: /
User-Agent: Nutch
Disallow: /
User-Agent: ObjectsSearch
Disallow: /
User-Agent: Oracle Ultra Search
Disallow: /
User-Agent: PEERbot
Disallow: /
User-Agent: PictureOfInternet
Disallow: /
User-Agent: PlantyNet
Disallow: /
User-Agent: QuepasaCreep
Disallow: /
User-Agent: ScSpider
Disallow: /
User-Agent: SOFT411
Disallow: /
User-Agent: spider.acont.de
Disallow: /
User-Agent: Sqworm
Disallow: /
User-Agent: SSM Agent
Disallow: /
User-Agent: TAMU
Disallow: /
User-Agent: TheUsefulbot
Disallow: /
User-Agent: TurnitinBot
Disallow: /
User-Agent: Tutorial Crawler
Disallow: /
User-Agent: TutorGig
Disallow: /
User-Agent: WebCopier
Disallow: /
User-Agent: WebZIP
Disallow: /
User-Agent: ZipppBot
Disallow: /
User-Agent: Xenu
Disallow: /
User-Agent: Wotbox
Disallow: /
User-Agent: Wget
Disallow: /
User-Agent: NaverBot
Disallow: /
User-Agent: mozDex
Disallow: /
User-Agent: Sosospider
Disallow: /
User-Agent: Baidupider
Disallow: /