Backup and restore of FAST Search for SharePoint 2010

一个同事问我一个问题: 如果FAST Search for SharePoint 2010被full restore到了一个之前的时间点, 那么当FAST Search重新开始一个增量爬网的时候, 会发生什么? FAST Search会查看内容数据库并发现上一次爬网的记录并为新item或更改的item制作索引么? FAST Search会发现索引与现在内容的不一致么? 还是说它直接会再来一次full crawl?

 

Some Basics

===================

Fast Search for SharePoint 2010 contains several indexing connectors. They can be divided into three types:

· The Microsoft SharePoint Server 2010 indexing connectors and crawling framework (Content SSA)

· Federated search connectors

o Federated search connectors enable you to pass a query to a target system and display results returned from that system without actually crawling that content.

· The FAST Search Server 2010 for SharePoint specific indexing connectors

o FAST Search Web Crawler

o FAST Search JDBC Connector

o FAST Search Lotus Notes Connector

Based on the introduction, we can see only Content SSA and Specific indexing connectors crawl items.

 

How FAST Search for SharePoint crawl items?

====================

For specific indexing connectors, mostly they use checksum based change detection for incremental crawls. This means that if you restored FAST Search to a previous recovery point, the checksum will still be check if the item is changed from last crawl. One incremental crawl after the FAST restore, you will be using correct index for your users’ query. So, no impact on this type.

For Content SSA, we need to talk a little deep to explain.

For this type of connectors, crawl can be divided into two steps:

1. Gathering

2. Feed item to ‘filter’ component.

SharePoint 2010 and FAST Search for 2010 utilize the same process for gathering SharePoint internal content. What different is after the content has been got by the search engine, which component is used to process the item.

· For SharePoint Search, iFilters will be used.

· For FAST Search for SharePoint 2010, FAST Content Plug-in will feed the batch of gathered items to FAST Search pipeline via FAST Content Distributor where items are filtered and processed into an index.

 

Now we will focus on the gathering part.

During an Incremental Crawl, the Crawler will pass along a Change Log Cookie (that it received from the WFE on the previous crawl) to the WFE. This change log cookie contains GUID for applicable Content DB and a row ID from EventCache table.

With this row ID, WFE will look up the EventCache table and knows what items have been changed since the last crawl, and then response the crawler items needs to be crawled.

 

Imagine we have the following event sequence:

? Incremental crawl 1 -> FAST Search full backup -> ItemA changed -> Increment crawl 2 -> FAST Search full restore -> Incremental crawl 3

Incremental crawl 3 will not crawl ItemA. This will bring inconsistency.

Another thing to consider is, EventCache table will be cleaned by a SharePoint timer job. If the recovery point is from long time ago, this is another factor that might bring inconsistency.

 

解释一番之后, 结论如下:

SharePoint Site内容的index与实际内容可能会有不一致, 其他类型connector制作的索引应该没问题.

避免不一致的方式是在full restore之后来一次full crawl, 这样用还是可以用的, 全爬网之后, 就彻底没问题了.

 

Reference

==================

Full backup and restore (FAST Search Server 2010 for SharePoint)

http://technet.microsoft.com/en-us/library/ff460221(v=office.14).aspx#BKMK_FullRestore

SP2010 Search *Explained: Crawling

http://blogs.msdn.com/b/sharepoint_strategery/archive/2012/10/30/sp2010-search-explained-crawling.aspx

SharePoint 2010/2013: “Change Log “Timer Job is not cleaning up Expired entries in EventCache Table

http://blogs.msdn.com/b/spses/archive/2013/05/02/sharepoint-2010-2013-change-log-timer-job-is-not-cleaning-up-expired-entries-in-eventcache-table.aspx

Plan for crawling and federation (FAST Search Server 2010 for SharePoint)

http://technet.microsoft.com/en-us/library/ff383278.aspx

Backup and restore of FAST Search for SharePoint 2010,布布扣,bubuko.com

时间: 2024-10-24 20:37:45

Backup and restore of FAST Search for SharePoint 2010的相关文章

分享微软官方Demo用的SharePoint 2010, Exchange 2010, Lync 2010虚拟机

微软官方有一套专门用于SharePoint 2010, Exchange 2010 Demo的虚拟机:SharePoint 2010: Information Worker Demonstration and Evaluation Virtual Machine.很多官方的教程和示列都是基于这个Demo的环境来演示的.自从2013发布后,就不再提供它的下载了.为了方便大家学习SharePoint 2010,特将其共享出来. 百度网盘:http://pan.baidu.com/s/1o64Y2T0

SharePoint 2010版本表

描述 SharePoint Foundation KB SharePoint Server 2010 KB FAST Search Server 版本号 June 2014 CU 2880975 2880972   14.0.7125.5002 April 2014 CU 2878270 2878250 2863932 14.0.7121.5004 February 2014 CU 2863938 2863913   14.0.7116.5000 December 2013 CU 2849990

TFS Express backup and restore

 When we setup source control server, we should always make a backup and restore plan for it. This article is to describe how to backup and restore a TFS Express instance from one server to another server. This blog is an English version, for Chine

Science14年的聚类论文——Clustering by fast search and find of density peaks

这是一个比较新的聚类方法(文章中没看见作者对其取名,在这里我姑且称该方法为local density clustering,LDC),在聚类这个古老的主题上似乎最近一些年的突破不大,这篇文章算是很好的了,方法让人很有启发(醍醐灌顶),并且是发表在Science上,受到的关注自然非常大. 本文的核心亮点:1是用比较新颖的方法来确定聚类中心,2是采用距离的local density来进行聚类的划分.在这两点中,常见的Kmeans算法采用的方法是:用每一类的均值作为中点,用距离的最近的点来确定聚类划分

How to backup and restore database in SQL Server

/*By Dylan SUN*/ If you want to backup and restore one database in SQL Server. Firstly, create a shared folder, and add everyone with read/write right. Secondly, backup your database. You can use the following script : backup database DatabaseName to

Science论文"Clustering by fast search and find of density peaks"学习笔记

"Clustering by fast search and find of density peaks"是今年6月份在<Science>期刊上发表的的一篇论文,论文中提出了一种非常巧妙的聚类算法.经过几天的努力,终于用python实现了文中的算法,下面与大家分享一下自己对算法的理解及实现过程中遇到的问题和解决办法. 首先,该算法是基于这样的假设:类簇中心被具有较低局部密度的邻居点包围,且与具有更高密度的任何点有相对较大的距离.对于每一个数据点,要计算两个量:点的局部密度和

第一章、关于SQL Server数据库的备份和还原(sp_addumpdevice、backup、Restore)

在sql server数据库中,备份和还原都只能在服务器上进行,备份的数据文件在服务器上,还原的数据文件也只能在服务器上,当在非服务器的机器上启动sql server客户端的时候,也可以通过该客户端来备份和还原数据库,但是这种操作实质是在服务器上进行的,备份的数据文件在服务器上,还原的数据文件也只能在服务器上,这个原则不会变,只是使用了客户端的一个工具来操作这个过程而已. 1.1.备份数据库 备份数据库有两种方式: 第一种是在企业管理器中,利用工具对数据库进行备份,这种备份的文件只会有一个,即以

Clustering by fast search and find of density peaks

"Clustering by fast search and find of density peaks"是20114年6月份在<Science>期刊上发表的的一篇论文,论文中提出了一种非常巧妙的聚类算法. 首先,该算法是基于这样的假设: (1)聚类中心密度要高 (2)高密度中心点之间的距离应该相对远一些. 异常点都会被排除,同时也和形状无关. 首先,这种方法不像原先的Kmeans那样随机初始种子点然后迭代,它是根据样本的密度峰值来确定聚类中心的,当然聚类中心确定之后,后面

Hadoop实现Clustering by fast search and find of density peaks

Hadoop实现Clustering by fast search and find of density peaks 本篇博客参考:Clustering by fast search and find of density peaks论文以及http://www.cnblogs.com/peghoty/p/3945653.html. Hadoop版本:2.6.0,Myeclipse:10.0 代码可在https://github.com/fansy1990/fast_cluster下载. 1.