转载慧都控件网的一篇文章

dtSearch使用教程:全文数据库索引

A simple index

Start a new C# project and make sure you have added a reference to the dtSearch library and have added:


1

using dtSearch.Engine;

to the start of the project.

Creating an index under program control with dtSearch is exceptionally simple. All you need is an IndexJob object:


1

IndexJob indexJob = new IndexJob();

You simply set the properties of the IndexJob object to specify the index you want to create and call one of the Execute methods to build or update the index.

So what do you have to specify to create an index?

First you have to say where you want the index to be created:


1

indexJob.IndexPath = @"C:\Users\name\AppData\Local\dtSearch\test2";

There is no particular reason to use this location; it is just the default used by the dtSearch Desktop utility for the indexes it creates. Notice that you specify the directory that the files for the index are created in.

Next you have to specify the folders and file that you would like to index. This is achieved using the FoldersToIndex string collection. You can add as many strings specifying paths to  folders to this collection as you need. For the example we will add just one:


1

indexJob.FoldersToIndex.Add(@"C:\Users\name\Documents");

You can add a <+> to the end of the path to signify that all of the subfolders should be indexed. If you don‘t add <+> then just the content of the specified folder is indexed.  You can also add include and exclude filters to specify which types of file are to be indexed. For simplicity we will ignore filters.

Finally, we have to set some "Action" properties that indicate how the indexing operation should be performed. The ActionCreate property has to be set to true for the indexing operation to create a new index. If the index already exists then it is overwritten. The ActionAdd property allows new documents to be added to the index. To create a new empty index and add files to it you have to set both:


1

2

indexJob.ActionCreate = true;

indexJob.ActionAdd = true;

The IndexJob is now setup with minimal configuration and we can start it going. The simplest way to do this is to use the Execute method. This starts the indexing off and only returns with a Boolean to indicate success or failure when the index is complete. So, to complete the program, we have to add:


1

bool result = indexJob.Execute();

The complete program is:


1

2

3

4

5

6

7

8

IndexJob indexJob = new IndexJob();

indexJob.FoldersToIndex.Add(@"C:\Users\

                       name\Documents");

indexJob.IndexPath = @"C:\Users\name\

          AppData\Local\dtSearch\test2";

indexJob.ActionCreate = true;

indexJob.ActionAdd = true;

bool result = indexJob.Execute();

Execute may be simple but it isn‘t really of much use.

Do you really want your indexing program to wait unresponsively while the index is constructed?

No, probably not.

In most cases the construction of an index takes more time that you can afford to have the UI blocked for. The standard solution in this case is to run the long blocking process on another thread. In this case dtSearch makes this very easy for you.

Instead of calling Execute, all you have to do is call ExecuteInThread and the call returns immediately and the indexing proceeds on another thread. You can keep control of the progress of the index using IsThreadDone, AbortThread and so on.

Implementing a full indexing application using these facilities is fairly easy - everything works as you would expect - and so for simplicity of the example we will avoid the slight complication of making the indexing asynchronous. In this case it doesn‘t matter too much because the index is small and completed in a few minutes or less.

Other data sources

One of the nice things about dtSearch is that it tends to implement facilities in ways that are simple, direct and probably the way you would choose to do it as well. Of course this means that you don‘t get the chance to use a lot of new jargon but you also get the program completed quicker.

Rather than implementing lots of different interfaces to work with standard data exchange protocols dtSearch simply provides a DataSource class. This uses any protocol you care to name internally to retrieve the data and then presents it to the indexing engine in a simple and uniform way.

Now in all probability you are already an expert on ADO, LINQ or RSS and so I‘m not going to go over any of these technologies. What I am going to concentrate on is how the DataSource class is used to feed the data to the indexing engine.

Let‘s get started.

Creating a custom DataSource

The basic idea is very simple - you have to create a class that inherits from DataSource. You have to override a few of the DataSource methods to provide the data to the search engine.

You can provide the data to the search engine either via DocText, DocStream, DocBytes or DocIsFile. The difference is that DocText is a simple string and the other three provide binary data that is treated as if it was a file of a specified format.

There are only two methods you have to implement - GetNextDoc and Rewind.

The GetNexDoc has to get the next "document" be it a row in a database table or a file downloaded by any means you want to use and present it to the indexing engine via one of the properties listed above. It simply returns true or false to indicate success or failure.

The Rewind method simply resets the document sequence so that the next GetNextDoc returns the first document in the sequence. It too returns true or false to indicate success or failure.

There are some other properties that you have to set to make everything work well but these are the basic core set. Let‘s see how it all works.

Rather than write an example that uses ADO, LINQ or some other data protocol it is simpler to read some files from disk. It shows how everything works and you can modify it to work with any other protocol. In fact the index to be constructed is the same as the first example.

First we need to define our own DataSource class:


1

2

public class myDataSource : DataSource

{

Usually this would be in another file in the project but when experimenting you can include it within the form‘s source file. Also to keep things simple let‘s not bother writing a constructor and dispense with error checking.  This is not the way you would do it in anything other than an example that has been stripped down the to the bare minimum.

We need to override two methods GetNextDoc and Rewind. The Rewind method has to reset the data import so this is also the place to write the initialization code:


1

2

3

4

5

6

7

public override bool Rewind()

{

 files = Directory.GetFiles(@"C:\Users\

                 name\Documents");

 currentFile = 0;

 return true;

}

We are using standard .NET I/O classes to work with the file system. You need to add:


1

using System.IO;

and declare the two private variables:


1

2

private string[] files;

private int currentFile;

We now have a list of file names in the string array files. Notice that we really do need to check that this operation worked and return false if it didn‘t. In a more realistic application the Rewind might well only reset the position in the data and you would probably need to write a separate initialization method to be used internally by the DataSource class.

The GetNextDoc method could return the next file in the list in a number of different ways - as a file, as stream or as an array of bytes. We could even read the file in and extract any text it might contain and present this as a string. In this case let‘s read the file into a byte array and present this to the indexing engine:


1

2

override public  bool GetNextDoc()

{

First we should check that we haven‘t reached the end of the list of files:


1

2

if (currentFile >= files.Length)

                     return false;

As long as there is a file to process we can process it. First we set DocName to the name of the file, notice that DocName is one of the inherited properties:


1

2

DocName = files[currentFile];

currentFile++;

Next we set the inherited data and time stamp properties:


1

2

DocCreatedDate = File.GetCreationTime(DocName);

DocModifiedDate=File.GetLastAccessTime(DocName);

We also have to set DocIsFile to false to stop the Index engine reading the file in from disk on is own - yes we could get it to do all of the work but this wouldn‘t illustrate how to get raw data to it.


1

DocIsFile = false;

As we have decided to handle the data input ourselves we next have to read the data into a byte array. We also have to check that the file actually has some data to read:


1

2

3

4

5

FileStream reader = File.OpenRead(DocName);

if (reader.Length > 0)

{

 byte[] fileData = new byte[reader.Length];

 reader.Read(fileData, 0, (int)reader.Length);

At this point we have the entire content of the file stored in the fileData array. However the file data has to be presented in DocBytes and we also have to set HaveDocBytes to true to indicate to the indexing engine that it has to read and process DocBytes:


1

2

3

DocBytes = fileData;

 HaveDocBytes = true;

}

We can now finish the method and the class:


1

2

return true;

}

The entire class is surprisingly short


1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

public class myDataSource : DataSource

{

 private string[] files;

 private int currentFile;

 override public  bool GetNextDoc()

 {

  if(currentFile >= files.Length)return false;

  DocName = files[currentFile];

  currentFile++;

  DocCreatedDate =

           File.GetCreationTime(DocName);

  DocModifiedDate =

           File.GetLastAccessTime(DocName);

  DocIsFile = false;

  FileStream reader = File.OpenRead(DocName);

  if (reader.Length > 0)

  {

   byte[] fileData = new byte[reader.Length];

   reader.Read(fileData,0,(int)reader.Length);

   DocBytes = fileData;

   HaveDocBytes = true;

  }

 return true;

 }

 public override bool Rewind()

 {

  files = Directory.GetFiles(@"C:\Users\

                           name\Documents");

  currentFile = 0;

  return true;

 }

}

Using the custom DataSource

Now we have the custom DataSource we can make use of it. Setting up the index creation is much the same as before - create IndexJob, set index path and action properties:


1

2

3

4

5

IndexJob indexJob = new IndexJob();

indexJob.IndexPath = @"C:\Users\name\

            AppData\Local\dtSearch\test2";

indexJob.ActionCreate = true;

indexJob.ActionAdd = true;

Next we create an instance of the custom DataSource:


1

myDataSource dataSource1 = new myDataSource();

Finally we can tell the IndexJob to use the data source,  and finally execute the job:


1

2

indexJob.DataSourceToIndex = dataSource1;

bool result = indexJob.Execute();

The indexing engine performs a rewind to make sure everything is initialized before it begins.

If you try this out you will discover that the contents of the index are the same as before. The program might achieve the same result but it does it in a very different way. Now you can take the same DataSource class and customize it to provide documents or raw text from any source you care to use - ODB, ADO.NET, LINQ, raw SQL, XML, RSS or any of the many web APIs.

原文地址:http://www.i-programmer.info/programming/database/3408-full-text-database-indexing-with-dtsearch.html

时间: 2024-10-13 00:44:55

转载慧都控件网的一篇文章的相关文章

mvc日期控件datepick的几篇文章,日后再总结吧

instinctcoder里有两篇,入门级的 http://instinctcoder.com/asp-net-mvc-4-jquery-datepicker/ http://instinctcoder.com/asp-net-mvc-4-jquery-datepicker-date-format-validation/ @Html.TextBoxFor(m=>m.UserName) 生成html时,input控件type="text", id和name属性都被赋值为"

Aspose.BarCode已修复关于PDF417条码识别和生成的各种问题条码控件网

Aspose.BarCode是由Aspose Pty Ltd所开发的一款功能强大,且稳健的条形码生成和条码识别组件,其使用托管的C#编写,能帮助开发者快速简便的向其Microsoft应用程序(WinForms, ASP.NET 和.NET Compact Framework)添加条形码生成和条码识别功能.有了Aspose.BarCode,开发者能对条形码图像的每一方面进行全面的控制:背景颜色,条形颜 色,图像质量,旋转角度,X尺寸,标题,客户自定义分辨率等.Aspose.BarCode可以从任意

【转载】OLE控件在Direct3D中的渲染方法

原文:OLE控件在Direct3D中的渲染方法 Windows上的图形绘制是基于GDI的, 而Direct3D并不是, 所以, 要在3D窗口中显示一些Windows中的控件会有很多问题 那么, 有什么办法让GDI绘制的内容在3D中显示出来?反正都是图像, 总有办法实现的嘛! 前段时间在研究浏览器在游戏中的嵌入, 基本的思路就是在后台打开一个浏览窗口, 然后把它显示的内容拷贝到一张纹理上, 再把纹理在D3D中绘制出来, 至于事件处理就要另做文章了. 所以, 其它的Windows里的GDI绘制的东西

(转)深入理解最强桌面地图控件GMAP.NET --- 原理篇

前几篇介绍了一些国内地图的案例, 深入理解最强桌面地图控件GMAP.NET --- SOSO地图 深入理解最强桌面地图控件GMAP.NET --- 百度地图 我们以Google地图为例,这章介绍下地图加载的原理. 投影(Projection) 谷歌地图采用的是墨卡托投影法,这里转载(http://www.cnblogs.com/willwayer/archive/2010/06/11/1756446.html) 下墨卡托投影的定义:墨卡托(Mercator)投影,又名“等角正轴圆柱投影”,荷兰地

MFC之控件和Cstring类型转换篇

1.打开文件 CFileDialog dlg(TRUE,NULL,NULL,OFN_ALLOWMULTISELECT,_T("All File |*.*|Jpeg File(*.jpg;*.jpeg;*.jpe)|*.jpg;*.jpeg;*.jpe|Windows(*.bmp)|*.bmp|CompuServe GIF(*.gif)|*.gif|Png文件(*.png)|*.png||"),this); dlg.m_ofn.lpstrTitle =_T("Open"

C# - 杨涛分页控件AspNetPager sql分页篇

http://www.webdiyer.com/downloads/ 前台 <%@ Page Language="C#" AutoEventWireup="true" CodeFile="Default.aspx.cs" Inherits="_Default" %> <%@ Register Assembly="AspNetPager" Namespace="Wuqi.Webdi

绘制功能丰富的图表控件Essential Diagram for WPF

Essential Diagram for WPF控件能够帮助你绘制强大的和功能丰富的图表,提供了直观地用户交互式的模型来创建和编辑图图表,支持数据绑定,打印.序列化以及自动布局等.提供了多种图型对象,如:节点.连接线.装饰图型等.类似Visio的图形控件. 具体功能: 控件提供了几种自定义选项用于改变图表外观 能够自定义端口用于连接,支持几种链接端口类型:箭头.圆.菱形 支持正交.贝塞尔.笔直的连接线 连接线可绘制成多段线样式 支持连接线桥接 支持通过从符号面板或者代码来绘制节点 控件提供了多

Essential Studio Reporting Edition报表控件介绍及下载

Essential Studio Reporting Edition是一款先进的报表解决方案,可以为你的WinForm和WebForm应用程序创建专业的基于文档的报表,如PDF.Excel.Word,并且还提供了计算功能,是开发文档商业报表的首选,支持32位和64为机器,可用于Visual Studio 2005到2010. 具体功能: Essential XlsIO 是一款.NET库,可用于读写Microsoft Excel 文件,具备丰富的功能,完全像Microsoft Office办公自动

DevExpress Grid控件经典常用功能代码收集

随着DevExpress 控件包越来越多的被中国用户使用,由于是英文版本,看英文版使用说明非常困难,慧都控件网在DevExpress 控件包使用方面有多年的研究,慧都控件网会不断的把DevExpress 使用经验分享给大家.»更多DevExpress开发资源与帮助文档 下面是我们平时收集最常用的DevExpress Winform 4个代码片段,比较常用,希望对广大DEV用户有帮助. 一 .GridControl的删除操作 private void rILinkEditInfoDel_Click