Lucene-全文索引

最近接触了lucene,我想也有很多人曾经听过，于是带着好奇心，我开始对lucene进行了解，给我影响最深的是它非常多的应用了索引表，这个工具之所以快是就是因为大量引用到了索引表。今天只说下我刚开始做的校历例子，创建索引。

下面对lucene从概念上做个介绍，Lucene是一个信息检索的函数库(Library),利用它你可以为你的应用加上索引和搜索的功能.Lucene的使用者不需要深入了解有关全文检索的知识,仅仅学会使用库中的一个类,你就为你的应用实现全文检索的功能.不过千万别以为Lucene是一个象google那样的搜索引擎,Lucene甚至不是一个应用程序,它仅仅是一个工具,一个Library.你也可以把它理解为一个将索引,搜索功能封装的很好的一套简单易用的API.利用这套API你可以做很多有关搜索的事情,而且很方便.

那么lucene可以做什么呢？Lucene可以对任何的数据做索引和搜索. Lucene不管数据源是什么格式,只要它能被转化为文字的形式,就可以被Lucene所分析利用.也就是说不管是MS word,
Html ,pdf还是其他什么形式的文件只要你可以从中抽取出文字形式的内容就可以被Lucene所用.你就可以用Lucene对它们进行索引以及搜索. 下面是我做的一个小例子，就是一个查询生成索引的例子：

<span style="font-size:14px;">package com.jikexueyuan.study;
import java.io.File;
import java.io.IOException;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.IntField;
import org.apache.lucene.document.Field.Store;
import org.apache.lucene.document.StringField;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.index.IndexWriterConfig.OpenMode;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.Version;
public class IndexCreate {

	/**
	 * @param args
	 */
	public static void main(String[] args) {
		// TODO Auto-generated method stub
		Analyzer analyzer=new StandardAnalyzer(Version.LUCENE_46);//StandardAnalyzer是将英文按照空格、标点符号等进行分词，将中文按照单个字进行分词，一个汉字算一个词
		IndexWriterConfig indexWriterConfig=new IndexWriterConfig(Version.LUCENE_46,analyzer);//把写入的文件用指定的分词器将文章分词（这样检索的时候才能查的快），然后将词放入索引文件。
		indexWriterConfig.setOpenMode(OpenMode.CREATE_OR_APPEND);
		Directory directory=null;
		IndexWriter indexWriter=null;
		try {
			directory=FSDirectory.open(new File("E://index/test"));// //索引库存放在这个文件夹里  ,Directory表示索引文件保存的地方，是抽象类，两个子类FSDirectory表示文件中，RAMDirectory 表示存储在内存中
			if(indexWriter.isLocked(directory)){
				indexWriter.unlock(directory);
			}
			indexWriter=new IndexWriter(directory,indexWriterConfig);
		} catch (Exception e) {
			e.printStackTrace();
		}

		//Document document=new Document();
		Document doc = new Document();
		doc.add(new StringField("id","abcde", Store.YES));
		doc.add(new org.apache.lucene.document.TextField("content","极客学院",Store.YES));
		doc.add(new IntField("num",1,Store.YES));

		try {
			indexWriter.addDocument(doc);//向索引中添加文档（Insert）
		} catch (Exception e) {

			e.printStackTrace();

		}

		Document doc1 = new Document();
		doc1.add(new StringField("id","sdfsd", Store.YES));
		doc1.add(new org.apache.lucene.document.TextField("content","Lucene案例",Store.YES));
		doc1.add(new IntField("num",1,Store.YES));

		try {
			indexWriter.addDocument(doc1);
		} catch (Exception e) {

			e.printStackTrace();

		}
		try {
			indexWriter.commit();
			indexWriter.close();
			directory.close();
		} catch (Exception e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		}
	}
}
</span>

结果会生成一系列的有关索引的文件，如下图：

从上面的例子我们可以看出创建索引需要的三个要素分别是：

1、indexWriter

2、Directory

3、Anayzer

4、Document

5、Field

对于lucene的分享还要继续，希望有越来越多的人可以共同努力！

时间： 2024-08-25 11:10:16

Lucene-全文索引

Lucene-全文索引的相关文章

全文索引-lucene，solr，nutch，hadoop之nutch与hadoop

Lucene：基于Java的全文检索引擎简介 (zhuan)

全文索引之nutch与hadoop（转）

Lucene：基于Java的全文检索引擎简介

做一个合格的前端，gulp自动化构建工具入门教程

使用Lucene.Net做一个简单的搜索引擎-全文索引

Lucene 基础理论

初识Lucene

整合Lucene 4.10.1 与IK Analyzer

Hadoop与Lucene和Nutch的关系