lucene学习-创建索引

本文的lucene是基于lucene3.5版本.

使用lucene实现搜索引擎开发，核心的部分是建立索引和搜索。本节主要是记录创建索引部分的内容。

创建的索引结构如图所示。

创建索引的步骤分为以下几个步骤：

1、建立索引器IndexWriter

2、创建文档对象Document

3、建立信息对象字段Field

4、将Field对象添加到Document

5、将Document对象添加到IndexWriter对象中

下面简要介绍几个核心对象。

（1）、创建IndexWriter对象。

IndexWriter writer=new IndexWriter(directory, iwc)。

directory是创建的索引要保存的路径，如果要保存在硬盘中则使用Directory directory = FSDirectory.open(new
File(path))创建一个directory对象。

如果要保存在内存中则使用：RAMDirectory directory=new RAMDirectory()创建一个directory对象。

（2）、创建Document对象。

Document doc =new
Document()；创建了一个不含有任何Field的空Document，如果要要Field添加到Document中，则使用add（Field）方法即可实现。

doc.add(field）。

（3）、创建Field对象。

Field field=new Field（Field名称，Field内容，存储方式，索引方式）；

存储方式分为3种：1、完全存储（Field.Store.YES）；2、不存储（Field.Store.NO）；3、压缩存储（Field.Store.COMPRESS）。

索引方式分为4种：1、不索引（Field.Index.NO）；2、 Field.Index.ANALYZED
；3、 Field.Index.NOT_ANALYZED；4、Field.Index.NOT_ANALYZED_NO_NORMS

创建一个简单的索引程序代码如下所示：

public void Index() {

String[] ids = { "1", "2", "3", "4"
};

String[] names = { "aa", "bb", "cc", "dd"
};

String[] contents = {

"Using AbstractJExcelView to export data to Excel file via JExcelAPI library",

"Using AbstractPdfView to export data to Pdf file via Bruno Lowagie’s iText library. ",

"Example to integrate Log4j into the Spring MVC application. ",

"Using Hibernate validator (JSR303 implementation) to validate bean in Spring MVC. "
};

IndexWriter writer = null;

try
{

Directory directory = FSDirectory.open(new
File(path));

// RAMDirectory directory=new RAMDirectory();

IndexWriterConfig iwc = new
IndexWriterConfig(Version.LUCENE_35,

new
StandardAnalyzer(Version.LUCENE_35));

writer = new
IndexWriter(directory, iwc);

Document doc = null;

for
(int i = 0; i < ids.length; i++) {

doc = new
Document();

doc.add(new
Field("id", ids[i], Field.Store.YES,

Field.Index.NOT_ANALYZED_NO_NORMS));

doc.add(new
Field("name", names[i], Field.Store.YES,

Field.Index.NOT_ANALYZED_NO_NORMS));

doc.add(new
Field("contents", contents[i], Field.Store.YES,

Field.Index.ANALYZED));

SimpleDateFormat sdf = new
SimpleDateFormat("yyyy-MM-dd");

doc.add(new
Field("date", sdf.format(new
Date()),

Field.Store.YES, Field.Index.NOT_ANALYZED));

// Field.Index.ANALYZED;

writer.addDocument(doc);

writer.commit();

}

} catch
(IOException e) {

e.printStackTrace();

} finally
{

if
(writer != null) {

try
{

writer.close();

} catch
(CorruptIndexException e) {

e.printStackTrace();

} catch
(IOException e) {

e.printStackTrace();

}

lucene学习-创建索引

时间： 2024-11-03 05:23:10

lucene学习-创建索引的相关文章

lucene之创建索引代码

public void createIndex() throws IOException { // 第一步采集数据:(jdbc采集数据) BookDao dao = new BookDaoImpl(); List<Book> queryBookList = dao.queryBookList(); // 将数据采集放到docment对象中 Document doc = null; List<Document> docList = new ArrayList<>(); f

lucene入门创建索引——（一）

1.程序宏观结构图 2.创建索引过程 3.代码实现创建索引库: 1) 创建JavaBean对象 2) 创建Docment对象 3) 将JavaBean对象所有的属性值,均放到Document对象中去,属性名可以和JavaBean相同或不同 4) 创建IndexWriter对象 5) 将Document对象通过IndexWriter对象写入索引库中 6) 关闭IndexWriter对象 Jar包: 代码: 1 // 创建索引 2 @Test 3 public void testInd

搜索引擎系列 ---lucene简介创建索引和搜索初步

一.什么是Lucene? Lucene最初是由Doug Cutting开发的,2000年3月,发布第一个版本,是一个全文检索引擎的架构,提供了完整的查询引擎和索引引擎 :Lucene得名于Doug妻子的中名,同时这也她外祖母的姓;目前是Apache基金会的一个顶级项目,同时也是学习搜索引擎入门必知必会. Lucene 是一个 JAVA 搜索类库,它本身并不是一个完整的解决方案,需要额外的开发工作. 优点:成熟的解决方案,有很多的成功案例.apache 顶级项目,正在持续快速的进步.庞大而活跃的开

搜索引擎系列 -lucene简介创建索引和搜索初步步骤

lucene中创建索引库

package com.hope.lucene; import org.apache.commons.io.FileUtils;import org.apache.lucene.document.Document;import org.apache.lucene.document.Field;import org.apache.lucene.document.TextField;import org.apache.lucene.index.IndexWriter;import org.apach

【Lucene】Lucene 学习之索引文件结构

Lucene 索引文件结构基本概念索引(index) Lucene的索引由许多个文件组成,这些文件放在同一个目录下段(segment) 一个Lucene的索引由多个段组成,段与段之间是独立的.添加新的文档时可以生成新的段,达到阈值(段的个数,段中包含的文件数等)时,不同的段可以合并. 在文件夹下,具有相同前缀的文件属于同一个段 segments.gen 和 segments_N(N表示一个具体数字,eg:segments_5)是段的元数据文件,他们保存了段的属性信息. 文档(documen

全文检索之lucene的优化篇--创建索引库

在上一篇HelloWorld的基础上,建立一个directory的包,添加一个DirectoryTest的测试类,用来根据指定的索引目录创建目录存放指引. DirectoryTest类中的代码如下,基本上就是在HelloWorld的基础上改改就可以了. 里面一共三个方法,testDirectory(),测试创建索引库;testDirectoryFSAndRAM(),结合方法1的两种创建方式,优化;testDirectoryOptimize(),在方法2个基础上,研究索引的优化创建,减少创建的索引

一、创建索引之代码开发

jar包: Lucene包: lucene-core-4.10.3.jar lucene-analyzers-common-4.10.3.jar lucene-queryparser-4.10.3.jar 其它: commons-io-2.4.jar junit-4.9.jar package com.itheima.lucene; import java.io.File; import java.io.IOException; import org.apache.commons.io.File

一步一步跟我学习lucene（6）---lucene索引优化之多线程创建索引

这两天工作有点忙,博客更新不及时,请大家见谅: 前面了解到lucene在索引创建的时候一个IndexWriter获取到一个读写锁,这样势在lucene创建大数据量的索引的时候,执行效率低下的问题: 查看前面文档一步一步跟我学习lucene(5)---lucene的索引构建原理可以看出,lucene索引的建立,跟以下几点关联很大: 磁盘空间大小,这个直接影响索引的建立,甚至会造成索引写入提示完成,但是没有同步的问题: 索引合并策略的选择,这个类似于sql里边的批量操作,批量操作的数量过多直接影响执