Lucene教程(一) 创建索引初步 / 憋错料

简述：

本教程为系列教程，最近在看Lucene的一个视频教程，感觉不错，可惜的是所讲的版本为3.5，由于Lucene不同的版本差距较大，因此当即决定写一个新版本的系列教程5.0版本，但是又怕从3.5跨到5.0跨度太大，毕竟自己也没有使用过Lucene，因此再插入一个中间版本4.5，所以，此系列教程打算把3.5版本，4.5版本，5.0版本都给出个例子，方便大家学习，也方便自己复习。

注：3.5版本并非原创，如下步骤的思路并非原创，都是我在视频教程上学到的，其他均为原创，以后跟进的教程也是如此

注：由于此教程为系列教程，pom文件会越来越大，不再单独摘出所需要的jar包，需要添加新的jar包我会在pom文件上方列出

注：由于Lucene5.0版本是基于JDK1.7开发的，所以想学习的同学请配置1.7及以上的版本

创建索引可分为主要的几步，我自己试验过，不同的版本间会有些不同，但是跟着如下的几大步骤一步一步写，问题不会太大。

创建Directory
创建IndexWriter
创建Document对象
为Document添加Field
通过IndexWriter添加文档到索引中

下面为例子代码：

3.5版本：

3.5版本比较简单，只需要Lucene核心包lucene-core即可，pom文件如下所示：

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>com.darren.lucene35.helloworld</groupId>
    <artifactId>lucene35_helloworld</artifactId>
    <version>0.0.1-SNAPSHOT</version>
    <packaging>jar</packaging>

    <name>lucene35_helloworld</name>
    <url>http://maven.apache.org</url>

    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <lucene.version>3.5.0</lucene.version>
    </properties>

    <dependencies>
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>4.12</version>
            <scope>test</scope>
        </dependency>
        <dependency>
            <groupId>org.apache.lucene</groupId>
            <artifactId>lucene-core</artifactId>
            <version>${lucene.version}</version>
        </dependency>
    </dependencies>
</project>

例子代码如下：

package com.darren.lucene35;

import java.io.File;
import java.io.FileReader;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.Version;

public class HelloLucene {

    /**
     * 建立索引
     */
    public void index() {
        IndexWriter indexWriter = null;
        try {
            // 1、创建Directory
            // 建立在内存中
            // Directory directory = new RAMDirectory();
            // 建立在硬盘式
            Directory directory = FSDirectory.open(new File("F:/test/lucene/index"));

            // 2、创建IndexWriter
            // 使用默认的标准分词器
            Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_35);
            IndexWriterConfig indexWriterConfig = new IndexWriterConfig(Version.LUCENE_35, analyzer);

            indexWriter = new IndexWriter(directory, indexWriterConfig);

            File dFile = new File("F:/test/lucene/document");
            File[] files = dFile.listFiles();
            for (File file : files) {
                // 3、创建Document对象
                Document document = new Document();

                // 4、为Document添加Field
                document.add(new Field("content", new FileReader(file)));
                document.add(new Field("filename", file.getName(), Field.Store.YES, Field.Index.NOT_ANALYZED));
                document.add(new Field("filepath", file.getAbsolutePath(), Field.Store.YES, Field.Index.NOT_ANALYZED));

                // 5、通过IndexWriter添加文档到索引中
                indexWriter.addDocument(document);
            }

        } catch (Exception e) {
            e.printStackTrace();
        } finally {
            try {
                if (indexWriter != null) {
                    indexWriter.close();
                }
            } catch (Exception e1) {
                e1.printStackTrace();
            }
        }
    }

}

4.5版本：

4.5版本需要Lucene核心包lucene-core和分词包lucene-analyzers-common，从4.0版本之后分词包从核心包分离，pom文件如下所示：

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>com.darren.lucene45.helloworld</groupId>
    <artifactId>lucene45_helloworld</artifactId>
    <version>0.0.1-SNAPSHOT</version>
    <packaging>jar</packaging>

    <name>lucene45_helloworld</name>
    <url>http://maven.apache.org</url>

    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <lucene.version>4.5.1</lucene.version>
    </properties>
    <dependencies>
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>4.12</version>
            <scope>test</scope>
        </dependency>
        <dependency>
            <groupId>org.apache.lucene</groupId>
            <artifactId>lucene-core</artifactId>
            <version>${lucene.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.lucene</groupId>
            <artifactId>lucene-analyzers-common</artifactId>
            <version>${lucene.version}</version>
        </dependency>
    </dependencies>
</project>

例子代码如下：

package com.darren.lucene45;

import java.io.File;
import java.io.FileReader;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.Version;

public class HelloLucene {
    /**
     * 建立索引
     */
    public void index() {
        IndexWriter indexWriter = null;
        try {
            // 1、创建Directory
            Directory directory = FSDirectory.open(new File("F:/test/lucene/index"));

            // 2、创建IndexWriter
            /**
             * 注意StandardAnalyzer与3.5版本的不同：
             *
             * StandardAnalyzer不在lucene-core包中而在lucene-analyzers-common包中 从4.0版本以后分离
             */
            Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_45);
            IndexWriterConfig indexWriterConfig = new IndexWriterConfig(Version.LUCENE_45, analyzer);
            indexWriter = new IndexWriter(directory, indexWriterConfig);

            File dFile = new File("F:/test/lucene/document");
            File[] files = dFile.listFiles();
            for (File file : files) {
                // 3、创建Document对象
                Document document = new Document();

                // 4、为Document添加Field
                /**
                 * 注意Field与3.5版本的不同：两个参数的构造器已过时，使用如下构造器
                 */
                // 第三个参数是FieldType 但是定义在TextField中作为静态变量，看API也不好知道怎么写
                document.add(new Field("content", new FileReader(file), TextField.TYPE_NOT_STORED));
                document.add(new Field("filename", file.getName(), TextField.TYPE_STORED));
                document.add(new Field("filepath", file.getAbsolutePath(), TextField.TYPE_STORED));

                // 5、通过IndexWriter添加文档到索引中
                indexWriter.addDocument(document);
            }

        } catch (Exception e) {
            e.printStackTrace();
        } finally {
            try {
                if (indexWriter != null) {
                    indexWriter.close();
                }
            } catch (Exception e) {
                e.printStackTrace();
            }
        }

    }
}

5.0版本：

5.0版本和4.5版本一样，pom文件如下所示：

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>com.darren.lucene50.helloworld</groupId>
    <artifactId>lucene50_helloworld</artifactId>
    <version>0.0.1-SNAPSHOT</version>
    <packaging>jar</packaging>

    <name>lucene50_helloworld</name>
    <url>http://maven.apache.org</url>

    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <lucene.version>5.0.0</lucene.version>
    </properties>

    <dependencies>
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>4.12</version>
            <scope>test</scope>
        </dependency>
        <dependency>
            <groupId>org.apache.lucene</groupId>
            <artifactId>lucene-core</artifactId>
            <version>${lucene.version}</version>
        </dependency>
                <dependency>
            <groupId>org.apache.lucene</groupId>
            <artifactId>lucene-analyzers-common</artifactId>
            <version>${lucene.version}</version>
        </dependency>
    </dependencies>
</project>

例子代码如下：

package com.darren.lucene50;

import java.io.File;
import java.io.FileReader;
import java.nio.file.FileSystems;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;

public class HelloLucene {
    /**
     * 建立索引
     */
    public void index() {
        IndexWriter indexWriter = null;
        try {
            // 1、创建Directory
            /**
             * 注意open方法与3.5版本和4.5版本的不同：
             *
             * 这里不再接受一个File而是Path，使用的是JDK1.7的新特性，也就是说5.0版本是基于JDK1.7开发的
             *
             * 如何获取Path，请参照 Java7新特性--Path http://blog.csdn.net/zpf336/article/details/45074445
             *
             */
            Directory directory = FSDirectory.open(FileSystems.getDefault().getPath("F:/test/lucene/index"));
            // 2、创建IndexWriter
            /**
             * 注意StandardAnalyzer与3.5版本的不同：
             *
             * StandardAnalyzer不在lucene-core包中而在lucene-analyzers-common包中 从4.0版本以后分离
             *
             * 并且不需要提供版本号
             */
            Analyzer analyzer = new StandardAnalyzer();

            /**
             * 注意IndexWriterConfig与3.5版本和4.5版本的不同：
             *
             * 不需要提供版本号
             */
            IndexWriterConfig indexWriterConfig = new IndexWriterConfig(analyzer);
            indexWriter = new IndexWriter(directory, indexWriterConfig);
            File dFile = new File("F:/test/lucene/document");
            File[] files = dFile.listFiles();
            for (File file : files) {
                // 3、创建Document对象
                Document document = new Document();
                // 4、为Document添加Field
                /**
                 * 注意Field与3.5版本的不同：两个参数的构造器已过时，使用如下构造器
                 *
                 * 但是和4.5版本是相同的
                 */
                // 第三个参数是FieldType 但是定义在TextField中作为静态变量，看API也不好知道怎么写
                document.add(new Field("content", new FileReader(file), TextField.TYPE_NOT_STORED));
                document.add(new Field("filename", file.getName(), TextField.TYPE_STORED));
                document.add(new Field("filepath", file.getAbsolutePath(), TextField.TYPE_STORED));

                // 5、通过IndexWriter添加文档到索引中
                indexWriter.addDocument(document);
            }

        } catch (Exception e) {
            e.printStackTrace();
        } finally {
            try {
                if (indexWriter != null) {
                    indexWriter.close();
                }
            } catch (Exception e) {
                e.printStackTrace();
            }
        }

    }
}

测试代码：

由于测试代码都一样，仅贴出5.0版本的此时代码：

package com.darren.lucene50;

import org.junit.Test;

public class HelloLuceneTest {
    @Test
    public void testIndex() {
        HelloLucene helloLucene = new HelloLucene();
        helloLucene.index();
    }
}

注：不同版本间会有一些不同，我都已在注释中提到

时间： 2024-10-18 20:13:51

Lucene教程(一) 创建索引初步

Lucene教程(一) 创建索引初步的相关文章

Lucene 4.7 --创建索引

Lucene.net 从创建索引到搜索的代码范例

Lucene学习：创建索引

全文检索之lucene的优化篇--创建索引库

Lucene教程具体解释

搜索引擎系列 ---lucene简介创建索引和搜索初步

搜索引擎系列 -lucene简介创建索引和搜索初步步骤

使用Lucene对预处理后的文档进行创建索引（可执行）

lucene创建索引以及索引文件合并