Lucene的学习及使用实验

实验一下Lucene是怎么使用的。

参考：http://www.importnew.com/12715.html （例子比较简单）

http://www.yiibai.com/lucene/lucene_first_application.html （例子比较复杂）

这里也有一个例子：http://www.tuicool.com/articles/aqIZNnE

我用的版本比较高，是6.2.1版本，文档查阅：

http://lucene.apache.org/core/6_2_1/core/index.html

首先在Intellij里面创建一个Maven项目。名字为lucene-demo。（主要参考 http://www.importnew.com/12715.html ）

其中pom.xml如下：

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>com.myapp</groupId>
    <artifactId>lucene-demo</artifactId>
    <version>1.0-SNAPSHOT</version>

    <dependencies>
        <!-- https://mvnrepository.com/artifact/org.apache.lucene/lucene-core -->
        <dependency>
            <groupId>org.apache.lucene</groupId>
            <artifactId>lucene-core</artifactId>
            <version>6.2.1</version>
        </dependency>
        <dependency>
            <groupId>org.apache.lucene</groupId>
            <artifactId>lucene-queryparser</artifactId>
            <version>6.2.1</version>
        </dependency>
    </dependencies>

</project>

讲了一个package：com.myapp.lucene，里面class LuceneDemo，内容如下：

package com.myapp.lucene;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.StringField;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.queryparser.classic.ParseException;
import org.apache.lucene.queryparser.classic.QueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopScoreDocCollector;
import org.apache.lucene.store.RAMDirectory;
import org.apache.lucene.store.Directory;

import java.io.IOException;

/**
 * Created by baidu on 16/10/20.
 */
public class LuceneDemo {
    // 0. Specify the analyzer for tokenizing text.
    // The same analyzer should be used for indexing and searching
    static StandardAnalyzer analyzer;
    static Directory index;

    static void prepareDoc() throws IOException{
        // 0. init analyzer
        analyzer = new StandardAnalyzer();

        // 1. create index
        index = new RAMDirectory();
        IndexWriterConfig config = new IndexWriterConfig(analyzer);

        IndexWriter w = new IndexWriter(index, config);

        addDoc(w, "lucence tutorial", "123456");
        addDoc(w, "hi hi hi", "222");
        addDoc(w, "ok LUCENCE", "123");
        w.close();
    }

    static void addDoc(IndexWriter w, String text, String more) throws IOException{
        Document doc = new Document();
        doc.add(new TextField("text", text, Field.Store.YES));
        doc.add(new StringField("more", more, Field.Store.YES));
        w.addDocument(doc);
    }

    static void search(String str) throws ParseException, IOException {
        // 2. query
        Query q = new QueryParser("text", analyzer).parse(str);

        // 3. search
        int listNum = 10;
        IndexReader reader = DirectoryReader.open(index);
        IndexSearcher searcher = new IndexSearcher(reader);
        TopScoreDocCollector collector = TopScoreDocCollector.create(listNum);
        searcher.search(q, collector);
        ScoreDoc[] hits = collector.topDocs().scoreDocs;

        // 4. display
        System.out.printf("Found %d docs.\n", hits.length);
        for (int i=0; i<hits.length; i++) {
            int docId = hits[i].doc;
            Document doc = searcher.doc(docId);
            System.out.printf("Doc %d: text: %s, more: %s\n", i+1, doc.get("text"), doc.get("more"));
        }
        reader.close();

    }

    public static void main(String[] args) {
        try {
            prepareDoc();
            search("Lucence");
        } catch (IOException e) {
            e.printStackTrace();
        } catch (ParseException e) {
            e.printStackTrace();
        }

    }
}

然后运行，能够成功：

Found 2 docs.
Doc 1: text: lucence tutorial, more: 123456
Doc 2: text: ok LUCENCE, more: 123

Process finished with exit code 0

因为用的是RAMDirectory，所以应该没有创建实际的目录和文件。

另外，代码和逻辑中有几点需要注意的地方：

注意，对于需要分词的内容我们使用TextField，对于像id这样不需要分词的内容我们使用StringField。

编码过程中，报过好几次错，关于Exception需要wrap或者throws的情况。

有些API的版本升级了，参数和以前不一样。在实际的代码中根据实际要求有所修改。一般都是简化了。

时间： 2024-07-30 14:22:29

Lucene的学习及使用实验的相关文章

【Todo】Lucene系统学习

之前已经写过一篇关于Lucene安装学习的文章:http://www.cnblogs.com/charlesblc/p/5980525.html 还有一篇关于Solr安装使用的文章:http://www.cnblogs.com/charlesblc/p/5981292.html 上面两篇比较偏实践和应用,开了个头:这一篇是在上面两篇基础上,针对Lucene原理再进行的学习. 参考的文章有:http://www.cnblogs.com/forfuture1978/archive/2009/12/1

EXTJS学习笔记--搭建实验环境

1.目的在eclipse中搭建EXTJS的基础环境,学习如何按照从官方下载的代码中搭建新的Extjs工程 2.学习过程中遇到的问题 ext-4.2.1-gpl与ext-4.2.1-commercial的区别 ext-4.2.1-gpl:开源版本,使用GPL开源协议 ext-4.2.1-commercial:商业版本,收费,稳定作为个人研究来讲,可以使用ext-4.2.1-gpl. 下载的Extjs目录中各个文件说明主要目录说明 builds目录:压

lucene源代码学习之LZ4压缩算法在lucene中应用

LZ4算法又称为Realtime Compression Algorithm,在操作系统(linux/freeBSD).文件系统(OpenZFS).大数据(Hadoop).搜索引擎(Lucene/solr).数据库(Hbase)--都可以看到它的身影,可以说是一个非常通用的算法.LZ4最突出的地方在于它的压缩/解压速度. 基础知识理解Lucene中LZ4算法的实现,需要有以下两点基础知识: 1. 理解Lucene里面的packedInts. 关于PacedInts,可以参考http://sbp

本周学习总结和实验报告一

本周学习总结和实验报告一撰写第三周课程总结及实验报告(一) 1.打印输出所有的"水仙花数",所谓"水仙花数"是指一个3位数,其中各位数字立方和等于该数本身.例如,153是一个"水仙花数". 编写Java程序,求13-23+33-43+-+973-983+993-1003的值. 编程求1!+2!+3!+-+20!. 编写Java程序,计算8+88+888+-前10项之和. 一个数如果恰好等于它的因子之和,这个数就称为完数.编写程序输出1000以内

lucene原理学习笔记（一）

最近在学习lucene原理方面的知识,将学习中学到的知识和问题记录下来,今天学习的主要内容就是关于索引方面的内容.我们知道lucene是实现全文检索的工具包,要在工程里面加入搜索的功能还需要基于lucene的api进行开发.那么全文检索的步骤分为哪几步呢.全文检索大体上分为两个步骤,索引的创建和搜索索引.于是乎全文检索就涉及到如下几个问题: 1.如何创建索引.2.如果搜索索引.3.索引里面存储的信息都有什么. 索引里面存储的都有什么信息呢,可以用一下的例子来说明,一个文件包含多个字符,当要查询一

V3学院带你学习EEPROM读写实验

一.实验背景在消费者电子电讯和工业电子中看上去不相关的设计里经常有很多相似的地方例如几乎每个系统都包括一些智能控制通常是一个单片的微控制器,通用电路例如LCD驱动器远程I/O,RAM,EEPROM或数据转换器,面向应用的电路譬如收音机和视频系统的数字调谐和信号处理电路或者是音频拨号电话的DTM发生器,为了使这些相似之处对系统设计者和器件厂商都得益而且使硬件效益最大电路最简单Philips开发了一个简单的双向两线总线实现有效的IC之间控制这个总线就称为Inter IC或I2C总线现在Philips

lucene中文学习地址推荐

Lucene学习总结之一:全文检索的基本原理http://www.cnblogs.com/forfuture1978/archive/2009/12/14/1623594.html Lucene5.5入门第一篇--hello Worldhttp://elasticsearch.cn/article/84

Lucene/Solr学习笔记（一）

solr简介 Solr是一个开源的,企业级搜索服务器.Solr可以理解为Lucene的服务器化产品.它使用java编写,遵循http协议和xml协议,使用多种数据格式(如json,xml)传递数据. 但她不是对Lucene的一次简单封装,Solr的大多数特征都与Lucene不同.Solr 和 Lucene 的界限经常是模糊的.以下是Solr的主要特性: 通过HTTP请求来建立索引和搜索索引拥有数个缓存来加快搜索速度一个基于web的管理员控制台运行时做性能统计,包括缓存命中/

【安全牛学习笔记】实验环境

准备实验环境渗透非授权系统的弊端搭建自己的实验环境安装虚拟机微软最新版软件 http://msdn .microsoft.com/en-ca/subscriptions/aa336858 windows虚拟机 http://dev.modern.ie/tools/vms/ 安装自己的虚拟机 Xp Win7 2003 选择"Download software" linux虚拟机 http://www.turnkeylinux.org Ubuntu Lamp安装 Metasploi