Lucene 4.9索引txt文件

暂时只是跑起来了，不知道是否正确，困了，睡觉了，改天再弄。搜索那块是分页的，也没仔细弄。。。

参考着 http://blog.csdn.net/kingskyleader/article/details/8444739

在data下放了三个txt...

S:\lucene\data\永生.txt

S:\lucene\data\1.txt

S:\lucene\data\2.txt

永生是本小说，汉语的应该没有英文。

1.txt 内容: hello

2.txt 内容: hi hello 哈哈

程序运行之后控制台打印的信息：

adding [Ljava.io.File;@3f611531
adding [Ljava.io.File;@3f611531
adding [Ljava.io.File;@3f611531
S:\lucene\data\1.txt
1407857427736
S:\lucene\data\2.txt
1407857444245

具体改天再研究。

下面是代码：

pom:

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>

  <groupId>LuceneTest</groupId>
  <artifactId>lucene</artifactId>
  <version>0.0.1-SNAPSHOT</version>
  <packaging>jar</packaging>

  <name>lucene</name>
  <url>http://maven.apache.org</url>

  <properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
  </properties>

  <dependencies>
    <dependency>
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
      <version>3.8.1</version>
      <scope>test</scope>
    </dependency>

    <!-- lucene -->
    <dependency>
    <groupId>org.apache.lucene</groupId>
    <artifactId>lucene-core</artifactId>
    <version>4.9.0</version>
    </dependency>

    <dependency>
    <groupId>org.apache.lucene</groupId>
    <artifactId>lucene-queryparser</artifactId>
    <version>4.9.0</version>
    </dependency>

    <dependency>
    <groupId>org.apache.lucene</groupId>
    <artifactId>lucene-analyzers-common</artifactId>
    <version>4.9.0</version>
    </dependency>

    <dependency>
    <groupId>org.apache.lucene</groupId>
    <artifactId>lucene-highlighter</artifactId>
    <version>4.9.0</version>
    </dependency>

  </dependencies>
</project>

建立索引：

package lucene;

import java.io.BufferedReader;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStreamReader;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.LongField;
import org.apache.lucene.document.StringField;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.index.IndexWriterConfig.OpenMode;
import org.apache.lucene.index.Term;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.Version;

public class InitIndex {

    public void creatIndex() throws IOException {

        boolean create = true;

        File data = new File("S:\\lucene\\data");
        File index = new File("S:\\lucene\\index");

        Directory dir = FSDirectory.open(index);

        Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_4_9);

        IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_4_9,
                analyzer);

        if (create) {
            // Create a new index in the directory, removing any
            // previously indexed documents:
            iwc.setOpenMode(OpenMode.CREATE);
        } else {
            // Add new documents to an existing index:
            iwc.setOpenMode(OpenMode.CREATE_OR_APPEND);
        }

        IndexWriter iw = new IndexWriter(dir, iwc);

        File[] file = data.listFiles();
        FileInputStream fis = null;
        for (File f : file) {

            fis = new FileInputStream(f);

            Document doc = new Document();

            Field pathField = new StringField("path", f.getPath(),
                    Field.Store.YES);
            doc.add(pathField);

            doc.add(new LongField("modified", f.lastModified(), Field.Store.YES));

            doc.add(new TextField("contents", new BufferedReader(
                    new InputStreamReader(fis, "GBK"))));

            if (iw.getConfig().getOpenMode() == OpenMode.CREATE) {
                // New index, so we just add the document (no old document can
                // be there):
                System.out.println("adding " + file);
                iw.addDocument(doc);
            } else {
                // Existing index (an old copy of this document may have been
                // indexed) so
                // we use updateDocument instead to replace the old one matching
                // the exact
                // path, if present:
                System.out.println("updating " + file);
                iw.updateDocument(new Term("path", f.getPath()), doc);
            }

        }
        iw.close();
        fis.close();

    }
}

搜索：

package lucene;

import java.io.File;
import java.io.IOException;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.queryparser.classic.ParseException;
import org.apache.lucene.queryparser.classic.QueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.Version;

public class Search {

    public void query() throws IOException, ParseException {

        String queries = "hello";

        int hitsPerPage = 10; 

        File index = new File("S:\\lucene\\index");

        IndexReader reader = DirectoryReader.open(FSDirectory.open(index));
        IndexSearcher searcher = new IndexSearcher(reader);
        Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_4_9);

        QueryParser parser = new QueryParser(Version.LUCENE_4_9, "contents", analyzer);  

        Query query = parser.parse(queries);

        TopDocs results = searcher.search(query, 5 * hitsPerPage);
        ScoreDoc[] hits = results.scoreDocs;
        int numTotalHits = results.totalHits;  

        int start = 0;
        int end = Math.min(numTotalHits, hitsPerPage);  

        for (int i = start; i < end; i++) {  

            Document doc = searcher.doc(hits[i].doc);
            String path = doc.get("path");
           System.out.println(path);
           String modified=doc.get("modified");
           System.out.println(modified);

          }  

    }
}

主函数：

package lucene;

import java.io.IOException;

import org.apache.lucene.queryparser.classic.ParseException;

public class Main {

    public static void main(String args[]) throws IOException, ParseException{
        InitIndex id=new InitIndex();
        id.creatIndex();
        Search se=new Search();
        se.query();
    }
}

Lucene 4.9索引txt文件,布布扣,bubuko.com

时间： 2024-08-09 06:33:28

Lucene 4.9索引txt文件的相关文章

Lucene实战构建索引

搭建lucene的步骤这里就不详细介绍了,无外乎就是下载相关jar包,在eclipse中新建java工程,引入相关的jar包即可本文主要在没有剖析lucene的源码之前实战一下,通过实战来促进研究建立索引下面的程序展示了indexer的使用 package com.wuyudong.mylucene; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.analysis.standard.Standard

根据文件夹地址获取txt文件并获取txt内容索引

本文章原创,引用转载请注明作者出处. 这两天写了一个小的C++程序,用的开发工具是visual studio.个人感觉Microsoft做的visual studio真心的很强大,推荐大家在开发c\c++,以及c#,asp.net等一些软件的时候可以使用visual studio. 说一下我开发的的程序效果:输入一个文件夹地址,程序可以遍历该文件夹以及该文件夹中所有子文件夹中文件,经过判断获取所有txt类型文件地址,并进入txt文件根据txt文件内容生成索引返回.之后用户可以输入关键字,程序返回

Lucene之删除索引

分类: [java]2013-08-30 22:22 467人阅读评论(0) 收藏举报 1.前言之前的博客<Lucene全文检索之HelloWorld>已经简单介绍了Lucene的索引生成和检索.本文着重介绍Lucene的索引删除. 2.应用场景: 索引建立完成后,因为有些原因,被索引的文件已经删除.此时,索引仍然存在,为了不产生“虚假检索结果”,需要将失效的索引删除 3.HelloLucene类(重点关注deleteIndexByQuery方法) [java] view plainco

老男孩教育每日一题-2017年5月4日-有一个oldboy.txt文件，把里面所有字母都转换成大写

老男孩教育每日一题-2017年5月4日-有一个oldboy.txt文件,把里面所有字母都转换成大写文件内容如下: [[email protected] oldboy]# cat oldboy.txt oldboy.blog.51cto.com www.oldboyedu.com 方法一:sed [[email protected] oldboy]# sed 's#[a-z]#\u&#g' oldboy.txt OLDBOY.BLOG.51CTO.COM WWW.OLDBOYEDU.COM 方

Lucene底层原理和优化经验分享(1)-Lucene简介和索引原理

基于Lucene检索引擎我们开发了自己的全文检索系统,承担起后台PB级.万亿条数据记录的检索工作,这里向大家分享下Lucene底层原理研究和一些优化经验. 从两个方面介绍: 1. Lucene简介和索引原理 2. Lucene优化经验总结 1. Lucene简介和索引原理该部分从三方面展开:Lucene简介.索引原理.Lucene索引实现. 1.1 Lucene简介 Lucene最初由鼎鼎大名Doug Cutting开发,2000年开源,现在也是开源全文检索方案的不二选择,它的特点概述起来就是

Lucene教程(四) 索引的更新和删除

这篇文章是基于上一篇文章来写的,使用的是IndexUtil类,下面的例子不在贴出整个类的内容,只贴出具体的方法内容. 3.5版本: 先写了一个check()方法来查看索引文件的变化: /** * 检查一下索引文件 */ public static void check() { IndexReader indexReader = null; try { Directory directory = FSDirectory.open(new File("F:/test/lucene/index&quo

robots.txt 文件是什么？如何获取

1.robots.txt基本介绍 robots.txt是一个纯文本文件,在这个文件中网站管理者可以声明该网站中不想被robots访问的部分,或者指定搜索引擎只收录指定的内容. 当一个搜索机器人(有的叫搜索蜘蛛)访问一个站点时,它会首先检查该站点根目录下是否存在robots.txt,如果存在,搜索机器人就会按照该文件中的内容来确定访问的范围:如果该文件不存在,那么搜索机器人就沿着链接抓取. 另外,robots.txt必须放置在一个站点的根目录下,而且文件名必须全部小写. robots.txt写作语

MySQL 笔记（三）由 txt 文件导入数据

改编自学校实验,涉及一些字符集相关的问题. 索引建库导入数据最终脚本下载数据点击这里建库 create.sql DROP DATABASE IF EXISTS orderdb; CREATE DATABASE orderdb; USE orderdb; CREATE TABLE employee ( employee_no VARCHAR(8), employee_name VARCHAR(10), sex CHAR(1), birthday DATE, address VARCHA

python读取txt文件以空行作为数据的切分处理

先举个例子,如下test.txt文件数据,需要提取每条数据的title和content, 单独保存到文件中: spiderTime:{'num':'12223'} title:中国保险1xxx summary: 请在xxx content: 当事人11sfdffghfhgfjjd tag:1 spiderTime:{'num':'12224'} title:中国保险2xxx summary: 请在xxx content: 当事人22sfdfffdffghfjd tag:2 spiderTime: