一步一步跟我学习hadoop(7)----hadoop连接mysql数据库运行数据读写数据库操作

为了方便 MapReduce 直接訪问关系型数据库(Mysql,Oracle)。Hadoop提供了DBInputFormat和DBOutputFormat两个类。通过DBInputFormat类把数据库表数据读入到HDFS,依据DBOutputFormat类把MapReduce产生的结果集导入到数据库表中。

执行MapReduce时候报错:java.io.IOException: com.mysql.jdbc.Driver,通常是因为程序找不到mysql驱动包。解决方法是让每一个tasktracker执行MapReduce程序时都能够找到该驱动包。

加入包有两种方式:

(1)在每一个节点下的${HADOOP_HOME}/lib下加入该包。重新启动集群,通常是比較原始的方法。

(2)a)把包传到集群上: hadoop fs -put mysql-connector-java-5.1.0- bin.jar /hdfsPath/

b)在mr程序提交job前,加入语句:DistributedCache.addFileToClassPath(new Path(“/hdfsPath/mysql- connector-java-5.1.0-bin.jar”),conf);

mysql数据库存储到hadoop hdfs

mysql表创建和数据初始化

DROP TABLE IF EXISTS `wu_testhadoop`;
CREATE TABLE `wu_testhadoop` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `title` varchar(255) DEFAULT NULL,
  `content` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT CHARSET=utf8;

-- ----------------------------
-- Records of wu_testhadoop
-- ----------------------------
INSERT INTO `wu_testhadoop` VALUES (‘1‘, ‘123‘, ‘122312‘);
INSERT INTO `wu_testhadoop` VALUES (‘2‘, ‘123‘, ‘123456‘);

定义hadoop数据訪问

mysql表创建完成后,我们须要定义hadoop訪问mysql的规则。

hadoop提供了org.apache.hadoop.io.Writable接口来实现简单的高效的可序列化的协议,该类基于DataInput和DataOutput来实现相关的功能。

hadoop对数据库訪问也提供了org.apache.hadoop.mapred.lib.db.DBWritable接口,当中write方法用于对PreparedStatement对象设定值,readFields方法用于对从数据库读取出来的对象进行列的值绑定。

以上两个接口的使用例如以下(内容是从源代码得来)

writable

 public class MyWritable implements Writable {
       // Some data
       private int counter;
       private long timestamp;

       public void write(DataOutput out) throws IOException {
         out.writeInt(counter);
         out.writeLong(timestamp);
       }

       public void readFields(DataInput in) throws IOException {
         counter = in.readInt();
         timestamp = in.readLong();
       }

       public static MyWritable read(DataInput in) throws IOException {
         MyWritable w = new MyWritable();
         w.readFields(in);
         return w;
       }
     }
 

DBWritable

public class MyWritable implements Writable, DBWritable {
   // Some data
   private int counter;
   private long timestamp;

   //Writable#write() implementation
   public void write(DataOutput out) throws IOException {
     out.writeInt(counter);
     out.writeLong(timestamp);
   }

   //Writable#readFields() implementation
   public void readFields(DataInput in) throws IOException {
     counter = in.readInt();
     timestamp = in.readLong();
   }

   public void write(PreparedStatement statement) throws SQLException {
     statement.setInt(1, counter);
     statement.setLong(2, timestamp);
   }

   public void readFields(ResultSet resultSet) throws SQLException {
     counter = resultSet.getInt(1);
     timestamp = resultSet.getLong(2);
   }
 }

数据库相应的实现

package com.wyg.hadoop.mysql.bean;

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;

import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.Writable;
import org.apache.hadoop.mapred.lib.db.DBWritable;

public class DBRecord implements Writable, DBWritable{
	private int id;
	private String title;
	private String content;
	public int getId() {
		return id;
	}

	public void setId(int id) {
		this.id = id;
	}

	public String getTitle() {
		return title;
	}

	public void setTitle(String title) {
		this.title = title;
	}

	public String getContent() {
		return content;
	}

	public void setContent(String content) {
		this.content = content;
	}

	@Override
	public void readFields(ResultSet set) throws SQLException {
		this.id = set.getInt("id");
		this.title = set.getString("title");
		this.content = set.getString("content");
	}

	@Override
	public void write(PreparedStatement pst) throws SQLException {
		pst.setInt(1, id);
		pst.setString(2, title);
		pst.setString(3, content);
	}

	@Override
	public void readFields(DataInput in) throws IOException {
		this.id = in.readInt();
		this.title = Text.readString(in);
		this.content = Text.readString(in);
	}

	@Override
	public void write(DataOutput out) throws IOException {
		out.writeInt(this.id);
		Text.writeString(out, this.title);
		Text.writeString(out, this.content);
	}

	@Override
	public String toString() {
		 return this.id + " " + this.title + " " + this.content;
	}
}

实现Map/Reduce

package com.wyg.hadoop.mysql.mapper;

import java.io.IOException;

import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reporter;

import com.wyg.hadoop.mysql.bean.DBRecord;

@SuppressWarnings("deprecation")
public class DBRecordMapper extends MapReduceBase implements Mapper<LongWritable, DBRecord, LongWritable, Text>{

	@Override
	public void map(LongWritable key, DBRecord value,
			OutputCollector<LongWritable, Text> collector, Reporter reporter)
			throws IOException {
		collector.collect(new LongWritable(value.getId()), new Text(value.toString()));
	}

}

測试hadoop连接mysql并将数据存储到hdfs

package com.wyg.hadoop.mysql.db;
import java.io.IOException;

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.lib.IdentityReducer;
import org.apache.hadoop.mapred.lib.db.DBConfiguration;
import org.apache.hadoop.mapred.lib.db.DBInputFormat;

import com.wyg.hadoop.mysql.bean.DBRecord;
import com.wyg.hadoop.mysql.mapper.DBRecordMapper;

public class DBAccess {
      public static void main(String[] args) throws IOException {
             JobConf conf = new JobConf(DBAccess.class);
             conf.setOutputKeyClass(LongWritable.class);
             conf.setOutputValueClass(Text.class);
             conf.setInputFormat(DBInputFormat.class);
             Path path = new Path("hdfs://192.168.44.129:9000/user/root/dbout");
             FileOutputFormat.setOutputPath(conf, path);
             DBConfiguration.configureDB(conf,"com.mysql.jdbc.Driver", "jdbc:mysql://你的ip:3306/数据库名","username","password");
             String [] fields = {"id", "title", "content"};
             DBInputFormat.setInput(conf, DBRecord.class, "wu_testhadoop",
                        null, "id", fields);
             conf.setMapperClass(DBRecordMapper.class);
             conf.setReducerClass(IdentityReducer.class);
             JobClient.runJob(conf);
      }
}

运行程序,结果例如以下:

15/08/11 16:46:18 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
15/08/11 16:46:18 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
15/08/11 16:46:18 WARN mapred.JobClient: No job jar file set.  User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
15/08/11 16:46:19 INFO mapred.JobClient: Running job: job_local_0001
15/08/11 16:46:19 INFO mapred.MapTask: numReduceTasks: 1
15/08/11 16:46:19 INFO mapred.MapTask: io.sort.mb = 100
15/08/11 16:46:19 INFO mapred.MapTask: data buffer = 79691776/99614720
15/08/11 16:46:19 INFO mapred.MapTask: record buffer = 262144/327680
15/08/11 16:46:19 INFO mapred.MapTask: Starting flush of map output
15/08/11 16:46:19 INFO mapred.MapTask: Finished spill 0
15/08/11 16:46:19 INFO mapred.TaskRunner: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
15/08/11 16:46:19 INFO mapred.LocalJobRunner:
15/08/11 16:46:19 INFO mapred.TaskRunner: Task ‘attempt_local_0001_m_000000_0‘ done.
15/08/11 16:46:19 INFO mapred.LocalJobRunner:
15/08/11 16:46:19 INFO mapred.Merger: Merging 1 sorted segments
15/08/11 16:46:19 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 48 bytes
15/08/11 16:46:19 INFO mapred.LocalJobRunner:
15/08/11 16:46:19 INFO mapred.TaskRunner: Task:attempt_local_0001_r_000000_0 is done. And is in the process of commiting
15/08/11 16:46:19 INFO mapred.LocalJobRunner:
15/08/11 16:46:19 INFO mapred.TaskRunner: Task attempt_local_0001_r_000000_0 is allowed to commit now
15/08/11 16:46:19 INFO mapred.FileOutputCommitter: Saved output of task ‘attempt_local_0001_r_000000_0‘ to hdfs://192.168.44.129:9000/user/root/dbout
15/08/11 16:46:19 INFO mapred.LocalJobRunner: reduce > reduce
15/08/11 16:46:19 INFO mapred.TaskRunner: Task ‘attempt_local_0001_r_000000_0‘ done.
15/08/11 16:46:20 INFO mapred.JobClient:  map 100% reduce 100%
15/08/11 16:46:20 INFO mapred.JobClient: Job complete: job_local_0001
15/08/11 16:46:20 INFO mapred.JobClient: Counters: 14
15/08/11 16:46:20 INFO mapred.JobClient:   FileSystemCounters
15/08/11 16:46:20 INFO mapred.JobClient:     FILE_BYTES_READ=34606
15/08/11 16:46:20 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=69844
15/08/11 16:46:20 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=30
15/08/11 16:46:20 INFO mapred.JobClient:   Map-Reduce Framework
15/08/11 16:46:20 INFO mapred.JobClient:     Reduce input groups=2
15/08/11 16:46:20 INFO mapred.JobClient:     Combine output records=0
15/08/11 16:46:20 INFO mapred.JobClient:     Map input records=2
15/08/11 16:46:20 INFO mapred.JobClient:     Reduce shuffle bytes=0
15/08/11 16:46:20 INFO mapred.JobClient:     Reduce output records=2
15/08/11 16:46:20 INFO mapred.JobClient:     Spilled Records=4
15/08/11 16:46:20 INFO mapred.JobClient:     Map output bytes=42
15/08/11 16:46:20 INFO mapred.JobClient:     Map input bytes=2
15/08/11 16:46:20 INFO mapred.JobClient:     Combine input records=0
15/08/11 16:46:20 INFO mapred.JobClient:     Map output records=2
15/08/11 16:46:20 INFO mapred.JobClient:     Reduce input records=2

同一时候能够看到hdfs文件系统多了一个dbout的文件夹,里边的文件保存了数据库相应的数据,内容保存例如以下

1	1 123 122312
2	2 123 123456

hdfs数据导入到mysql

hdfs文件存储到mysql,也须要上边的DBRecord类作为辅助。由于数据库的操作都是通过DBInput和DBOutput来进行的;

首先须要定义map和reduce的实现(map用以对hdfs的文档进行解析,reduce解析map的输出并输出)

package com.wyg.hadoop.mysql.mapper;

import java.io.IOException;
import java.io.DataInput;
import java.io.DataOutput;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.util.Iterator;

import org.apache.hadoop.filecache.DistributedCache;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.Writable;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter;
import com.wyg.hadoop.mysql.bean.DBRecord;

public class WriteDB {
    // Map处理过程
    public static class Map extends MapReduceBase implements

            Mapper<Object, Text, Text, DBRecord> {
        private final static DBRecord one = new DBRecord();

        private Text word = new Text();

        @Override

        public void map(Object key, Text value,

            OutputCollector<Text, DBRecord> output, Reporter reporter)

                throws IOException {

            String line = value.toString();
            String[] infos = line.split(" ");
            String id = infos[0].split("	")[1];
            one.setId(new Integer(id));
            one.setTitle(infos[1]);
            one.setContent(infos[2]);
            word.set(id);
            output.collect(word, one);
        }

    }

    public static class Reduce extends MapReduceBase implements
		    Reducer<Text, DBRecord, DBRecord, Text> {
		@Override
		public void reduce(Text key, Iterator<DBRecord> values,
				OutputCollector<DBRecord, Text> collector, Reporter reporter)
				throws IOException {
			DBRecord record = values.next();
		    collector.collect(record, new Text());
		}
	}
}

測试hdfs导入数据到数据库

package com.wyg.hadoop.mysql.db;

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.TextInputFormat;
import org.apache.hadoop.mapred.lib.db.DBConfiguration;
import org.apache.hadoop.mapred.lib.db.DBInputFormat;
import org.apache.hadoop.mapred.lib.db.DBOutputFormat;

import com.wyg.hadoop.mysql.bean.DBRecord;
import com.wyg.hadoop.mysql.mapper.WriteDB;

public class DBInsert {
	public static void main(String[] args) throws Exception {

        JobConf conf = new JobConf(WriteDB.class);
        // 设置输入输出类型

        conf.setInputFormat(TextInputFormat.class);
        conf.setOutputFormat(DBOutputFormat.class);

        // 不加这两句,通只是,可是网上给的样例没有这两句。
        //Text, DBRecord
        conf.setMapOutputKeyClass(Text.class);
        conf.setMapOutputValueClass(DBRecord.class);
        conf.setOutputKeyClass(Text.class);
        conf.setOutputValueClass(DBRecord.class);
        // 设置Map和Reduce类
        conf.setMapperClass(WriteDB.Map.class);
        conf.setReducerClass(WriteDB.Reduce.class);
        // 设置输如文件夹
        FileInputFormat.setInputPaths(conf, new Path("hdfs://192.168.44.129:9000/user/root/dbout"));
        // 建立数据库连接
        DBConfiguration.configureDB(conf,"com.mysql.jdbc.Driver", "jdbc:mysql://数据库ip:3306/数据库名称","username","password");
        String[] fields = {"id","title","content" };
        DBOutputFormat.setOutput(conf, "wu_testhadoop", fields);
        JobClient.runJob(conf);
    }

}

測试结果例如以下

15/08/11 18:10:15 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
15/08/11 18:10:15 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
15/08/11 18:10:15 WARN mapred.JobClient: No job jar file set.  User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
15/08/11 18:10:15 INFO mapred.FileInputFormat: Total input paths to process : 1
15/08/11 18:10:15 INFO mapred.JobClient: Running job: job_local_0001
15/08/11 18:10:15 INFO mapred.FileInputFormat: Total input paths to process : 1
15/08/11 18:10:15 INFO mapred.MapTask: numReduceTasks: 1
15/08/11 18:10:15 INFO mapred.MapTask: io.sort.mb = 100
15/08/11 18:10:15 INFO mapred.MapTask: data buffer = 79691776/99614720
15/08/11 18:10:15 INFO mapred.MapTask: record buffer = 262144/327680
15/08/11 18:10:15 INFO mapred.MapTask: Starting flush of map output
15/08/11 18:10:16 INFO mapred.MapTask: Finished spill 0
15/08/11 18:10:16 INFO mapred.TaskRunner: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
15/08/11 18:10:16 INFO mapred.LocalJobRunner: hdfs://192.168.44.129:9000/user/root/dbout/part-00000:0+30
15/08/11 18:10:16 INFO mapred.TaskRunner: Task ‘attempt_local_0001_m_000000_0‘ done.
15/08/11 18:10:16 INFO mapred.LocalJobRunner:
15/08/11 18:10:16 INFO mapred.Merger: Merging 1 sorted segments
15/08/11 18:10:16 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 40 bytes
15/08/11 18:10:16 INFO mapred.LocalJobRunner:
15/08/11 18:10:16 INFO mapred.TaskRunner: Task:attempt_local_0001_r_000000_0 is done. And is in the process of commiting
15/08/11 18:10:16 INFO mapred.LocalJobRunner: reduce > reduce
15/08/11 18:10:16 INFO mapred.TaskRunner: Task ‘attempt_local_0001_r_000000_0‘ done.
15/08/11 18:10:16 INFO mapred.JobClient:  map 100% reduce 100%
15/08/11 18:10:16 INFO mapred.JobClient: Job complete: job_local_0001
15/08/11 18:10:16 INFO mapred.JobClient: Counters: 14
15/08/11 18:10:16 INFO mapred.JobClient:   FileSystemCounters
15/08/11 18:10:16 INFO mapred.JobClient:     FILE_BYTES_READ=34932
15/08/11 18:10:16 INFO mapred.JobClient:     HDFS_BYTES_READ=60
15/08/11 18:10:16 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=70694
15/08/11 18:10:16 INFO mapred.JobClient:   Map-Reduce Framework
15/08/11 18:10:16 INFO mapred.JobClient:     Reduce input groups=2
15/08/11 18:10:16 INFO mapred.JobClient:     Combine output records=0
15/08/11 18:10:16 INFO mapred.JobClient:     Map input records=2
15/08/11 18:10:16 INFO mapred.JobClient:     Reduce shuffle bytes=0
15/08/11 18:10:16 INFO mapred.JobClient:     Reduce output records=2
15/08/11 18:10:16 INFO mapred.JobClient:     Spilled Records=4
15/08/11 18:10:16 INFO mapred.JobClient:     Map output bytes=34
15/08/11 18:10:16 INFO mapred.JobClient:     Map input bytes=30
15/08/11 18:10:16 INFO mapred.JobClient:     Combine input records=0
15/08/11 18:10:16 INFO mapred.JobClient:     Map output records=2
15/08/11 18:10:16 INFO mapred.JobClient:     Reduce input records=2

測试之前我对原有表进行了清空处理,能够看到运行后数据库里边加入了两条内容;

下次在运行的时候会报错,属于正常情况,原因在于我们导入数据的时候对id进行赋值了,假设忽略id。是能够一直加入的;

源代码下载地址

源代码已上传,下载地址为download.csdn.net/detail/wuyinggui10000/8974585

时间: 2024-10-26 03:47:39

一步一步跟我学习hadoop(7)----hadoop连接mysql数据库运行数据读写数据库操作的相关文章

一步一步跟我学习hadoop(7)----hadoop连接mysql数据库执行数据读写数据库操作

为了方便 MapReduce 直接访问关系型数据库(Mysql,Oracle),Hadoop提供了DBInputFormat和DBOutputFormat两个类.通过DBInputFormat类把数据库表数据读入到HDFS,根据DBOutputFormat类把MapReduce产生的结果集导入到数据库表中. 运行MapReduce时候报错:java.io.IOException: com.mysql.jdbc.Driver,一般是由于程序找不到mysql驱动包.解决方法是让每个tasktrack

学习总结------用JDBC连接MySQL

1.下载MySQL的JDBC驱动 地址:https://dev.mysql.com/downloads/connector/ 为了方便,直接就选择合适自己的压缩包 跳过登录,选择直接下载 下载完成后,解压  mysql-connector-java-5.1.42-bin.jar  ,复制 在解压的文件中,找到 粘贴到WebContent—WEB-INF—lib 在ecplise中选择lib文件夹,直接Ctrl+V粘贴即可 2.新建一个.jsp页面 代码: <%@page import="j

python基础学习24----使用pymysql连接mysql

使用pymysql连接mysql 安装pymysql pymysql安装可以通过两种方式 使用pip安装 首先简单说一下pip的使用方法 获取帮助 pip --help 升级 pip pip install -U pip 安装包 pip install SomePackage 卸载包 pip uninstall SomePackage 升级指定的包 pip install -U SomePackage 搜索包 pip search SomePackage 查看指定包的详细信息 pip show

Python数据库操作 MySQL数据库与数据表操作#学习猿地

# MySQL数据库与数据表操作 + 数据库的操作 + 数据库创建 + 数据库删除 + 数据表的操作 + 数据表的创建 + 数据表的修改 (表结构) + 数据表的删除 ### 数据库的操作 #### 1.数据库的创建 ```mysql # 链接mysql数据库后,进入mysql后可以操作数据 # 1. 创建库 create database if not exists tlxy default charset=utf8: -- 1. 数据库 tlxy 如果不存在则创建数据库,存在则不创建 --

一步一步学习大数据:Hadoop 生态系统与场景

Hadoop概要 到底是业务推动了技术的发展,还是技术推动了业务的发展,这个话题放在什么时候都会惹来一些争议. 随着互联网以及物联网的蓬勃发展,我们进入了大数据时代.IDC预测,到2020年,全球会有44ZB的数据量. 传统存储和技术架构无法满足需求 .在2013年出版的<大数据时代>一书中,定义了大数据的5V特点:Volume(大量).Velocity(高速).Variety(多样).Value(低价值密度).Veracity(真实性). 大数据学习群:119599574 当我们把时间往回看

一步一步跟我学习hadoop(3)----hadoop命令手册

上节我们学习了hadoop的eclipse插件安装和wordcount程序的运行,本篇对hadoop命令进行一个系统的了解 hadoop的命令通过HADOOP_HOME\bin\hadoop命令触发,我们可以在命令行执行 hadoop --help 来提示用户的命令输入. hadoop命令分为两大类:用户命令和管理命令,以下是hadoop命令的详细解读 本篇博客是摘自官网http://hadoop.apache.org/docs/r1.0.4/cn/commands_manual.html 0概

一步一步跟我学习hadoop(5)----hadoop Map/Reduce教程(2)

Map/Reduce用户界面 本节为用户採用框架要面对的各个环节提供了具体的描写叙述,旨在与帮助用户对实现.配置和调优进行具体的设置.然而,开发时候还是要相应着API进行相关操作. 首先我们须要了解Mapper和Reducer接口,应用通常须要提供map和reduce方法以实现他们. 接着我们须要对JobConf, JobClient,Partitioner,OutputCollector,Reporter,InputFormat,OutputFormat,OutputCommitter等进行讨

一步一步跟我学习lucene(19)---lucene增量更新和NRT(near-real-time)Query近实时查询

这两天加班,不能兼顾博客的更新,请大家见谅. 有时候我们创建完索引之后,数据源可能有更新的内容,而我们又想像数据库那样能直接体现在查询中,这里就是我们所说的增量索引.对于这样的需求我们怎么来实现呢?lucene内部是没有提供这种增量索引的实现的: 这里我们一般可能会想到,将之前的索引全部删除,然后进行索引的重建.对于这种做法,如果数据源的条数不是特别大的情况下倒还可以,如果数据源的条数特别大的话,势必会造成查询数据耗时,同时索引的构建也是比较耗时的,几相叠加,势必可能造成查询的时候数据缺失的情况

一步一步跟我学习lucene(9)---lucene搜索之拼写检查和相似度查询提示(spellcheck)

suggest应用场景 用户的输入行为是不确定的,而我们在写程序的时候总是想让用户按照指定的内容或指定格式的内容进行搜索,这里就要进行人工干预用户输入的搜索条件了:我们在用百度谷歌等搜索引擎的时候经常会看到按键放下的时候直接会提示用户是否想搜索某些相关的内容,恰好lucene在开发的时候想到了这一点,lucene提供的suggest包正是用来解决上述问题的. suggest包联想词相关介绍 suggest包提供了lucene的自动补全或者拼写检查的支持: 拼写检查相关的类在org.apache.