lucene包结构

lucene 2.2包结构:

analysis不做详细介绍,因为在实际开发中会使用对中文支持的庖丁分词来做为分词器。

document:是写索引的时候的非常重要的一个工具,要把原始数据转为一个个document,然后进行write.

index:写索引的核心包

queryParser:搜索时候的解析器。

search:搜索的核心包

store:在写索引的时候,可以控制哪些field是要存储的。。。

util:工具包。

index:

index的头注释:

An <code>IndexWriter</code> creates and maintains an index.

<p>The <code>create</code> argument to the
  <a href="#IndexWriter(org.apache.lucene.store.Directory, org.apache.lucene.analysis.Analyzer, boolean)"><b>constructor</b></a>
  determines whether a new index is created, or whether an existing index is
  opened.  Note that you
  can open an index with <code>create=true</code> even while readers are
  using the index.  The old readers will continue to search
  the "point in time" snapshot they had opened, and won‘t
  see the newly created index until they re-open.  There are
  also <a href="#IndexWriter(org.apache.lucene.store.Directory, org.apache.lucene.analysis.Analyzer)"><b>constructors</b></a>
  with no <code>create</code> argument which
  will create a new index if there is not already an index at the
  provided path and otherwise open the existing index.</p>

<p>In either case, documents are added with <a
  href="#addDocument(org.apache.lucene.document.Document)"><b>addDocument</b></a>
  and removed with <a
  href="#deleteDocuments(org.apache.lucene.index.Term)"><b>deleteDocuments</b></a>.
  A document can be updated with <a href="#updateDocument(org.apache.lucene.index.Term, org.apache.lucene.document.Document)"><b>updateDocument</b></a>
  (which just deletes and then adds the entire document).
  When finished adding, deleting and updating documents, <a href="#close()"><b>close</b></a> should be called.</p>

<p>These changes are buffered in memory and periodically
  flushed to the {@link Directory} (during the above method
  calls).  A flush is triggered when there are enough
  buffered deletes (see {@link #setMaxBufferedDeleteTerms})
  or enough added documents since the last flush, whichever
  is sooner.  For the added documents, flushing is triggered
  either by RAM usage of the documents (see {@link
  #setRAMBufferSizeMB}) or the number of added documents.
  The default is to flush when RAM usage hits 16 MB.  For
  best indexing speed you should flush by RAM usage with a
  large RAM buffer.  You can also force a flush by calling
  {@link #flush}.  When a flush occurs, both pending deletes
  and added documents are flushed to the index.  A flush may
  also trigger one or more segment merges which by default
  run with a background thread so as not to block the
  addDocument calls (see <a href="#mergePolicy">below</a>
  for changing the {@link MergeScheduler}).</p>

<a name="autoCommit"></a>
  <p>The optional <code>autoCommit</code> argument to the
  <a href="#IndexWriter(org.apache.lucene.store.Directory, boolean, org.apache.lucene.analysis.Analyzer)"><b>constructors</b></a>
  controls visibility of the changes to {@link IndexReader} instances reading the same index.
  When this is <code>false</code>, changes are not
  visible until {@link #close()} is called.
  Note that changes will still be flushed to the
  {@link org.apache.lucene.store.Directory} as new files,
  but are not committed (no new <code>segments_N</code> file
  is written referencing the new files) until {@link #close} is
  called.  If something goes terribly wrong (for example the
  JVM crashes) before {@link #close()}, then
  the index will reflect none of the changes made (it will
  remain in its starting state).
  You can also call {@link #abort()}, which closes the writer without committing any
  changes, and removes any index
  files that had been flushed but are now unreferenced.
  This mode is useful for preventing readers from refreshing
  at a bad time (for example after you‘ve done all your
  deletes but before you‘ve done your adds).
  It can also be used to implement simple single-writer
  transactional semantics ("all or none").</p>

<p>When <code>autoCommit</code> is <code>true</code> then
  every flush is also a commit ({@link IndexReader}
  instances will see each flush as changes to the index).
  This is the default, to match the behavior before 2.2.
  When running in this mode, be careful not to refresh your
  readers while optimize or segment merges are taking place
  as this can tie up substantial disk space.</p>
 
  <p>Regardless of <code>autoCommit</code>, an {@link
  IndexReader} or {@link org.apache.lucene.search.IndexSearcher} will only see the
  index as of the "point in time" that it was opened.  Any
  changes committed to the index after the reader was opened
  are not visible until the reader is re-opened.</p>

<p>If an index will not have more documents added for a while and optimal search
  performance is desired, then the <a href="#optimize()"><b>optimize</b></a>
  method should be called before the index is closed.</p>

<p>Opening an <code>IndexWriter</code> creates a lock file for the directory in use. Trying to open
  another <code>IndexWriter</code> on the same directory will lead to a
  {@link LockObtainFailedException}. The {@link LockObtainFailedException}
  is also thrown if an IndexReader on the same directory is used to delete documents
  from the index.</p>
 
  <a name="deletionPolicy"></a>
  <p>Expert: <code>IndexWriter</code> allows an optional
  {@link IndexDeletionPolicy} implementation to be
  specified.  You can use this to control when prior commits
  are deleted from the index.  The default policy is {@link
  KeepOnlyLastCommitDeletionPolicy} which removes all prior
  commits as soon as a new commit is done (this matches
  behavior before 2.2).  Creating your own policy can allow
  you to explicitly keep previous "point in time" commits
  alive in the index for some time, to allow readers to
  refresh to the new commit without having the old commit
  deleted out from under them.  This is necessary on
  filesystems like NFS that do not support "delete on last
  close" semantics, which Lucene‘s "point in time" search
  normally relies on. </p>

<a name="mergePolicy"></a> <p>Expert:
  <code>IndexWriter</code> allows you to separately change
  the {@link MergePolicy} and the {@link MergeScheduler}.
  The {@link MergePolicy} is invoked whenever there are
  changes to the segments in the index.  Its role is to
  select which merges to do, if any, and return a {@link
  MergePolicy.MergeSpecification} describing the merges.  It
  also selects merges to do for optimize().  (The default is
  {@link LogByteSizeMergePolicy}.  Then, the {@link
  MergeScheduler} is invoked with the requested merges and
  it decides when and how to run the merges.  The default is
  {@link ConcurrentMergeScheduler}. </p>

时间: 2024-10-16 15:56:47

lucene包结构的相关文章

TCP/IP数据包结构具体解释

[关键词] TCP IP 数据包 结构 具体解释 网络 协议 一般来说,网络编程我们仅仅须要调用一些封装好的函数或者组件就能完毕大部分的工作,可是一些特殊的情况下,就须要深入的理解 网络数据包的结构,以及协议分析.如:网络监控,故障排查等-- IP包是不安全的,可是它是互联网的基础,在各方面都有广泛的应用.由IP协议衍生的协议族有10数种(据我所知),以后还会出现 很多其它的基于IP的协议- 先从实际出发吧! 一般我们在谈上网速度的时候,专业上用带宽来描写叙述,事实上不管说网速或者带宽都是不准确

Node.js入门:包结构

JavaScript缺少包结构.CommonJS致力于改变这种现状,于是定义了包的结构规范(http://wiki.commonjs.org/wiki/Packages/1.0 ).而NPM的出现则是为了在CommonJS规范的基础上,实现解决包的安装卸载,依赖管理,版本管理等问题.require的查找机制明了之后,我们来看一下包的细节. 一个符合CommonJS规范的包应该是如下这种结构: 一个package.json文件应该存在于包顶级目录下 二进制文件应该包含在bin目录下. JavaSc

USB的包结构及包分类

USB的传输总是低位在前,高位在后. USB的传输方向:从设备到主机的数据为输入:从主机到设备的数据叫做输出. 1. 包结构 以同步域开始,紧跟着一个包标识符PID(Packet Identifier),最终以包结束符EOP(End of Packet)结束这个包. 同步域 作用:① 通知USB串行接口引擎数据要开始传输:② 同步主机和设备之间的时钟. 格式:① 全速/低速设备的同步域为00000001:② 高速设备的同步域为31个0,后面跟1个1:注意:这是对发送端的要求,接收端在解码时,0的

全文检索概念,Lucene大致结构

1.1 常见的全文检索 1) 在window系统中,可以指定磁盘中的某一个位置来搜索你想要得到的东西. 2) 在myeclipse中,点击Help->Help Contents,可以利用搜索功能找到你要查询的帮助文档. 3) 在百度和google 中,可以搜索互联网中的信息,有:网页.pdf.word音频.视频等内容. 4) 在bbs系统中,有搜索文章的功能. 以上的查询功能都相似,都是查询的文本内容,查询方法也相似即找出含有指定字符串的资源.只不过是查询的范围不一样.(硬盘.帮助文件.互联网)

JDK源码包结构分类

最近查看JDK源码时,无意间发现几个类在陌生包里:com.sun.*.sun.*.org.*,google了一把总结了下以备他人搜索,如内容有误欢迎指正! Jre库包含的jar文件(jdk1.6):resources.jar.rt.jar.jsse.jar.jce.jar.charsets.jar.dnsns.jar.localedata.jar等共10个jar文件,其中resource.jar为资源包(图片.properties文件):rt.jar为运行时包,子包结构如下图: Java.*.j

文档:网络通讯包结构(crc校验,加解密)

包结构: 包 对(datacrc+protoID+dataSize)组成的byte[] 进行crc计算而得到 对(数据内容)进行crc计算而得到 协议号 数据内容的字节长度 数据内容 字段 headcrc datacrc protoID dataSize data 类型 uint uint ushort ushort byte[] 字节数 4 4 2 2 dataSize crc校验 问:TCP协议中,底层做了校验,那通信时我们还有必要再进行有CRC或者其他校验吗? 答:tcp 是可靠的, 就是

【SpringBoot】【1】解决不同包结构下,同样的类名冲突的问题

前言: 问题来源:参考博客的博主的情况是合并项目导致的冲突,我是不同的人写代码,命名重复了,改名又不好改(采取的改名方案是一方的代码加个前缀单词,看着挺累赘的) 报错原因:spring提供两种beanName生成策略,基于注解的sprong-boot默认使用的是AnnotationBeanNameGenerator,它生成beanName的策略就是,取当前类名(不是全限定类名)作为beanName.由此,如果出现不同包结构下同样的类名称,肯定会出现冲突. 正文: 1,自己写一个类实现 org.s

vue证明题三,vue项目的包结构和配置

用vue-cli创建的项目带有自动配置好的包结构,包结构都是固定的. 关于详细的解释,网上多得是,只说下最重要的内容 1.vue项目包结构和端口号配置 这里笔者下了个HBuilderX来写代码. 2.vue开发写什么? vue中编写的主要是.vue文件,如App.vue文件.大概结构如下图说明: 该vue文件的加载写在了main.js文件中,简要解析如下图: 仔细看来和当前访问的localhost:8080的页面的内容对不上,那么继续看下路由文件写了什么: 这里,路由文件指明了根路径下要加载一个

BLE的通道以及数据包结构

1. 通道(channel) 对于无线通信,数据是在某一频率上传输的,BLE采用频率是2.4GHz,频段范围是2.4000 GHz - 2.4835GHz,在这个范围内,又为40个通道,其中37,38,39通道是广播通道.其余37个通道是数据通道,如下图所示: 从图中可以看到,40个通道并不是线性递增分布的,3个广播通道是分散的,BLE数据传输过程中采用跳频技术,而跳频的计算要就要考虑跳过广播通道,3个广播分散的另一个好处就是有利于避开干扰. 如下图所示,广播通道避开了部分802.11WIFI信