细说Lucene源码(一):索引文件锁机制

大家都知道,在多线程或多进程的环境中,对统一资源的访问需要特别小心,特别是在写资源时,如果不加锁,将会导致很多严重的后果,Lucene的索引也是如此,lucene对索引的读写分为IndexReader和IndexWriter,顾名思义,一个读,一个写,lucene可以对同一个索引文件建立多个IndexReader对象,但是只能有一个IndexWriter对象,这是怎么做到的呢?显而易见是需要加锁的,加锁可以保证一个索引文件只能建立一个IndexWriter对象。下面就细说Lucene索引文件锁机制:

如果我们对同一个索引文件建立多个不同的IndexWriter会怎么样呢?

IndexWriterConfig indexWriterConfig = new IndexWriterConfig(analyzer);

IndexWriter indexWriter = new IndexWriter(dir, indexWriterConfig);

IndexWriterConfig indexWriterConfig2 = new IndexWriterConfig(analyzer);

IndexWriter indexWriter2 = new IndexWriter(dir,indexWriterConfig2);

运行后,控制台输出:

Exception in thread "main" org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: [email protected]:\Users\new\Desktop\Lucene\write.lock

    at org.apache.lucene.store.Lock.obtain(Lock.java:89)

    at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:755)

    at test.Index.index(Index.java:51)

    at test.Index.main(Index.java:78)

显然是不可以对同一个索引文件开启多个IndexWriter。

上面是一个比较简略的类图,可以看到lucene采用了工厂方法,这样可以方便扩展其他实现,这里只以SimpleFsLock为例说明lucene的锁机制(其他的有兴趣可以看lucene源码)。

Lock类是锁的基类,一个抽象类,源码如下:

public abstract class Lock implements Closeable {

  /** How long {@link #obtain(long)} waits, in milliseconds,
   *  in between attempts to acquire the lock. */
  public static long LOCK_POLL_INTERVAL = 1000;

  /** Pass this value to {@link #obtain(long)} to try
   *  forever to obtain the lock. */
  public static final long LOCK_OBTAIN_WAIT_FOREVER = -1;

  /** Attempts to obtain exclusive access and immediately return
   *  upon success or failure.  Use {@link #close} to
   *  release the lock.
   * @return true iff exclusive access is obtained
   */
  public abstract boolean obtain() throws IOException;

  /**
   * If a lock obtain called, this failureReason may be set
   * with the "root cause" Exception as to why the lock was
   * not obtained.
   */
  protected Throwable failureReason;

  /** Attempts to obtain an exclusive lock within amount of
   *  time given. Polls once per {@link #LOCK_POLL_INTERVAL}
   *  (currently 1000) milliseconds until lockWaitTimeout is
   *  passed.
   * @param lockWaitTimeout length of time to wait in
   *        milliseconds or {@link
   *        #LOCK_OBTAIN_WAIT_FOREVER} to retry forever
   * @return true if lock was obtained
   * @throws LockObtainFailedException if lock wait times out
   * @throws IllegalArgumentException if lockWaitTimeout is
   *         out of bounds
   * @throws IOException if obtain() throws IOException
   */
  public final boolean obtain(long lockWaitTimeout) throws IOException {
    failureReason = null;
    boolean locked = obtain();
    if (lockWaitTimeout < 0 && lockWaitTimeout != LOCK_OBTAIN_WAIT_FOREVER)
      throw new IllegalArgumentException("lockWaitTimeout should be LOCK_OBTAIN_WAIT_FOREVER or a non-negative number (got " + lockWaitTimeout + ")");

    long maxSleepCount = lockWaitTimeout / LOCK_POLL_INTERVAL;
    long sleepCount = 0;
    while (!locked) {
      if (lockWaitTimeout != LOCK_OBTAIN_WAIT_FOREVER && sleepCount++ >= maxSleepCount) {
        String reason = "Lock obtain timed out: " + this.toString();
        if (failureReason != null) {
          reason += ": " + failureReason;
        }
        throw new LockObtainFailedException(reason, failureReason);
      }
      try {
        Thread.sleep(LOCK_POLL_INTERVAL);
      } catch (InterruptedException ie) {
        throw new ThreadInterruptedException(ie);
      }
      locked = obtain();
    }
    return locked;
  }

  /** Releases exclusive access. */
  public abstract void close() throws IOException;

  /** Returns true if the resource is currently locked.  Note that one must
   * still call {@link #obtain()} before using the resource. */
  public abstract boolean isLocked() throws IOException;

  /** Utility class for executing code with exclusive access. */
  public abstract static class With {
    private Lock lock;
    private long lockWaitTimeout;

    /** Constructs an executor that will grab the named lock. */
    public With(Lock lock, long lockWaitTimeout) {
      this.lock = lock;
      this.lockWaitTimeout = lockWaitTimeout;
    }

    /** Code to execute with exclusive access. */
    protected abstract Object doBody() throws IOException;

    /** Calls {@link #doBody} while <i>lock</i> is obtained.  Blocks if lock
     * cannot be obtained immediately.  Retries to obtain lock once per second
     * until it is obtained, or until it has tried ten times. Lock is released when
     * {@link #doBody} exits.
     * @throws LockObtainFailedException if lock could not
     * be obtained
     * @throws IOException if {@link Lock#obtain} throws IOException
     */
    public Object run() throws IOException {
      boolean locked = false;
      try {
         locked = lock.obtain(lockWaitTimeout);
         return doBody();
      } finally {
        if (locked) {
          lock.close();
        }
      }
    }
  }

}

里面最重要的方法就是obtain(),这个方法用来维持锁,建立锁之后,维持时间为LOCK_POLL_INTERVAL,之后需要重新申请维持锁,这样做是为了支持多线程读写。当然也可以将lockWaitTimeout设置为-1,这样就是一直维持写锁。

抽象基类LockFactory,只定义了一个抽象方法makeLock,返回Lock对象的一个实例。

public abstract class LockFactory {

  /**
   * Return a new Lock instance identified by lockName.
   * @param lockName name of the lock to be created.
   */
  public abstract Lock makeLock(Directory dir, String lockName);

}

抽象类FSLockFactory继承Lock:

public abstract class FSLockFactory extends LockFactory {

  /** Returns the default locking implementation for this platform.
   * This method currently returns always {@link NativeFSLockFactory}.
   */
  public static final FSLockFactory getDefault() {
    return NativeFSLockFactory.INSTANCE;
  }

  @Override
  public final Lock makeLock(Directory dir, String lockName) {
    if (!(dir instanceof FSDirectory)) {
      throw new UnsupportedOperationException(getClass().getSimpleName() + " can only be used with FSDirectory subclasses, got: " + dir);
    }
    return makeFSLock((FSDirectory) dir, lockName);
  }

  /** Implement this method to create a lock for a FSDirectory instance. */
  protected abstract Lock makeFSLock(FSDirectory dir, String lockName);

}

可以看到

public static final FSLockFactory getDefault() {

return NativeFSLockFactory.INSTANCE;

}

这个方法默认返回NativeFSLockFactory,和SimpleFSLockFactory一样是一个具体实现,NativeFSLockFactory使用的是nio中FileChannel.tryLock方法,这里不展开讨论,有兴趣的读者可以去看jdk nio的源码(好像现在oracle不提供FileChannel实现类的源码了,需要去jvm里找)。

下面就是本篇文章的重头戏,SimpleFSLockFactory

public final class SimpleFSLockFactory extends FSLockFactory {

  /**
   * Singleton instance
   */
  public static final SimpleFSLockFactory INSTANCE = new SimpleFSLockFactory();

  private SimpleFSLockFactory() {}

  @Override
  protected Lock makeFSLock(FSDirectory dir, String lockName) {
    return new SimpleFSLock(dir.getDirectory(), lockName);
  }

  static class SimpleFSLock extends Lock {

    Path lockFile;
    Path lockDir;

    public SimpleFSLock(Path lockDir, String lockFileName) {
      this.lockDir = lockDir;
      lockFile = lockDir.resolve(lockFileName);
    }

    @Override
    public boolean obtain() throws IOException {
      try {
        Files.createDirectories(lockDir);
        Files.createFile(lockFile);
        return true;
      } catch (IOException ioe) {
        // On Windows, on concurrent createNewFile, the 2nd process gets "access denied".
        // In that case, the lock was not aquired successfully, so return false.
        // We record the failure reason here; the obtain with timeout (usually the
        // one calling us) will use this as "root cause" if it fails to get the lock.
        failureReason = ioe;
        return false;
      }
    }

    @Override
    public void close() throws LockReleaseFailedException {
      // TODO: wierd that clearLock() throws the raw IOException...
      try {
        Files.deleteIfExists(lockFile);
      } catch (Throwable cause) {
        throw new LockReleaseFailedException("failed to delete " + lockFile, cause);
      }
    }

    @Override
    public boolean isLocked() {
      return Files.exists(lockFile);
    }

    @Override
    public String toString() {
      return "[email protected]" + lockFile;
    }
  }

}

在SimpleFSLockFactory定义了一个内部类SimpleFSLock继承Lock,我们还是主要看SimpleFSLockFactory的obtain方法,这里就是SimpleFSLock具体实现文件锁的代码。

Files.createDirectories(lockDir);

Files.createFile(lockFile);

可以看着两行代码,createDirectories建立write.lock(可以是别的文件名,lucene默认使用write.lock)文件所在的文件夹及父文件夹。createFile则是创建write.lock文件,这里有一个精妙的地方,如果write.lock已经存在,那么createFile则会抛出异常,如果抛出异常,则表明SimpleFSLockFactory维持文件锁失败,也即意味着别的进程正在写索引文件。

看到close()方法中Files.deleteIfExists(lockFile); 就表示如果每次关闭IndexWriter,则会删除write.lock文件。

总结一下,SimpleFSLockFactory加文件锁的机制可以通俗的理解为,在索引文件所在的目录下,创建一个write.lock文件,如果此文件夹下已经有write.lock文件,则表明已经有其他进程在写当前的索引目录,所以此次添加文件锁失败,也即不能像索引文件中添加信息。每次添加完信息后,则会删除write.lock文件,释放文件锁。也即如果write.lock文件存在,就表明已经有进程在写索引文件,如果write.lock不存在就创建文件并添加了文件锁,别的进程不能写文件。

这是一个非常精妙的方式去实现写文件排他锁,当然可能有些读者会疑惑为什么自己在Demo中,创建完索引,close后还有write.lock文件存在,因为现在lucene的默认实现是NativeFSLockFactory,也是上文提及的使用nio调用本地方法去实现的lock。

时间: 2024-12-16 09:10:14

细说Lucene源码(一):索引文件锁机制的相关文章

nginx的源码分析--间接回调机制的使用和类比

nginx使用了间接回调机制,结合upstream机制的使用来说明一下,首先明确几个事实: 1)其实ngxin和下游客户端的连接使用的是ngx_http_connection_t,每个连接对应着一个读事件.一个写事件,epoll监听队列监听的是事件(ngx_event_t),但是事件的data字段对应于这个事件所属的连接(ngx-connection_t).但是nginx和上游服务器之间的连接使用的ngx_peer_connection_t,其实ngx_peer_connection_t是ngx

lucene源码分析的一些资料

针对lucene6.1较新的分析:http://46aae4d1e2371e4aa769798941cef698.devproxy.yunshipei.com/conansonic/article/details/51849659 老的: Annotated Lucene(源码剖析中文版) Lucene 原理与代码分析完整版

看Lucene源码必须知道的基本规则和算法

上中学的时候写作文,最喜欢的季节我都是写冬天.虽然是因为写冬天的人比较少,那时确实也是对其他季节没有什么特殊的偏好,反而一到冬天,自己皮肤会变得特别白.但是冬天啊,看到的只有四季常青盆栽:瓜栗(就是发财树,好吧,算我矫情,反正我不喜欢这个名字),绿萝,永远看不到它开花的巴西铁,富贵竹,散尾葵……过年的时候家里的杜鹃就开花了,零星的几朵小花儿更突显了这个季节的凄凉.红掌,蝴蝶兰总是美美的在那里,开不败却看不到生机.插到水里的勿忘我,洋桔梗,看到他们也只会联想到过几天他们会枯萎的命运.春天来了,先是

看Lucene源码必须知道的基本概念

终于有时间总结点Lucene,虽然是大周末的,已经感觉是对自己的奖励,毕竟只是喜欢,现在的工作中用不到的.自己看源码比较快,看英文原著的技术书也很快.都和语言有很大关系.虽然咱的技术不敢说是部门第一的,说到日语和英语,倒是无人能出其右的.额~~,一个做技术的,感觉自己好弱啊.对语言,只是天赋而已.对技术,却是痴迷.虽然有人跟我说我不做管理白瞎了我这个人儿.但是我就一心想做技术,如果到了40岁,做技术没人要的话.我就去硅谷编代码去,毕竟硅谷的同事都说我技术挺好的,相信找个技术活儿还是不成问题的.话

MySQL源码:索引相关的数据结构

http://www.orczhou.com/index.php/2012/11/mysql-source-code-data-structure-about-index/ 本文将尝试介绍MySQL索引存储相关的数据结构.程序=数据结构+算法,了解数据结构,然后就可以进一步了解MySQL源码中如何使用索引,如何选择自己的执行计划. 目录 [hide] 1. MySQL如何描述某个数据表的索引 2. GDB打印观察索引信息 2.1 打印索引基本信息 2.2 打印索引某一列的基本信息 2.3 打印索

【Zookeeper】源码分析之Watcher机制(一)

一.前言 前面已经分析了Zookeeper持久话相关的类,下面接着分析Zookeeper中的Watcher机制所涉及到的类. 二.总体框图 对于Watcher机制而言,主要涉及的类主要如下. 说明: Watcher,接口类型,其定义了process方法,需子类实现. Event,接口类型,Watcher的内部类,无任何方法. KeeperState,枚举类型,Event的内部类,表示Zookeeper所处的状态. EventType,枚举类型,Event的内部类,表示Zookeeper中发生的事

从使用Handler致内存泄漏角度源码追踪Handler工作机制

使用Handler时内存泄漏分析 在Android中,处理完异步任务后常常会在主线程进行一些操作,所以我们可能会使用到Handler,下面是Handler的常见使用方法: public class MainActivity extends AppCompatActivity { private Handler mHanlder = new Handler() { @Override public void handleMessage(Message msg) { //TODO } }; } 但是

【腾讯Bugly干货分享】深入源码探索 ReactNative 通信机制

Bugly 技术干货系列内容主要涉及移动开发方向,是由 Bugly 邀请腾讯内部各位技术大咖,通过日常工作经验的总结以及感悟撰写而成,内容均属原创,转载请标明出处. 本文从源码角度剖析 RNA 中 Java <> Js 的通信机制(基于最新的 RNA Release 20). 对于传统 Java<>Js 通信而言,Js 调用 Java 通不外乎 Jsbridge.onprompt.log 及 addjavascriptinterface 四种方式,在 Java 调用 Js 只有 l

短视频app源码开发经验分享——功能机制篇(上)

继直播平台爆红之后,短视频APP也在2018年"重获新生",在快手短视频低调运营的同时,今日头条旗下的抖音APP,西瓜视频等持续发力,目前抖音短视频已成为国内"超好玩的短视频APP",随后腾讯也布局短视频市场,推出了"腾讯微视".一时间,短视频市场"硝烟弥漫",既然短视频这么火爆,我们今天就来聊下短视频APP的部分主要功能的机制问题.一.推荐机制 短视频推荐列表根据推荐值进行排列,这里提供一个简单的推荐机制方案:推荐值=(曝光