leveldb学习之sstable(2)

block写入：block_builder

block.h和.cc里定义了block的entry存储格式和restart，提供了entry的查找接口以及迭代器。那么如何往写block里写entry呢？leveldb遵循面向对象的设计理念在block_builder类里提供了相关接口。

BlockBuilder相关函数：

Add( )将entry顺序写入现有block数据块的末端，排序工作在上层的函数完成。
Finish( )，当block写满，完成写入重启点数组和重启点个数的写入
Reset( )，重置block

sstable

已经知道，sstable是leveldb中持久化数据的文件格式。而整体来看，sstable由数据（data）和元信息（meta/index）组成，数据和源信息统一以block单位存储，读取时也按统一的逻辑读取，整体的数据格式如下：

data_block：实际存储的kv数据
meta_block：当前版本未实现
index_block：保存每个data_block的last_key及其在sstable文件中的索引

sstable读取：table

/table/table.cc是有关将sstable读取的操作：

 private:
  struct Rep;
  Rep* rep_;

定义了结构rep，并在table类设立一个指针成员。并在table::open( )函数完成了rep_的实例化

Rep* rep = new Table::Rep;

rep结构：

struct Table::Rep {
  ~Rep() {
    delete filter;
    delete [] filter_data;
    delete index_block;
  }

  Options options;//用户设置
  Status status;//状态
  RandomAccessFile* file;//文件读操作流，主要成员有文件的名字，i节点和读操作
  uint64_t cache_id;
  FilterBlockReader* filter;//和meta_block有关，不用管
  const char* filter_data;//

  BlockHandle metaindex_handle;  // Handle to metaindex_block: saved from footer
  Block* index_block;
};

BlockHandle是一个用来指向block在文件中位置的“指针”（里面记录的是文件偏移量），可参考format.h；

footer：文件末尾的固定长度的数据，保存着metaindex_block和index_block的索引信息（blockHandle），最后有8字节的magic校验。显然footer信息的读取对掌握整个table至关重要。

在table::open( )函数中就会从文件的末尾读取footer：

......
  Slice footer_input;
  Status s = file->Read(size - Footer::kEncodedLength, Footer::kEncodedLength,
                        &footer_input, footer_space);
  if (!s.ok()) return s;

  Footer footer;
  s = footer.DecodeFrom(&footer_input);
  if (!s.ok()) return s;

......
  Block* index_block = NULL;
  if (s.ok()) {
    s = ReadBlock(file, opt, footer.index_handle(), &contents);
    if (s.ok()) {
      index_block = new Block(contents);
    }
  }

  ......
    rep->file = file;
    rep->metaindex_handle = footer.metaindex_handle();
    rep->index_block = index_block;

readBlock就是通过blockhandle读取文件中指定block的函数，定义在format.cc

Status ReadBlock(RandomAccessFile* file,
                 const ReadOptions& options,
                 const BlockHandle& handle,
                 BlockContents* result) {
  result->data = Slice();
  result->cachable = false;
  result->heap_allocated = false;
//blockcontents的初始化

  // Read the block contents as well as the type/crc footer.
  // See table_builder.cc for the code that built this structure.
  size_t n = static_cast<size_t>(handle.size());
  char* buf = new char[n + kBlockTrailerSize];
  Slice contents;
  Status s = file->Read(handle.offset(), n + kBlockTrailerSize, &contents, buf);
  if (!s.ok()) {
    delete[] buf;
    return s;
  }
  if (contents.size() != n + kBlockTrailerSize) {
    delete[] buf;
    return Status::Corruption("truncated block read");
  }

  // do something
  return Status::OK();
}

kBlockTrailerSize就是每个block末端的五字节信息，包括压缩标志位和用于CRC校验的开销。do something 就是对提取到的内容分析，判断有无压缩，错误时返回状态信息以及赋值result。

sstable写入：table_builder

sstable写如不需要关心排序，因为sstable的产生是由memtable dump或者compact时merge排序产生的，key的顺序上层已经保证。

结构rep：

struct TableBuilder::Rep {
  Options options;
  Options index_block_options;
  WritableFile* file;//封装了流操作的文件
  uint64_t offset;//写入位置的偏移量
  Status status;
  BlockBuilder data_block; // 用于将entry写入当前data_block
  BlockBuilder index_block;// 用于在index_block添加data_block的索引信息
  std::string last_key; //当前table中最后条目的key，写入key要大于此，否则上层未提供排好序的entry
  int64_t num_entries;  //条目总数
  bool closed;          //table关闭标志位， Either Finish() or Abandon() has been called.
  FilterBlockBuilder* filter_block;

  bool pending_index_entry;//当前block为空时为true
  BlockHandle pending_handle;  // Handle to add to index block

  std::string compressed_output;

  Rep();
  }

void TableBuilder::Add(const Slice& key, const Slice& value)
{
  Rep* r = rep_;
  assert(!r->closed);
  if (!ok()) return;
  if (r->num_entries > 0) {
    assert(r->options.comparator->Compare(key, Slice(r->last_key)) > 0);
  }
    ......
  r->last_key.assign(key.data(), key.size());
  r->num_entries++;
  r->data_block.Add(key, value);

  const size_t estimated_block_size = r->data_block.CurrentSizeEstimate();
  if (estimated_block_size >= r->options.block_size) {
    Flush();
  }
}

leveldb把数据dump到磁盘，在内存中只有一份block，当block满了（大于options.block_size），就自动将此block写入磁盘（Flush）。

写入操作调用层次：

Add( )，写入内存中的block，判断block大小，决定是否写入磁盘
Flush( )
WriteBlock( )，取block压缩标志位决定是否压缩，写入压缩标志位
WriteRawBlock( )，添加CRC，调用文件流写入磁盘

时间： 2024-08-09 22:01:55

leveldb学习之sstable(2)

block写入：block_builder

sstable

sstable读取：table

sstable写入：table_builder

leveldb学习之sstable(2)的相关文章

leveldb学习：sstable(2)

LevelDb学习资料

leveldb学习笔记

leveldb 学习。

leveldb学习：skiplist

leveldb学习：versionedit和versionset

leveldb学习之version

leveldb学习：dbimpl(1)

leveldb学习：Cache