MongoDB GridFS规范

This is being changed for 2.4.10 and 2.6.0-rc3. Tyler Brock‘s explanation:

Now that the server uses power of 2 by default, if the default chunk size for gridfs is 256k we will almost always be throwing away some storage space. This is because if the bindata field of a chunk will occupy 256k (an exact power of 2), then _id and foreign key reference to the files collection, etc will take up additional space that will cause the document‘s allocated storage to be rounded up to 512k (the next power of 2). This would be a huge waste.

Instead, if we make the default chunk size 255k then we have an extra 1k to store the _id and other metadata so that when the document is persisted we round up to 256k and not 512k upon persisting the document.

MongoDB从2.4.10开始将默认的chunkSize修改为255KB，之前都是256KB。上面这段话说明了为什么要修改，原来mongodb的服务器总是以2^n个字节获取空间的，当默认设置的chunkSize为256K的时候，binaryData将会消耗掉256K的空间，而其他的字段如_id, file_ids 和 n 就会占用额外的几十个字节的空间。这样一来就会超过256K，那么服务器就会给每一个chunk分配512K，这样浪费就大了。。。。。

The `chunks` Collection

Each document in the chunks collection represents a distinct chunk of a file as represented in the GridFSstore. The following is a prototype document from the chunks collection.:

{
  "_id" : <ObjectId>,
  "files_id" : <ObjectId>,
  "n" : <num>,
  "data" : <binary>
}

A document from the chunks collection contains the following fields:

chunks._id: The unique ObjectId of the chunk.

chunks.files_id: The _id of the “parent” document, as specified in the files collection.

chunks.n: The sequence number of the chunk. GridFS numbers all chunks, starting with 0.

chunks.data: The chunk’s payload as a BSON binary type.

The chunks collection uses a compound index on files_id and n, as described in GridFS Index.

The `files` Collection

Each document in the files collection represents a file in the GridFS store. Consider the following prototype of a document in the files collection:

{
  "_id" : <ObjectId>,
  "length" : <num>,
  "chunkSize" : <num>,
  "uploadDate" : <timestamp>,
  "md5" : <hash>,

  "filename" : <string>,
  "contentType" : <string>,
  "aliases" : <string array>,
  "metadata" : <dataObject>,
}

Documents in the files collection contain some or all of the following fields. Applications may create additional arbitrary fields:

files._id: The unique ID for this document. The _id is of the data type you chose for the original document. The default type for MongoDB documents is BSON ObjectId.

files.length: The size of the document in bytes.

files.chunkSize

The size of each chunk. GridFS divides the document into chunks of the size specified here. The default size is 255 kilobytes.

Changed in version 2.4.10: The default chunk size changed from 256k to 255k.

files.uploadDate: The date the document was first stored by GridFS. This value has the Date type.

files.md5: An MD5 hash returned by the filemd5 command. This value has the String type.

files.filename: Optional. A human-readable name for the document.

files.contentType: Optional. A valid MIME type for the document.

files.aliases: Optional. An array of alias strings.

files.metadata: Optional. Any additional information you want to store.

MongoDB GridFS规范

时间： 2024-09-26 23:23:38

MongoDB GridFS规范

The `chunks` Collection

The `files` Collection