MongoDB聚合管道

通过上一篇文章中，认识了MongoDB中四个聚合操作，提供基本功能的count、distinct和group，还有可以提供强大功能的mapReduce。

在MongoDB的2.2版本以后，聚合框架中多了一个新的成员，聚合管道，数据进入管道后就会经过一级级的处理，直到输出。

对于数据量不是特别大，逻辑也不是特别复杂的聚合操作，聚合管道还是比mapReduce有很多优势的：

相比mapReduce，聚合管道比较容易理解和使用
可以直接使用管道表达式操作符，省掉了很多自定义js function，一定程度上提高执行效率
和mapReduce一样，它也可以作用于分片集合

但是，对于数据量大，逻辑复杂的聚合操作，还是要使用mapReduce实现。

聚合管道

在聚合管道中，每一步操作（管道操作符）都是一个工作阶段（stage），所有的stage存放在一个array中。MongoDB文档中的描述如下：

db.collection.aggregate( [ { <stage> }, ... ] )

在聚合管道中，每一个stage都对应一个管道操作符，根据MongoDB文档，聚合管道可以支持以下管道操作符：

Name	Description
$geoNear	Returns an ordered stream of documents based on the proximity to a geospatial point. Incorporates the functionality of $match, $sort, and $limit for geospatial data. The output documents include an additional distance field and can include a location identifier field.
$group	Groups input documents by a specified identifier expression and applies the accumulator expression(s), if specified, to each group. Consumes all input documents and outputs one document per each distinct group. The output documents only contain the identifier field and, if specified, accumulated fields.
$limit	Passes the first n documents unmodified to the pipeline where n is the specified limit. For each input document, outputs either one document (for the first n documents) or zero documents (after the first n documents).
$match	Filters the document stream to allow only matching documents to pass unmodified into the next pipeline stage. $match uses standard MongoDB queries. For each input document, outputs either one document (a match) or zero documents (no match).
$out	Writes the resulting documents of the aggregation pipeline to a collection. To use the $out stage, it must be the last stage in the pipeline.
$project	Reshapes each document in the stream, such as by adding new fields or removing existing fields. For each input document, outputs one document.
$redact	Reshapes each document in the stream by restricting the content for each document based on information stored in the documents themselves. Incorporates the functionality of $project and $match. Can be used to implement field level redaction. For each input document, outputs either one or zero document.
$skip	Skips the first n documents where n is the specified skip number and passes the remaining documents unmodified to the pipeline. For each input document, outputs either zero documents (for the first n documents) or one document (if after the first n documents).
$sort	Reorders the document stream by a specified sort key. Only the order changes; the documents remain unmodified. For each input document, outputs one document.
$unwind	Deconstructs an array field from the input documents to output a document for each element. Each output document replaces the array with an element value. For each input document, outputs n documents where n is the number of array elements and can be zero for an empty array.

下面通过具体的例子看看聚合管道的基本用法：

首先，通过以下代码准备测试数据：

 1 var dataList = [
 2     { "name" : "Will0", "gender" : "Female", "age" : 22 , "classes": ["MongoDB", "C#", "C++"]},
 3     { "name" : "Will1", "gender" : "Female", "age" : 20 , "classes": ["Node", "JavaScript"]},
 4     { "name" : "Will2", "gender" : "Male", "age" : 24 , "classes": ["Java", "WPF", "C#"]},
 5     { "name" : "Will3", "gender" : "Male", "age" : 23 , "classes": ["WPF", "C",]},
 6     { "name" : "Will4", "gender" : "Male", "age" : 21 , "classes": ["SQL", "HTML"]},
 7     { "name" : "Will5", "gender" : "Male", "age" : 20 , "classes": ["DOM", "CSS", "HTML5"]},
 8     { "name" : "Will6", "gender" : "Female", "age" : 20 , "classes": ["PPT", "Word", "Excel"]},
 9     { "name" : "Will7", "gender" : "Female", "age" : 24 , "classes": ["HTML5", "C#"]},
10     { "name" : "Will8", "gender" : "Male", "age" : 21 , "classes": ["Java", "VB", "BASH"]},
11     { "name" : "Will9", "gender" : "Female", "age" : 24 , "classes": ["CSS"]}
12 ]
13
14 for(var i = 0; i < dataList.length; i++){
15     db.school.students.insert(dataList[i]);
16 }

$project

$project主要用于数据投影，实现字段的重命名、增加和删除。下面例子中，重命名了"name"字段，增加了"birthYear"字段，删除了"_id"字段。

 1 > db.runCommand({
 2 ...     "aggregate": "school.students",
 3 ...     "pipeline": [
 4 ...                         {"$match": {"age": {"$lte": 22}}},
 5 ...                         {"$project": {"_id": 0, "studentName": "$name", "gender": 1, "birthYear": {"$subtract": [2014, "$age"]}}},
 6 ...                         {"$sort": {"birthYear":1}}
 7 ...                     ]
 8 ... })
 9 {
10         "result" : [
11                 {
12                         "gender" : "Female",
13                         "studentName" : "Will0",
14                         "birthYear" : 1992
15                 },
16                 {
17                         "gender" : "Male",
18                         "studentName" : "Will4",
19                         "birthYear" : 1993
20                 },
21                 {
22                         "gender" : "Male",
23                         "studentName" : "Will8",
24                         "birthYear" : 1993
25                 },
26                 {
27                         "gender" : "Female",
28                         "studentName" : "Will1",
29                         "birthYear" : 1994
30                 },
31                 {
32                         "gender" : "Male",
33                         "studentName" : "Will5",
34                         "birthYear" : 1994
35                 },
36                 {
37                         "gender" : "Female",
38                         "studentName" : "Will6",
39                         "birthYear" : 1994
40                 }
41         ],
42         "ok" : 1
43 }
44 >

$unwind

$unwind用来将数组拆分为独立字段。

 1 > db.runCommand({
 2 ...     "aggregate": "school.students",
 3 ...     "pipeline": [
 4 ...                         {"$match": {"age": 23}},
 5 ...                         {"$project": {"name": 1, "gender": 1, "age": 1, "classes": 1}},
 6 ...                         {"$unwind": "$classes"}
 7 ...                     ]
 8 ... })
 9 {
10         "result" : [
11                 {
12                         "_id" : ObjectId("54805220e31c9e1578ed0ccc"),
13                         "name" : "Will3",
14                         "gender" : "Male",
15                         "age" : 23,
16                         "classes" : "WPF"
17                 },
18                 {
19                         "_id" : ObjectId("54805220e31c9e1578ed0ccc"),
20                         "name" : "Will3",
21                         "gender" : "Male",
22                         "age" : 23,
23                         "classes" : "C"
24                 }
25         ],
26         "ok" : 1
27 }
28 >

$group

跟单个的group用法类似，用作分组操作。

使用$group时，必须要指定一个_id域，同时也可以包含一些算术类型的表达式操作符。

下面的例子就是用聚合管道实现得到男生和女生的平均年龄。

 1 > db.runCommand({
 2 ...     "aggregate": "school.students",
 3 ...     "pipeline": [
 4 ...                     {"$group":{"_id":"$gender", "avg": { "$avg": "$age" } }}
 5 ...                  ]
 6 ... })
 7 {
 8         "result" : [
 9                 {
10                         "_id" : "Male",
11                         "avg" : 21.8
12                 },
13                 {
14                         "_id" : "Female",
15                         "avg" : 22
16                 }
17         ],
18         "ok" : 1
19 }
20 >

查询男生和女生的最大年龄

 1 > db.runCommand({
 2 ...     "aggregate": "school.students",
 3 ...     "pipeline": [
 4 ...                         {"$group": {"_id": "$gender", "max": {"$max": "$age"}}}
 5 ...                     ]
 6 ... })
 7 {
 8         "result" : [
 9                 {
10                         "_id" : "Male",
11                         "max" : 24
12                 },
13                 {
14                         "_id" : "Female",
15                         "max" : 24
16                 }
17         ],
18         "ok" : 1
19 }
20 >

管道操作符、管道表达式和表达式操作符

上面的例子中，已经接触了这几个概念，下面进行进一步介绍。

管道操作符作为"键"，所对应的"值"叫做管道表达式，例如"{"$group": {"_id": "$gender", "max": {"$max": "$age"}}}"

$group是管道操作符
{"_id": "$gender", "max": {"$max": "$age"}}是管道表达式
$gender在文档中叫做"field path"，通过"$"符号来获得特定字段，是管道表达式的一部分
$max是表达式操作符，正是因为聚合管道中提供了很多表达式操作符，我们可以省去很多自定义js函数

下面列出了常用的表达式操作符，更详细的信息，请参阅MongoDB文档

Boolean Expressions
- $and, $or, $ not
Comparison Expressions
- $cmp, $eq, $gt, $gte, $lt, $lte, $ne
Arithmetic Expressions
- $add, $divide, $mod, $multiply, $subtract
String Expressions
- $concat, $strcasecmp, $substr, $toLower, $toUpper

总结

聚合管道提供了一种mapReduce 的替代方案，mapReduce使用相对来说比较复杂，而聚合管道的拥有固定的接口，一系列可选择的表达式操作符，对于大多数的聚合任务，聚合管道一般来说是首选方法。

Ps: 文章中使用的例子可以通过以下链接查看

http://files.cnblogs.com/wilber2013/pipeline.js

时间： 2024-12-25 03:00:39

MongoDB聚合管道的相关文章

MongoDB 聚合管道（Aggregation Pipeline）

MongoDB 聚合管道(Aggregation Pipeline) - 张善友时间 2013-12-27 22:40:00 博客园_张善友相似文章 (0)原文 http://www.cnblogs.com/shanyou/p/3494854.html添加到推刊收藏到推刊创建推刊收藏取消已收藏到推刊! 创建推刊 × Modal header --> 请填写推刊名描述不能大于100个字符! 权限设置: 公开仅自己可见

MongoDB基础教程系列--第七篇 MongoDB 聚合管道

在讲解聚合管道(Aggregation Pipeline)之前,我们先介绍一下 MongoDB 的聚合功能,聚合操作主要用于对数据的批量处理,往往将记录按条件分组以后,然后再进行一系列操作,例如,求最大值.最小值.平均值,求和等操作.聚合操作还能够对记录进行复杂的操作,主要用于数理统计和数据挖掘.在 MongoDB 中,聚合操作的输入是集合中的文档,输出可以是一个文档,也可以是多条文档. MongoDB 提供了非常强大的聚合操作,有三种方式: 聚合管道(Aggregation Pipeline)

MongoDB 聚合管道（aggregate）

1.聚合函数查询总数 .count() > db.userinfo.count() 3 > db.userinfo.find() { "_id" : 1, "name" : "郭大爷", "sex" : "男", "age" : "80" } { "_id" : 2, "name" : "郭老师"

【翻译】MongoDB指南/聚合——聚合管道

[原文地址]https://docs.mongodb.com/manual/ 聚合聚合操作处理数据记录并返回计算后的结果.聚合操作将多个文档分组,并能对已分组的数据执行一系列操作而返回单一结果.MongoDB提供了三种执行聚合的方式:聚合管道,map-reduce方法和单一目的聚合操作. 聚合管道 MongoDB的聚合框架模型建立在数据处理管道这一概念的基础之上.文档进入多阶段管道中,管道将文档转换为聚合结果.最基本的管道阶段类似于查询过滤器和修改输出文档形式的文档转换器. 其他的管道为分组和

Mongodb中数据聚合之聚合管道aggregate

在之前的两篇文章<Mongodb中数据聚合之基本聚合函数count.distinct.group>和<Mongodb中数据聚合之MapReduce>中,我们已经对数据聚合提供了两种实现方式,今天,在这篇文章中,我们讲讲在Mongodb中的另外一种数据聚合实现方式--聚合管道aggregate. 面对着广大用户对数据统计的需求,Mongodb从2.2版本之后便引入了新的功能聚合框架(aggregation framework),它是数据聚合的新框架,这个概念类似于数据处理中的管道.每

mongodb聚合查询

MongoDB中聚合(aggregate)主要用于处理数据(诸如统计平均值,求和等),并返回计算后的数据结果.有点类似sql语句中的 count(*). $sum 计算总和. db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$sum : "$likes"}}}]) $avg 计算平均值 db.mycol.aggregate([{$group : {_id : "$by_use

mongodb MongoDB 聚合 group

MongoDB 聚合 MongoDB中聚合(aggregate)主要用于处理数据(诸如统计平均值,求和等),并返回计算后的数据结果.有点类似sql语句中的 count(*). 基本语法为:db.collection.aggregate( [ <stage1>, <stage2>, ... ] ) 现在在mycol集合中有以下数据: { "_id" : 1, "name" : "tom", "sex" :

mongodb MongoDB 聚合 group（转）

【mongoDB查询进阶】聚合管道(二) -- 阶段操作符

https://segmentfault.com/a/1190000010826809 什么是管道操作符(Aggregation Pipeline Operators) mongoDB有4类操作符用于文档的操作,例如find查询里面会用到的$gte,$in等.操作符以$开头,分为查询操作符,更新操作符,管道操作符,查询修饰符4大类.其中管道操作符是用于聚合管道中的操作符. 管道操作符的分类管道操作符可以分为三类: 阶段操作符(Stage Operators) 表达式操作符(Expression