MongoDB Aggregation with C#

MongoDBMongoDB is a very praticle and simple NoSQL database. It provides drivers in a lot of popular languages such as C, C++, C#, Java, Perl, PHP, Python, Scala and so on. MongoDB is a document-base database and well suited for Python or Javascrip applications. However, it is also of praticle interest for more typed languages and C# is a good example. A database is fine, but what we do of the data is even more interesting. Manipulating data, computing some analytics, doing reports is a great trend and I'll introduce you on how to do that with MongoBD. I've already introduce the Map-Reduce concept (http://www.codingtricks.biz/use-mongodb-net-driver-with-matlab-part-2/). However, Map-Reduce is limited with MongoDB probably because of the single-thread aspect of Javascript. Fortunetly, we are provided a very efficient aggregation framework with MongoDB and we can already do a lot of things with it. Here are several introducing simple examples.

Disclaimer
All example are for MongDB C# driver of version < 2.0 and MongoDB version < 3.0. Major improvement/modifications have been done on the core of MongoDB and its drivers so that driver version above 2.0 do not support .NET framework 3.5 or 4.0. Indeed, the new driver makes use of new functionalities such as async/await asynchron call. My examples are based on .NET 4.0 and C# driver version 1.9.2.235 as I have not migrate to the latest version yet.

The data

[{"User":"John", PageName":"Page1", "PageView":5},
{"User":"John", PageName":"Page2", "PageView":2},
{"User":"John", PageName":"Page3", "PageView":6},
{"User":"Louise", PageName":"Page1", "PageView":4},
{"User":"Louise", PageName":"Page2", "PageView":1}]

Run an aggregation

To run a pipeline, we need first a collection to work on ūüėČ So from the opened database, simply get the collection with

Note that I like to use strongly typed data and I am used to tell MongoDB driver that I work with Customer objects.

Then, we must construct the arguments of the aggregation. A pipeline is an array of BsonDocuments, each BsonDocument standing for an operation. Imagine the pipeline we work on has a matching step, then groups some documents to finally project the results, we would right

Then we feed the arguments of the aggregation

And run the aggregation

Read the result

Result of an aggregation can be exploited various ways. The simplest one is when the result are directly stored into another collection

A second option is to iterate through the result as they are of type IEnumerable<BsonDocument> . Another option is to cast them to known type the same way we would do with LINQ operators. Finally, we can access to members of a BsonDocument simply by the name of the child  document["name of property"];

So lets get in the framework now. We have several base operators and we'll cover some.

$match

Matching is very similar to a query on a collection. As the same says, the matching operator keeps only documents that match the condition in the pipeline. The syntax is quite easy

 

$group

Grouping is in my mind one of the most important operator. This operator groups similar documents under the same "_id", and can perform interesting operations on the resulting array of documents. A basic example of grouping is to count the number of documents satisfying our conditions. For example, we could count the number of page viewed by each user

An even more interesting application is to sum over a field of our document. For example, instead of counting the number of page viewed by each user, we can compute the number of visit for each user

Another interesting aspect of grouping is building arrays. For example, we could list all the pages viewed by each user

A variation of the later would not to use the $push operator but the $addToSet  operator which ensure uniqueness of each value in the array.

$project

Projection works the same way in MongoDB as in SQL databases. Its functionality is to reshape a document by renaming, add/remove fields etc. Sometimes, after lots of operations, the resulting documents have huge "_id" and few interesting fields, and the _id could be reshaped. Imagine that our data is slightly improved with a timestamp (a simple integer here for simplycity as it could be tricky to work with Dates).

[{"User":"John", PageName":"Page1", "PageView":5, "TimeStamp":1},
{"User":"John", PageName":"Page2", "PageView":2, "TimeStamp":1},
{"User":"John", PageName":"Page3", "PageView":6, "TimeStamp":1},
{"User":"Louise", PageName":"Page1", "PageView":4, "TimeStamp":1},
{"User":"Louise", PageName":"Page2", "PageView":1, "TimeStamp":1},
{"User":"John", PageName":"Page2", "PageView":2, "TimeStamp":2},
{"User":"John", PageName":"Page3", "PageView":7, "TimeStamp":2},
{"User":"Louise", PageName":"Page3", "PageView":4, "TimeStamp":2}
]

We would now want to know the number of pages each user viewed each day. We first need to group by

The resulting documents are not very sexy,

but we surely can improve lisibility with projection

 

$unwind

Last but not least, one of the hardest operator to understand or to manipultate is the $unwind operator. It could be understood as the reverse operation of grouping. Indeed, unwinding a document based on a particular array element result in several documents, one for each array member, all the other fields remaining the same for each document (event the "_id" !)

 

An now if we unwind the result

 

Leave a Reply

%d bloggers like this: