Use MongoDB .NET driver with Matlab - Part.2

MongoDBI've presented the basics to connect with MongoDB database server and populate a collection with simple documents in a previous post. We're now going a step further by showing how to retrieve data and play with them. Let's say we populated a database collection with some documents that follow the schema

This collection contains the results (or score) for each discipline of the Olympic Games, for each sport and each competitor. We want to compute the average and standard deviation and other statistics of the score of each discipline by year to see the evolution in time (ok, that's really useless, but that's just for the demo). To perform such an operation, we must aggregate data into a more useful and condensated result. MongoDB provides us map-reduction operations and let us customize them. The map reduction sequence is a two (or three as we'll present here) steps

 

 

MapReduce-590x390

The first step is to map the input data (our documents) which consist to group our data into sets of key/values pairs. The mapping function take no argument and returns no data.

The reduction step will combine all elements of the processing sets from the mapping function, and generate the wanted result. The function take a key/value pair as input and return the aggregated result. In our example, the reduce function will take the score of each year/discipline couples and compute the average, standard deviation, min/max scores and the associated athlete name, the number of candidates, and the list of score to represent them into an histogram.

Finally, an optional step is to finalize the result with any manipulation of the final output result. In our example, we'll finalize the computation of the average and standard deviation. The map-reduction-finalize functions for MongoDB must be provided to the driver as a string of a Javascript function. Those functions can be complicated but are very straight-foreward in our case.

The mapping function

The mapping function generates key/value pair sets of the schema:

The reduction function

The reduction function will generate results of the same schema as the mapping function, but now with aggregated results for each key. As the result of our reduction function contains intermediate calculus, we must now finalize the results with the finalize function:

The fun part is now completed and we have to implement all this in Matlab. The whole code is given as below:

If we dive step by step this code, first we have to get the Javascript code of the map reduce and finalize functions as string. This is easily done with the fileread function of Matlab, but we also need to cast it as a BsonJavascript type as documented in the .NET MongoDB driver. Next we performed customization of the mapreduce function by providing options to the MapReduce function. The options tell that we want our results to be stored in a new collection named "results"

The content of this collection will be replaced each time the MapReduce function is called. And finally, we must tell on which collection of the database we want to perform map reduction.

And that's all ! Our database now contains a collection named "results" that aggregates the results of the map reduce function, i.e. the average and std of the score in each discipline of each olympic games, with the name of the best and worst athlete (note that if there is ex-equo results, only the last one is presented...) for each discipline.

In a next article, I'll present how to serialize or deserialize data from the database into NET classes that are much easier to work with in Matlab than BsonDocuments.

Leave a Reply

%d bloggers like this: