Merge MongoDB Documents in esProc

Problem source:https://groups.google.com/forum/#!topic/mongodb-user/BpgEaRqrKsA .

Below is a selection of Collection C1:

{

“_id” : ObjectId(“55014006e4b0333c9531043e”),

“acls” : {

“append” : {

“users” : [ObjectId(“54f5bfb0336a15084785c393”) ],

“groups” : [ ]

},

“edit” : {

“groups” : [ ],

“users” : [

ObjectId(“54f5bfb0336a15084785c392”)

]

},

“fullControl” : {

“users” : [ ],

“groups” : [ ]

},

“read” : {

“users” : [ ObjectId(“54f5bfb0336a15084785c392”), ObjectId(“54f5bfb0336a15084785c398”)],

“groups” : [ ]

}

},

name: “ABC”

}

{

“_id” : ObjectId(“55014006e4b0333c9531043f”),

“acls” : {

“append” : {

“users” : [ObjectId(“54f5bfb0336a15084785c365”) ],

“groups” : [ ]

},

“edit” : {

“groups” : [ ],

“users” : [

ObjectId(“54f5bfb0336a15084785c392”)

]

},

“fullControl” : {

“users” : [ ],

“groups” : [ ]

},

“read” : {

“users” : [ ObjectId(“54f5bfb0336a15084785c392”), ObjectId(“54f5bfb0336a15084785c370”)],

“groups” : [ ]

}

},

name: “ABC”

}

You need to group the collection by name. Each group contains the users field of the document corresponding to a same name and does not allow duplicate members. The expected result may like this:

{

result : [

{

_id: “ABC”,

readUsers : [

ObjectId(“54f5bfb0336a15084785c393”),

ObjectId(“54f5bfb0336a15084785c392”),

ObjectId(“54f5bfb0336a15084785c398”),

ObjectId(“54f5bfb0336a15084785c365”),

ObjectId(“54f5bfb0336a15084785c370”)

]

}

]

}

esProc code:

esProc_NoSQL_merge_mongodb_3

A1: Connect to MongoDB. The connection string format is mongo://ip:port/db?arg=value&…

A2: Use find function to retrieve data from MongoDB, sort it and create a cursor. c1 is the collection name; no filtering criterion is specified; and all fields except _id will be retrieved and sorted by name. In esProc find function, which is analogous to the combination of MongoDB findsort and limit function, the filtering criterion syntax follows the MongoDB rules.

A3: Fetch data from the cursor by loop, getting a group of documents with the same name field each time. A3’s working range is the indented B3 to B5, where A3 can be used to reference the loop variable.
B3: Retrieve all users fields from the current group of documents, as shown below:

esProc_NoSQL_merge_mongodb_4

B4: Merge users fields from all documents of the current group and remove duplicate members.

B5: Append each result of B4’s loop to B2. Finally B2 becomes this:

esProc_NoSQL_merge_mongodb_5

 

B2 is the final result we want. If the result is too big to be loaded into the memory, you can use export@j function in B5 to convert each of B4’s results to a JSON string and then append them to the text file one by one.

A6: Disconnect from MongoDB.

Advertisements

About datathinker

a technical consultant on Database performance optimization, Database storage expansion, Off-database computation. personal blog at: datakeywrod, website: raqsoft
This entry was posted in MongoDB, SQL-related Puzzle and tagged , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s