esProc Exports Unstructured MongoDB Data as CSV Files

Problem source:https://plus.google.com/+VicNgrail/posts/ebS9JUtFopw .

MongoDB allows storing unstructured data in it. But it is somewhat difficult to export the data as standard structured data. esProc, however, makes it an easy job, with MongoDB’s cooperation. Let’s look at the steps for doing this.

Below is some data from Collection test:

/* 0 */

{

“_id” : ObjectId(“5518f6f8a82a704fe4216a43”),

“id” : “No1”,

“cars” : {

“name” : “Putin”,

“car” : [“porche”, “bmw”]

}

}

/* 1 */

{

“_id” : ObjectId(“5518f745a82a704fe4216a44”),

“id” : “No2”,

“cars” : {

“name” : “jack”,

“car” : [“Toyota”, “Jetta”, “Audi”]

}

}

You need to export it as a CSV file with the following layout:

esProc_NoSQL_mongodb_csv_1

esProc code

esProc_NoSQL_mongodb_csv_2

A1: Connect to MongoDB. Connection string format is mongo://ip:port/db?arg=value&…

A2: Retrieve data from MongoDB using find function and generate a cursor with the retrieved data. The collection name istest. There are no filtering criteria and all fields except _id are desired. find functions in esProc and MongoDB are alike. The esProc version follows MongoDB for syntax of filtering criteria.

A3: Retrieve desired fields to create a structured two-dimensional table, which is in the form of cursor. In the code, ~ represents every document in A2; conj function concatenates data together.

A4: Export data from A3 as a comma separated text file. @t means exporting with column names. esProc engine manages buffers automatically, fetching a batch of data each time from the cursor into the memory for computation.

A4: Close MongoDB connection.

For users who want independent management of each batch of data, esProc provides the following approach:

esProc_NoSQL_mongodb_csv_3

A3: Run a loop to fetch data from the cursor into memory, 1,000 rows each time. A3’s working range is the indented B3 and B4, in which A3 is used to reference the loop variable. A3’s data is as follows:

esProc_NoSQL_mongodb_csv_4

B3Convert the current batch of data to structured two-dimensional table, as shown below:

esProc_NoSQL_mongodb_csv_5

 

B4Append the result of processing the current batch to the file. @a means data appending.

Advertisements

About datathinker

a technical consultant on Database performance optimization, Database storage expansion, Off-database computation. personal blog at: datakeywrod, website: raqsoft
This entry was posted in MongoDB, SQL-related Puzzle and tagged , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s