Parallelism of esProc enhances Oracle Data Import Speed

Recently, we skillfully handled an industry project to import a great amount of data from file to Oracle in comparatively short time.
At the beginning, we tried to import the data with the sqlldr of Oracle, only to find it is surprisingly time-consuming to load a great amount of data: Too slow if 2.5 hours have to be spent to import a table containing 80 million records.
Later, by adopting the parallel import of sqlldr, the time is shortened to 0.8 hour. Here are the full details:

Train of thoughts

Split the data file to import into 10 shares. Then, with the multi-task parallelism, execute the sqlldr command for the corresponding shares. Needless to say, prepare the control files of the same amount. Then, multiple clients will start to import data to database all at the same time.

Please note these two things: 1. The way to generate multiple sqlldr commands and corresponding number of control files – a bit tried if writing them one by one; 2, The way to perform parallelly – ever more tired if performing one by one.

In this case, we use the tool, esProc by name, to generate the commands and control files automatically and then run parallelly.

Implementation steps

Main program

esProc_oracel_parallel_1

Responsible for task control, task distribution, and calling sub-program.

Subprogram

Generate the specific control file and sqlldr command, and execute the import command to complete the data loading.

esProc_oracel_parallel_2

Note: In this case, the parallelism feature of esProc is used to execute multiple sqlldr commands; The function system is used to call the system commands.

Actual result

Because of the programmable-controlled parallel tasks, the number of parallel tasks can be set as necessary to tap into the machine’s full performance potential.

The below figure illustrates the sqlldr import speed for different degree of parallelism – linear increasing on the whole – the more parallel tasks, the faster the import would be.

esProc_oracel_parallel_3

 

Advertisements

About datathinker

a technical consultant on Database performance optimization, Database storage expansion, Off-database computation. personal blog at: datakeywrod, website: raqsoft
This entry was posted in Data Analytics and tagged , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s