Functions of esProc/R/Python/Perl in Structured Data Process by Comparison: Chapter 1.Retrieving Table from Database

Structured data processing is the most common computation mode in applications. Such computations are generally performed in databases with SQL, but we often get into such situations where SQL fails to work or is unworkable, for example, it is not suitable to use SQL in the case of complex computation logic and enormous process steps.  And SQL cannot be used directly where the original data are stored in the files other than in the database, or multiple heterogeneous databases are involved, it is high time that we adopt script languages to achieve these computations.

There are four common script languages: esProc,Perl,Python,R, with their own characteristics, each of which can be used to complete such operations. In this paper, the writer, as a beginner, will make a test on the 4 script languages for 20 common structured data operations. In the end, their performances will be presented hereby comparison in a simple way.

What is described in this paper covers only the structured data computation capability, the full functions of these 4 script languages are not mentioned here. Most of parts only involve some information about 2D structured data table. As I got to study some products only for a short time, the related information can be available only from the official website or documents. It would be possible that some results could be flawed.

 Part (I):Retrieving Table from Database

Connect the database, load 2D table from the database into memory, and store the datain the objects provided by the product.


>db=connect(“oracle”);                      //esProc configures detail definition for data connection in the environment information; only the name is written here

=db.query(“select * from tbl”);        //A sql statement is executed to return the table sequence (object of 2D structured data table in esProc) automatically, and then generate column name automatically



$dbh = DBI->connect(“dbi:Oracle:host=;sid=ORCL”,$oracleid,”,{AutoCommit=>0})||

die “Don’t connect database ! “;     #Create a database connection

$sql=”select * from tbl”;


$sth->execute();                         # Execute sql

while(@result=$sth->fetchrow){     # Fetch data row-by-row

       push @e,[@result];                   # The row array pointer is stored in another array, thus to form a 2D table; [] represents array reference (pointer)



$dbh->disconnect;            #Close the connection



1. The syntax of [@result]makes the beginner confused.

2. As there is no column name, the column number needs to be noted manually.

3. The array itself is @array, while the array element value uses $array[] with different prefix. Beginners tend to make mistakes.

4. If an associative array or structure is used to store the column name, this array will be twice as long as it is used before and take up too much memory.


It is a little more advanced than Perl in terms of the interface. When executing sql, it returns a 2D array directly.


import cx_Oracle

import os


def connectDB(dbname=’ORCL’):                       #Define a function to connect the database

        if dbname==’ORCL’:





defsqlSelect(sql,db):                  # Define a function to execute select statement



        rs=cr.fetchall()          #Return a 2D array directly, which does not include metadata



if __name__==’__main__’:               #Main program, to connect the database, and execute sql


        sql=’select * from dave’


        for x in rs:

                print (x)


The object cursor of Python also provides the property of column name, which can be obtained bycursor.description, for example:

def sqlSelect(sql,db):




        names=[f[0] for f in cr.description]

        for row in cr.fetchall():

                 for pair in zip(names,row):

                        print (‘%s:%s’ %pair)

Output by calling this function is as follows:









Some beginners will be also puzzled over program blocks of Python, as it does not use {} to define the blocks, no semicolon at the end of each statement. It relies on indentation to define the blocks, so that indentation becomes more significant. When I first wrote the Python program, the Tab  is used to indent the same spaces as the previous line(whose code is copied).As a result, an error occurred. Then I changed to use the space for indentation, but the error still repeated. In the end, this issue was removed by duplicating indentation from previous line. Later I found out that all the indentations in a program block must be kept consistent, whether you use Tab, or space.

Although esProc also defines the code blocks by using natural indentation instead of symbols such as {}, etc., the code has been written into the cell to make sure the scope of the code block will be clear at a glance, so that it is much easier for us to avoid malposition due to misused characters.


As convenient as esProc, R can load data into the object of data frame directly, while data frame is similar to the table sequence in esProc.





originalData<-sqlQuery(conn,”select * from Customers”)  #Directly generate a data frame






About datathinker

a technical consultant on Database performance optimization, Database storage expansion, Off-database computation. personal blog at: datakeywrod, website: raqsoft
This entry was posted in Program Language, Structured Data Process and tagged , , , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s