Functions of esProc/ R/Python/Perl in Structured Data Process by Comparison:Chapter 4. 2D Table Structure

esProc

esProc has existing table sequence object, innate column name and number as well as row number, and also provides a wealth of access methods, aggregate functions and loop functions.

The syntaxes are all identical, regardless of table sequence, record object itself or pointer reference. The system will perform auto-process function in the case of a fact that the user does not understand the concept of the pointer completely.

Perl

Only an array can be used to store 2D table. In array member, the memory pointer will point to another array, thus to form a 2D array.

Access method is not simple with an unintuitive syntax, which is not easy for the beginners to understand.

The array has different syntax from that of array pointer. Users need to know when it is an array, and when it is an array pointer.

Example 1:

  @AoA=(

  [“fred”,”barney”], 

  [“george”,”jane”,”elroy”], 

  [“homer”,”marge”,”bart”], 

  ); 

  print$AoA[2][2];  #The symbol -> can be omitted here

The brackets at the outside level are parentheses, it is because we will assign a value to that array. This is true if you do not want it to be @AoA here, but a reference pointing to it, see below:

Example 2:

#A reference which points to the “Array with array reference”

  $ref_to_AoA=[

        [“fred”,”barney”,”pebbles”,”bambam”,”dino”,], 

        [“homer”,”bart”,”marge”,”maggie”,], 

        [“george”,”jane”,”elroy”,”judy”,], 

  ]; 

  print$ref_to_AoA->[2][2];  #The symbol -> cannot be omitted here

Note that the brackets at the outside level change into square brackets here, and the syntax for access also varies a little. Unlike C/C++, the array and reference cannot be freely exchanged in Perl (but in C/C++, the array and pointer can be substituted each other in many places). $ref_to_AoA is an array reference, while @AoA is an array. Likewise, $AoA[2] is also not an array but an array reference, therefore, the two rows as shown below:

$AoA[2][2]
  $ref_to_AoA->[2][2]

Which can also be replaced with the two rows as below:

$AoA[2]->[2]
  $ref_to_AoA->[2]->[2]

The syntaxes of Perl’s pointer and array are different, where the symbol -> can be or cannot be ignored in some cases; when assigning a value for them, the different brackets will be used, i.e. the array’s value is in parentheses, while the pointer’s in square brackets; the prefixes of variable names are also not identical, i.e., @is used for the array, while $is used for the pointer. Therefore, the programmers are required to know well its concept during operation. Otherwise it will be easy for the beginners to write the syntax in a wrong way. For a beginner, this is indeed a challenge. 

Memory Expands due to Associative array

With associative array or structure, you can access to the element values by their names. However, for the associative array in Perl, it is achieved through the array, for example:

 %fruit = (“apples”,17,”bananas”,9,”oranges”,”none”);

Odd member is key, whereas even memory is value. So you can access to value with a key, for example:

  print $fruit{“apples”};      # Printout 17

Originally, an array [17,9,none] was 3 in length, but, in order to achieve access to value by key, its length doubled. If an associative array is used to store the records, for the fact that a 2D array composed by a number of associative arrays is used to store 2D table, this will cause a sharp memory  increasing. It doesn’t matter if there are not too many records in 2D array, after all, which makes it easy to use in the process. However, in the case of enormous records, this will consume an unwieldy amount of memory with poor performance.

Python

Like Perl, the defect that 2D array is used to store 2D table lies in that there is no concept of the column, neither does the column name. You need to build your own classes in person since there is no column-based operation directly by Column number and Column name.

What is deemed much easier than Perl is that there is no concept of the pointer, and it features the simple syntax for access. However, some new issues remain here: the reference still exists with a poor syntax, so that the programmer tends to make the following errors:

Create an array, whose width is 3, the height 4

    #[[0,0,0],

    # [0,0,0],

    # [0,0,0],

    # [0,0,0]]

    myList = [[0] * 3] * 4

When operating myList[0][1] = 1, the whole Column 2 is found assigned with the values, which changes to:

    [[0,1,0],

    [0,1,0],

    [0,1,0],

    [0,1,0]] 

The right syntax should be:

myList = [[0 for col in range(3)] for row in range(4)]

It seems that Python eliminates the concept of the pointer, but it handles the pointer in an unnatural way, even leading you to get stuck in trouble. For a programmer who poorly understands the relevant concepts tends to make some errors.

R

R language uses the data frame to store 2D table, which, as an object, is very much like what the table sequence of esProc looks. It provides not only column name but also row name, you can perform some operations on the data by Row Name and Row Number, Column Name and Column Number.

R language provides read.table() function, which can be used to directly load the data from a DB or a file into a data frame.This is also like what the table sequence of esProc does.

As the data frame in R language is stored by columns, R is quite fit for the case where the number of columns is greater than that of rows. But for esProc, Perl and Python, the data frame is stored by rows, so that the case against R is more applicable here.

From the perspective of actual business, the latter case, i.e. the number of rows is greater than that of columns, exists far more universally. Therefore, the operation efficiency in R language will be lower if there are too many records.

Since R language provides more plentiful mathematical and statistical functions, it is more often used for matrix operation where disparity between rows and columns is lesser. For its application scenarios, maybe it is nothing wrong for R to store the data frame by columns. However, for the big data scenarios, R language is not as useful as it does in the above case.

 Conlusions:

Image,

Advertisements

About datathinker

a technical consultant on Database performance optimization, Database storage expansion, Off-database computation. personal blog at: datakeywrod, website: raqsoft
This entry was posted in Structured Data Process and tagged , , , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s