Functions of esProc/R/Python/Perl in Structured Data Process by Comparison :Chapter 9.Querying Records

esProc

  =tbl.select(col1>5)            //Get the set of row records where col1>5 is true, it supports the binary search for ordered table, and can return the first record or all the records

  =tbl.pselect(col1>5)         //Get the row number of record where col1>5 is true, it supports the binary search for ordered table and can returns the first record or all the records

  =tbl.find(v)                          //Return the object of row record whose primary key is v, the index is available

  =tbl.pfind(v)                        //Return the row number of the record whose primary key is v, the index is available

Perl

Perl’s records are regarded as an array on the whole, where no concept of primary key is involved, much less the concept of index. For this reason, it is impossible to locate the object of row record according to the primary key.

  @resultlist = grep{$_[2]>40} @result;      #Find the set of pointers for the record where the value of Column 3 is greater than 40 from the array @result, and store it in the array @resultlist.

  $rowcount=grep{$_[2]>40} @result;         #Count the number of records where the value of Column 3 is greater than 40 from the array @result, and store it in the variable $rowcount

The following code shows the sequence number of the record where the value of Column 3 is greater than 40 from a 2D array.

    $rownum=0;

    my @rownums;

    foreach $row(@result){

         if($row[2]>40){

                   push @rownums,$rownum;             #Append the member to the array

    }

    $rownum++;

    }

    return @rownums;

  1. Whether the grep function returns searched set of the elements or the number of the elements, this depends on whether the variable on the left side of Equal sign (=) is an array or individual value. It is possible for the beginner to get confused over this.
  2. The 2D table is just a 2D array, where a record just is an 1D array, there is no primary key and index , so that the record cannot be searched by primary key.
  3. You need to write a number of codes if you want the sequence number of the searched records.

Python

Python’s array does not directly provide the filter function, and you have to write your own program to implement it. The code is shown as below:

    # coding=gbk

    data = [                               # A 2D array

    [121.1, 0.02, 0.02],

    [121.1, 0.05, 0.05],

    [122.1, 0.56, 0.6],

    [122.1, 3.79, 4.04],

    [123.1, 93.75, 100.0],

    [123.1, 0.01, 0.01],

    [124.1, 0.01, 0.01],

    [124.1, 1.01, 1.08],

    [124.1, 0.11, 0.11],

    [124.1, 0.05, 0.06],

    [125.1, 0.39, 0.41],

    ]

    result=[]            # Used to store selected records

    num=[]               # Used to store the sequence number of selected record

    i=0

    for row in data:                  #Select the record where the value of Column 2 is greater than 1 and its sequence number

         if(row[1]>1):

                   result.append(row)

                   num.append(i)

         i++

  1. The filter, select out and other functions are not provided, and locate function is available only for 1D array.
  2. The 2D table is just a 2D array, where a record is just an array; there is no index and primary key, so that the record cannot be searched by primary key.

R

tbl[2,]                 # Return Row 2

tbl[c(1,2,4),]     # Return Rows 1, 2 and 4 

tbl<-read.table(“D:/T4.txt”,col.names=c(“c1″,”c2″,”c3”))

newtbl<-tbl [tbl $c1>50,]                   #Filter out the record where the value of Column 1 is greater than 50 according to specific criterion to generate new data frame

subset(tbl,c1>90,select=c(c2,c3))   #Filter out the record where c1 is greater than 90, and Columns c2,c3 are selected to generate a new data frame

This syntax is so simple that you get a bit confused, while for such syntax as tbl[n,], n is an integer only, and means to select the specific rows from tbl. What the expression tbl$c1>50 returns is Boolean value True/False, but the tbl[tbl$c1>50,] can be used to get the record which matches with the specific criterion, that’s so wired!

2014-06-26_113353

Advertisements

About datathinker

a technical consultant on Database performance optimization, Database storage expansion, Off-database computation. personal blog at: datakeywrod, website: raqsoft
This entry was posted in Structured Data Process and tagged , , , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s