Functions of esProc/R/Python/Perl in Structured Data Process by Comparison :Chapter 15.Join Tables

esProc

esProc provides a variety of functions to join a number of tables, by which, two table sequences can be aligned and joined with each other according to certain rules.

    join@f(tbl1:col1,exp1,exp2;tbl2:col2,exp1,exp2;…) 

A number of Table sequences tbl1,tbl2,…. will be joined with each other by the expressions exp1,exp2,…., in order to generate the tables with the columns col1,col2,…. and then an aggregation will be made for each of them.

Option Description:

    @f    for full join

    @1   for left join of A1

    @m Assume all the table sequences are sorted for associated fields, use a Merge Algorithm

Perl

In Perl, the table join operationis a very troublesome thing, which can be handled by your own. While coding, carelessness will result in a low performance, in some cases, the sample codes are given as below:

  1. One-to-one join

After two 2D arrays conduct one-to-one join by Column 1, the computation is performed.

    #!perl

   @tbl1=([173746,”bbb”,13],

                [173744,”aaa”,14],

                [173745,”ccc”,18]);

   @tbl2=([173744,”fgdf”,44],

               [173746,”jkl”,88],

              [173745,”ertr”,66]); 

    my %mapped = map { ($_->[0] => $_ ) } @tbl2;

    my @joined = map { [ @{$_}, exists $mapped{$_->[0]} ? @{$mapped{$_->[0]}} : () ] } @tbl1;

    foreach(@joined){

         print $_->[0],” “,$_->[1],” “,$_->[2],” “,$_->[3],” “,$_->[4],” “,$_->[5],” “,$_->[6],”\n”;

    }

As we can see from the above example, the join algorithm is difficult to understand and requires the programmers to encapsulate by themselves. Meanwhile, many prefixes, like %,@,$ andvarious signs of aggregation, like [], (), {} and ->appear in the code. . However, these signs are not easy for beginners to have a full understanding and easy to write them wrong. 

  1. One-to-many join

    @tbl1=([“Pass”,”?>=60″],

                 [“Fail”,”?<60″]);

    @tbl2=([173744,”Peter”,44],

                [173746,”Mike”,88],

                [173745,”Jane”,66]);

     my %mapped;

     push {$mapped{$_->[0]}}, $_ foreach @tbl2;

    subfn {

    my $entries = shift;

    my $t = 0;

    $t += $_->[2] foreach @{$entries};

    return ( scalar @{$entries}, $t, $t/@{$entries} );

    }

    my @joined = map { exists $mapped{$_->[0]} ? [ @{$_}, fn($mapped{$_->[0]}) ] : () } @tbl1;

As we can see from the above example, the code is difficult to understand, and the merging of many loop code blocks into a single line reduces the readability. Also, perl’s various prefixes areapt to confuse the beginners.

Python

  1. One-to-one join

    tbl1=[[173746,”bbb”,13],                  #First define two 2D arrays

            [173744,”aaa”,14],

            [173743,”ddd”,14],

            [173745,”ccc”,18]]

    tbl2=[[173744,”fgdf”,44],

             [173746,”jkl”,88],

             [173745,”ertr”,66]]

    tbl1.sort() #Separately sort them by Column 1

    tbl2.sort() 

    j=0

    for i in range(0,len(tbl1)):         #Two arrays are then joined by a way that the records from the second array is referenced as the elements and appended to the end of the first array.

         value1=tbl1[i][0]

         value2=0

         for j in range(j,len(tbl2)):

                   value2=tbl2[j][0]

                   if(value2==value1):

                            tbl1[i].append(tbl2[j])      

                            break

                   elif(value2>value1):

                            break

    print(tbl1)

  1. One-to-many join

    tbl1=[[“Pass”,”?>=60″],

             [“Fail”,”?<60″]]

    tbl2=[[173744,”Peter”,44],

             [173746,”Mike”,88],

             [173745,”Jane”,66]]

    tbl1.sort()

    tbl2.reverse()

    j=0

    for    exp in tbl1:

         for i in range(j,len(tbl2)):

                   exp1=exp[1]    

                   exp1=exp1.replace(“?”,str(tbl2[i][2]))

                   if(eval(exp1)):

                            if(len(exp)==2):

                                exp.append([tbl2[i]])

                            else:

                                exp[2].append(tbl2[i])

                    else:

                            j=i

                            break

    print(tbl1) 

Notes:

  1. As the concept of the pointer has been eliminated in Python, Python is very easy to program when multi-table join involves multi-level data reference.
  2. Python, however, still requires a lot of loop codes for achieving join between the tables. Unlike esProc which provides a type of join function, it shows a lot more complex process than esProc.

R

dd<- data.frame(name=c(1:3), group=rep(1, 3), score=c(69, 71, 92))

dd1 <- data.frame(name=c(1:2), group=c(2, 2), score=c(93, 99))

merge(dd,dd1,by.x=”group”,by.y=”name”,all=TRUE) 

This indicates that dd and dd1 will be joined; the join field of dd is group, dd1’s is name; all=TRUE indicates the full join, while false means the inner join.

Merge function can only be joined by the columns other than expressions, for example, I want to use the name of the first data frame to join with name+1 of the second data frame, which can be achieved after the other computed column is added. It is not an easy way.

esProc_r_perl_python_15

Advertisements

About datathinker

a technical consultant on Database performance optimization, Database storage expansion, Off-database computation. personal blog at: datakeywrod, website: raqsoft
This entry was posted in esProc/R/Python/Perl, Structured Data Process and tagged , , , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s