esProc Helps Process Structured Texts in Java – Alignment Join

The join statements of the database can be used conveniently to perform the operation of alignment join. But sometimes the data is stored in the text files, and to compute it in Java alone we need to write a large number of loop statements. This makes the code cumbersome. Using esProc to help with programming in Java can solve the problem easily and quickly. Let’s look at how this works through an example.

The text file emp.txt contains employee information, except that in which EId is 1. Another text file sOrder.txt contains information of sales orders in which field SellerId corresponds to field EId inemp and from which the information whose SellerId is 2 is excluded. Part of the original data is listed below:

emp.txt

esProc_java_structured_alignjoin_1

sOrder.txt

esProc_java_structured_alignjoin_2

It is required to join the three fields: Name, Dept and Gender, inemp to sOrderin alignmentand output the computed result to a new file. The expected result is as follows:

esProc_java_structured_alignjoin_3

Code written in esProc:

esProc_java_structured_alignjoin_4

In cells A1 and A2 respectively, data is imported from the two text files and stored in two variables: emp and sOrder. Here import function uses tab as the column separator by default. Option @t represents the first row will be imported as the field names. Because only some of the fields in emp.txt are needed, the code in A1 uses the names of these desired fields as parameters. After execution, values of emp and sOrder are as follows:

esProc_java_structured_alignjoin_5

In the code in A3: =join@1(sOrder:s,SellerId;emp:e,EId), join function performs the operation of alignment join and changes the names of the two tables to s and e respectively. Option @1 represents the left join which is in line with the requirement of the example: joinempto sOrderin alignment. The computed result is as follows:

esProc_java_structured_alignjoin_6

Click the numbers in blue and we can see the detailed information, as shown below:

esProc_java_structured_alignjoin_7

esProc can also be used to realize the right join which only requires exchanging positions of data in alignment. For example, to align sOrder according to emp, we just need to exchange their positions in the code, that is, =join@1(emp:e,EId;sOrder:s,SellerId). The computed result is as follows:

esProc_java_structured_alignjoin_8

It is also easy to realize the full join using option @f. The code is join@f(sOrder:s,SellerId;emp:e,EId). The computed result is as follows:

esProc_java_structured_alignjoin_9

There are altogether four operations of alignment join: left join, right join, full join and inner join. By default, join function is used to execute the inner join, the code is =join(sOrder:s,SellerId;emp:e,EId). The computed result is as follows:

esProc_java_structured_alignjoin_10

Let’s get back to the example. The code in A4: =A3.new(s.OrderID, s.Client, s.SellerId, s.Amount, s.OrderDate,e.Name, e.Dept, e.Gender), is for getting the desired fields from table eand creating a new structured two-dimensional table. The computed result is as follows:

esProc_java_structured_alignjoin_11

Now the alignment is done and data needs to be exported to a new file. The code for this is =file(“E: \\result.txt”).export@t(A4). In export function, tab is by default the column separator and option @t represents the field names are exported to the first row. Open result.txt and we can see information as follows:

esProc_java_structured_alignjoin_12

The script in the above has finished exporting all aligned data to the new file, what we will do next is to call the script in Java.

//create a connection using esProcjdbc

Class.forName(“com.esproc.jdbc.InternalDriver”);

con= DriverManager.getConnection(“jdbc:esproc:local://”);

//call esProc script; the name of the script file is test

st =(com.esproc.jdbc.InternalCStatement)con.prepareCall(“call test()”);

// execute esProc stored procedure

st.execute();

By executing the above Java code, emp will be joined to sOrder in alignment and the result will be output to file result.txt.

If the requirement is changed to this: query data in sOrder according to dynamic periods of time, execute the same operation of alignment join and return the result directly to Java. To complete the task esProc needs to define two parameters: begin and end, to represent starting time and ending time respectively. The esProc code is as follows:

esProc_java_structured_alignjoin_13

The code in red has been modified.

A2:Filter sOrder again using select function according to the starting and ending time passed from Java, that is, @begin and @end.
A5:Output the computed result in A4 to JDBC interface.
And Java code should be modified too to pass parameters to esProc code and get thefinal result. The modified code is as follows:
Class.forName(“com.esproc.jdbc.InternalDriver”);
con= DriverManager.getConnection(“jdbc:esproc:local://”);
st =(com.esproc.jdbc.InternalCStatement)con.prepareCall(“call test(?,?)”);
st.setObject(1,startTime);
st.setObject(2,endTime);
st.execute();
ResultSet set = st.getResultSet();

Advertisements

About datathinker

a technical consultant on Database performance optimization, Database storage expansion, Off-database computation. personal blog at: datakeywrod, website: raqsoft
This entry was posted in esProc/R/Python/Perl, Structured Data Process and tagged , , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s