esPro Helps Process Structured Texts in Java – Data Import

Java provides only the most basic data import functions like specifying separators. But other common functions, such as importing specified columns by column names, specifying the order of columns and data types and that no separatorshould be used, need to be realized manually. Though they are not difficult to realize, the code is complicated and easy to get wrong.

But if esProc is used to help Java in developing the program, the problems will be dealt with without writing code manually. The following example will teach you how esProc works.

Here is a text file data.txt with tab being the separator and 30 columns. The first row of the file is the column names of business significance. It is required to import the following columns by column names: ID, x1Shift, x2Shift and ratio, and compute a new column – value- using the business formula “((x1Shift+x2Shift)/2)*ratio”. The first rows and columns of the file are as follows:

esProc_java_structured_dataimport_1

To solve the problem in Java, we must sort out all the 30 columns, quote specified columns with subscripts and then perform the computation. If too many formulas are involved, the computation will become rather complicated and easy to get wrong. In order to reduce the error probability, we have to store each piece of data inan object and give each field a business name according to which the formula is computed.

But with the help of esProc, Java can avoid all these troubles. The code is written as follows:

esProc_java_structured_dataimport_2

A1:import function is used to import the file, not all the 30 columns though. It will import the specified columns by column names instead. Parameter option @t means importing the first row of the file as the column names. The computed result of this step is as follows:

esProc_java_structured_dataimport_3

A2:Perform computation directly according to the business names. The result is as follows:

esProc_java_structured_dataimport_4

In practice, sometimes the above result needs to be exported to a file. The code for this is=file(“E:\\result.txt”).export@t(A2.new(ID,value)). It means writing the two columns: ID and value, to the file result.txt. The content of the file is as follows:

esProc_java_structured_dataimport_5

If the computed result needs to be returned to Java for further use, we only need to write code in esProc like this, result A2.new(ID,value)), which means returning the columns ID and result to Java through JDBC interface, with a data type of resultSet. Then the result can be obtained by calling esProc script in Java through JDBC. The code is as follows:

//create a connection using esProcjdbc

Class.forName(“com.esproc.jdbc.InternalDriver”);

con= DriverManager.getConnection(“jdbc:esproc:local://”);

//call the esProc script file whose name is test

st =(com.esproc.jdbc.InternalCStatement)con.prepareCall(“call test()”);

st.execute();//execute esProc stored procedure

ResultSet set = st.getResultSet();   // get the result set

Sometimes the order of columns needs to be specified in order to manipulate data visually while data is imported. For example, import data from the same file data.txt according to a new order of the specified fields: x1Shift, x2Shift, radio and ID. esProc can specify the order automatically, and the code can be written as this:

=file(“E:\\data.txt”).import@t(xShift,yShift,ratio,ID)

The computed result is as follows:

esProc_java_structured_dataimport_6

In the above code, esProc can set suitable data type automatically. For example, xShift and yShift will be set as float. But sometimes we need to specify the data types. For example, the data in ID is similar to integer but actually it is the string. If the first four characters of ID are to be imported separately, esProc will execute the following code:

esProc_java_structured_dataimport_7

A1:Convert the data type of column ID by specially importing it as strings. The result is as follows:

esProc_java_structured_dataimport_8

Note: It is a rule that strings are left justified and numbers are right justified in esProc’s IDE, as shown in the above figure.

A2:Select the first four characters and the result is as follows:

esProc_java_structured_dataimport_9

During data import, sometimes there is not a separator. For example, data2.txt has 20 columns and part of the data is as follows:

esProc_java_structured_dataimport_10

It can be seen that there is no column separator in data2.txt and some data is just useless blank row. But esProc can import data correctly through executing the following code:

esProc_java_structured_dataimport_11

A1:Import the data as a single-column table sequence with the column name being “_1” by default. The function option @s means the field will be imported directly without being split. The result is as follows:

esProc_java_structured_dataimport_12

A2:A1.select(trim(_1)!=””)is to select the non-blank rows. select function can make query by field name or row numbers. The result is as follows:

esProc_java_structured_dataimport_13

A3:=A2.new(mid(_1,1,1),mid(_1,2,1),mid(_1,3,1),mid(_1,4,1),mid(_1,5,1),mid(_1,6,1),mid(_1,7,1),mid(_1,8,1),mid(_1,9,1),mid(_1,10,1),mid(_1,11,1),mid(_1,12,1),mid(_1,13,1),mid(_1,14,1),mid(_1,15,1),mid(_1,16,1),mid(_1,17,1),mid(_1,18,1),mid(_1,19,1),mid(_1,20,1))

This long line of code is to split each row of data into 20 fields. mid function has three parameters: the name of field to be split, starting and ending positions and the length of each split field. The result after splitting is as follows:

esProc_java_structured_dataimport_14

A3 shows the needed result.

But the long code in A3 is inconvenient for checking and maintenance. We can simplify it using esProc’s dynamic code as follows:

A4:=20.loops(~~+”mid(_1,” + string(~) + “,1),”)

A5:=exp=left(A4,len(A4)-1)

A6:=eval(“A2.new(“+ A5+”)”)

loops function in A4 is used to perform loop computation and generate a regular string, that is, “mid(_1,1,1),mid(_1,2,1),mid(_1,3,1),mid(_1,4,1),mid(_1,5,1),mid(_1,6,1),mid(_1,7,1),mid(_1,8,1),mid(_1,9,1),mid(_1,10,1),mid(_1,11,1),mid(_1,12,1),mid(_1,13,1),mid(_1,14,1),mid(_1,15,1),mid(_1,16,1),mid(_1,17,1),mid(_1,18,1),mid(_1,19,1),mid(_1,20,1),”

Notice that there is a redundant comma at the end of the string, which can be removed using the code in A5.

A6:Execute the dynamic script. eval function can parse the string dynamically to an expression. For example, eval(“2+3”) equals expression 2+3, whose value is 5. So actually the role of expression in A5 equals that of the code in A3 and their computed results are identical:

esProc_java_structured_dataimport_16

Advertisements

About datathinker

a technical consultant on Database performance optimization, Database storage expansion, Off-database computation. personal blog at: datakeywrod, website: raqsoft
This entry was posted in Data Analytics, esProc/R/Python/Perl, Program Language, Structured Data Process. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s