How to Use esProc as the Class Library of Structured File Computing for Java

In some cases, data must be stored in the file system, rather than in a database. That requires handling file-based data computing manually. Since Java lacks the related class library, you have to hardcode the structured file computing, which produces complicated and unreadable code.

Yet, you can make up for this deficiency with esProc (free edition is available). esProc encapsulates plenty of functions for processing structured files and provides the JDBC interface. A Java application will treat an esProc script as a database stored procedure, execute it after passing parameter and get the result set via JDBC. You can learn more details from How to Use esProc as the Class Library for Java.

Now let’s look at some familiar scenarios involving structured computing in Java and their esProc solutions.

Querying text file

sOrder.txt is a text file separated by tabs. You need to query orders in a specified time period in Java.

Below is the source data:

esProc_java_class_library_2

esProc code:

esProc_java_class_library_3

Import the file, using tab as the default separator. @t means importing the first row as column names. Then perform the conditional filtering. startDate and endDate are input parameters, like a period from 2010-01-01 to 2010-12-31. Here’s the result:

esProc_java_class_library_4

Sorting text file

Based on the above sOrder.txt, you need to sort records in Java by client numbers in descending order and by year and month in ascending order.

esProc code: =A1.sort(-Client,year(OrderDate),month(OrderDate))

Explanation: Use “-” to sort data in descending order. You need to do some calculation to get the years and the months.

Related information: To perform sort after the querying, you can use =A2.sort(…), or =A1.select(…).sort(…)

Here’s the result:

esProc_java_class_library_5

Group and aggregate operations

You need to calculate the sales amount and number of orders per seller per year.

esProc code: =A1.groups(SellerId,year(OrderDate);sum(Amount),count(~))

Explanation: group function can perform aggregate while grouping data. ~ represents each group or the current group. count(~) is equivalent to count(OrderID).

Here’s the result:

esProc_java_class_library_6

Getting distinct value

You need to make a client list according to sOrder.

esProc code: =A1.id(Client)

Here’s the result:

esProc_java_class_library_7

Removing duplicate values

You need to get the first record for each client and for each seller

esPrco code: =A1.group@1(Client,SellerId)

Explanation: group function is used to group records (without having to aggregate them). @1 means getting the first record from each group.

Here’s the result:

esProc_java_class_library_8

Finding topN

You need to get the 3 orders with the greatest sales amount for each seller.

esProc code: =A1.group(SellerId;~.top(3,-Amount):t).conj(t)

Explanation: top function filters records to get TopN. The sign “-” means a reversed order. conj function is used for concatenation.

Related information: To get the order with the greatest sales amount, use maxp function.

Here’s the result:

esProc_java_class_library_9

Related computing

emp.txt is a text file, separated by tabs. Its EId field corresponds to sOrder’s SellerId field. Now with Java, you need to align Name, Dept and Gender fields in emp.txt with sOrder.txt.

Below is the source data:

esProc_java_class_library_11

esProc code:

esProc_java_class_library_10

The join function performs a join and changes names of the two tables into s and e respectively; @1 means a left join. Then you retrieve the desired fields from the joined table to create a new structured two-dimensional table. Here’s the result:

esProc_java_class_library_12

Related information: @1 means a left join; @f means a full join. No option is needed for an inner join.

All the above examples assume that the size of the file is relatively small. If the file is too big to be entirely loaded into memory, you can use the esProc cursor to handle it. See related documents for detailed information.

Advertisements

About datathinker

a technical consultant on Database performance optimization, Database storage expansion, Off-database computation. personal blog at: datakeywrod, website: raqsoft
This entry was posted in Java Assistant and tagged , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s