esProc Simplifies SQL-style Computations – Multi-layered Data Grouping with Specified Criteria

During database application development, we are often faced with complicated SQL-style computations, to which the multi-layered data grouping with specified criteria belong. In SQL, the key method for realizing the operation is to group the source data according to specified criteria using left join statement. The problem is that this method usually involves handling data grouping and summarizing, inter-row computations, completing data, and, moreover, multi-layered data. So we need to write rather complicated SQL statements to express it.

In esProc, the operation can be realized with simple and easy code. Its ability will be shown through the following example.

Here is a table – stocklog – in which all the warehouse-in and -out records of various products every day are stored. Now we are asked to produce a stock report of all the products for every day of a specified time period. Some of the records in stocklog are as follows:

esProc_sql_multilayer_group_specified_1

In the table, if the INDICATOR value of a record is null, it is a warehouse-in record; if the INDICATOR value is ISSUE, it is a warehouse-out record. Note that though some dates are missing, which means there are no corresponding records in these days, the stock report must include all the dates continuously.

The stock report includes the following categories for each product each day: the opening stock (Open), warehouse-in quantity (Enter), stock in its highest level (Total), warehouse-out quantity (Issued) and the closing stock (Close). The “Open” of the current day is the “Close” of the day before; “Enter” and “Issued” come from stocklog; “Total” is equal to “Open+Enter”; “Close” is equal to “Open+Enter-Issued” or “Total-Issued”.

esProc script is shown below:

esProc_sql_multilayer_group_specified_2

A1: Query the database and compute the total Enter and toal Issued of each product each day based on stocklog. As only data grouping and summarizing is needed in this step and the computation is simple, a SQL statement can be used to perform it. Notice that the two parameters – start and end – correspond respectively to the two quotation marks in the SQL statement and represent the time periods passed from the external, which may be a Java program or a reporting tool. Suppose values of start and end are 2014-04-01 and 2014-04-10 respectively, result of A1 will be as follows:

esProc_sql_multilayer_group_specified_3

A2=A1.group(Lname)

This line of code groups the result of A1 by Lname, with each group being all the records of the Enter and Issued of each product each day of the specified time period. Please note it is not necessary to summarize each group of data. Result of A2 is shown in the left part of the following figure and detail data of each group are listed to the right.

esProc_sql_multilayer_group_specified_4

esProc provides two functions for grouping data – groups and group. Similar to SQL’s group by statement, groups groups and summarizes data. While group only groups data without summarizing them, which is a function SQL hasn’t.

The final result should include the stock statistics of all days during the time period specified by start and end. But, in the source data, not all days have the warehouse-in and -out records, thus the result of A2 should be aligned with the continuous dates. The following code is to generate the time sequence first.

B2=periods(start,end,1)

periods function can be used to create a time sequence, which requires three parameters: start, end and interval. By default, a sequence of dates will be generated. By using other options, a time sequence of years, seasons, months and ten-day periods can also be created. Result of A3 is as follows:

esProc_sql_multilayer_group_specified_5

A3=for A2. This is a loop statement, which performs loop on the result of A2, with each loop aiming at a product.

B3-B6 is a loop body that aligns each product’s warehouse-in and -out records with the time sequence in B2 and then computes each product’s stock statistics each day and finally append the result to B6. Note that a loop body in esProc is represented visually by an indentation instead of the braces or identifiers like begin/end.

B3=A3.align(A3,Date)

This line of code aligns the current product’s warehouse-in and -out records with the time sequence in B2. Note that A3 wears two hats; it is both a loop statement and a loop variable, that is, the current product’s warehouse-in and -out records. Take item3 as an example, the left part of the following figure shows the records before alignment and the right part shows the records after it:

esProc_sql_multilayer_group_specified_6

B4>c=0

It assigns an initial value – zero – to the variable c, which represents the Open field in each record of the current product. The Open field value of the initial date is zero and will be modified continuously in B5.

B5=B3.new(A3.Lname:Lname,B2(#):LDate, c:Opening, Enter,(b=c+Enter):Total,Issue,(c=b-Issue):Close)

This line of code computes the stock statistics. B3.new(…) means creating a new table sequence, that is, the stock statistics of the current product, based on the result of B3. The new table sequence has 7 fields:

A3.Lname:Lname —- Fetch Lname field from A3 – the warehouse-in and -out records of the current product. The new field is named Lname.

B2 (#):LDate —- Insert the time sequence in B2 into the new table sequence in order and make it a new field with the name LDate. Note that # represents the record numbers in A3 and B2(N) represents the Nth record in B2. So B2(#) means inserting B2 into the new table sequence according to the record numbers in A3.

c:Open —- Make variable c the value of Open field. In the first record, c is zero.

Enter —- Take the Enter field in B3 directly as a new field. Because the new table sequence is created based on the result of B3, it is unnecessary to rename the new field as Lname field was named.

(b=c+Enter):Total —- Compute Total field according to the formula Open+Enter. The expression here is surrounded by parentheses to make it clearer.

Issue — Take Issue field in B3 directly as a new field

(c=b-Issue):Close — Compute Close field according to the formula Total-Issued. Note that variable c has been modified so that it will be qualified for computing the next record as the value of Open field, which is got according to the business rule that “Open” of the current day is equal to “Close” of the day before.

Take item 3 as example, result of B5 is as follows:

esProc_sql_multilayer_group_specified_7

B6=@|B5

Continuously, this line of code appends the result of B5 to the current cell B6, which is represented by @. The final result is as follows:

esProc_sql_multilayer_group_specified_8

B6 is the final result of this example.

In addition, the esProc script can be called by the reporting tool or a Java program in a way similar to that in which a Java program calls an ordinary database. The JDBC provided by esProc can be used to return a computed result in the form of ResultSet to the Java main program. Please refer to related documents for details.

Advertisements

About datathinker

a technical consultant on Database performance optimization, Database storage expansion, Off-database computation. personal blog at: datakeywrod, website: raqsoft
This entry was posted in SQL-related Puzzle and tagged , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s