esProc Simplifies SQL-style Computations – Relational Computation on Groups

During developing database applications, we are often faced with complicated SQL-style computations like relational computation on multilayered groups. As SQL requires summarizing during data grouping and it doesn’t support object-style relational access, it is difficult to deal with these computations. We have to resort to an advanced technique of window functions nested with multilayered subqueries. esProc, however, can handle them more easily by supporting real grouping and visual object-style relational access.

In practice, there are a lot of cases in which relational computation on groups is needed, such as the one presented on the website http://forums.bit-tech.net/showthread.php?t=207052. Based on this practical example, we have designed a more common one for illustrating in detail how esProc works to realize the relational computation on groups. 

Computing target: Query branches of a DVD store where there are less than four categories of DVD copies. 

Data Structure: Branch Table stores information of the branch stores. DVD Table stores titles and categories of DVDs, in which DVDs, like Transformers IV, are virtual data items instead of physical disks. DVDCopy Table stores physical DVD copies branch stores have. Note: DVDCopy Table is related to Branch Table through BranchID field, and to DVD Table through DVDID field. The following is part of the data:

Branch Table:

esProc_sql_relationalgroup_1

DVD Table:

esProc_sql_relationalgroup_2

DVDCopy Table:

esProc_sql_relationalgroup_3

 

Description:

  1. Computed results should be certain records in Branch Table.
  2. That the Status field of a record in DVDCopy shows Miss means the DVD is missing, and that a record’s LastDateReturned field is empty means the DVD has been rented out but not returned. Obviously the DVDs that are missing and haven’t been returned are outside of our computing objects and should be filtered away.
  3. We should consider the situation that there may be certain branch stores whose information isn’t included in DVDCopy Table, though it is rarely seen.

Solution:

  1. Select valid, existing DVD copies the branch stores have from DVDCopy Table.
  2. Group DVDCopy Table by BID. Each group will contain all DVD copies a branch store has.
  3. Select the DVDs corresponding to the DVD copies each branch store has, and compute the number of categories to which these DVDs belong.
  4. Select branch stores where the number of categories of existing DVDs is less than four. These branch stores are eligible.
  5. Select branch stores which DVDCopy Table hasn’t. They are also eligible.
  6. Combine the two kinds of eligible branch stores.

Code written in esProc:

 

 

esProc_sql_relationalgroup_4

A1-A3: Query data from three tables in the database. The three tables are made variables which are named respectively as Branch, DVD and DVDCopy. Computed results are as follows:

esProc_sql_relationalgroup_5

A4: Switch the DVDID field and BID field in DVDCopy Table to corresponding records in DVD Table and Branch Table respectively. Note: This step is the basis of object-style relational access, which requires the use of switch function. After computing, DVDCopy becomes as follows:

esProc_sql_relationalgroup_6

Fields in blue have corresponding records. Click one and you can see the details, as shown in the following figure:

esProc_sql_relationalgroup_7

Now we can perform object-style relational access only with the operator “.”. For instance, DVDCopy.(DVDID). (Category) represents the category of each DVD copy, and DVDCopy.(BID) gets the detailed information (complete record) about the branch store corresponding to each DVD copy.

A5: =DVDCopy.select(Statues!=”Miss” && LastDateReturend!=null). This line of code aims to filter away the DVD copies that are missing and that haven’t been returned. After the code is executed, values of A5 are shown as follows:

esProc_sql_relationalgroup_8

A6: =A5.group(BID) is to group data in A5 by BID, with each row representing all DVD copies a branch store has. The result is as follows:

esProc_sql_relationalgroup_9

Click the data in blue and you’ll see members of each group:

esProc_sql_relationalgroup_10

It can be seen that group function only groups data, but doesn’t summarize the data at the same time. In this point, it is different from the function for grouping in SQL. Sometimes, we need to further process the grouped data, rather than simply summarizing them. To do this, esProc’s group function is more convenient to use, as shown in the code in A7:

A7: =A6.new( ~.BID:BonList, ~.(DVDID).id(Category).count():CatCount )

The above line of code computes the number of categories of DVD copies to which each branch store corresponds. new function can generate a new object A7 based on the data in A6. A7 has two columns: BonList and CatCount. BonList originates directly from column BID of the grouped data in A6, and CatCount originates from column DVDID of the grouped data. There are three steps to compute CatCount: ~.(DVDID) finds the DVD records corresponding to all DVD copies each branch store has; id(Category) removes repeated records of Category from these DVD records; and count() computes the number of categories. The result is as follows:

esProc_sql_relationalgroup_11

That is, branch store B002 has three categories of DVD copies, B003 also has three categories and B001 has four categories.

A8: A7.select(CatCount<4).This line of code makes query to select branch stores whose CatCount is less than 4. The result is as follows:

esProc_sql_relationalgroup_12

The above branch stores that are in short supply are computed according to DVDCopy Table. But maybe some branch stores with serious supply shortage are not in the DVDCopy, such as the cases that all the DVD copies in the branch store has been rented out, or that the branch store hasn’t any DVD copies. So these branch stores should also be counted. The code for this step is as follows:

A9: =A8.(BonList) | (Branch \ A7.(BonList)) 

In the above code, operator “|” represents the union operation of two data sets (which can be replaced by union function); operator “\” represents the complement operation (which can be replaced by diff function). A8.(BonList), Branch and A7.(BonList) represent respectively branch stores with supply shortage in DVDCopy Table, all branch stores and branch stores which are included in DVDCopy Table. Their respective values are:

esProc_sql_relationalgroup_13

A9 computes the final result of this example. Its values are:

esProc_sql_relationalgroup_14

 

Or we can realize the computation indirectly. For instance, first compute “branch stores that are not in short supply”, and then compute the complement of the result and Branch Table. The final result should be the same as that of A9.

Please note variables like A8 or Branch cannot be used to represent a data set in SQL because it doesn’t support explicit set. Thus the simple code in the above have to be replaced by lines of complicated SQL statements.

Besides, esProc can be called by a Java program. The method is similar to that with which a Java program calls an ordinary database. The JDBC provided by esProc can be used to return a computed result of the form of ResultSet to Java main program. For more details, please refer to the related documents.

Advertisements

About datathinker

a technical consultant on Database performance optimization, Database storage expansion, Off-database computation. personal blog at: datakeywrod, website: raqsoft
This entry was posted in SQL-related Puzzle and tagged , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s