Implement Basic Functions of SQL through R Language & esProc

The example data is from 2 tables of the classical Northwind database:

Orders table with the main fields: OrderID, EmployeeID, OrderDate, Freight, CustomerID

Customer table with the main fields: CustomerID, CompanyName

Retrieve data of the entire table

SQL solution: select * from Orders

R solution: A1<-sqlQuery(conn,’select * from Orders’)

esProc solution: $select * from Orders

Comments: The word count of esProc and SQL codes only differs by one word. The R language has a strong flavor of programming.

Where: Search for the order which has a freightage higher than 100 and is placed before the year of 1998

SQL solution: SELECT * FROM Orders WHERE Freight > 100 AND OrderDate < ‘1998-01-01’

R solution: subset(A1,Freight > 100 & OrderDate < as.POSIXlt(‘1998-01-01’))

esProc solution: =A1.select(Freight > 100 && OrderDate < date(“1998-01-01”))

Comments: Both SQL and R solutions are close to SQL to some extent. R is of the typical function style, and esProc is of the typical object style. The programmers prefer the former, while the business experts prefer the latter, esProc, for esProc is easier for them to understand.

Order: Sort by employees correctly, and then sort by freightage in reverse order

SQL solution: SELECT * FROM Orders ORDER BY EmployeeID ,Freight DESC

R solution: A1[order(A1$ EmployeeID,-A1$Freight),]

esProc solution: =A1.sort(EmployeeID,Freight:-1)

Comments:

The R solution is to retrieve 2 vectors from A1 at first, pass them to the function order to group them together for sorting, and then export the serial numbers. At last, data will be rearranged according to the serial number. The computation process of R language is rather winding, not as straightforward as SQL. It is because that R is good at handling vector, the access to structured data will take column as the basic unit, and the parameter usually takes up a whole column. By comparison, SQL takes the record (column) as the basic unit, with parameter as the column name.

esProc solution resembles that of SQL because esProc takes the record (row) as the basic unit .

Group & Sum: Summarize by employee, sum up the freightage, and count the orders:

SQL solution: SELECT EmployeeID, COUNT(OrderID), SUM(Freight) FROM Orders GROUP BY EmployeeID

esProc solution: =A1.group(EmployeeID; EmployeeID, ~.count(), ~.sum(Freight))

R solution:

A4<-aggregate(A1$Freight,list(A1$EmployeeID),sum)

A4$count<-tapply(A1$Freight,A1$EmployeeID,length)

Comments: In this case, it is obvious that R and SQL differ greatly. The algorithms available in R may be clearer for mathematicians, and means more learning efforts for users accustomed to SQL.

Join:  Perform left join on Orders table and Customers table by CustomerID.

SQL Solution:

Select * from Orders left join Customers on Orders.CustomerID =Customers.CustomerID

esProc solution:

=join@1(A1:CustomerID:Orders, B1:CustomerID:Customers)

R solution:

merge(A1,B1,by.x=”CustomerID”,by.y=”CustomerID”,all.x=TRUE)

Comments: The join of SQL equals to join of esProc or the merge of R. Similarly, the left join of SQL equals to join@1 of esProc, or merge(…all.x=TRUE) of R. Obviously, esProc is more alike SQL in the respects of both the syntax conventions and the literal meanings.

Distinct: Remove the duplicate CustomerID

SQL solution: select distinct CustomerID from Orders

R solution: unique(B2$CustomerID)

esProc solution: =B2.id(CustomerID)

Comments: The keywords of the two solutions respectively differ to that of SQL. However, their usages are basically the same to that of SQL. In which, R is the typical function style, and esProc is the typical object style.

Like: Search for the record with Island in ShipName

SQL solution: select * from Orders where ShipName like ‘%Island%’

R solution: subset(A1,grepl(“Island”,ShipName,ignore.case = TRUE))

esProc solution: =A1.select(like@c(ShipName ,”*Island*”))

Comments: R supports several means to match, including the regular expressions, and is more powerful than esProc in this respect. The usages of esProc are more close to that of SQL, and fit for those who are familiar with SQL.

As can be seen from the above comparison, esProc has a coding style more close to that of SQL since esProc supports the data structure of “Record” by the infrastructure, making it more suitable for users that are familiar with SQL. In addition, compared with R language, esProc provides the representation style of “object + functions” that is much easier for business experts to accept.

R is more resourceful in details, ideal for the programmers and mathematicians. In addition, supporting the regular expressions and other functions makes R more open as a preferred analysis tool for programmers.

For more examples @ http://www.raqsoft.com/category/esproc/knowledge-base/esproc-vs-sql

Advertisements

About datathinker

a technical consultant on Database performance optimization, Database storage expansion, Off-database computation. personal blog at: datakeywrod, website: raqsoft
This entry was posted in FAQ, Program Language and tagged , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s