R language VS. esProc: Top 3 Salespersons Ranking by Monthly Sales Amount

Both R language and esProc have the outstanding ability to perform the stepwise computation. However, in the particulars they differ from each other. To demonstrate the difference, a case is designed as follows:

A company’s Sales department wants to select out the outstanding salespersons through statistics, that is, the salespersons whose sales amounts are always among the top 3 in each month from the January this year to the previous year. The data is mainly from the order table of MSSQL database: salesOrder, and the main fields include the ID of order: ordered, name of salesperson: name, sales amount: sales, and date of order: salesDate.

The solution is like this substantially:

  1. Compute the beginning dates of this year and this month, and filter the data by date.
  2. Group by month and salesperson, and compute the sales amount of each salesperson in each month.
  3. Group by month, and compute the rankings of sales amount in each group.
  4. Filter out the top 3 salespersons from each group.
  5. Compute the set of intersections of each group, that is, salespersons always among the top 3 in each month.


The solution of R language is as shown below:

01   library(RODBC)

02   odbcDataSources()

03   conn<-odbcConnect(“sqlsvr”)

04   originalData<-sqlQuery(conn,’select * from salesOrder’)

05   odbcClose(conn)

06   starTime<-as.POSIXlt(paste(format(Sys.Date(),’%Y’),’-01-01′,sep=”))

07   endTime<-as.POSIXlt(paste(format(Sys.Date(),’%Y’),format(Sys.Date(),’%m’),’01’,sep=’-‘))

08   fTimeData<-subset(originalData,salesDate>=starTime & salesDate<endTime)

09   gNameMonth<-aggregate(fTimeData$sales,list(fTimeData$name,format(fTimeData$salesDate,’%m’)),sum)

10   names(gNameMonth)<-c(‘name’,’month’,’monthSales’)

11   gNameMonth$rank <- do.call(c, tapply(gNameMonth$monthSales, gNameMonth$month,function(x) rank(-x)) )

12   rData<-subset(gNameMonth,rank<=3)

13   nameList <- split(rData$name, rData$month)

14   Reduce(intersect, nameList)


The solution of esProc is as shown below:



Then, let’s compare the two solutions by checking the database access firstly:

R language solution implements the data access from Line01 to 05 through relatively a few more steps, and this is acceptable considering it as the normal operations.

esProc solution allows for directly inputting SQL statements in the cell A1, which is quite convenient.

In respect of database access, R language and esProc differ to each other slightly. Both solutions are convenient.

Secondly, compare the time function:

R language solution computes the beginning dates of this year and this month through line 06-07. Judging from this point, R language is abundant in the basic functions.

esProc solution completes the same computation in A2 and B2, in which pdate function can be used to compute the beginning date of this month directly, which is very convenient.

In respect of date function, it seems that esProc is slightly better, while R language has a huge amount of 3rd-party-function library, and maybe there is any date function that is easier to use. Therefore, on the whole, esProc plays R language to a draw in this respect.

The focal point is stepwise computation:

Firstly, filter by date, group by month and sales person and then summarize by sales amount. The above functionalities are implemented respectively in line 8-9 for R language and cell A3-A4 for esProc. The difference is not great.

Proceed with the computation. According to the a bit straightforward thought of analysis, the steps followed should be: 1 Group by month; 2 Add the field of ranking in the group, and compute the rankings; 3 Filter by ranking, and only keep the salespersons that achieved the sales amounts ranking the top 3 in each group; 4. Finally, compute the set of intersection on the basis of the data in each group.

The corresponding codes of R language are from line 10 – 14 in the order of 2->3->1->4. In order words, rank the data in each group throughout the whole table, and then group. Have you noticed anything awkward? Although it is the ranking within the group, users of R language have to sort first and then group! This is because R language is weak in the ability to group first and then process. To barely compose the statements following the train of thought of 1->2->3->4, users of R language must have a strong technical background to handle the complex iteration statement expressions. The style of reverse thinking on this condition will greatly simplify the codes.

esProc solution completes the similar computation in the cell A5 – A8, not requiring any reverse thought. esProc users can simply follow their intuitive thinking of 1->2->3->4. This is because that esProc provides the ingenious representing style of ~. The ~ represents the current member that takes part in the computation. For this case, the ~ is each 2-dimension table in the group (corresponds to the data.frame of R language or the resutSet of SQL). In this way, ~.monthSales can be used to represent a certain column of the current 2-dimension table. By compassion, users of R language can only resort to some rather complicated means like loops to access the current member, which is more troublesome for sure.

Both esProc and R language solutions have excellent performances in respect of interaction.


About datathinker

a technical consultant on Database performance optimization, Database storage expansion, Off-database computation. personal blog at: datakeywrod, website: raqsoft
This entry was posted in Data Analytics, FAQ, Program Language and tagged , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s