A Handy Method of Grouping and Summarizing Text File Data in Java

We often need to process text file data while programming. Here is an example for illustrating how to group and summarize text file data in Java: load employee information from text file employee.txt, group according to DEPT and seek COUNT, the number of employee, and total amount of SALARY of each group.

Text file employee.txt is in a format as follows:

EID   NAME       SURNAME        GENDER  STATE        BIRTHDAY        HIREDATE         DEPT         SALARY

1       Rebecca   Moore      F       California 1974-11-20       2005-03-11       R&D          7000

2       Ashley      Wilson      F       New York 1980-07-19       2008-03-16       Finance    11000

3       Rachel      Johnson   F       New Mexico     1970-12-17       2010-12-01       Sales         9000

4       Emily         Smith        F       Texas        1985-03-07       2006-08-15       HR    7000

5       Ashley      Smith        F       Texas        1975-05-13       2004-07-30       R&D          16000

6       Matthew Johnson   M     California 1984-07-07       2005-07-07       Sales         11000

7       Alexis        Smith        F       Illinois       1972-08-16       2002-08-16       Sales         9000

8       Megan     Wilson      F       California 1979-04-19       1984-04-19       Marketing        11000

9       Victoria    Davis        F       Texas        1983-12-07       2009-12-07       HR    3000

10     Ryan         Johnson   M     Pennsylvania    1976-03-12       2006-03-12       R&D          13000

11     Jacob        Moore      M     Texas        1974-12-16       2004-12-16       Sales         12000

12     Jessica     Davis        F       New York 1980-09-11       2008-09-11       Sales         7000

13     Daniel       Davis        M     Florida      1982-05-14       2010-05-14       Finance    10000

Java’s way of writing code for this task is:

  1. Import data from the file by rows and save them in emp, the multiple Map objects of sourceList, the List
  2. Traverse the object of sourceList, perform grouping according to DEPT and save the result in list, which contains different List objects of group, the Map
  3. Traverse group and then traverse each DEPT’s list object, and sum up SALARY.
  4. While traversing group, save the values of DEPT, COUNT and SALARY in result, the Map object, and the results of different departments in resultList, the List object.
  5. Print out the data of resultList.

The code is as follows:

public static void myGroup() throws Exception{

           File file = new File(“D:\\esProc\\employee.txt”);

           FileInputStream fis = null;

           fis = new FileInputStream(file);

           InputStreamReader input = new InputStreamReader(fis);

           BufferedReader br = new BufferedReader(input);

           String line = null;

           String info[] = null;

           List<Map<String,String>> sourceList= new ArrayList<Map<String,String>>();

           List<Map<String,Object>> resultList= new ArrayList<Map<String,Object>>();

           if ((line = br.readLine())== null) return;//skip the first row

          while((line = br.readLine())!= null){

                    info = line.split(“\t”);

                    Map<String,String> emp=new HashMap<String,String>();

                    emp.put(“EID”,info[0]);

                    emp.put(“NAME”,info[1]);

                    emp.put(“SURNAME”,info[2]);

                    emp.put(“GENDER”,info[3]);

                    emp.put(“STATE”,info[4]);

                    emp.put(“BIRTHDAY”,info[5]);

                    emp.put(“HIREDATE”,info[6]);

                    emp.put(“DEPT”,info[7]);

                    emp.put(“SALARY”,info[8]);

                    sourceList.add(emp);

           }

           Map<String,List<Map<String,String>>> group = new      HashMap<String,List<Map<String,String>>>();

           //grouping object

           for (int i = 0, len = sourceList.size(); i < len; i++) {//group data from different DEPT

                    Map<String,String> emp =(Map) sourceList.get(i); 

            if (group.containsKey(emp.get(“DEPT”))) {

                group.get(emp.get(“DEPT”)).add(emp) ;

            } else {

                List<Map<String,String>> list = new ArrayList<Map<String,String>>() ;

                list.add(emp) ;

                group.put(emp.get(“DEPT”),list) ;

            }

           }

           Set<String> key = group.keySet();

           for (Iterator it = key.iterator(); it.hasNext();) {//summarize the grouped data                     

String dept = (String) it.next();

                    List<Map<String,String>> list = group.get(dept);

                    double salary =0;

                    for (int i = 0, len = list.size(); i < len; i++) {

                             salary += Float.parseFloat(list.get(i).get(“SALARY”));

                    }

                    Map<String,Object> result=new HashMap<String,Object>();

                    result.put(“DEPT”,dept);

                    result.put(“SALARY”,salary);

                    result.put(“COUNT”,list.size());

                    resultList.add(result);

           }

           for (int i = 0, len = resultList.size(); i < len; i++) {//print out the resulting data

                    System.out.println(“dept=”+resultList.get(i).get(“DEPT”)+

                                       “||salary=”+resultList.get(i).get(“SALARY”)+

                                       “||count=”+resultList.get(i).get(“COUNT”));

           }

}

The results after the code is executed are as follows:

dept=Sales||salary=1362500.0||count=187

dept=Finance||salary=177500.0||count=24

dept=Administration||salary=40000.0||count=4

dept=Production||salary=663000.0||count=91

dept=Marketing||salary=733500.0||count=99

Here myGroup function has only one grouping field. If it has multiple grouping fields, nested multi-layer collections class is needed and the code will become more complicated. As myGroup function has fixed grouping fields and summarizing fields, if there is any change about the fields, we have no choice but to modify the program. This robs the function of the ability to deal with situations of flexible and dynamic grouping and summarizing. In order to enable it to handle these situations as well as SQL statement does, we need to develop additional program for analyzing and evaluating dynamic expressions, which is a rather difficult job.

As a programming language specially designed for processing structured (semi-structured) data and able to perform dynamic grouping and summarizing easily, esProc can rise to the occasion at this time as a better assistive tool. It can integrate with Java seamlessly, enabling Java to access and process text file data as dynamically as SQL does.

For example, group according to DEPT and seek COUNT, the number of employees, and the total amount of SALARY of each group. To do this, esProc can import from external an input parameter “groupBy” as the condition of dynamic grouping and summarizing. See the following chart:

java_textfile_group_1

The value of “groupBy” is DEPT:dept;count(~):count,sum(SALARY):salary. esProc needs only three lines of code as follows to do this job:

java_textfile_group_2

A1: Define a file object and import the data to it. The first row is the headline which uses tab as the default field separator. esProc’s IDE can display the imported data visually, like the right part of the above chart.

A2: Group and summarize according to the specified field. Here macro is used to dynamically analyze the expression in which groupBy is an input parameter. esProc will first compute the expression enclosed by ${…}, then replace ${…} with the computed result acting as the macro string value and interpret and execute the result. In this example, the code we finally execute is =A1.groups(DEPT:dept;count(~):count,sum(SALARY):salary).

A3: Return the eligible result set to the external program.

When the grouping field is changed, it is no need to change the program. We just need to change the parameter groupBy. For example, group according to the two fields DEPT and GENDER and seek COUNT, the number of employees, and the total amount of SALARY of each group. The value of parameter groupBy can be expressed like this: DEPT:dept,GENDER:gender;count(~):count,sum(SALARY):salary. After execution, the result set in A2 is as follows:

java_textfile_group_3

Finally, call this piece of esProc code in Java to get the grouping and summarizing result by using JDBC provided by esProc. The code called by Java for saving the above esProc code as test.dfx file is as follows:

         // create esProc jdbc connection

Class.forName(“com.esproc.jdbc.InternalDriver”);

con= DriverManager.getConnection(“jdbc:esproc:local://”);

//call esProc code (the stored procedure) in which test is the file name of dfx

com.esproc.jdbc.InternalCStatement st;

st =(com.esproc.jdbc.InternalCStatement)con.prepareCall(“call test(?)”);

// set parameters

st.setObject(1,”DEPT:dept,GENDER:gender;count(~):count,sum(SALARY):salary”);//the parameters are the dynamic grouping and summarizing fields

// execute the esProc stored procedure

st.execute();

// get the result set

ResultSet set = st.getResultSet();

Here the relatively simple esProc code can be called directly by Java, so it is unnecessary to write esProc script file (like the above test.dfx). The code is as follows:

st=(com. esproc.jdbc.InternalCStatement)con.createStatement();

          ResultSet set=st.executeQuery(“=file(\”D:\\\\esProc\\\\employee.txt\”).import@t().groups(DEPT:dept,GENDER:gender;count(~):count,sum(SALARY):salary)”);

The above Java code directly calls a line of code from esProc script: get data from the text file, group and summarize them according to the specified fields and return the result to set, Java’s ResultSet object.

Advertisements

About datathinker

a technical consultant on Database performance optimization, Database storage expansion, Off-database computation. personal blog at: datakeywrod, website: raqsoft
This entry was posted in Data Analytics, Program Language and tagged , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s