A Handy Method of Conditioned Filtering in Text Files with Java

         We often encounter the situation that requires text file data processing. Here we’ll look at how to execute conditioned filteringin text files with Java through an example: read employee information from text file employee.txt and select female employees who were born on and after January 1, 1981.

         The text file employee.txt is in a format as follows:

EID   NAME       SURNAME        GENDER  STATE        BIRTHDAY        HIREDATE         DEPT         SALARY

1       Rebecca   Moore      F       California 1974-11-20       2005-03-11       R&D          7000

2       Ashley      Wilson      F       New York 1980-07-19       2008-03-16       Finance    11000

3       Rachel      Johnson   F       New Mexico     1970-12-17       2010-12-01       Sales         9000

4       Emily         Smith        F       Texas        1985-03-07       2006-08-15       HR    7000

5       Ashley      Smith        F       Texas        1975-05-13       2004-07-30       R&D          16000

6       Matthew Johnson   M     California 1984-07-07       2005-07-07       Sales         11000

7       Alexis        Smith        F       Illinois       1972-08-16       2002-08-16       Sales         9000

8       Megan     Wilson      F       California 1979-04-19       1984-04-19       Marketing        11000

9       Victoria    Davis        F       Texas        1983-12-07       2009-12-07       HR    3000

10     Ryan         Johnson   M     Pennsylvania    1976-03-12       2006-03-12       R&D          13000

11     Jacob        Moore      M     Texas        1974-12-16       2004-12-16       Sales         12000

12     Jessica     Davis        F       New York 1980-09-11       2008-09-11       Sales         7000

13     Daniel       Davis        M     Florida      1982-05-14       2010-05-14       Finance    10000

         Java’s way of code writing is that it reads data from the file by rows, save them in the List objects, traverse List objects, and savethe eligible records in the resultingList objects. Lastly, print out the number of eligible employees. Detailed code is as follows:

public static void myFilter() throws Exception{

           File file = new File(“D:\\employee.txt”);

           FileInputStream fis = null;

           fis = new FileInputStream(file);

           InputStreamReader input = new InputStreamReader(fis);

           BufferedReader br = new BufferedReader(input);

           String line = null;

           String info[] = null;

           List sourceList= new ArrayList();

           List resultList= new ArrayList();

           if ((line = br.readLine())== null) return;//skip the first line, exit if the file is null

           while((line = br.readLine())!= null){ //import to the memory from the file

                    info = line.split(“\t”);

                    Map<String,String> emp=new HashMap<String,String>();

                   emp.put(“EID”,info[0]);

                   emp.put(“NAME”,info[1]);

                   emp.put(“SURNAME”,info[2]);

                   emp.put(“GENDER”,info[3]);

                   emp.put(“STATE”,info[4]);

                   emp.put(“BIRTHDAY”,info[5]);

                   sourceList.add(emp);

           }

           for (int i = 0, len = sourceList.size(); i < len; i++) {//process data by rows

                    Map<String,String> emp =(Map) sourceList.get(i); 

                    SimpleDateFormat sdf = new SimpleDateFormat(“yyyy-MM-dd”);

                    if ( emp.get(“GENDER”).equals(“F”) && !sdf.parse(emp.get(“BIRTHDAY”)).before(sdf.parse(“1981-01-01”)) )

                   { //save the eligible records in List objects using the conditional statement

                   resultList.add(emp);

                    }

           }

          System.out.println(“count=”+resultList.size());//print out the number of eligible employees

}

    The filtering condition of this function is fixed. If the condition is changed, the conditional statement in the program should be modified accordingly. Multiple pieces of code are needed if there are multiple conditions, and the program lacks the ability to handle the provisional, dynamic conditions. Now we’ll rewrite the code and make it universal in some degree by slightly changing the loop of traversing sourceList:

for (int i = 0, len = sourceList.size(); i < len; i++) {

                    Map<String,String> emp =(Map) sourceList.get(i); 

                    SimpleDateFormat sdf = new SimpleDateFormat(“yyyy-MM-dd”);

                    boolean isRight = true;

                    if (gender!=null && !emp.get(“GENDER”).equals(gender)){//process the condition of gender

                             isRight = false;

                    }

                    if (start!=null && sdf.parse(emp.get(“BIRTHDAY”)).before(start) ){//process the starting conditionof BIRTHDAY

                             isRight = false;

                    }

                    if (end!=null && sdf.parse(emp.get(“BIRTHDAY”)).after(end) ){//process the end condition of BIRTHDAY

                             isRight = false;;

                    }

                    if (isRight) resultList.add(emp);//save the eligible records in the resulting list

           }

In the rewritten code, gender, start and end are input parameters of the function myFilter. The program can manage situations that GENDER field equals the input value gender, BIRTHDAY field is greater than or equal to the input value start as well as less than or equal to the input value end. If any of the input values is null, the condition will be ignored. Conditions are joined by AND.

If we want to make myFiltera more universal function, for example, join conditions with OR or allow computation between fields, the code will become more complicated, requiring program for analyzing and evaluating dynamic expressions. This type of program can be as flexible and universal as database SQL, but it is really difficult to develop.

 

In view of this, we can turn to esProc to assist with this task. esProc is a programming language designed for processing structured (semi-structured) data. It is quite easy for it to perform the above universal query task and can integrate with Java seamlessly so that Java can access and process text file data as flexibly as SQL does.

For example, to query female employees who were born on and after January 1, 1981, esProc can import from external an input parameter “where” as the dynamic condition, see the following chart:

java_file_datasource_1

The value of “where”is:BIRTHDAY>=date(1981,1,1) && GENDER==”F”. esProc needs only three lines of code as follows:

 

 

java_file_datasource_2

A1:Define a file object and import data to it. The first row is the headline with tab as the field separator by default. esProc’s IDE can visually display the imported data, as shown on the right of the above chart.

A2:Filter according to the condition. Here macro is used to analyze the expression dynamically. “where” is the input parameter. esProc will first compute the expression enclosed by ${…}, then replace ${…} with the computed result acting as macro string value and interpret and execute the result. In this example, the code we finally execute is =A1.select(BIRTHDAY>=date(1981,1,1) && GENDER==”F”).

A3:Return the eligible result set to the external program.

When the filtering condition changes, we just need to change the parameter “where”without rewriting the code. For example, the condition is modified into querying female employees who were born on and after January 1, 1981,or records of employees whose NAME+SURNAMEequals“RebeccaMoore”. The code forwhere’s parameter value can be like this: BIRTHDAY>=date(1981,1,1) && GENDER==”F” || NAME+SURNAME==”RebeccaMoore”. After execution, the result set in A2 is shown in the following chart:

java_file_datasource_3

 

Finally, call this piece of esProc code with Java to get the filtering result by using jdbc provided by esProc. The code called by Java for saving the above esProc code as test.dfx file is as follows:

       // create esProcjdbcconnection

       Class.forName(“com.esproc.jdbc.InternalDriver”);

       con= DriverManager.getConnection(“jdbc:esproc:local://”);

       //call esProc program (the stored procedure) in which test is the file name of dfx

       st =(com.esproc.jdbc.InternalCStatement)con.prepareCall(“call test(?)”);

       //set parameters

       st.setObject(1,” BIRTHDAY>=date(1981,1,1) && GENDER==\”F\” ||NAME+SURNAME==\”RebeccaMoore\””);//the parameter is the dynamic filtering condition

       // execute esProc stored procedure

      st.execute();

       //get the result set: a set of eligible employees

       ResultSet set = st.getResultSet();

When writing script of relatively simple code, we may write the esProc code directly into Java code that calls the esProc JDBC. This can save us from having to writethe esProc script file (test.dfx):

st=(com. esproc.jdbc.InternalCStatement)con.createStatement();

ResultSet set=st.executeQuery(“=file(\”D:\\\\esProc\\\\employee.txt\”).import@t().select(BIRTHDAY>=date(1981,1,1)&&GENDER==\”F\” || NAME+SURNAME==\”RebeccaMoore\”)”);

This piece of Java code directly calls a line of code from esProc script: get data from the text file, filter them according to the specified condition and return the result set toset, the ResultSet object.

Advertisements

About datathinker

a technical consultant on Database performance optimization, Database storage expansion, Off-database computation. personal blog at: datakeywrod, website: raqsoft
This entry was posted in Application, Program Language and tagged , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s