Five methods of data consolidation for single source reporting tool

As the name indicates, the single source reporting tools refer to tools only support the report based on a single data set, for example, iReport, Birt, and Crystal. However, we often need to use the single source reporting tool to present the data from multiple sources in reporting tool, for example, the data from MySQL and Oracle or MSSQL and Excel. In that case, the multiple source data must be converted to a single source data before use. Then, how to convert? Based on my years of practices and experiences, I’ve figured out five methods of data consolidation for single source reporting tool.

Sub report: Most mainstream reporting tools support the sub report by having the sub report built in the subheading or detail area of a parent report, and allowing sub report to receive the parameters from the parent report.

This method is clear in principles, easy to grasp, and quite suitable for a few particular simple scenarios, but is not suitable for most scenarios for the below reasons: first, the nested sub report is designed to act as some computational functions by utilizing the position relation between reports. Programmers can compute arbitrarily between any two data sources. In facts, this method is only fit for the simple connections in the one-to-one or one-to-many relation, but not fit for the more general computations. Second, merging two reports will inevitably involve the data alignment problems. Because the data are in different reports, it is always hard for programmers to keep the data from the sub report in a stable relation of its position to the parent report. Plus, the performance is also a weak point of nested sub report. As you can imagine, as a result of using sub report in the report with multi-level groupings or multi-times of expansions, the number of reports and database connections will experience the data explosion. Moreover, its incapability of developing report in a same report will cause the inconvenience in maintaining the nested sub report.

Report of advanced edition: The iReport, Birt, Crystal of the lower edition do not support the computations between data sources. However, almost all reporting tools of the advanced edition can consolidate multiple data sources into a single data source for future use. They are Virtual Data Sources of iReport, Data Sources Join of Birt, Universal of Crystal, and other tools alike.

As the best practice recommended by most vendors, this method can solve the normal computations between various data sources. However, you may notice the drawbacks within. First, this method does not work for some reports. Since each report requires its unique solution, it is impossible to substitute or migrate, and the maintenance cost is quite high. For the report developers with a concern on coupling and scalability, they should be careful about this method. Second, this method is an awkward way to solve the complex multi-data-source computation. Though its wizard is viewable, this method lacks the flexible computational scripts and the powerful enough computational capability. Last, this method only offers the insufficient support for the non-database data source. For example, it is a quite cumbersome process to consolidate MSSQL and Excel into a single data source in this method.

Data source computation layer: The data source computation layer is the layer in-between the data source and the reporting tool, responsible for the uniform computation on the data from the data source and returning the computed result to the layer of reporting tools. esProc and R language are the typical examples.

This method can solve the data source computation problems of reports all-roundly, as well as a range of problems in data source preparation. It features the great universality in the respect of multi-data-source consolidation. In addition, it provides an interface independent of a specific reporting tool. For example, esProc provides the JDBC interface for direct use by the reporting tools; R provides the 3rd party API for indirect calling by Java. Such universality can also be proved in the respect of computational capability. The computation layer for data source generally provides the more professional computing scripts and IDE for structural data. For example, esProc has implemented the true debugging function, offering a more convenient developing procedure than that of SQL. This method can be adopted to handle not only the conventional computations, but also the complex computation problems easily. R Language is another example regarding its support for multiple precast models. In addition, the universality of this method can also be proven in its support for any data source – not only the data from database, but also the data from non-database Txt or Excel sheet.

The data source computation layer is commonly used to remove the restriction on the single-source tool. Though such methods are complete, their performances are low generally. The data source computation layer performs all computations in memory, and the processing data volume is limited to the size of memory. Despite the above weak point, both esProc and R Language support the Hadoop, which enable them to support the big data in distributed system pretty well.

In addition to the above 3 major methods, there are 2 auxiliary methods:

Customized data set: This is the common name of API interfaces provided by reporting tools. For example, the JasperReport provides the way to use the Virtual Data Source and Bean Data Source methods in combination. The former tool can connect two data sources visibly, and the latter tool can handle the complex computation or computation between the non-database data sources. To put it concretely, we can use the Java developing tool to compute the underlying data directly and ultimately return via the interface recognizable for the reporting tools.

The method requires a complex developing process, and only offers an unprofessional computation features, uncommon usability, and poor performance. So, I regard it as a supplementary method.

Data warehouse. ETL and data warehouse can be classified to this category. The common tools include the DB2 data warehouse, DataStage, SSIS, Informatica, and Talend. The data warehouse has the advantage of high performance and great universality, and the disadvantage of the great cost, long duration, and endless maintenances. At present, the successful data warehouses are actually few in number. Seldom are they applied in the normal reports. So, I also regard it as a supplementary method.

In conclusion, I believe the above 3 primary methods and 2 supplementary methods can help you remove the restrictions on most single-source reporting tools.


About datathinker

a technical consultant on Database performance optimization, Database storage expansion, Off-database computation. personal blog at: datakeywrod, website: raqsoft
This entry was posted in Reporting tool and tagged , , , , , , , , , , , , , , . Bookmark the permalink.

One Response to Five methods of data consolidation for single source reporting tool

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s