In a typical data warehouse setup, we would first extract, cleanse and bring the required base data into the data warehouse. We would then transform or aggregate the base data into additional derived tables for easier reporting; these are processes that are optional but are done to get the data ‘report friendly’.
The decision to have calculation logic embedded in the data integration (DI) layer or the reporting layer can be primarily driven by two factors
Report execution performance and simplifying report development
When the DI layer pre calculates and makes the values directly available for the reporting process, then there is a performance improvement in the report delivery. But this does not provide the flexibility to change any parameters in the calculation formula in run time during analysis. Cube based reporting layer also falls under this case.
Flexibility for users to manage the calculations during analysis
When all calculation logic is embedded within the reporting layer then there is a lapse in the report delivery since the report server spends time in performing these calculations during runtime.
In many the aggregation or transformation logic required to get the data ‘report friendly’ involves different levels of aggregation functions which could be counts, sum, percentage etc across multiple dimensions with different set of records from the same fact table. In summary we are looking at good amount of business logic to be built into the data integration process to prepare the data for the reporting. This directly impacts the complexity of the data integration process and the amount of time to be spent in testing the data integration process, especially during the first release of the data warehouse, when the knowledge about the data is an evolving process.
Recently observed that during the data warehouse development process we can get the ‘report friendly’ requirements started at the reporting layer level and then gradually move these logics into the DI layer based on need.
We had 100+ key calculations to be developed in a 4 weeks timeframe, getting them calculated and tested was a challenge. We decided to build all the logic of calculations from the base data in the reporting layer, we went through multiple iterations to finalize the calculation logic with the users, but the process was much easier to handle.
Reporting layer provided a much easier flexibility of debugging and verifying each calculation, we were able to add a calculation, run the result, pass it to business the sample outputs and validate…all of this was done within the reporting layer team…with very less co-ordination with the DI team. Data Integration process debugging is time consuming, we will need to fall back and forth between the DI platform and the database platform.
Reporting layer provided the flexibility to fix any issues identified by the users and get back to them immediately; it would have not been easier through data integration layer.
Was easy enough to generate the SQL query from the reporting interface and add that logic to the DI process for pre-computing the calculations for report better performance
In summary reporting layer provided
- Flexibility in debugging the calculation logic
- Easier to add calculations one by one and validate the output with the users
- Quicker to fix any issues identified by the users
- Lesser dependency on the DI and database team
- Finally helped in deciding what calculations to move to the DI layer, also generated the SQL Query for the DI team to embed in the DI layer
Thanks for reading, let me know your thoughts on building DI logic specific for the reporting layer… ‘report friendly’.