The fact transactions that come in earlier than the dimension (master) records are not bad data, such fact records needs to be handled in our ETL process as a special case. Such situations of facts coming in before dimensions can occur quite commonly like in case of a customer opening a bank account and his transactions starting to flow into the data warehouse immediately. But the customer id creation process from the Customer Reconciliation System can get delayed and hence the customer data would reach the data warehouse after few days.

How do we handle this scenario differs based on the business process being addressed, there could be two different requirements

  • Make the fact available and report under “In Process” category; commonly followed in financial reporting systems to enable reconciliation
  • Make the fact available only when the dimension is present,; commonly followed in status reporting systems

Requirement 1: Make the fact available and report under “In Process” category

For this requirement follow the below steps

  1. Insert into the dimension table a record that represents a default or ‘In Process’ status like in case of the banking example the Customer Dimension would have a ‘default record’ inserted that represents the information that the customer detail has not yet arrived
  2. In the ETL process while populating the Fact table, for the transactions that do not have a corresponding entry in the Dimension table, assign a default Dimension key and insert into the Fact. In the same process insert the Dimensions Lookup values into a ‘temporary’ or ‘error’ table
  3. Build an ETL process that checks the new records inserted into the Dimension table, queries the temporary table and identifies the records in facts for which the dimension key has to be updated and updates the respective fact’s dimension key

Requirement 2: Make the fact available only when the dimension is present

For this requirement follow the below steps

  1. Build an ETL process that populates the fact into a staging table
  2. Build an ETL process that pushes only the records that have a dimension value to the data warehouse tables
  3. At the end of ETL process delete all the processed records from the staging table making the other unprocessed records available to be pulled next time
    Posted by Muneeswara C Pandian
    Comments (0)
    June 29th, 2007

    Comments (0)