When we receive the data from source systems, the data file will not carry a flag indicating whether the record provided is new or has it changed. We would need to build process to determine the changes and then push them to the target table.

There are two steps to it

  1. Pull the incremental data from the source file or table
  2. Process the pulled incremental data and determine the impact of it on the target table as Insert or Update or Delete

Step 1: Pull the incremental data from the source file or table
If source system has audit columns like date then we can find the new records else we will not be able to find the new records and have to consider the complete data
For source system’s file or table that has audit columns, we would follow the below steps

  1. While reading the source records for a day (session), find the maximum value of date(audit filed) and store in a persistent variable or a temporary table
  2. Use this persistent variable value as a filter in the next day to pull the incremental data from the source table

Step 2: Determine the impact of the record on target table as Insert/Update/ Delete
Following are the scenarios that we would face and the suggested approach

  1. Data file has only incremental data from Step 1 or the source itself provide only incremental data
    • do a lookup on the target table and determine whether it’s a new record or an existing record
    • if an existing record then compare the required fields to determine whether it’s an updated record
    • have a process to find the aged records in the target table and do a clean up for ‘deletes’
  2. Data file has full complete data because no audit columns are present
    • The data is of higher
      • have a back up of the previously received file
      • perform a comparison of the current file and prior file; create a ‘change file’ by determining the inserts, updates and deletes. Ensure both the ‘current’ and ‘prior’ file are sorted by key fields
      • have a process that reads the ‘change file’ and loads the data into the target table
      • based on the ‘change file’ volume, we could decide whether to do a ‘truncate & load’
    • The data is of lower volume
      • do a lookup on the target table and determine whether it’s a new record or an existing record
      • if an existing record then compare the required fields to determine whether it’s an updated record
      • have a process to find the aged records in the target table and do a clean up or delete
Posted by Muneeswara C Pandian
Comments (3)
July 13th, 2007

Comments (3)

Kumaran Karthikeyan - October 24th, 2009

Even though the redo log file tells the data that were changed, we still need to identify the inserts, updates and deletes by comparing to the target dataset as mentioned by Munees based on the alternate/natural key(s). The point that I am trying to make here is, there are times were there would be absolutely no changes to the data for the one that redo log file reported as changed data. Reason being, the target data structure may have only subset of the source data elements and the change happened in the source data element(s) which are not in the target (or) the data in the source was simply touched and not truly changed. In these scenarios we need to drop this record and not update the target. Thanks Kumaran

Muneeswara C Pandian - July 25th, 2007

Yes Illiyaz the CDC products from Oracle would help in determining the changes. But we need to take note of deciding the change as an Update or Insert in accordance with the Target DW system. The state of the target table can be different like an update in the source system can become an insert in the target system. Informatica's PowerExchange also provides capabilties for CDC against Mainframe data sources like DB2,IMS and other databases like Oracle, SQL Server.

Illiyaz Mohammad - July 19th, 2007

Oracle provides something called CDC(Change Data Capture) in its 10 g versions.This basically caprtures all the data that has been changed from the REDO log files and executes the same DDL statements onto the target DB.Also Oracel 9i release2 (9.1.2) offers STREAMS which also works in a similar way.In Streams , rules can be implemented to capture DML only of a particular type or any other rules can be set.