ETL represents the three basic steps:

  1. Extraction of data from a source system
  2. Transformation of the extracted data and
  3. Loading the transformed data into a target environment

In general ‘ETL’ represented more of batch process and that of gathering data from either flat files or relational structure. When ETL systems started supporting data from wider sources like XML, industry standard format like SWIFT, unstructured data, real time feeds like message queues etc ‘ETL’ got evolved to ‘Data Integration’. That’s the reason why now all ETL product vendors are called Data Integrators.

Now let us see how Data Integration or ETL has evolved over the period. The ways of performing DI…

  • Write Code
  • Generate Code
  • Configure Engine

Write Code: Write a piece of code in a programming language, compile and execute

Generate Code: Use a Graphical User Interface to input the requirements of data movement, generate the code in a programming language, compile and execute

Configure Engine: Use a Graphical User Interface to input the requirements, save the inputs (Metadata) in a data store (repository). Use the generic pre compiled Engine to interpret the metadata from the repository and execute.

Pros and Cons of each approach

Pros Write Code Generate Code Configure Engine
  • Easy to get started for smaller tasks
  • Complex data handling requirements can be met
  • Developer friendly to design the requirements
  • Metadata of requirements captured
  • Developer friendly to design the requirements
  • Metadata of requirements captured
  • Easier code maintenance
  • Flexibility to access any type of data source
  • Scalable for huge data volume supports architectures like SMP, MPP, NUMA – Q,GRID etc
Cons
  • Large effort in maintenance of the code
  • Labor-intensive development, error prone and time consuming
  • Large effort in maintenance of the code
  • Metadata and code deployed can be out of sync
  • Certain data handling requirements might require adding a ‘hand written code’
  • Dedicated environment, servers and the initial configuration process
Posted by Muneeswara C Pandian
Comments (0)
June 1st, 2007

Comments (0)