One of the basic ETL steps that we would use in most of the ETL jobs during development is ‘Lookup’. We shall discuss further on what lookup is? when to use? how it works ? and some points to be considered while using a lookup process.
What is lookup process?
During the process of reading records from a source system and loading into a target table if we query another table or file (called ‘lookup table’ or ‘lookup file’) for retrieving additional data then its called a ‘lookup process’. The ‘lookup table or file’ can reside on the target or the source system. Usually we pass one or more column values that has been read from the source system to the lookup process in order to filter and get the required data.
How ETL products implement lookup process?
There are three ways ETL products perform ‘lookup process’
- Direct Query: Run the required query against the table or file whenever the ‘lookup process’ is called up
- Join Query: Run a query joining the source and the lookup table/file before starting to read the records from the source.
- Cached Query: Run a query to cache the data from the lookup table/file local to the ETL server as a cache file. When the data flow from source then run the required query against the cache file whenever the ‘lookup process’ is called up
Most of the leading products like Informatica, DataStage support all the three ways in their product architecture. We shall see the pros and cons of this process and how these work in part II.