We have recently upgraded from Datastage 7.5 to IBM Infosphere Information server Datastage 8.5. We have Peoplesoft EPM 8.9 Warehouse setup and Datastage is the ETL tool that is used to load the data into different layers of warehouse. Datastage is used in loading every layer of the warehouse (Stage, Enriched and Multi-Dimension layers). There are Daily and Monthly ETL jobs that are scheduled to run through Control-M and the jobs run 24/7 due to different regions of business by the client (NA, APAC and EMEA).
After we upgraded our Datastage environment from 7.5 to 8.5 the performance of the jobs reduced by at least 30% and this had an high impact on the deliverables. Delay in one region stream affected the other region streams and due to a feed from EPM to FGL system, there was additional impact as well.
We analyzed the performance and found that the poor performance was attributed to two reasons.
1. Architectural change from Datastage 7.5 to 8.5
2. ETL jobs designed as “Server” jobs rather than “parallel” jobs.
Architecture of Datastage has changed drastically from 7.5 to 8.5 with a change from 2tier architecture to n-tier architecture and also the Datastage Metadata architecture is moved from file based metadata architecture in 7.5 to Database based Metadata architecture in 8.5.
Pic 1: Datastage 7.5 Architecture
Pic 2: Datastage 8.5 Architecture
Another important observation we made is that, during installation of Datastage 8.5 we have installed the feature of NLS which was not activated in Datastage 7.5 as the client is not licensed for NLS Multilingual license in 7.5. We found below issues with this setup in 8.5:
1. EPM Database is a Non-Unicode database and the Datastage 8.5 engine is a NLS capable engine with “UTF-8” Character set.
2. For every row of data that is being transformed into database, Datastage 8.5 engine was trying to convert its character set from UTF standard to ISO standard which was resulting in performance issue.
3. IBM has posted that the performance of the Server jobs can take a hit if NLS capability is installed during installation process as per screenshot below.
Once the NLS feature was installed on the Datastage engine there was no way of uninstalling it. To fix this issue, we came up with a little trick to turn off the NLS feature;
Steps to Disable NLS settings in Datastage 8.5:
Step1: Log in as Datastage Administrator user on to Datastage engine server and cd to $DSHOME
Step 2: Open “uvconfig” file (Configuration file used by Datastage engine to boot up), go to the section “NLSMODE” and change the value for NLSMODE to 0
# NLSMODE – Set to 1 if NLS mode is ON for the
# system as a whole. 0 means that NLS mode is OFF.
Save the uvconfig
Step 3: Regenerate the “uvconfig” file
uvregen: reconfiguration complete, disk segment size is 17946748
Step 4: Stop the Datastage engine
$DSHOME$ uv -admin -stop
JobMonApp has been shut down.
DataStage Engine 22.214.171.124 instance “adb” has been brought down.
Step 5: Change the NLS character set value in dsenv file of Datastage Engine to ISO standard.
NLS_LANG=AMERICAN_AMERICA.WE8ISO8859P1; export NLS_LANG#NLS_LANG=AMERICAN_AMERICA.UTF8; export NLS_LANG
Step 6: Invoke the dsenv and boot the engine
$DSHOME$ . ./dsenv
$DSHOME$ uv -admin -start
DataStage Engine 126.96.36.199 instance “adb” has been brought up.
JobMonApp has been started.
With the above steps the NLS language support functionality is disabled in Datastage 8.5.
Step 7. Regenerate the indexes for all the Datastage projects using uv DS.TOOLS
DataStage Tools Menu
1. Report on project licenses
2. Rebuild Repository indices
3. Set up server-side tracing >>
4. Administer processes/locks >>
5. Adjust job tunable properties
Which would you like? ( 1 – 5 ) ?2
Once we disabled the NLS feature in Datastage 8.5, we gained performance of upto 30% from the existing performance levels. We also made the following recommendations to the development team to further improve the performance.
1. Use Oracle OCI Stage while designing the Datastage jobs instead of the DRS Connector stage.
2. Redesign the longest running jobs in “parallel” mode instead of “server mode”. We did a POC for this and resulted in 50% to 60% gain of performance.
Even though the step performed did not completely un-install the NLS feature, it disabled the NLS capability which resulted in better performance.
If you are planning for your Datastage 8.5 installation please keep these points noted and plan your installation accordingly.