Blog Network Home > Informatica Way > Pattern Matching in Informatica Power Center

Pattern Matching in Informatica Power Center

by Karthik Raja R on December 20, 2013 in Informatica Way

We are having several functions in Informatica, and each function will be used depends upon the requirements. One of the most useful and less used functions is Regular Expressions. That is, ‘REG_’ functions. We can find these functions under Data Cleansing in Functions. There are 3 regular expressions available in Informatica: REG_EXTRACT, REG_MATCH and REG_REPLACE. All these regular expressions are pattern based functions rather than format based (example, date format) or data based. Means that, when the records are dynamic we cannot check the same char or string to all the incoming records. Those records can be handled by Regular Expressions.

REG_MATCH

This function will simply return TRUE if the data matches the pattern else FALSE.

Syntax

 
REG_MATCH (subject, pattern)
 

Scenario-1:

One of the client requirements is to filter out the records based upon the particular pattern

MM-YYYY’ from source files. Since the source files are having the manually entered data, the records are not consistent. If a normal scenario like, Sal >10000 or empid between 1000 and 5000 means, we can use the conditions directly in the filter transformation and filter out the records. But when the source records are dynamic and to use some specific pattern to filter out the records, we don’t have many options here. For the above requirement we have used REG_MATCH function.

REG_MATCH (Value,‘(\w+\d\d-\d\d\d\d)’)

\w+ à  is any value that follows an alphanumeric character.

\d       à  matches one instance of any digit from 0-9.

The records whichever is not meeting the above pattern
will be filtered out.
Please find the below screenshots,
for the source and target records details.
Source





Target




REG_REPLACE
It replaces characters in a string with another character pattern.
By default, REG_REPLACE searches the input string for the character
pattern you specify and replaces all occurrences with the replacement
pattern.
You can also indicate the number of occurrences of the pattern
you want to replace in the string.

Syntax

REG_REPLACE (subject, pattern, replace, numReplacements)
 

Scenario-2:

One of the client requirements is to normalize the data by replacing the

incoming data from inconsistent date format to normalized date format based upon the first occurrence.

If the requirement is to replace a particular char or string,

generally we will use Repalcechr or Replacestr functions.

But, Repalcechr or Replacestr functions will replace all the occurrences of the particular char or string.

Since the requirement is to replace a particular pattern with the first occurrence,

we have used REG_REPLACE function to address the above requirement.

o-by � fn��?� normal'>Scenario-2:

One of the client requirements is to normalize the data by

replacing the incoming data from inconsistent date format to normalized date format based upon the first occurrence.

If the requirement is to replace a particular char or string,

generally we will use Repalcechr or Replacestr functions.

But, Repalcechr or Replacestr functions will replace all the occurrences of the particular char or string.

Since the requirement is to replace a particular pattern with the first occurrence,

we have used REG_REPLACE function to address the above requirement.

REG_REPLACE (Value, ‘/’,’-’, 1)

1        à Specifies the number of occurrences you want to replace. If you omit this option,

REG_REPLACE will replace all occurrences of the character string r

Replacestr functions will replace all the occurrences of the particular char or string.

Since the requirement is to replace a particular pattern with the first occurrence,

we have used REG_REPLACE function to address the above requirement.

o-by � fn��?� normal'>Scenario-2:

One of the client requirements is to normalize the data by replacing the incoming data from inconsistent

date format to normalized date format based upon the first occurrence.

If the requirement is to replace a particular char or string,

generally we will use Repalcechr or Replacestr functions.

But, Repalcechr or Replacestr functions will replace all the occurrences of the particular char or string.

Since the requirement is to replace a particular pattern with the first occurrence,

we have used REG_REPLACE function to address the above requirement.

REG_REPLACE (Value, ‘/’,’-’, 1)

1        à Specifies the number of occurrences you want to replace. If you omit this option,

REG_REPLACE will replace all occurrences of the character string.
Source

Target

Here the first occurrence of ‘/’ only has been replaced by ‘-‘.

The remaining occurrences were not replaced with ‘-‘.

The similar kind of implementation is not possible through Repalcechr or Replacestr functions.

REG_EXTRACT

Extracts sub patterns of a regular expression within an input value.

Syntax

REG_EXTRACT (subject, 'pattern', subPatternNum, match_from_start)

Scenario-3:

One of the client requirements is to extract the date in each record and to store the value in the target.

Since the requirement is to extract the particular string based upon the pattern ‘MM-YYYY’,

we can’t use the substr () function here.

Because, we can’t determine the exact start and end positions of the substring.

So, we have used REG_EXTRACT function to address the requirement.

Both subPatternNum, match_from_start mentioned in the syntax are optional.

REG_EXTRACT (Value,’ (\w+) (\d\d-\d\d\d\d)’, 2)

\w+ à  is any value that follows an alphanumeric character.

\d à  matches one instance of any digit from 0-9.

2 à Sub pattern number of the regular expression you want to match. Use the following

Guidelines to determine the subpattern number:

No value or 1. Extracts the first regular expression subpattern.
2. Extracts the second regular expression subpattern.
n. Extracts the nth regular expression subpattern.

Default is 1.

Source

Target


You might want to read these awesome related posts


Leave a Comment