The Requirement: Converting rows to columns

Customer

Product

Cost

Cust1

P1

10

Cust1

P2

20

Cust1

P3

30

Cust2

ABC

10

Cust2

P2

25

Cust2

Def

10

Customer

Product1

Cost1

Product2

Cost2

Product3

Cost3

Cust1

P1

10

P2

20

P3

30

Cust2

ABC

10

P2

25

Def

10

The above illustration would help in understanding the requirement. We had to merge multiple records into one record based on certain criteria. The design had to be reusable since each dimension within the data mart required this flattening logic.

1. Approach:

The use of aggregator transformation would group the records by a key, but retrieval of the values for a particular column as individual columns is a challenge, hence designed a component ‘Flattener’ based on expression transformation.

Flattener is a reusable component, a mapplet that performs the function of flattening records.

Flattener consists of an Expression and a Filter transformation. The expression is used to club each incoming record based on certain logic. Decision to write the record to target is taken using the Filter transformation.

2. Design:

The mapplet can receive up to five inputs, of the following data types:

i_Col1 (string),  Customer

i_Col2 (string), Product

i_Col3 (decimal), Cost

i_Col4 (decimal) and

i_Col5 (date/time)

Have kept the names generic trying to accept different data types, so that the mapplet can be used in any scenario where there is a need for flattening records.

The mapplet gives out 15×5 sets of output, in the following manner:

o_F1_1 (string), Customer

o_F2_1 (string), Product1

o_F3_1 (decimal), Cost1

o_F4_1 (decimal) and

o_F5_1 (date/time)

o_F1_2 (string), Customer

o_F2_2 (string), Product2

o_F3_2 (decimal), Cost2

o_F4_2 (decimal) and

o_F5_2 (date/time)

… … and so on

The output record is going to have repetitive sets of 5 columns each (Each set would refer to one incoming row). Based on the requirement the number of occurrence of these sets can be increased. The required fields alone can be used / mapped. For the above example we use just 2 strings and one decimal for mapping Customer, Product and Cost.

The mapplet receives records from its parent mapping. The Expression would initially save each incoming value to a variable and compare it with its counterpart that came in earlier and is held in its cache as long as the condition to flatten is satisfied.

Syntax to store current and previous values:

i_Col2 string i

prv_Col2 string v curr_Col2

curr_Col2 string v i_Col2

The condition/logic to flatten records is parameterized and decided before mapping is called thus increasing codes’ scalability. The parameterized logic is passed to the Expression transformation via a Mapplet parameter. The value is used as an expression to perform the evaluation and the result is a flag value either ‘1’ or ‘2’.

Syntax for port – flag

Flag integer v $$Expr_compare

An example for parameterized expression

$$Expr_compare = iif (curr_Col1 = prv_Col1 AND curr_Col2 !=

prv_Col2, 1, iif (curr_Col1 != prv_Col1,2))

A variable port named “rec_count” is incremented, based on the flag.

Syntax for port – rec_count

rec_count integer v iif (flag=2,0, iif (flag=1,rec_count + 1,rec_count))

The expression transformation now uses the value in ports “flag” and “rec_count” to decide the place holder for each incoming input value, i.e. the column in target table where this data would move into ultimately. This process is an iterative one and goes on until the comparison logic ($$Expr_compare) holds good, i.e. until all records get flattened per the logic. An example of the place holder expression is shown below:

v_Field1 data type v iif(flag=2 AND rec_count=0,curr_Col1, v_Field1)

Port “write_flag_1” is set to 1 when the comparison logic fails (meaning flattening is complete). Filter transformation filters out the record once it is completely transposed.

Filter condition:

write_flag_1 integer v iif (flag=2 AND write_flag>1 ,1,0)

3. Outcome:

After developing the code and implementing the same we found it to be a useful utility, so thought of sharing it and would like to hear suggestions from readers on performing the same functionality in a different way. Please do share your views.

Posted by Ganesan Thiyagarajan
Comments (7)
September 14th, 2009

Comments (7)

nikhil - January 14th, 2011

raj its very simpl you take one normalizer transformation in which you select normalizer there you create two port one for month and another one for numeric value and in month port you give occur=4 and save mapping. try it you may get your desire output.

raj - October 24th, 2010

Converting Rows to columns I have Relational source like his. JAN FEB MAR APR 100 200 300 400 500 600 700 800 900 100 200 300 I need to convert these rows into columns to the targe. MONTH TOTAL JAN 1500 FEB 900 MAR 1200 APR 1500 Please experts help me

sachin - March 23rd, 2010

srinivas could u please let me know about the target structure.since i am not able to flatten the records.

D.Fial - December 3rd, 2009

Can you tell me how to transpose row to columns: so very simple: 1 2 3 4 2 4 5 6 5 4 3 1 into 1 2 5 2 4 4 3 5 3 4 6 1 Thank you very mutch

srinivas o - November 17th, 2009

You can also may be concat all values for a group with comma seprated and pump into a flat file and read that flat file in the next step and load into target database. This will make logic easy, But you read twice and but as you are loading into flat file and then loaing 1-1, should't be a big bottleneck.

srinivas o - November 17th, 2009

Agree with dilip, Infact Informatica Knowledge base has example also.

Dilip - October 27th, 2009

I think this objective could be also achieved using the first funtion in the aggregrator.

Comments are closed.