Introduction

The table-level logic in your data warehouse is implemented through a series of stage table transformations.

The types of transformation supported by Ajilius are:

  • Select / Join
  • Group By
  • Deduplicate
  • Unpivot
  • Flatten Hierarchy
  • Union
  • Union All
  • Persist Append
  • Persist Update
  • Persist Change
  • Persist Static

Select / Join

Select / Join transformations are the basic building block of the data warehouse. One or more tables may be joined together, and one or more columns from the joined data set selected for output.

Always define your business key (BK) columns on tables to be joined, or they may not show up in the join selection list.

Refer to the Joining Tables tutorial for more details.

Group By

A table may have selected columns aggregated using the Group By transformation.

Mark your business key (BK) columns on the destination table, and apply an aggregate transform (sum, min, avg, etc.) to the remaining columns.

From:               To:
C1  C2  C3          C1  C2  sum(c3)
----------          ---------------
A   B   1           A   B   3
A   B   2           B   C   7
B   C   3
B   C   4

Deduplicate

Duplicate rows may be removed from a table using the transformation. The query applies the “distinct” keyword to the select clause, ensuring that only unique rows are retained.

From:               To:
C1  C2  C3          C1  C2  C3
----------          ----------
A   B   C           A   B   C
B   C   D           B   C   D
B   C   D           D   E   F
D   E   F

Unpivot

An unpivot transform rotates a table of values from a repeating group of columns, to a normalised set of row values.

For example, an unpivot may transform data like this:

Budget (before):             Budget (after):
Product  Q1 Q2 Q3 Q4         Product  Period   Value
--------------------         -----------------------
P1       10 20 30 40         P1       Q1       10
                             P1       Q2       20
                             P1       Q3       30
                             P1       Q4       40

This transform requires that you use the Unpivot Key (UK), Unpivot Column (UC) and Unpivot Value (UV) column roles.

When you create the transform, new calculated columns will be added to your destination table to hold the Unpivot Key (Period, in this case) and the Unpivot Value (Value, in this example). These are new columns that will replace the columns to be unpivotted. Set their data types as you require.

Then, mark each of the columns to be transformed using the Unpivot Column role. In this example, columns Q1, Q2, Q3 and Q4 will be the columns to be unpivotted.

Flatten Hierarchy

The most complex transformation in Ajilius creates a flattened representation of a self-referencing hierarchy.

Employee:                    Employee (flattened 2 levels):
ID  Name  ManagerID          ID  L1Name  L2Name
-------------------          ------------------
01  Mary  (null)             01  Mary    Mary
02  Fred  01                 02  Fred    Mary
03  Lee   01                 03  Lee     Mary

When you create the new stage table, assign a maximum depth to the hierarchy. All columns from the source table, other than the business key/s, will be repeated for this number of levels in the flattened table.

Delete the columns that you don’t want in the flattened hierarchy.

In the column editor, mark one column as the Business Key (BK) and one column as the Parent Key (PAK). The PAK column is that which references the Business Key (BK) to identify the parent row. For example, in a table of employees, the column employee_id might be the business key, and manager_id (referencing employee_id) the Parent Key.

The generated transform will create a fully populated set of (depth) columns, based on the parent/child relationships.

Union / Union All

Multiple tables may be concatenated using the Union and Union All transformations.

A Union transform results in a set of unique rows, a Union All includes duplicate rows if present in the concatenated set.

Refer to the Union tutorial for more information.

Persist Append, Persist Update, Persist Change, Persist Static

Persistent staging tables maintain a persistent set of data across batch runs.

The difference between the types of persistent stage are:

Persist Append All rows in the source (FROM) table are appended to the persistent-staging table.
Persist Update New rows in the source table are appended to the persistent staging table, existing rows are over-written with the new values
Persist Change New rows in the source table are appended to the persistent staging table, existing rows have a new version appended to the table if they have changed.

Note: The hash value which drives persistent change is calculated on the LOAD table driving this stage. A persistent change table is best driven directly from a LOAD table. If you do introduce joins, unions, transformations, etc. between the load and persistent tables, think carefully about the effect of these transformations on the load hash.

Persist Static

Keep a static copy of the source table, which is only reloaded as directed by the user. More details:

http://ajilius.com/2017/09/new-transform-persist-static/

 

Business Key (BK) columns must be defined for Persist Update and Persist Change transformations.

A hash (HASH) column must have been loaded for a Persist Change table.

A current row (CUR) column must be defined for a Persist Change table.