Persistent Staging is typically used to keep data from source systems in its original state.
One common reason for doing this is to have a full set of source system data available to support future changes that you may wish to make to star schemas. Having the data already in the data warehouse makes it simple to recreate or modify the star schema at will.
There are two types of persistent staging supported by Ajilius:
Latest persistent staging maintains the last known state of every row in the table. Rows which are added to the source system will be added to the persistent staging table. Rows which are modified in the source system will be modified in the persistent staging table. Rows which are deleted in the source system will be retained, in their last known state, in the persistent staging table.
Historic persistent staging maintains the history of the changes to every row in the source system table. Like latest persistent staging, new rows are added as they are received from the source system. Unlike latest persistent staging, when a source system row is modified, the previous version is retained and the new version is added.
In this tutorial, we will walk through the creation of both Latest and Historic persistent staging tables. We will use the Genre table from the ChinookOLTP sample database as our source.
Latest Persistent Staging
To create a Latest Persistent Staging table from a Load table, you must have defined a business key on the Load table. This is because business keys are used to match rows when deciding to update or insert incoming data.
Here, we have loaded the ChinookOLTP.Genre table metadata into our warehouse repository, and we have defined the business key to be the genre_id column.
Next, choose the Create Stage option from the context menu dropdown for the load_genre table:
Enter the details for the new staging table, making sure to select the Persistent Staging Transform Type, and turn off the truncation of the table before loading:
After pressing the ADD button to save the new table definition, select the Scripts option from the context menu for the stage_genre_persist table:
Scroll down your script window, and notice the code that has been generated to maintain this table. We’re using SQL Server as our target in this tutorial, and the script looks like this:
If we create the table, then run our Transform twice in succession, we get the following results:
This is because the first Transform found no rows in the table, and all rows were inserted. The second run found that every row had a match in the table (we’re reloading the same data), and every row was updated.
Historic Persistent Staging
Slightly more complex is the creation of an Historic Persistent Staging table. This is because we need to add some additional metadata to the data we load, to manage change tracking and version dates.
Let’s start again, with our initial metadata for the load_genre table. This time, press the Change link to show the column editor:
Press the Add link in the action bar, displaying the following form to add a column. Firstly, we’re going to add a Load Date/Time column to the table, showing when we loaded each row. All you have to do is select the DateTime option from the Column Role dropdown, and Add the new column.
Repeat this step, but this time choose the Hash value for Column Role. A hash is a special value that uniquely represents the contents of the row. Change any value in the row, and we’ll get a new hash value. Ajilius dynamically calculates this hash as rows are loaded, and we’ll use it later for change detection.
After adding the Hash column, your column list should look like this:
Now, run your scripts to create and load the load_genre table. When these scripts have run successfully, you should see the following contents for this table. You can see that we’ve added two columns, named ajilius_datetime and ajilius_hash.
Now let’s transform the load_genre table into a persistent staging table using the same method we used in the previous section.
- Select the Create Stage option from the context menu for the load_genre table
- Set the following details for the new staging table:
- Table Name = stage_genre_persist
- Transform Type = Persistent Staging
- Truncate on Load = Off
- Add the table
When you have completed these actions, Choose the Scripts option from the context menu for the stage_genre_persist table. Scroll down, and see the changed code that has been generated because we added the hash and load time to the table:
Create the table, then run the Transform twice in succession. Here are the results you should see:
In contrast to the previous example, this time there were no rows updated in the second pass. This is because the contents of each row were the same, no new data had been received.
Let’s test this by manually modifying one of the source rows. Using a database editor, we’ll add the word “TEST” to the end of the last row description.
Now, when we reload the Load table, and run the Transform process for the Stage table, we see the following result when we view the data. We have turned off the current row flag (ajilius_current), and added a new row that reflects the latest state of the table.
Persistent Staging tables are often used to maintain a history of source system data, so that changes to a dimension model can be implemented with historic data in context.
In this tutorial we have covered two types of persistent staging table supported by Ajilius:
- Latest, which maintains only the latest version of the row associated with each business key.
- Historic, which maintains a copy of each row for every update.
If you have any questions, please contact us for support.