Enhancement: Reset Incremental Loads

We like to say that Ajilius is driven by our users, and here is another example.

Release 2.2.16, due Sunday, will include a feature to reset incremental loads, allowing a load to be rerun even after it succeeds. One driver for this feature is detection of a source system error that does not affect the running of a load, but does affect the data content of the load.

The “ajilius” schema of your data warehouse contains a table named “ajilius_control”. This table holds one row for each column for which incremental values are tracked. We have modified this table to include a “previous” and “latest” value.

On successfully processing a load, we update the “previous” value with the contents of the “latest” value (that was used to process the load), and over-write the “latest” value with the maximum column value from the load.

Now, you can select a menu option Schedule | Reset Controls:

reset1

On selecting this option, you will be shown a screen listing all tables for which incremental controls have been defined:

reset2

When you press the Reset button, the control values of any checked table will be reset to their previous state. This enables the incremental load to be repeated.

Ajilius. Tipping the hat to Steve 😉

 

 

Data Quality Reject Limits

Ajilius 2.2.16, due for release later this week, will include a feature to enable a reject threshold to be set for data quality screens.

As shown in the following screen, you can select a limit beyond which a job will be cancelled if it is exceeded by the number of rejects:

reject_limit1

If the limit is exceeded, an error message and exception will be generated. This is an example of how that appears during interactive testing of load scripts:

reject_limit2

When rejects occur that are less than the reject limit, the job will succeed, but a warning message will be placed in the processing log. Here is an example from a test batch:

reject_limit3

Ajilius. Fine tuning data quality.

Data Quality with Regular Expressions

Ajilius currently supports three types of data quality screens on data being loaded to the warehouse:

  • Data Type
  • Range/s
  • Regex

We’ve previously posted about type and range validation, but we recently had an enquiry about the use of Regular Expressions (regex) for data validation. Let’s build an example based on Postal Code validation.

The Person.Address table in AdventureWorks2014 contains a Postal Code column. Addresses are international, with many different formats of Postal Code. For the purposes of this demonstration, we are going to validate the column against Canadian Postal Codes. We’ll use a regular expression taken from the Regular Expressions Cookbook, 2nd. Edition

    ^(?!.*[DFIOQU])[A-VXY][0-9][A-Z] ?[0-9][A-Z][0-9]$

We’ll start from the point where we have imported the Person.Address metadata into our repository:

regex1

Click Change in the right-side panel, then modify the postal_code column. Scroll down to the Data Quality section, and make the following changes:

regex2

 

Save your changes, go back to the Load List, and select the Scripts option for the load_person_address table.

Notice the new section of script which has been generated. This is the validator that will be applied to this column at load time.

    rs = new AjiliusResultSet (rs)
    rs.setValidator('postal_code','text','regex','^(?!.*[DFIOQU])[A-VXY][0-9][A-Z] ?[0-9][A-Z][0-9]$','1')

Now run the script, and watch the results at the bottom left of the screen:

regex3

As you can see, only 1090 rows passed the validation. And when we view the contents of the table, we see that these rows do match the Canadian Postal Code format defined in our regular expression:

regex4

A word of caution, regex validation is slower than type and range checking. Without regex validation, that same table loaded in 0.749 seconds, and the difference was due entirely to the regex algorithm. If you have a choice, use range checking instead.

Ajilius. Better data screens.

 

Enhancement: Cached Metadata

Ajilius enables browsing of source system metadata, to identify and load data into the warehouse. Here is a typical display:

metadata01

A click on a table, on the left side, shows the columns for that table on the right side.

Until now, each update of this screen represented a round trip to the database. That was fine for a data source located close by the Ajilius server, but users in hybrid environments reported slow screen updates when pulling cloud metadata to on-premise Ajilius. This was particularly apparent with large Oracle systems, which add thousands of system table entries to a metadata set.

We’ve now added a feature to cache metadata within Ajilius. This means the delay of updating metadata happens only once, and subsequent interactions are as fast as an on-premise solution.

The context menu for a data source now contains an option to Refresh Metadata:

metadata02

When you select this option, you will be prompted with a screen warning that a metadata update might be slow. In this case, “slow” means a few seconds.

metadata03

On running this refresh, an internal metadata cache is added to your data warehouse metadata repository, and subsequent calls to browse metadata or load metadata into the warehouse will be drawn from this cache.

You may refresh the cache at any time.

An added bonus is that once your data source metadata is cached, you no longer need to be connected to the source whilst working with source metadata. That’s a great feature for companies with high risk data, and for people who like to take their work home with them 🙂

Ajilius. More flexible metadata.

 

Multi-generational upgrades

Lately we’ve found that our pace of delivery has outstripped the ability of some users to keep up with upgrades.

We have been expecting users to apply each upgrade as it is issued. In practice, that hasn’t always been possible. Emails might have been missed, user priorities might have been elsewhere, and it has sometimes led to a situation where support was needed to work through the issues of upgrading multiple generations at a time.

We also had a problem this week with a customer who restored a metadata repository from a backup that was several months old. This meant that their metadata was out of step with their current application and repositories.

We’re fixing that problem this week. Release 2.2.14 will bring a new method of versioning upgrades. The version of the repository will be recorded, and the upgrade patches will include multiple generations of upgrade history.

On applying a patch to Ajilius, any repository that is not at the current level will be upgraded, even if every repository is at a different release.

Only one patch will be required to bring your metadata up to the latest version, even if it is several generations old. Further, any metadata that has been restored from a backup will also be brought up to date, simply by rerunning the upgrade process.

Ajilius. Flexible Upgrades.