Ajilius makes it easy to protect the quality of data being loaded to the data warehouse. We do this by the implementation of data quality screens on all load tables.

When extracting and loading data, Ajilius will validate that the data conforms to screen rules you define, and reject rows which do not conform, optionally terminating the batch when a pre-defined reject limit has been exceeded.

Data quality screens are implemented using the Data Quality Control block of the Column Edit screen:

Three types of data quality screen are supported, by making a selection from the DQ Validation dropdown:

  • Data Type
  • Range/s
  • Regex

Data Type

Data Type validation checks the incoming data value to be sure it can be converted to the data type that has been assigned to the column. For example, if you have assigned the “integer” data type to a column, a value of “ABC” loaded from a delimited file for that column would cause validation to fail.

Data Type validation is automatically performed as a precursor to Range and Regex validation.

Range/s

Range validation checks the incoming data value to ensure that it is between a minimum and maximum value.

A range is expressed as:

from:to

Leading and trailing spaces around the from and to values will be trimmed before comparison.

Multiple ranges may be separated by semi-colons:

from1 : to1 ; from2 : to2 ; from3 : to3

Validation will pass if the incoming value is within any of the specified ranges.

Ranges may also operate as lists. Single values may be entered, and mixed with ranges:

VIC; NSW; QLD; SA; NT; WA; TAS; ACT

1; 2; 11-19; 100-1000

Regular Expressions

Regex validation tests a regular expression against the string representation of the incoming data value.

A regular expression may be entered in the DQ Expression text box.

The regular expression must return a true/false value. True means that the incoming value will be accepted, false means it will be rejected.

A worked example of regular expression validation may be found in the following blog post:

Data Quality with Regular Expressions

 

On saving a column, the presence of validation rules will be highlighted in the column list:

The handling of rejects can be set through the Change Table screen for a load table:

The job will trigger an error if more than the limit number of rows have failed validation, and been rejected.

Rejected rows will be written to a file named <table>_rejects.csv. This file will be created in the Extract Directory folder from the Data Warehouse screen:

The reject file will contain the columns of the incoming data file, plus one additional column holding the reason for which the row was rejected.