Running Totals

A customer asked if it was possible to compute a table of cumulative transaction values by day. There were business reasons why this could not be left to a BI tool, so we worked out the following method.

  1. Create a persistent stage table T1 to hold the values to be accumulated. Completely anonymised, these values were TX_DATE, DIM1, DIM2, QUANTITY.
  2. From that persistent table, create a GROUP BY stage table T2 with the columns DIM1, DIM2 and SUM(QUANTITY). Discard the TX_DATE column.
  3. Add a derived column, a date named AT_DATE, with the transform CURRENT DATE.

The last step – a derived column – was the key to the solution.

We made one change to validation rules to enable this to work, and we are now able to meet the customer’s requirement.

Version 2.1 – Data Quality Automation

The hero feature of Ajilius 2.1 is Data Quality Automation.

This is yet another unique feature brought to data warehouse automation by Ajilius.

In the 2.1 release, Ajilius adds three types of data quality screens to the extract process:

  • Data type validation, where values are tested for conformance to the column data type.
  • Range validation, where values are tested for set and range boundaries.
  • Regex validation, where values are tested against regex regular expressions.

In Version 2.3 (due September 2016) we will be adding Lookup validation to data quality rules, to check the existence of values in data warehouse tables.

Rows breaking validation are logged to an error file, along with the reason/s for row rejection.

A new return code from the extract job signals that validation errors have occurred, enabling the scheduler to choose to continue or suspend the batch pending user remediation of the errors.

And once again, we’re adding this as a standard feature of the Ajilius platform. If you’re licensed for Ajilius, upgrade to the latest version and you can immediately identify and screen data quality problems before they hit your data warehouse.

Ajilius. Committed to innovation in data warehouse automation.

Version 2.1 – Pivotal Greenplum

greenplumAjilius is pleased to announce full support for Pivotal Greenplum in Version 2.1, available now.

Greenplum is an open source MPP data warehouse, available on-premise and cloud. Based on an earlier version of PostgreSQL, Greenplum will shortly be upgraded to the latest PostgreSQL code base for even faster loads and transformations.

The advantage of Greenplum is that, being open source, it gives anyone the opportunity to use a real MPP data warehouse platform. All you need is hardware, or cloud, with realisable savings of hundreds of thousands of dollars over commercial offerings.

You now have a great, free, scalability path when your workload or data grows to exceed PostgreSQL capabilities. With the unique “3 click migration” feature of Ajilius, you can move your entire data warehouse at any time with just a few clicks of the mouse.

Ajilius. Keeping the value in data warehouse automation.

Data quality performance

inspectionWe’re introducing data quality screens in V2.1, to be released at the pgDayAsia conference in March. In this release, data quality screens implement data type, range and expression testing on selected tables and columns.

The last week has been spent running performance tests, to identify the overhead added by these screens.

On average, we currently process 500,000 values per second.

For example, if your daily fact extract has 10 million rows, and each row comprises 20 columns, you have a total of 200 million values to screen. At 500,000 values per second, a full data quality screen of every column would add around six minutes to your batch.

To put that number in scale, 10 million is roughly the number of bets placed at the TAB on Melbourne Cup day. Or around the number of sales transactions done by a major department store chain in one week. Or the total number of motor vehicles sold in one year, world-wide, by Toyota Motor Company. In other words, 10 million rows is a LOT of data, and we’re going to completely validate it with full data screening, guaranteeing that no bad data is loaded from every row and column in your extract, in less than 10 minutes.

Wow!

And even though I wrote “Wow!” about that validation, we are working to make it even faster by release day.

Ajilius. Putting the quality in data warehouse automation.

MySQL, MariaDB and Aurora

This afternoon I signed off the enhanced data source adapters for MySQL, MariaDB and Amazon’s new Aurora database.

These adapters are compatible with both on-premise and cloud-hosted databases, with full Unicode support.

We’re fully tested against the Employee, Sakila and Classic Models sample databases, so now it is time to see some customer databases being loaded.

That’s another big step on the path to Version 1.4, only the enhanced DB2 adapter remains in the queue.

Is your metadata secure?

Are your databases at risk from your data warehouse automation product?

We’ve seen DWA where user credentials for production systems are held in clear text!

If you’re using a data warehouse automation product other than Ajilius, use a query tool to open your metadata database. For example, if your product uses SQL Server, use Management Studio to open the metadata. Now look at the metadata tables which store data warehouse and data source credentials. Can you read them? If you can, so can anyone who reads that database.

All Ajilius user passwords – that is, the passwords that you use to access Ajilius – are one-way hashed. That means you can’t read the password in our database, and we can’t reverse or recover your password, even if we have your database. Our hashing is based on the SHA256 algorithm for world-class data protection.

All credentials, to your data warehouses and your data sources, are fully encrypted. The user-id and password for each connection are encrypted using the AES256 algorithm, an incredibly strong algorithm used by many government security organisations.

Database query tools with authorised access to the metadata database only see a long string of characters like “4e3d3cc78416…” and not your real credentials.

Even if someone gets a physical copy of your metadata, and browses the disk file, they can’t read your database credentials.

Ajilius. We won’t risk your database security.

Delimited file formats

Delimited files, often referred to as “CSV” or text files, represent tables of data in a simple text format.

Rows are separated by newline characters, or by newline and carriage-return pairs.

Columns within rows are separated by delimiters. The delimiters supported by Ajilius are comma, tab, semi-colon and vertical bar (‘|’). All rows must contain all columns, although empty columns may be represented as consecutive delimiters.

Text fields containing delimiters must be enquoted using single (‘) or double (“) quote characters.

Where a text field contains a character used as a delimiter, that character must be preceded by an escape character, typically backslash (\). If the text field contains the escape character, it must be preceded by another escape character (ie, ‘\\’).

Delimited files must have column names contained in the first row of the file.

The file may be encoded in any format supported by Ajilius (link).

Detection of encoding, delimiters, quotation and escape characters is automatically performed on file access. You don’t need to specify anything other than the file name.

Ajilius. Flexible delimited file formats.

File Transcoding

Because Ajilius is a multi-lingual and cross-platform product, we sometimes get asked about our ability to handle specific file encodings.

XML, JSON and Delimited files may be any of the following:

  • ASCII
  • windows-1252
  • UTF-8 (with or without a BOM)
  • UTF-16 BE or LE (with a BOM)
  • UTF-32 BE, LE, 3412-ordered, or 2143-ordered (with a BOM)
  • Big5, GB2312/GB18030, EUC-TW, HZ-GB-2312, and ISO-2022-CN (Traditional and Simplified Chinese)
  • EUC-JP, SHIFT_JIS, and ISO-2022-JP (Japanese)
  • EUC-KR and ISO-2022-KR (Korean)
  • KOI8-R, MacCyrillic, IBM855, IBM866, ISO-8859-5, and windows-1251 (Russian)
  • ISO-8859-2 and windows-1250 (Hungarian)
  • ISO-8859-5 and windows-1251 (Bulgarian)
  • ISO-8859-7 and windows-1253 (Greek)
  • ISO-8859-8 and windows-1255 (Visual and Logical Hebrew)
  • TIS-620 (Thai)

Ajilius will transcode these file types to the encoding of the target DBMS, assuming that the target encoding is capable of supporting the characters in the input file.

Ajilius. We can read your writing.

New feature: Separate extract / load

We’ve now separated Extract and Load processes in Ajilius.

Previous versions coupled the extract and load processes for a table. To process the LOAD_PRODUCT table, for example, we would connect to the source database, extract the required rows, and load those rows to the warehouse, all within the same process.

We faced a situation recently where the largest extract for a warehouse came from a system that finished its end-of-day processing four hours ahead of the window allocated for the data warehouse load. The extract from this system was the longest-running task in the ELT process.

By separating the extract and load processes, we are now able to schedule the extract from this system (and others) to complete as early as possible, with the load to the warehouse occurring at a later time.

The data warehouse load window is made significantly smaller, giving the operations team more headroom if any errors occur during end-of-day processing in upstream systems.

You can, of course, continue to run extract and load processes at the same time if you prefer.

Ajilius. Making ELT fast and flexible.

New feature: Warehouse Script All

We’ve just released a new feature for Ajilius, a one-click generation of all scripts for a warehouse.

In previous releases you ran each script manually. This was a little tedious if you wanted to run an entire load process from within the Ajilius application.

Now, you can select the Script All option from the Warehouse menu, and a full set of ELT scripts will be generated from the session warehouse metadata.

You will be shown a script screen with the full warehouse DDL in the left pane, and a script to run all ELT processes in the right pane. Use the Create and Run buttons to perform an end-to-end build of your data warehouse.

Ajilius. Eliminating repetition in Data Warehouse Automation.