New feature: Merge Transform

We’ve just finished a new feature to combine multiple streams of data within the data warehouse. Known as a “Merge Append”, this feature merges (appends) the contents of one table into a target table, with user-defined mappings between columns.

We recently delivered this feature for customers with multiple, similar, sources of data; and now we’re making it publicly available. Examples of use cases have been collectives of independent schools, financial aggregators, and re-insurance organisations.

Our content editor has done a great job on a tutorial for this feature, setting a standard we’ll be applying to all features from now on.

http://172.104.56.212/merge-table-tutorial/

Let us know what you think.

Ajilius. Committed to delivering value.

Integrated Upgrades

One of our principles in making enhancements to Ajilius is drawn from a quote attributed to Albert Einstein:

Everything should be made as simple as possible, but no simpler.

Our new Java version made the installation process much simpler, and the patch process as well. Now, the full release of 2.2 simplifies the process of upgrades to our metadata databases.

When we need to update the metadata databases, we’ll send you a file to be placed in the Ajilius program folder. Simply restart Ajilius, the update will be applied, and your application will continue as normal.

Ajilius. As simple as possible.

Note:

Unlike some competitor products, Ajilius does not “phone home”. We do not automatically download any files, and send no information back from your installation. Your metadata and credentials are secure within your site at all times.

Running Totals

A customer asked if it was possible to compute a table of cumulative transaction values by day. There were business reasons why this could not be left to a BI tool, so we worked out the following method.

  1. Create a persistent stage table T1 to hold the values to be accumulated. Completely anonymised, these values were TX_DATE, DIM1, DIM2, QUANTITY.
  2. From that persistent table, create a GROUP BY stage table T2 with the columns DIM1, DIM2 and SUM(QUANTITY). Discard the TX_DATE column.
  3. Add a derived column, a date named AT_DATE, with the transform CURRENT DATE.

The last step – a derived column – was the key to the solution.

We made one change to validation rules to enable this to work, and we are now able to meet the customer’s requirement.

Version 2.1 – Data Quality Automation

The hero feature of Ajilius 2.1 is Data Quality Automation.

This is yet another unique feature brought to data warehouse automation by Ajilius.

In the 2.1 release, Ajilius adds three types of data quality screens to the extract process:

  • Data type validation, where values are tested for conformance to the column data type.
  • Range validation, where values are tested for set and range boundaries.
  • Regex validation, where values are tested against regex regular expressions.

In Version 2.3 (due September 2016) we will be adding Lookup validation to data quality rules, to check the existence of values in data warehouse tables.

Rows breaking validation are logged to an error file, along with the reason/s for row rejection.

A new return code from the extract job signals that validation errors have occurred, enabling the scheduler to choose to continue or suspend the batch pending user remediation of the errors.

And once again, we’re adding this as a standard feature of the Ajilius platform. If you’re licensed for Ajilius, upgrade to the latest version and you can immediately identify and screen data quality problems before they hit your data warehouse.

Ajilius. Committed to innovation in data warehouse automation.

Version 2.1 – Pivotal Greenplum

greenplumAjilius is pleased to announce full support for Pivotal Greenplum in Version 2.1, available now.

Greenplum is an open source MPP data warehouse, available on-premise and cloud. Based on an earlier version of PostgreSQL, Greenplum will shortly be upgraded to the latest PostgreSQL code base for even faster loads and transformations.

The advantage of Greenplum is that, being open source, it gives anyone the opportunity to use a real MPP data warehouse platform. All you need is hardware, or cloud, with realisable savings of hundreds of thousands of dollars over commercial offerings.

You now have a great, free, scalability path when your workload or data grows to exceed PostgreSQL capabilities. With the unique “3 click migration” feature of Ajilius, you can move your entire data warehouse at any time with just a few clicks of the mouse.

Ajilius. Keeping the value in data warehouse automation.

Data quality performance

inspectionWe’re introducing data quality screens in V2.1, to be released at the pgDayAsia conference in March. In this release, data quality screens implement data type, range and expression testing on selected tables and columns.

The last week has been spent running performance tests, to identify the overhead added by these screens.

On average, we currently process 500,000 values per second.

For example, if your daily fact extract has 10 million rows, and each row comprises 20 columns, you have a total of 200 million values to screen. At 500,000 values per second, a full data quality screen of every column would add around six minutes to your batch.

To put that number in scale, 10 million is roughly the number of bets placed at the TAB on Melbourne Cup day. Or around the number of sales transactions done by a major department store chain in one week. Or the total number of motor vehicles sold in one year, world-wide, by Toyota Motor Company. In other words, 10 million rows is a LOT of data, and we’re going to completely validate it with full data screening, guaranteeing that no bad data is loaded from every row and column in your extract, in less than 10 minutes.

Wow!

And even though I wrote “Wow!” about that validation, we are working to make it even faster by release day.

Ajilius. Putting the quality in data warehouse automation.

MySQL, MariaDB and Aurora

This afternoon I signed off the enhanced data source adapters for MySQL, MariaDB and Amazon’s new Aurora database.

These adapters are compatible with both on-premise and cloud-hosted databases, with full Unicode support.

We’re fully tested against the Employee, Sakila and Classic Models sample databases, so now it is time to see some customer databases being loaded.

That’s another big step on the path to Version 1.4, only the enhanced DB2 adapter remains in the queue.

Is your metadata secure?

Are your databases at risk from your data warehouse automation product?

We’ve seen DWA where user credentials for production systems are held in clear text!

If you’re using a data warehouse automation product other than Ajilius, use a query tool to open your metadata database. For example, if your product uses SQL Server, use Management Studio to open the metadata. Now look at the metadata tables which store data warehouse and data source credentials. Can you read them? If you can, so can anyone who reads that database.

All Ajilius user passwords – that is, the passwords that you use to access Ajilius – are one-way hashed. That means you can’t read the password in our database, and we can’t reverse or recover your password, even if we have your database. Our hashing is based on the SHA256 algorithm for world-class data protection.

All credentials, to your data warehouses and your data sources, are fully encrypted. The user-id and password for each connection are encrypted using the AES256 algorithm, an incredibly strong algorithm used by many government security organisations.

Database query tools with authorised access to the metadata database only see a long string of characters like “4e3d3cc78416…” and not your real credentials.

Even if someone gets a physical copy of your metadata, and browses the disk file, they can’t read your database credentials.

Ajilius. We won’t risk your database security.

Delimited file formats

Delimited files, often referred to as “CSV” or text files, represent tables of data in a simple text format.

Rows are separated by newline characters, or by newline and carriage-return pairs.

Columns within rows are separated by delimiters. The delimiters supported by Ajilius are comma, tab, semi-colon and vertical bar (‘|’). All rows must contain all columns, although empty columns may be represented as consecutive delimiters.

Text fields containing delimiters must be enquoted using single (‘) or double (“) quote characters.

Where a text field contains a character used as a delimiter, that character must be preceded by an escape character, typically backslash (\). If the text field contains the escape character, it must be preceded by another escape character (ie, ‘\\’).

Delimited files must have column names contained in the first row of the file.

The file may be encoded in any format supported by Ajilius (link).

Detection of encoding, delimiters, quotation and escape characters is automatically performed on file access. You don’t need to specify anything other than the file name.

Ajilius. Flexible delimited file formats.

File Transcoding

Because Ajilius is a multi-lingual and cross-platform product, we sometimes get asked about our ability to handle specific file encodings.

XML, JSON and Delimited files may be any of the following:

  • ASCII
  • windows-1252
  • UTF-8 (with or without a BOM)
  • UTF-16 BE or LE (with a BOM)
  • UTF-32 BE, LE, 3412-ordered, or 2143-ordered (with a BOM)
  • Big5, GB2312/GB18030, EUC-TW, HZ-GB-2312, and ISO-2022-CN (Traditional and Simplified Chinese)
  • EUC-JP, SHIFT_JIS, and ISO-2022-JP (Japanese)
  • EUC-KR and ISO-2022-KR (Korean)
  • KOI8-R, MacCyrillic, IBM855, IBM866, ISO-8859-5, and windows-1251 (Russian)
  • ISO-8859-2 and windows-1250 (Hungarian)
  • ISO-8859-5 and windows-1251 (Bulgarian)
  • ISO-8859-7 and windows-1253 (Greek)
  • ISO-8859-8 and windows-1255 (Visual and Logical Hebrew)
  • TIS-620 (Thai)

Ajilius will transcode these file types to the encoding of the target DBMS, assuming that the target encoding is capable of supporting the characters in the input file.

Ajilius. We can read your writing.