Dependency diagrams

With Version 1.4 due by the end of December, we’ve just made our final implementation decision regarding dependency diagrams.

Over the past month, the team has produced literally hundreds of diagrams, testing fourteen different approaches to diagram generation.

The short list of implementation technologies came down to three Javascript libraries – Mermaid,VisJS and Cytoscape.

Mermaid’s Dagre layout algorithm produced the best technical appearance in diagrams, but its layout engine experienced severe difficulties when diagrams became more complex. Development is lagging, and is dependent on an unsupported core library.

VisJS was good, but it suffered from hierarchy layout problems, as well as labelling conflicts in diagrams. I think VisJS is more generally powerful, but fell back in our use case.

The best compromise solution was Cytoscape. It is in active (and enthusiastic) development, has a growing selection of layout algorithms, does the best job of laying out our diagrams, and supports features we need such as custom colours and tooltips.

Here’s a sample of what you can expect with the coming release of V1.4.

DependencyDiagram

Release 1.3.11

This release fixes a couple of minor errors, and makes some productivity improvements to existing functions.

Stage Column Documentation
If a column was manually added to a stage table after initial creation, the documentation entered at this point was not updated. This has been fixed.

First Column Calculation / Literal
If a calculated column, or literal, was moved to the first column position in a table, Ajilius would generate invalid transform SQL. This has been fixed.

We’ve made the following changes to existing functions.

Enable Transformations on New Columns
We made an early design decision that transformations were applied by changing a column. When adding a column, we did not know if it was capable of being transformed until after the addition. This means that if you wanted to re-add a previously deleted column, and upper-case it at the same time, that you had to go through two steps – add the column, then change it to add the transformation.
We have now modified the Add Column function to enable you to enter your choice of source-column, role, transform and calculation. Roles and transforms can only be applied to selected source-columns, not to calculations. If both a source-column and a calculation have been entered, the source-column selection will take precedence.

Enable Column Selection on Join Tables
You might recall a post, many months ago, titled Our Join Editor Sucks. Well, Join Editor V2 was a big improvement, but still had some workflow issues. When you joined a table, all columns from that table were copied into the join set. You could delete columns from the column management functions, but users (self included) did not like this interruption to the thought process that typically happens when joining tables.
The new Join Editor enables column selection at the point of join definition. A toggle supports select-all functionality on the list, or you may select individual columns. This is a small improvement when you want all columns from a joined table, but a huge leap forward in productivity when you only want one or two columns from a larger table.
You can still used the column management functions to add, change and delete individual columns from the join set, as well as sort the columns into your preferred sequence in the table.

Registered users will be notified by email of the download location. If you’re reading this and have not received your email, please contact support@ajilius.com.

PostgreSQL message encoding

We ran into an interesting error this week, where a job was ending with an encoding error on the completion message.

Our first thought was that the database encoding did not match the value in the metadata, but no, both were set to UTF-8.

This is an Italian customer, and after much head-scratching, we discovered that the configuration value for lc_messages was set to ‘Italian_Italy.1252’. Ajilius expects messages returned by the server to be in the same encoding as the data warehouse.

In this case, the DBMS had been installed with settings from the Italian locale on the server, but the data warehouse database had been created with UTF-8 encoding.

We will now recommend that servers are initialised with initdb using the preferred encoding.

In this case, the customer changed the value of lc_messages in postgresql.conf, and the problem was resolved.

http://www.postgresql.org/docs/9.4/static/locale.html

http://www.postgresql.org/docs/current/static/runtime-config-client.html#GUC-LC-MESSAGES

Release 1.3.10

This release is a quick maintenance update, with a minor enhancement to built-in transforms.

WHERE Test Error
The Test button on the WHERE clause editor may have triggered a fatal error screen. The clause was correctly written, but could not be tested. ELT jobs worked correctly. This error has been corrected.

We’ve made the following changes to existing functions.

New Transforms
We’ve added the following transforms to the Change Column screen:
– Cast to Integer
– Cast to Numeric

Registered users will be notified by email of the download location. If you’re reading this and have not received your email, please contact support@ajilius.com.

Release 1.3.09

The devs have been busy!

This release brings a bunch of new features to make it easier to re-run whole sections of the data warehouse during development and testing, as well as some functions to keep your development environment neat and tidy.

Load | Create All
Create, or drop and re-create, all Load tables.

Load | Process All
Execute the extract, push (for cloud DW), and load processes for all Load tables.

Stage | Create All
Create, or drop and re-create, all Stage tables.

Stage | Process All
Execute the transform processes for all Stage tables.

Dimension | Create All
Create, or drop and re-create, all Dimension tables.

Dimension | Process All
Execute the transform processes for all Dimension tables.

Fact | Create All
Create, or drop and re-create, all Fact tables.

Fact | Process All
Execute the transform processes for all Fact tables.

Warehouse | Clear Scripts
Erases the contents of the SCRIPTS directory.

Warehouse | Clear Extracts
Erases the contents of the EXTRACTS directory.

Warehouse | Drop Tables
This function lists all the tables in your data warehouse. You may check one or more tables, then press the DROP button to remove those tables from your DW database. Use this function with care, once dropped, tables can only be restored from backup.

We’ve also made one small change to the Load Tables list, to bring it into line with the other table displays.

Load | List Load Tables
The column role has been added to the table columns list at the right side of the screen.

There are no metadata changes in this release, and no known errors to be fixed.

Registered users will be notified by email of the download location. If you’re reading this and have not received your email, please contact support@ajilius.com.

Release 1.3.06

We’re now growing fast enough that we need to formalise the release update process.

Each time we publish an update, we’ll post the details of the update in this topic.

This release brings the following user changes:

  • Amazon RDS PostgreSQL as DW.
    We always supported RDS as a data source, now we support it as a first-class target platform as well. This means that you can use PostgreSQL on premise, in the cloud on IAAS, or in the cloud as PAAS. Three-click migration means that you quickly and easily migrate between these platforms, as well as Snowflake, Redshift and Azure SQL Data Warehouse.
  • New Oracle driver offering more simple setup.
    The previous Oracle driver was extremely complex to install, and caused more support issues than any other aspect of installation and configuration. We have now switched to a driver that does not depend on the Oracle client software, with much better results.
  • Optional command line credentials.
    ELT scripts use environment variables to pass credentials and paths. Some users reported difficulty setting environment variables in complex server environments, so we have added the option of command-line credentials to all jobs.
  • Deleting a table or column no longer deletes descendants.
    If a table or column is deleted, we now set the descendant columns to NULL. This leaves downstream tables intact, while enabling you to restructure earlier tables in the ELT flow.
  • A new Fast Map feature to quickly adjust table and column sourcing.
    If you have restructured precedent tables or columns, or changed your mind about the sourcing of a table, you can now quickly update the column sources through this function.
  • A new Fast Delete feature to rapidly delete large numbers of columns.
    You might only require a handful of columns from a set of joined tables. This feature makes it quick and easy to delete all the unwanted columns in one pass, rather than separately deleting each column.
  • Documentation enhancements.
    We have added more obvious references to column roles, transformations and calculations in the data warehouse documentation generated by Ajilius.

All users will receive an email with the new code and instructions. There are no database changes in this release, just copy the executable to your Ajilius directory

Is your metadata secure?

Are your databases at risk from your data warehouse automation product?

We’ve seen DWA where user credentials for production systems are held in clear text!

If you’re using a data warehouse automation product other than Ajilius, use a query tool to open your metadata database. For example, if your product uses SQL Server, use Management Studio to open the metadata. Now look at the metadata tables which store data warehouse and data source credentials. Can you read them? If you can, so can anyone who reads that database.

All Ajilius user passwords – that is, the passwords that you use to access Ajilius – are one-way hashed. That means you can’t read the password in our database, and we can’t reverse or recover your password, even if we have your database. Our hashing is based on the SHA256 algorithm for world-class data protection.

All credentials, to your data warehouses and your data sources, are fully encrypted. The user-id and password for each connection are encrypted using the AES256 algorithm, an incredibly strong algorithm used by many government security organisations.

Database query tools with authorised access to the metadata database only see a long string of characters like “4e3d3cc78416…” and not your real credentials.

Even if someone gets a physical copy of your metadata, and browses the disk file, they can’t read your database credentials.

Ajilius. We won’t risk your database security.

Roadmap update

We’ve just updated the Ajilius Roadmap to reflect the imminent release of Version 1.3.

The major features of this release are:

  • Cloud to Cloud Data Warehouse, with interoperable support for Snowflake Elastic DW, Amazon Redshift and Microsoft Azure SQL DW.
  • Upgraded documentation generation, with interactive lineage navigation on all elements.
  • Enhanced support for JSON files and REST services

You might note that the Roadmap shows Azure SQL DW coming in Version 1.4, due at the end of Q4. The code is working, but Microsoft currently has the platform in preview only. We’ll update our level of support once it is released to production.

We’ve also added two new datasheets:

You’ll find all this content on our Bookshelf page.

Delimited file formats

Delimited files, often referred to as “CSV” or text files, represent tables of data in a simple text format.

Rows are separated by newline characters, or by newline and carriage-return pairs.

Columns within rows are separated by delimiters. The delimiters supported by Ajilius are comma, tab, semi-colon and vertical bar (‘|’). All rows must contain all columns, although empty columns may be represented as consecutive delimiters.

Text fields containing delimiters must be enquoted using single (‘) or double (“) quote characters.

Where a text field contains a character used as a delimiter, that character must be preceded by an escape character, typically backslash (\). If the text field contains the escape character, it must be preceded by another escape character (ie, ‘\\’).

Delimited files must have column names contained in the first row of the file.

The file may be encoded in any format supported by Ajilius (link).

Detection of encoding, delimiters, quotation and escape characters is automatically performed on file access. You don’t need to specify anything other than the file name.

Ajilius. Flexible delimited file formats.

File Transcoding

Because Ajilius is a multi-lingual and cross-platform product, we sometimes get asked about our ability to handle specific file encodings.

XML, JSON and Delimited files may be any of the following:

  • ASCII
  • windows-1252
  • UTF-8 (with or without a BOM)
  • UTF-16 BE or LE (with a BOM)
  • UTF-32 BE, LE, 3412-ordered, or 2143-ordered (with a BOM)
  • Big5, GB2312/GB18030, EUC-TW, HZ-GB-2312, and ISO-2022-CN (Traditional and Simplified Chinese)
  • EUC-JP, SHIFT_JIS, and ISO-2022-JP (Japanese)
  • EUC-KR and ISO-2022-KR (Korean)
  • KOI8-R, MacCyrillic, IBM855, IBM866, ISO-8859-5, and windows-1251 (Russian)
  • ISO-8859-2 and windows-1250 (Hungarian)
  • ISO-8859-5 and windows-1251 (Bulgarian)
  • ISO-8859-7 and windows-1253 (Greek)
  • ISO-8859-8 and windows-1255 (Visual and Logical Hebrew)
  • TIS-620 (Thai)

Ajilius will transcode these file types to the encoding of the target DBMS, assuming that the target encoding is capable of supporting the characters in the input file.

Ajilius. We can read your writing.