Ajilius Release Numbers

We recently finished an RFP submission with a question that asked:

Describe your version numbering system, and how well your product keeps to it

I thought the answer might be useful to all customers.

Our version numbering system

The Ajilius release number is comprised of three elements – major version, minor version, and point version. For example, in version 3.2.1, the major version is 3, the minor version is 2, and the point version is 1. But what do the various types of version mean?

A major version represents an architectural change. For example, Python to Java was a major version. the CTAS engine was a major version.

A minor version represents a metadata change, or a module-level change. Metadata changes are where we alter the structure of the metadata repositories used by Ajilius. Module changes are where we introduce major blocks of functionality. Incremental deployments were a minor version, SSAS integration was a major version.

Point versions are where we make relatively minor changes to the application code. These may be error corrections, or enhancements that build on existing functions or metadata.

How well do we keep it?

When I look back over the Mercurial logs for Ajilius, I’d have to say that we’ve only been 60-70% compliant in our numbering scheme. We’re always correct with Major versions, always correct with minor versions, but we’ve often let small metadata changes slip through in point versions.

From now on, though, we’re going to be 100% compliant with this structure.

Why bother? Simpler upgrades.

Better adherence to our release structure will help us to simplify our upgrades. We’ll make it faster and easier for you to download and apply new code.

  • We’ll always offer a full installer. This will create a completely new installation.
  • We’ll always offer an upgrade installer. This will upgrade any earlier release to the current release, including metadata structure changes and library dependency updates.
  • Now, we’re going to simplify point releases, as just the Ajilius executable program. Point releases will comprise only ajilius.jar program, currently around 400k in size. This will be faster to download, and more simple to copy to the Ajilius folder.

So when you get release 3.3.2 later this week, you’ll know that it is a code-only change to Release 3.3, and makes no changes to metadata or libraries.

Ajilius. The best development cycle in data warehouse automation.

Ajilius 3.3.0

Our blog has been a little quiet, lately, as we’ve been concentrating our efforts on new development.

That work has culminated in the release of Ajilius 3.3.0.

Besides enhancements, here’s a list of all the new features we’ve delivered since our last blog post:

3.3.0

  • Dimension Hierarchies
  • Measure Aggregates
  • SSAS Tabular Model Generation
  • Cross-database Custom Query Loads

3.2.12

  • DB2 data source driver
  • Per-table lineage analytics

3.2.10

  • Fast, royalty-free Excel data source driver

3.2.8

  • Fast, royalty-free XML data source driver

Ajilius. No other Data Warehouse Automation platform delivers as much new value every month.

New Transform: Persist Static

Ajilius 3.2.7 includes a new type of transform, labelled Persist Static.

The intent of this transform is to persist a table that will not be automatically reloaded.

Ajilius has a dependency-based scheduler. When we process a table, we recursively check and process its dependencies if required. Until now, that meant that ALL tables could potentially be reloaded.

We found this wasteful in cases where the table was relatively static. A business calendar, for example, might be updated once per year by the Finance department. A translation table aligning common data from two systems could be updated only when new products are added. In both examples, under the old process, these table would be reloaded every time a batch that referenced them was run.

Now, however, tables can with the type Persist Static will trim the dependency tree when included in a batch. The latest version of the table will be used, and its dependencies ignored.

A Persist Static table can still be reloaded from the command line. Assume that we have the following chain of tables, and that stage_source is Persist Static.

source.csv -> load_source -> stage_source

Any job that references stage_source as a dependency will use the latest contents of the table.

To reload this table, the following command might be used:

java -jar ajilius.jar -w MyWarehouse -t stage_source

That command will cause Ajilius to start processing with stage_source. It will be found to have a dependency of load_source. A load job will then be triggered to import data from source.csv. On completion of this job, the staging of data from load_source to stage_source will be performed. The table is then completely updated, and the job will terminate.

Ajilius. More effective loads and transforms.

 

Snowflake-Snowflake Data Loads

When we built the Snowflake adapter for Ajilius, around two years ago, we saw Snowflake as a Data Warehouse target.

We’re now seeing many Snowflake customers using its capabilities as a Data Lake as well as a Data Warehouse. Customers are landing their data in one Snowflake database, then migrating and structuring it for analysis and reporting.

We have responded to this shift by implementing cross-database queries for data ingestion by Ajilius.

Previously, customers could extract data from one Snowflake database, then reload it to another. If your Ajilius instance was running on-premise, this carried a significant overhead as the data was extracted, transferred to your Ajilius instance, transferred back to the Snowflake host, and reloaded.

Now, a new load type of “Cross-Database” can be selected for a table. When the table load job is executed, we create a three-part name query instead of an extract/load query.

Here’s how it works.

Create a data source in the normal way. Let’s use the SNOWFLAKE_SAMPLE_DATA that comes with a new account:

After refreshing metadata, we can see the tables and columns from the demonstration database. Now we can load the CALL_CENTER table:

Set the characteristics of the table. The Load Method will default to Bulk Import, which uses the Snowflake PUT/COPY method. Change this to Cross-Database Query:

Now, when you generate scripts for this table, you’ll see that a cross database query is used instead of a PUT/COPY. Note that some columns have been dropped in order to show the full query.

Of course, incremental loads are also supported for cross-database loads.

Ajilius. Faster loads for Snowflake-Snowflake.

 

SQL Server Integrated Authentication

From time to time we get a customer reporting an error like this:

The key to this error is the text “Faled to load the sqljdbc_auth.dll”

This file is required by the Microsoft SQL Server JDBC driver, in order to handle integrated authentication.

The file must be placed in the Java bin folder. Upgrades to Java can change the location of the Java in folder, by changing the version number higher up the path. This means that the file can no longer be found.

Here are some instructions to resolve this error if it occurs on your server.

Instructions

Download the SQL Server JDBC driver from https://www.microsoft.com/en-au/download/details.aspx?id=11774. Make sure you get the 64-bit version.

Install the driver to any location. It doesn’t matter where, it basically just unzips a file structure and you’ll only be taking one file from its directory.

Go to the folder to which you installed the driver, and find the following path:

\sqljdbc_6.0\enu\auth\x64

Copy the file sqljdbc_auth.dll from this location

Find the location of your Java Runtime Environment using the WHERE command:

where java

It should be something like this:

C:\Program Files\Java\jdk1.8.0_121\bin\java.exe

Copy the sqljdbc_auth.dll into the java folder. Given the result I’ve just shown, on my machine I would copy it to:

C:\Program Files\Java\jdk1.8.0_121\bin\sqljdbc_auth.dll

Now, when you restart the Ajilius application, integrated authentication should work on your machine.

Custom Code Injection

Ajilius has always supported a rich set of transformations and custom calculations to support complex ELT. Occasionally, however, we’ve had requests for custom code injection before or after ELT transformations. That request is now satisfied.

All Load and Transform jobs now support blocks of user-defined code at the beginning and end of the job.

Code injected into Load jobs is in the Groovy scripting language, while code injected into Transform jobs is pure SQL.

To add custom code to a job, select the Change menu item for the table being processed:

The table form now contains two new fields, named Custom Code – Pre-SQL and Custom Code – Post-SQL. These fields contain any legitimate SQL statements.

When you run the transform script for this table, the custom code will be injected into the appropriate place in the script:

Please note that custom code may not be supported across target data warehouse platforms, and may need to be reviewed and edited after migration.

Ajilius. Round-trip customisation.

 

Continuous Deployment with Ajilius

Many data warehouse automation products are good at single database development, but struggle when asked to cope with multiple environments such as Dev, Test, SIT, UAT and Production. When you’re striving to be Agile, you need the ability to handle CI / CD in any environment, in any state.

Release 3.2 of Ajilius brings full CI / CD to data warehouse automation.

Within the browser UI, we now support immediate generation of alignment scripts that preserve data across multiple generations of deployment. The following picture shows the new UI menu options for deployments, as well as an alignment script in the left-hand pane.

We’ve also extended the Ajilius command line to support scripted deployments across multiple environments. The following screen shot shows the initial deployment of a data warehouse to a new server.

We’ve then modified the structure of one table, and added a new table, through the browser UI. The second command line invocation alters the data warehouse to align with the new table structure.

But what happens when you have multiple environments, in multiple stages of deployment? Ajilius persists the current deployment state for each environment, and has IMPORT and EXPORT functions to reload that state before alignment. This means, for example, that your DEV, UAT and PROD environments can be progressively deployed in different states.

Because these features are implemented on the command line, as well as in the UI, Ajilius supports full automation of the CI / CD process. For example, our own internal CI / CD test server uses CRON to implement an hourly deployment, based on any changes to the metadata in the last hour. Of course, we fully integrate with products like GoCD, Control-M, etc. for enterprise-level automation.

The metadata changes that can trigger an alignment include:

  • Add table or column
  • Rename table or column
  • Change column data type
  • Change column role (ie, SCD2 to SCD0)
  • Change table distribution method
  • Change column sequence

And remember, every change preserves your existing data. We even fully re-key fact tables if an SCD2 dimension changes to any other type.

Ajilius. Enterprise maturity in data warehouse CI / CD.

DWA: Data Warehouse Abandonware?

Frequency of updates is a good indication of the on-going viability of a data warehouse automation platform.

From what we see and hear, too many competitors are good at getting their products released, but not so good at error correction and enhancement.

This came to mind while I was packaging our latest update – huge changes in incremental deployment – and looked at the following:

That’s the releases we’ve made during just the past four months.

  • Every update is cumulative, you don’t have to apply them all
  • Every update contains one or more new features, usually driven directly by customer requests
  • Most updates contain performance enhancements
  • Many updates contain code enhancements and corrections to errors reported by users

Here’s what a couple of weeks looks like in Mercurial:

All our release planning is done through a Trello board. At present it has a backlog of around 18 months for the current development team. New target platforms. New data sources. New transformations. New BI interfaces. New methodology support. New features we’re not yet ready to discuss. We’re going to continue, or even accelerate the rate of new feature deployment over the next  year.

The one thing that won’t change is our licence policy. We’ll continue to be a site-licensed product, with no lock-ins, including the guarantee that your data warehouse will continue to function even if you don’t renew your Ajilius licence, and we’ll never increase your annual renewal for as long as you use Ajilius.

Ajilius. USD 15k / site. The best value in data warehouse automation.

 

Documentation belongs …

… IN THE CODE.

I recently had to review some old ETL code where the developer was a firm believer in documentation. Unfortunately, his belief was that documentation belonged in Confluence.

Not that I’ve got anything against Confluence, it is a good product, but I got very tired and very grumpy about having to look up page after wiki page to figure out what was in his twisted little brain when the code was written.

Ajilius saves developers from having to search in unreasonable places for documentation. Not only do we capture both user and technical documentation for all artifacts, but we put technical document right were it belongs … IN THE CODE.

Here’s a screen shot of the technical documentation panel for a table:

… and here’s how it surfaces in code, neatly translated from HTML to text:

Ajilius. Better data warehouses … and better documentation.

Ajilius 3: Adaptive ELT Performance

The new Adaptive ELT feature in Ajilius 3 uses intelligent algorithm selection at runtime to optimise ELT performance.

Combined with the new SQL+Scripts architecture for code generation and execution, we’re seeing orders of magnitude improvement in some use cases.

Here’s a sample of batch output from the standard Ajilius demonstration:

The first two jobs – stage_location and dim_location – used to take around 0.06 and 0.08 seconds to run. Now they’re talking less than .001 seconds.

They’re trivially small tables, but we’re using them as an example because you can replicate the experience in an Ajilius demonstration.

Not every job will benefit to this extent, it depends on factors like the algorithm, and the number of rows in the source and target tables, but every job that processes different data on each run will be faster.

Ajilius. Faster.