New Transform: Persist Static

Ajilius 2.3.7 includes a new type of transform, labelled Persist Static.

The intent of this transform is to persist a table that will not be automatically reloaded.

Ajilius has a dependency-based scheduler. When we process a table, we recursively check and process its dependencies if required. Until now, that meant that ALL tables could potentially be reloaded.

We found this wasteful in cases where the table was relatively static. A business calendar, for example, might be updated once per year by the Finance department. A translation table aligning common data from two systems could be updated only when new products are added. In both examples, under the old process, these table would be reloaded every time a batch that referenced them was run.

Now, however, tables can with the type Persist Static will trim the dependency tree when included in a batch. The latest version of the table will be used, and its dependencies ignored.

A Persist Static table can still be reloaded from the command line. Assume that we have the following chain of tables, and that stage_source is Persist Static.

source.csv -> load_source -> stage_source

Any job that references stage_source as a dependency will use the latest contents of the table.

To reload this table, the following command might be used:

java -jar ajilius.jar -w MyWarehouse -t stage_source

That command will cause Ajilius to start processing with stage_source. It will be found to have a dependency of load_source. A load job will then be triggered to import data from source.csv. On completion of this job, the staging of data from load_source to stage_source will be performed. The table is then completely updated, and the job will terminate.

Ajilius. More effective loads and transforms.

 

Snowflake-Snowflake Data Loads

When we built the Snowflake adapter for Ajilius, around two years ago, we saw Snowflake as a Data Warehouse target.

We’re now seeing many Snowflake customers using its capabilities as a Data Lake as well as a Data Warehouse. Customers are landing their data in one Snowflake database, then migrating and structuring it for analysis and reporting.

We have responded to this shift by implementing cross-database queries for data ingestion by Ajilius.

Previously, customers could extract data from one Snowflake database, then reload it to another. If your Ajilius instance was running on-premise, this carried a significant overhead as the data was extracted, transferred to your Ajilius instance, transferred back to the Snowflake host, and reloaded.

Now, a new load type of “Cross-Database” can be selected for a table. When the table load job is executed, we create a three-part name query instead of an extract/load query.

Here’s how it works.

Create a data source in the normal way. Let’s use the SNOWFLAKE_SAMPLE_DATA that comes with a new account:

After refreshing metadata, we can see the tables and columns from the demonstration database. Now we can load the CALL_CENTER table:

Set the characteristics of the table. The Load Method will default to Bulk Import, which uses the Snowflake PUT/COPY method. Change this to Cross-Database Query:

Now, when you generate scripts for this table, you’ll see that a cross database query is used instead of a PUT/COPY. Note that some columns have been dropped in order to show the full query.

Of course, incremental loads are also supported for cross-database loads.

Ajilius. Faster loads for Snowflake-Snowflake.

 

SQL Server Integrated Authentication

From time to time we get a customer reporting an error like this:

The key to this error is the text “Faled to load the sqljdbc_auth.dll”

This file is required by the Microsoft SQL Server JDBC driver, in order to handle integrated authentication.

The file must be placed in the Java bin folder. Upgrades to Java can change the location of the Java in folder, by changing the version number higher up the path. This means that the file can no longer be found.

Here are some instructions to resolve this error if it occurs on your server.

Instructions

Download the SQL Server JDBC driver from https://www.microsoft.com/en-au/download/details.aspx?id=11774. Make sure you get the 64-bit version.

Install the driver to any location. It doesn’t matter where, it basically just unzips a file structure and you’ll only be taking one file from its directory.

Go to the folder to which you installed the driver, and find the following path:

\sqljdbc_6.0\enu\auth\x64

Copy the file sqljdbc_auth.dll from this location

Find the location of your Java Runtime Environment using the WHERE command:

where java

It should be something like this:

C:\Program Files\Java\jdk1.8.0_121\bin\java.exe

Copy the sqljdbc_auth.dll into the java folder. Given the result I’ve just shown, on my machine I would copy it to:

C:\Program Files\Java\jdk1.8.0_121\bin\sqljdbc_auth.dll

Now, when you restart the Ajilius application, integrated authentication should work on your machine.

Custom Code Injection

Ajilius has always supported a rich set of transformations and custom calculations to support complex ELT. Occasionally, however, we’ve had requests for custom code injection before or after ELT transformations. That request is now satisfied.

All Load and Transform jobs now support blocks of user-defined code at the beginning and end of the job.

Code injected into Load jobs is in the Groovy scripting language, while code injected into Transform jobs is pure SQL.

To add custom code to a job, select the Change menu item for the table being processed:

The table form now contains two new fields, named Custom Code – Pre-SQL and Custom Code – Post-SQL. These fields contain any legitimate SQL statements.

When you run the transform script for this table, the custom code will be injected into the appropriate place in the script:

Please note that custom code may not be supported across target data warehouse platforms, and may need to be reviewed and edited after migration.

Ajilius. Round-trip customisation.

 

Continuous Deployment with Ajilius

Many data warehouse automation products are good at single database development, but struggle when asked to cope with multiple environments such as Dev, Test, SIT, UAT and Production. When you’re striving to be Agile, you need the ability to handle CI / CD in any environment, in any state.

Release 3.2 of Ajilius brings full CI / CD to data warehouse automation.

Within the browser UI, we now support immediate generation of alignment scripts that preserve data across multiple generations of deployment. The following picture shows the new UI menu options for deployments, as well as an alignment script in the left-hand pane.

We’ve also extended the Ajilius command line to support scripted deployments across multiple environments. The following screen shot shows the initial deployment of a data warehouse to a new server.

We’ve then modified the structure of one table, and added a new table, through the browser UI. The second command line invocation alters the data warehouse to align with the new table structure.

But what happens when you have multiple environments, in multiple stages of deployment? Ajilius persists the current deployment state for each environment, and has IMPORT and EXPORT functions to reload that state before alignment. This means, for example, that your DEV, UAT and PROD environments can be progressively deployed in different states.

Because these features are implemented on the command line, as well as in the UI, Ajilius supports full automation of the CI / CD process. For example, our own internal CI / CD test server uses CRON to implement an hourly deployment, based on any changes to the metadata in the last hour. Of course, we fully integrate with products like GoCD, Control-M, etc. for enterprise-level automation.

The metadata changes that can trigger an alignment include:

  • Add table or column
  • Rename table or column
  • Change column data type
  • Change column role (ie, SCD2 to SCD0)
  • Change table distribution method
  • Change column sequence

And remember, every change preserves your existing data. We even fully re-key fact tables if an SCD2 dimension changes to any other type.

Ajilius. Enterprise maturity in data warehouse CI / CD.

DWA: Data Warehouse Abandonware?

Frequency of updates is a good indication of the on-going viability of a data warehouse automation platform.

From what we see and hear, too many competitors are good at getting their products released, but not so good at error correction and enhancement.

This came to mind while I was packaging our latest update – huge changes in incremental deployment – and looked at the following:

That’s the releases we’ve made during just the past four months.

  • Every update is cumulative, you don’t have to apply them all
  • Every update contains one or more new features, usually driven directly by customer requests
  • Most updates contain performance enhancements
  • Many updates contain code enhancements and corrections to errors reported by users

Here’s what a couple of weeks looks like in Mercurial:

All our release planning is done through a Trello board. At present it has a backlog of around 18 months for the current development team. New target platforms. New data sources. New transformations. New BI interfaces. New methodology support. New features we’re not yet ready to discuss. We’re going to continue, or even accelerate the rate of new feature deployment over the next  year.

The one thing that won’t change is our licence policy. We’ll continue to be a site-licensed product, with no lock-ins, including the guarantee that your data warehouse will continue to function even if you don’t renew your Ajilius licence, and we’ll never increase your annual renewal for as long as you use Ajilius.

Ajilius. USD 15k / site. The best value in data warehouse automation.

 

Documentation belongs …

… IN THE CODE.

I recently had to review some old ETL code where the developer was a firm believer in documentation. Unfortunately, his belief was that documentation belonged in Confluence.

Not that I’ve got anything against Confluence, it is a good product, but I got very tired and very grumpy about having to look up page after wiki page to figure out what was in his twisted little brain when the code was written.

Ajilius saves developers from having to search in unreasonable places for documentation. Not only do we capture both user and technical documentation for all artifacts, but we put technical document right were it belongs … IN THE CODE.

Here’s a screen shot of the technical documentation panel for a table:

… and here’s how it surfaces in code, neatly translated from HTML to text:

Ajilius. Better data warehouses … and better documentation.

Ajilius 3: Adaptive ELT Performance

The new Adaptive ELT feature in Ajilius 3 uses intelligent algorithm selection at runtime to optimise ELT performance.

Combined with the new SQL+Scripts architecture for code generation and execution, we’re seeing orders of magnitude improvement in some use cases.

Here’s a sample of batch output from the standard Ajilius demonstration:

The first two jobs – stage_location and dim_location – used to take around 0.06 and 0.08 seconds to run. Now they’re talking less than .001 seconds.

They’re trivially small tables, but we’re using them as an example because you can replicate the experience in an Ajilius demonstration.

Not every job will benefit to this extent, it depends on factors like the algorithm, and the number of rows in the source and target tables, but every job that processes different data on each run will be faster.

Ajilius. Faster.

 

Ajilius 3: SQL + Scripts

In Ajilius 1, we used Python as a scripting language. Ajilius 2 switched to Java and Groovy. We generated scripts that mixed the scripting language and SQL to emulate stored procedures.

We got pushback on this approach from a number of prospects, probably because we didn’t always do a good job of explaining the relationship between the scripting language and the SQL. Prospects saw the Groovy syntax, for example, and thought that they would have to learn a new language to build their data warehouse.

Looking back, we can see why.

Here is an Ajilius 2 script:

Here is the Ajilius 3 script for the same table:

As you can see, the SQL script in our new version is immediately familiar to a data warehouse developer.

This isn’t to say that we’re abandoning scripting, because Ajilius 3 introduces this feature as SQL + Scripts.

You’ll still have the option to export and package the SQL for execution as Groovy scripts, Stored Procedures (where supported), PowerShell scripts and more. The difference to earlier versions is that this is an option, not the standard.

An added bonus from this approach is that scripts run through the Ajilius scheduler no longer have to go through the overhead of instantiating the scripting engine and loading/compiling the script before execution. As a result, you get much faster execution of your transformation processes.

Ajilius. Listening to customer feedback to make us better.

 

Take the SAS language challenge

Can your data warehouse automation solution correctly handle international character sets?

I was reading an interesting blog post from SAS at the weekend. It shows how they visualise data containing character sets from many languages around the world.

http://blogs.sas.com/content/sastraining/2017/05/12/map-of-idioms-from-around-the-world/

The blog discuss loading and visualising data that looks like this:

Given that the data is contained in a spreadsheet, I thought it would be fun to see how that data was handled by Ajilius.

Create a data source that points to the spreadsheet:

Ingest the metadata from the spreadsheet:

Run the load script, and here is the result:

There it is. Fast, full-fidelity representation of the SAS data. If you’re building a data warehouse in Korea, or Armenia, or Israel, or any other country with a non-ASCII language, then Ajilius is your best solution for data warehouse automation.

Ajilius. Data warehouse automation for the global community.