Ajilius 3.3.0

Our blog has been a little quiet, lately, as we’ve been concentrating our efforts on new development.

That work has culminated in the release of Ajilius 3.3.0.

Besides enhancements, here’s a list of all the new features we’ve delivered since our last blog post:

3.3.0

  • Dimension Hierarchies
  • Measure Aggregates
  • SSAS Tabular Model Generation
  • Cross-database Custom Query Loads

3.2.12

  • DB2 data source driver
  • Per-table lineage analytics

3.2.10

  • Fast, royalty-free Excel data source driver

3.2.8

  • Fast, royalty-free XML data source driver

Ajilius. No other Data Warehouse Automation platform delivers as much new value every month.

New Transform: Persist Static

Ajilius 3.2.7 includes a new type of transform, labelled Persist Static.

The intent of this transform is to persist a table that will not be automatically reloaded.

Ajilius has a dependency-based scheduler. When we process a table, we recursively check and process its dependencies if required. Until now, that meant that ALL tables could potentially be reloaded.

We found this wasteful in cases where the table was relatively static. A business calendar, for example, might be updated once per year by the Finance department. A translation table aligning common data from two systems could be updated only when new products are added. In both examples, under the old process, these table would be reloaded every time a batch that referenced them was run.

Now, however, tables can with the type Persist Static will trim the dependency tree when included in a batch. The latest version of the table will be used, and its dependencies ignored.

A Persist Static table can still be reloaded from the command line. Assume that we have the following chain of tables, and that stage_source is Persist Static.

source.csv -> load_source -> stage_source

Any job that references stage_source as a dependency will use the latest contents of the table.

To reload this table, the following command might be used:

java -jar ajilius.jar -w MyWarehouse -t stage_source

That command will cause Ajilius to start processing with stage_source. It will be found to have a dependency of load_source. A load job will then be triggered to import data from source.csv. On completion of this job, the staging of data from load_source to stage_source will be performed. The table is then completely updated, and the job will terminate.

Ajilius. More effective loads and transforms.

 

Snowflake-Snowflake Data Loads

When we built the Snowflake adapter for Ajilius, around two years ago, we saw Snowflake as a Data Warehouse target.

We’re now seeing many Snowflake customers using its capabilities as a Data Lake as well as a Data Warehouse. Customers are landing their data in one Snowflake database, then migrating and structuring it for analysis and reporting.

We have responded to this shift by implementing cross-database queries for data ingestion by Ajilius.

Previously, customers could extract data from one Snowflake database, then reload it to another. If your Ajilius instance was running on-premise, this carried a significant overhead as the data was extracted, transferred to your Ajilius instance, transferred back to the Snowflake host, and reloaded.

Now, a new load type of “Cross-Database” can be selected for a table. When the table load job is executed, we create a three-part name query instead of an extract/load query.

Here’s how it works.

Create a data source in the normal way. Let’s use the SNOWFLAKE_SAMPLE_DATA that comes with a new account:

After refreshing metadata, we can see the tables and columns from the demonstration database. Now we can load the CALL_CENTER table:

Set the characteristics of the table. The Load Method will default to Bulk Import, which uses the Snowflake PUT/COPY method. Change this to Cross-Database Query:

Now, when you generate scripts for this table, you’ll see that a cross database query is used instead of a PUT/COPY. Note that some columns have been dropped in order to show the full query.

Of course, incremental loads are also supported for cross-database loads.

Ajilius. Faster loads for Snowflake-Snowflake.

 

Custom Code Injection

Ajilius has always supported a rich set of transformations and custom calculations to support complex ELT. Occasionally, however, we’ve had requests for custom code injection before or after ELT transformations. That request is now satisfied.

All Load and Transform jobs now support blocks of user-defined code at the beginning and end of the job.

Code injected into Load jobs is in the Groovy scripting language, while code injected into Transform jobs is pure SQL.

To add custom code to a job, select the Change menu item for the table being processed:

The table form now contains two new fields, named Custom Code – Pre-SQL and Custom Code – Post-SQL. These fields contain any legitimate SQL statements.

When you run the transform script for this table, the custom code will be injected into the appropriate place in the script:

Please note that custom code may not be supported across target data warehouse platforms, and may need to be reviewed and edited after migration.

Ajilius. Round-trip customisation.

 

Documentation belongs …

… IN THE CODE.

I recently had to review some old ETL code where the developer was a firm believer in documentation. Unfortunately, his belief was that documentation belonged in Confluence.

Not that I’ve got anything against Confluence, it is a good product, but I got very tired and very grumpy about having to look up page after wiki page to figure out what was in his twisted little brain when the code was written.

Ajilius saves developers from having to search in unreasonable places for documentation. Not only do we capture both user and technical documentation for all artifacts, but we put technical document right were it belongs … IN THE CODE.

Here’s a screen shot of the technical documentation panel for a table:

… and here’s how it surfaces in code, neatly translated from HTML to text:

Ajilius. Better data warehouses … and better documentation.

Ajilius 3: SQL + Scripts

In Ajilius 1, we used Python as a scripting language. Ajilius 2 switched to Java and Groovy. We generated scripts that mixed the scripting language and SQL to emulate stored procedures.

We got pushback on this approach from a number of prospects, probably because we didn’t always do a good job of explaining the relationship between the scripting language and the SQL. Prospects saw the Groovy syntax, for example, and thought that they would have to learn a new language to build their data warehouse.

Looking back, we can see why.

Here is an Ajilius 2 script:

Here is the Ajilius 3 script for the same table:

As you can see, the SQL script in our new version is immediately familiar to a data warehouse developer.

This isn’t to say that we’re abandoning scripting, because Ajilius 3 introduces this feature as SQL + Scripts.

You’ll still have the option to export and package the SQL for execution as Groovy scripts, Stored Procedures (where supported), PowerShell scripts and more. The difference to earlier versions is that this is an option, not the standard.

An added bonus from this approach is that scripts run through the Ajilius scheduler no longer have to go through the overhead of instantiating the scripting engine and loading/compiling the script before execution. As a result, you get much faster execution of your transformation processes.

Ajilius. Listening to customer feedback to make us better.

 

Take the SAS language challenge

Can your data warehouse automation solution correctly handle international character sets?

I was reading an interesting blog post from SAS at the weekend. It shows how they visualise data containing character sets from many languages around the world.

http://blogs.sas.com/content/sastraining/2017/05/12/map-of-idioms-from-around-the-world/

The blog discuss loading and visualising data that looks like this:

Given that the data is contained in a spreadsheet, I thought it would be fun to see how that data was handled by Ajilius.

Create a data source that points to the spreadsheet:

Ingest the metadata from the spreadsheet:

Run the load script, and here is the result:

There it is. Fast, full-fidelity representation of the SAS data. If you’re building a data warehouse in Korea, or Armenia, or Israel, or any other country with a non-ASCII language, then Ajilius is your best solution for data warehouse automation.

Ajilius. Data warehouse automation for the global community.

 

What’s In Ajilius 3

We’ve been working on the next major release for Ajilius for several months now, and it is time to let you know what we’ll deliver starting next month.

Adaptive ELT

Q: “What is the fastest algorithm to load a dimension?”

A: “It depends.”

That’s right, the best algorithm for a load process depends on your platform and your data. MPP platforms prefer CTAS algorithms, SMP may do better with UPDATE algorithms for low volatility tables.

What we often see, though, is that design decisions made during development are no longer valid when the DW has been in production for several months. Assuming good design choices in the first place, the most common reason is that the volume and patterns of data in production are not matching those that drove the initial design decision.

Ajilius already supports a choice of CTAS or UPDATE algorithms depending on the platform, but in Ajilius 3 we’re taking an industry-leading leap forward and introducing a dynamic choice of algorithm at run time. We’re calling it Adaptive ELT.

Based on the combination of platform and data, we’ll make an intelligent choice about the best algorithm to use for each and every batch run. Does the data suggest you’ll be updating the majority of rows in a billion row table? We’ll generate CTAS for that batch. Are you only updating a few  rows on the next run? We’ll use an UPDATE statement.

No other ETL or ELT platform gives you this flexibility. It is like having your developers rewrite and tune their code, each night, with the data for that night’s batch driving their decisions. You’ll get an expert ELT system that dynamically monitors and adjusts your processing for maximum performance, on every single run.

SQL+Script

Until now, Ajilius has generated scripts for ELT processing. Ajilius 1 generated Python, Ajilius 2 generates Groovy, Ajilius 3 is throwing away the single script constraints.

In Ajilius 3 we’ve separated generation of SQL from generation of the wrapping script. We’re calling this SQL+Script, because we’ll be generating SQL for every transform, supported by a wrapping Script of a type requested by the user.

A new template option will enable the generated SQL code that effects transformations to be wrapped in multiple scripting languages. We’ll continue to support Groovy as our primary scripting option, but here are just some of the options that will be available in Ajilius 3:

  • Plain SQL. Every script will be able to be generated as just SQL statements in the target platform dialect. This will delight customers who want to use Ajilius in a hybrid environment with existing data warehouses.
  • Stored Procedures. SQL Scripts will be wrapped in stored procedure creation scripts for customers who prefer to use tools like SQL Agent and SSIS for job execution.
  • PowerShell. The rise of PowerShell as an automation language, and its growing popularity in CI/CD environments, makes it a great option for an Ajilius target. We’ll generate wrappers that emulate our current Groovy scripts, but using PowerShell as the scripting language.

BEAM Support

Being great fans of Lawrence Corr’s Agile Data Warehouse Design methodology, we’re adding BEAM support to Ajilius.

Our new “Model” module will implement the BEAM methodology. Here’s what you’ll see in the first release:

  1. Capture and presentation of 7-Ws
  2. Generation of template fact/s and dimensions from Ws
  3. Mapping of staging tables to fact/s and dimensions
  4. BEAM-compatible short codes for all object types.
  5. Incorporation of BEAM artefacts into Ajilius-generated documentation

Ajilius currently works “left-to-right”, in that we start from data sources and eventually transform to facts and dimension. We’ll keep that capability in Ajilius 3, but we’ll also provide “right-to-left” model driven implementations, where you’ll have the option to create facts and dimensions first, then link them back to transformations. In other words, we’ll support model-driven, data-driven and hybrid approaches to building a dimensional data warehouse.

All of this is an extension to the current Ajilius. You’ll be able to retrofit BEAM to an existing Ajilius DW, as well as integrate new BEAM-designed stars in to a conformed set of existing dimensions.

Once again, Ajilius is demonstrating innovation and leadership in data warehouse automation, being the first to directly integrate BEAM into the automation process.

The Fine Print

We’ll be rolling out the first release of Ajilius 3 some time before June 30.

The Ajilius 3 upgrade will preserve all your existing metadata, but we’ll be making some adjustments to the metadata structure that will break the incremental update flow. In other words, we’ll be expecting all users to upgrade to 3, then restarting incremental upgrades from that point.

We will continue to support Ajilius 2 for six months from the date of release of Ajilius 3, in order to give users time to evaluate and test the migration process.

There is, of course, no additional charge to users for Ajilius 3, all upgrades are covered for the life of your license agreement.

We hope you’re as excited as we are by these changes!

Ajilius. Committed to innovation in data warehouse automation.

 

Three-Click Migration

A technical sales manager at Microsoft recently argued that our claim of three-click migration between supported platforms was impossible. “It can’t be done,” he argued, “they’re completely different syntax and architectures.”

So we demonstrated it.

We took a data warehouse we’d built on AWS Redshift.

We ported it to Azure SQL Data Warehouse in three clicks.

We demonstrated the differences in the code that was generated for each platform, and the best practices it entailed.

We then ported the Azure SQL Data Warehouse version to SQL Server on Linux, in three more clicks, as a low-cost development and test platform where the results could be automatically deployed back to Azure SQL Data Warehouse.

Again, we demonstrated how it was done in code.

Finally, we added tables from Dynamics365 and SharePoint Online to the data warehouse on SQL Server, and deployed it back to Azure SQL Data Warehouse.

“OK,” he said, “I’m convinced.”

Ajilius. We walk what we talk.

Faster CRM Reporting

One thing we’ve noticed from our CRM integrations with Salesforce, Dynamics and NetSuite is that their APIs are glacially slow. If your reporting solution gets its data from an application interface rather than a database, your reports could take minutes to complete rather than seconds.

That isn’t surprising, an API gets in the way between your report and the database, but just how slow has been a real eye opener. We’ve seen data volumes that would move in less than a second on a direct database read take 5-6 minutes to retrieve from an application interface.

You might think that one report taking 6 minutes instead of a second or two isn’t too bad, but what if there are 100 reports to be run? Or you need to surface that report on an interactive dashboard?

Not only are there performance issues, but some APIs also have concurrency and volume limits. You might not be able to run more than two reports at once, and if you have a large group of users, their report queries may exceed the extract limits imposed by your CRM provider.

The solution is a data warehouse, where you extract the data from your application API just once, then all your reports interact with a faster database. You can integrate your CRM data with other systems in your organisation; and optimise the data structure into facts and dimensions, the best solution for reporting and analytics.

Here’s what it might look like from a time perspective:

Keep in mind, you also get the benefit of fast, interactive analytics and visualisation, for the entire day. Interactive dashboards are a reality, not a dream.

With our pre-built interfaces to Salesforce, Dynamics365 and NetSuite (and more coming soon), Ajilius makes it fast and simple to integrate your CRM into a data warehouse.

Ajilius profiles your data sources, screens your data to ensure its quality, ingests it into your data warehouse, transforms it into facts and dimensions, and creates direct integrations in business terminology with great products like Yellowfin, Tableau, Qlik, PowerBI and Excel. We’re the best value solution in data warehouse automation.

Ajilius. Faster CRM reporting.