Enhancement: Surrogate Key Names

Ajilius V1 used a fixed naming convention for surrogate keys. The dimension surrogate was named “id”, and the fact foreign key was named “<dimension>_id”.

While this might have been correct from a data modelling perspective, it didn’t work well with modern BI tools which expect the surrogate and foreign keys to have the same name.

In V2 we modified the convention to use the same name, which we constructed as “<dimension>_key” in each case. This worked well with BI tools, but Kimball modellers were more used to seeing column names like “store_key” instead of “dim_store_key”.

Now we’re releasing an enhancement that enables you to set the surrogate key name when you create a dimension, and to modify it like any other column name.

Update to version 2.2.7, and you’ll be able to use this feature.

Ajilius. Responding to user feedback.

 

Enhancement: Batch Processor

When we designed Ajilius, we expected that customers would have an existing batch scheduling solution in place. Scripts generated by Ajilius would be deployed to production servers, and scheduled for execution by the operations team.

It hasn’t worked that way in practice. To our surprise, most customers to date have had no existing scheduler in place. Our most common enhancement request has been to provide this functionality.

Well, we’ve done it. Ajilius now incorporates a batch scheduler for dependency-based execution of data warehouse scripts.

Ajilius now incorporates a command line processor that supports the following arguments.

-w / –warehouse
Generate scripts from the warehouse_name metadata

-b / –batch
Execute batch-level operations

-s / –source
Load all tables from named source/s

-l / –load
Load all tables, or named tables

-t / –transform
Transform all tables, or named tables

-d / –dimension
Process all dimensions, or named dimensions

-f / –fact
Process all facts, or named facts

The -t, -d and -f options process all dependencies for the selected tables. For example, the arguments:

-w chinook -b reset -f all

will conduct a full end-to-end load of the “chinook” warehouse.

You might choose to only load two specific fact tables, in which case the arguments might look like:

-w chinook -b reset -f fact_sale fact_budget

Multiple commands can be included in the one command line, or jobs may be separately scheduled. Steps that have been completed will be retained between jobs, until they are reset by the “-b reset” option. For example:

-w chinook -b reset
-w chinook -l all
-w chinook -t all
-w chinook -d all
-w chinook -f all

If executed as a series of jobs, this batch would reset the internal dependency log, load all load tables, transform all stage tables, process all dimensions, then process all facts.

It might, alternatively, been written as:

-w chinook -b reset -l all -t all -d all -f all

But given that all dependencies are processed for each table, it could be simplified to:

-w chinook -b reset -f all

The batch scheduler will be in Release 2.2.6, available from Monday, July 25.

Ajilius. Now with enhanced scheduling.

 

Enhancement: Custom Query Incremental Loads

Ajilius uses change data columns to govern incremental extracts from source tables. You flag these columns on the Column data entry screen.

custom_query_control_column

When we implemented Custom Query Loads, this feature was not supported.

Now, thanks to some great work from the dev team, we’re pleased to announce that incremental loads are now fully supported on Custom Query Loads. Same method, same result.

Include parameter markers in your custom queries, flag the columns for change data detection, and we’ll automatically generate the rest.

Ajilius. Every day, in every way, we’re getting better and better.

New feature: Custom Query Loads

The normal workflow to define an extract/load job in Ajilius is to browse the source system for the required table or view, then load the metadata for that table, and finally execute the ELT job created from the metadata.

Complex load requirements can occasionally be represented as views in the source system, but often it is not possible to create a custom view in a source system. It might be a hosted application where you don’t have the ability to create new objects, or you might simply lack the authorisation to create a view.

In this circumstance, you might like to use our newest feature, Custom Query Loads.

http://ajilius.com/custom-query-loads/

Ajilius. The features keep coming.

 

New feature: Merge Transform

We’ve just finished a new feature to combine multiple streams of data within the data warehouse. Known as a “Merge Append”, this feature merges (appends) the contents of one table into a target table, with user-defined mappings between columns.

We recently delivered this feature for customers with multiple, similar, sources of data; and now we’re making it publicly available. Examples of use cases have been collectives of independent schools, financial aggregators, and re-insurance organisations.

Our content editor has done a great job on a tutorial for this feature, setting a standard we’ll be applying to all features from now on.

http://ajilius.com/merge-table-tutorial/

Let us know what you think.

Ajilius. Committed to delivering value.

Integrated Upgrades

One of our principles in making enhancements to Ajilius is drawn from a quote attributed to Albert Einstein:

Everything should be made as simple as possible, but no simpler.

Our new Java version made the installation process much simpler, and the patch process as well. Now, the full release of 2.2 simplifies the process of upgrades to our metadata databases.

When we need to update the metadata databases, we’ll send you a file to be placed in the Ajilius program folder. Simply restart Ajilius, the update will be applied, and your application will continue as normal.

Ajilius. As simple as possible.

Note:

Unlike some competitor products, Ajilius does not “phone home”. We do not automatically download any files, and send no information back from your installation. Your metadata and credentials are secure within your site at all times.

Release 2.2.b4

All users should have received an email regarding the availability of 2.2.b4.

This release aligns the dimension surrogate key default value across platforms.

In earlier versions, some databases implemented the default row as ID=0, some as ID=1. The migration process took care of differences, but it was still confusing for some users.

Now, all default rows carry ID=1.

We’ve also rolled in a couple of patches around SQL Server data sources. In particular, you no longer need to specify an extract folder if you’re using the direct load (default) option.

We expect that this will be the second last beta, and we’ll move to a full release next week.

Let us know if you didn’t receive the upgrade email, and we’ll respond with the download location.

Ajilius & Java

We’ve been a little quiet, lately.

The reason is that, for the last few months, we’ve had a team of people working across three countries to complete something exciting, a new version of Ajilius that shifts us from a Python platform to Java. For the last month in particular, we’ve been on an incredible sprint to deliver the Java version by the end of May.

This wasn’t an easy decision to make, as Python was critical to our first year of operation. We couldn’t have built Ajilius as quickly as we did on another product, and we grew to like Python as a development language.

Sadly, though, we found two problems with Python – distribution and database drivers.

Distribution management for Python is handled by a utility called PIP. Our installation scripts used PIP to acquire (and update) our library dependencies. Unfortunately, PIP struggled behind firewalls, and a number of corporate and government customers were unwilling to make the changes necessary for it to work.

That meant we had to bundle dependencies ourselves, which became a nightmare of dependencies-on-dependencies, and a maintenance and distribution headache for our team. If you’ve heard of DLL-hell in the Windows world, Python’s site-packages hell is at least three layers closer to Lucifer himself.

The second problem was database drivers. While Snowflake released a great quality Python driver for their DBMS, most other databases are supported by the Python community, which means that features can lag well behind current vendor releases. A great example is SQL Server, where Python’s interfaces are completely variable in their support. One doesn’t support Unicode. One doesn’t support Integrated Authentication. None support AlwaysOn intents. None support the Bulk Copy API. It wasn’t just a SQL Server problem, we found that as we supported more and more databases, the issues of finding current drivers became a support nightmare.

So both of these problems meant we needed an alternative. After much searching and testing, we settled on the combination of Java and JDBC, and after several man years of effort we finally have a Java version ready for beta testing.

Java solves the packaging and distribution problem because we can distribute Ajilius with the library JAR files already set up. Once you have installed Java, our installation process is as simple as unzipping a folder into a known location. Maintaining current versions of library dependencies is as simple as copying a file. Further, we no longer have to go through a platform-dependent compilation and packaging step as we did with our Python version, the code and packaging is identical across our supported platforms of Windows, Linux and OSX.

Java and JDBC have also enabled us to solve our database driver problem. Every mainstream DBMS has a vendor-supported JDBC driver available, we no longer have to test multiple community drivers then make compromises on the availability of features vs. the driver quality. Even better, we now have drivers available for popular application platforms and web services including CRM, Marketing, Accounting, Collaboration, Social Networking and more. We’ll be talking a lot more about this in coming posts.

So what is the benefit for you, the customer and partner?

Firstly, you get a much faster installation experience, that is more familiar to your operations team. Install Java, copy the Ajilius folder, and you’re done. It can even be fully automated for internal software packaging.

Secondly, updates are now much easier to deploy. You’ll get more features, faster.

Finally, the incredible breadth of JDBC drivers means that we’ll have connectors for more data sources than you can possibly imagine, both cloud and on-premise.

Our demonstration servers have now been switched to Java, and we’ll be ready for on-premise evaluations by the end of the week. Migration packages for existing customers will follow by the end of the month.

Now, we move on with the development of the huge list of features that will be coming to Ajilius in the next twelve months. Hang on, it is going to be one heck of a ride!

 

Running Totals

A customer asked if it was possible to compute a table of cumulative transaction values by day. There were business reasons why this could not be left to a BI tool, so we worked out the following method.

  1. Create a persistent stage table T1 to hold the values to be accumulated. Completely anonymised, these values were TX_DATE, DIM1, DIM2, QUANTITY.
  2. From that persistent table, create a GROUP BY stage table T2 with the columns DIM1, DIM2 and SUM(QUANTITY). Discard the TX_DATE column.
  3. Add a derived column, a date named AT_DATE, with the transform CURRENT DATE.

The last step – a derived column – was the key to the solution.

We made one change to validation rules to enable this to work, and we are now able to meet the customer’s requirement.

Version 2.1c – Snowflake TIME Data Type

On Wednesday night we got notification of the latest changes in Snowflake Elastic Data Warehouse. Of particular interest was the implementation of a TIME data type. Here we are on Friday night, and we’re releasing support for this latest feature.

We were one of the customers to request this data type early in the Snowflake development process, so we’re delighted to see it implemented.

Having a TIME data type makes it easy to implement dimensions that incorporate concepts like business/after hours, and morning/lunch/afternoon periods. Want to know the blocks of time in which most of your transactions occur across the week? You need a time dimension.

Without a TIME data type, Ajilius represents time values using a character string in the format HH:MM:SS. This works for building the star schema, but it doesn’t support analytic queries involving time arithmetic. Now we can store and calculate time values in Snowflake data warehouses.

The installer package for V2.1c also updates the Snowflake connector to the latest version.

Ajilius. Keeping up with Snowflake.