New Transform: Persist Static

Ajilius 2.3.7 includes a new type of transform, labelled Persist Static.

The intent of this transform is to persist a table that will not be automatically reloaded.

Ajilius has a dependency-based scheduler. When we process a table, we recursively check and process its dependencies if required. Until now, that meant that ALL tables could potentially be reloaded.

We found this wasteful in cases where the table was relatively static. A business calendar, for example, might be updated once per year by the Finance department. A translation table aligning common data from two systems could be updated only when new products are added. In both examples, under the old process, these table would be reloaded every time a batch that referenced them was run.

Now, however, tables can with the type Persist Static will trim the dependency tree when included in a batch. The latest version of the table will be used, and its dependencies ignored.

A Persist Static table can still be reloaded from the command line. Assume that we have the following chain of tables, and that stage_source is Persist Static.

source.csv -> load_source -> stage_source

Any job that references stage_source as a dependency will use the latest contents of the table.

To reload this table, the following command might be used:

java -jar ajilius.jar -w MyWarehouse -t stage_source

That command will cause Ajilius to start processing with stage_source. It will be found to have a dependency of load_source. A load job will then be triggered to import data from source.csv. On completion of this job, the staging of data from load_source to stage_source will be performed. The table is then completely updated, and the job will terminate.

Ajilius. More effective loads and transforms.

 

Snowflake-Snowflake Data Loads

When we built the Snowflake adapter for Ajilius, around two years ago, we saw Snowflake as a Data Warehouse target.

We’re now seeing many Snowflake customers using its capabilities as a Data Lake as well as a Data Warehouse. Customers are landing their data in one Snowflake database, then migrating and structuring it for analysis and reporting.

We have responded to this shift by implementing cross-database queries for data ingestion by Ajilius.

Previously, customers could extract data from one Snowflake database, then reload it to another. If your Ajilius instance was running on-premise, this carried a significant overhead as the data was extracted, transferred to your Ajilius instance, transferred back to the Snowflake host, and reloaded.

Now, a new load type of “Cross-Database” can be selected for a table. When the table load job is executed, we create a three-part name query instead of an extract/load query.

Here’s how it works.

Create a data source in the normal way. Let’s use the SNOWFLAKE_SAMPLE_DATA that comes with a new account:

After refreshing metadata, we can see the tables and columns from the demonstration database. Now we can load the CALL_CENTER table:

Set the characteristics of the table. The Load Method will default to Bulk Import, which uses the Snowflake PUT/COPY method. Change this to Cross-Database Query:

Now, when you generate scripts for this table, you’ll see that a cross database query is used instead of a PUT/COPY. Note that some columns have been dropped in order to show the full query.

Of course, incremental loads are also supported for cross-database loads.

Ajilius. Faster loads for Snowflake-Snowflake.

 

SQL Server Integrated Authentication

From time to time we get a customer reporting an error like this:

The key to this error is the text “Faled to load the sqljdbc_auth.dll”

This file is required by the Microsoft SQL Server JDBC driver, in order to handle integrated authentication.

The file must be placed in the Java bin folder. Upgrades to Java can change the location of the Java in folder, by changing the version number higher up the path. This means that the file can no longer be found.

Here are some instructions to resolve this error if it occurs on your server.

Instructions

Download the SQL Server JDBC driver from https://www.microsoft.com/en-au/download/details.aspx?id=11774. Make sure you get the 64-bit version.

Install the driver to any location. It doesn’t matter where, it basically just unzips a file structure and you’ll only be taking one file from its directory.

Go to the folder to which you installed the driver, and find the following path:

\sqljdbc_6.0\enu\auth\x64

Copy the file sqljdbc_auth.dll from this location

Find the location of your Java Runtime Environment using the WHERE command:

where java

It should be something like this:

C:\Program Files\Java\jdk1.8.0_121\bin\java.exe

Copy the sqljdbc_auth.dll into the java folder. Given the result I’ve just shown, on my machine I would copy it to:

C:\Program Files\Java\jdk1.8.0_121\bin\sqljdbc_auth.dll

Now, when you restart the Ajilius application, integrated authentication should work on your machine.

Custom Code Injection

Ajilius has always supported a rich set of transformations and custom calculations to support complex ELT. Occasionally, however, we’ve had requests for custom code injection before or after ELT transformations. That request is now satisfied.

All Load and Transform jobs now support blocks of user-defined code at the beginning and end of the job.

Code injected into Load jobs is in the Groovy scripting language, while code injected into Transform jobs is pure SQL.

To add custom code to a job, select the Change menu item for the table being processed:

The table form now contains two new fields, named Custom Code – Pre-SQL and Custom Code – Post-SQL. These fields contain any legitimate SQL statements.

When you run the transform script for this table, the custom code will be injected into the appropriate place in the script:

Please note that custom code may not be supported across target data warehouse platforms, and may need to be reviewed and edited after migration.

Ajilius. Round-trip customisation.

 

Take the SAS language challenge

Can your data warehouse automation solution correctly handle international character sets?

I was reading an interesting blog post from SAS at the weekend. It shows how they visualise data containing character sets from many languages around the world.

http://blogs.sas.com/content/sastraining/2017/05/12/map-of-idioms-from-around-the-world/

The blog discuss loading and visualising data that looks like this:

Given that the data is contained in a spreadsheet, I thought it would be fun to see how that data was handled by Ajilius.

Create a data source that points to the spreadsheet:

Ingest the metadata from the spreadsheet:

Run the load script, and here is the result:

There it is. Fast, full-fidelity representation of the SAS data. If you’re building a data warehouse in Korea, or Armenia, or Israel, or any other country with a non-ASCII language, then Ajilius is your best solution for data warehouse automation.

Ajilius. Data warehouse automation for the global community.

 

Terminal Emulators

When your software lives in the cloud, you get used to the world of terminal emulators. We run over 50 cloud servers on 5 different cloud platforms, and the right terminal emulator is a real boost to productivity.

When I was using OSX as my development platform, I used iterm2 as my terminal. Far better than the standard Mac terminal program, it offers features like paned windows, search, autocomplete, hotkeys, and more.

On switching back to Windows I was stumped. Putty was the most often recommended terminal, but it just felt ancient after iterm2, and I started looking for alternatives. After much searching, I found the wonder that is ZOC Terminal.

Tabbed windows! 3270! 5250! Windows and OSX! This product isn’t just a Putty replacement, but Attachmate as well.

The file transfer feature means that I never have to scratch my head trying to remember SCP syntax. There’s a full scripting language built-in, that we’re using to automate some of our automated deployment processes. Think of something you want to do in a terminal session, and it is probably built into the extensive feature list.

ZOC Terminal is a paid, commercial product. Its productivity and usability gains are worth every cent.

Strongly recommended, if you’re deploying Ajilius on a cloud / Linux environment.

Display Settings

Something often overlooked by new users is the ability to adjust the size of tables and editors to match their screen size and browser zoom level.

Not everyone likes small, squinty fonts … especially those of us on the wrong side of 50 … and Ajilius makes it easy to adjust.

Click on your user name (top right of navigation bar) and select the User Preferences option:

Then, use the Table Rows and Documentation Lines options to adjust the display. Table Rows adjusts the number of rows displayed in tables, and Documentation Lines adjusts the number of lines displayed in the User and Tech Notes fields.

Ajilius. Readable data warehouse automation.

Custom Driver Settings

This week we’re adding custom driver settings to Ajilius.

While our driver settings are chosen for optimal performance, there have been times when customers have asked for the ability to tailor connections to specific requirements.

We’re now supporting that request through two settings:

  • Connection Parameters. These are extensions to the connection string used by JDBC drivers. Specify connection string parameters in a single line, including delimiters after every parameter.
  • Driver Properties. These are JDBC properties which modify the behaviour of drivers through an API call instead of the connection string. Specify properties as a combination of property-value pairs, with one property per row.

 

The above example shows a modification to the connection string for the Salesforce driver, which adjusts the Salesforce API to version 36, and reduces the timeout setting to 30 seconds.

Please use custom driver settings with extreme care, as invalid settings could cause your ELT jobs to fail. We recommend you check with us before making changes regarding the specific format requirements of each driver.

Ajilius. Customisable automation.

 

SharePoint Data Warehouse

Reference data – the common lookup tables that drive many data warehouses – often has no clear home in an organisation. It usually gets dumped in spreadsheets, XML files or text files, and ends up in a mess. Recently we’ve found that SharePoint Online makes a great repository for DW reference data. It can be managed and maintained in a clean, multi-user environment, then neatly integrated into the data warehouse.

This was supposed to be a post about the relative merits of SharePoint and Google Sheets for the maintenance of reference data. Unfortunately, Google’s approach to browser-based OAUTH authentication doesn’t play well on servers, so we’re temporarily shelving Google Sheets as an Ajilius data source until we figure out a workable solution.

On with SharePoint!

Adding a SharePoint Online data source to Ajilius is a breeze. Simply enter your site, user name and password:

sharepoint01

Refresh your SharePoint metadata, and you’ll see the libraries, lists and other data sources that can be loaded into Ajilius. Here we’ve created a simple list in which we can maintain details of public holidays.

sharepoint02

You’ll notice that there are many metadata columns added by SharePoint. In fact, we’re only interested in three columns, and we’ll clean that up once we’ve imported the metadata.

Import your metadata just like any other Ajilius data source, by selecting the Load Metadata option from the context menu for the SharePoint.PublicHolidays table. Here is the screen to complete the metadata:

sharepoint03

When added, you’ll see a screen like this, showing the columns for which metadata has been imported:

sharepoint04

Let’s clean up those extra columns we don’t need. Click the Change link at the top of the right-hand panel, and you’ll be taken to the column list.

sharepoint05

We’re going to use the Delete link, highlighted in the previous picture, to quickly delete a large number of columns from the table. When you click it, you will see a simplified form of the column mapper. You can choose to delete a column by changing the drop-down to Delete, either by selection, or most browsers support simply tabbing into the drop-down and pressing the ‘D’ key to set the value.

sharepoint06

When you’ve deleted the columns you don’t need, you should be left with a Date, Store and Title column. Clean up their metadata with some better descriptions, set the Date and Store as business keys to the table, and it should look like this:

sharepoint07

Next, use the Scripts option from the load_public_holiday context menu, and Create and Load your new table. Here is how your screen should look after loading the data.

sharepoint08

And when we view the data, we can see that it has been successfully loaded from our source in SharePoint Online.

sharepoint09

SharePoint Online makes a great multi-user tool for managing the small reference data sets that often get overlooked in the data warehouse governance process. Ajilius now includes a SharePoint connector as a standard feature, at no additional cost.

Ajilius. Now with SharePoint.

 

 

 

Persistent Staging Tutorial

It almost qualifies as a Frequently Asked Question in demonstrations:

How do you handle persistent staging?

Persistent Staging is typically used to keep data from source systems in its original state.

One common reason for doing this is to have a full set of source system data available to support future changes that you may wish to make to star schemas. Having the data already in the data warehouse makes it simple to recreate or modify the star schema at will.

There are two types of persistent staging supported by Ajilius:

  • Latest
  • Historic

Read the tutorial to discover exactly how they work.

Persistent Staging Tutorial