New Transform: Persist Static

Ajilius 2.3.7 includes a new type of transform, labelled Persist Static.

The intent of this transform is to persist a table that will not be automatically reloaded.

Ajilius has a dependency-based scheduler. When we process a table, we recursively check and process its dependencies if required. Until now, that meant that ALL tables could potentially be reloaded.

We found this wasteful in cases where the table was relatively static. A business calendar, for example, might be updated once per year by the Finance department. A translation table aligning common data from two systems could be updated only when new products are added. In both examples, under the old process, these table would be reloaded every time a batch that referenced them was run.

Now, however, tables can with the type Persist Static will trim the dependency tree when included in a batch. The latest version of the table will be used, and its dependencies ignored.

A Persist Static table can still be reloaded from the command line. Assume that we have the following chain of tables, and that stage_source is Persist Static.

source.csv -> load_source -> stage_source

Any job that references stage_source as a dependency will use the latest contents of the table.

To reload this table, the following command might be used:

java -jar ajilius.jar -w MyWarehouse -t stage_source

That command will cause Ajilius to start processing with stage_source. It will be found to have a dependency of load_source. A load job will then be triggered to import data from source.csv. On completion of this job, the staging of data from load_source to stage_source will be performed. The table is then completely updated, and the job will terminate.

Ajilius. More effective loads and transforms.


Snowflake-Snowflake Data Loads

When we built the Snowflake adapter for Ajilius, around two years ago, we saw Snowflake as a Data Warehouse target.

We’re now seeing many Snowflake customers using its capabilities as a Data Lake as well as a Data Warehouse. Customers are landing their data in one Snowflake database, then migrating and structuring it for analysis and reporting.

We have responded to this shift by implementing cross-database queries for data ingestion by Ajilius.

Previously, customers could extract data from one Snowflake database, then reload it to another. If your Ajilius instance was running on-premise, this carried a significant overhead as the data was extracted, transferred to your Ajilius instance, transferred back to the Snowflake host, and reloaded.

Now, a new load type of “Cross-Database” can be selected for a table. When the table load job is executed, we create a three-part name query instead of an extract/load query.

Here’s how it works.

Create a data source in the normal way. Let’s use the SNOWFLAKE_SAMPLE_DATA that comes with a new account:

After refreshing metadata, we can see the tables and columns from the demonstration database. Now we can load the CALL_CENTER table:

Set the characteristics of the table. The Load Method will default to Bulk Import, which uses the Snowflake PUT/COPY method. Change this to Cross-Database Query:

Now, when you generate scripts for this table, you’ll see that a cross database query is used instead of a PUT/COPY. Note that some columns have been dropped in order to show the full query.

Of course, incremental loads are also supported for cross-database loads.

Ajilius. Faster loads for Snowflake-Snowflake.