Yellowfin Integration

One of the challenges in self-service BI is that you can quickly end up in your BI tool’s version of Excel Hell. That is, every user has their own definition of data, sources and aggregations, leading to the pre-DW situation where no two reports ever agreed.

Ajilius now brings a new level of governance to self-service, by directly integrating the data warehouse metadata with the BI tool. Over the 2.4 development cycle we’re bringing integration to a number of popular BI platforms, beginning today with Yellowfin.

yellowfin1

Select a fact table, select a destination, and a full Yellowfin model for your star schema will be written to that destination.

Use the Import function in Yellowfin, and your model is imported as a Yellowfin connection and view.

Not only do we generate the metadata, but Yellowfin makes round-trip integration a breeze. In our demonstrations we show how models can be enriched in line with changes to the data warehouse, preserving your existing reports and dashboards.

Now your data warehouse and your BI solution can be closer than ever before.

Ajilius. Self-service and governance.

DW DBMS Changes

DW DatabasesIn line with DW market developments, we have made some changes to the databases supported by Ajilius as target data warehouse platforms.

The revised list of targets is:

  • SQL Server (On-Premise & IAAS)
  • PostgreSQL / EnterpriseDB
  • MariaDB Column Store
  • Exasol
  • Azure SQL
  • Azure SQL Data Warehouse
  • Snowflake Elastic Data Warehouse
  • AWS Redshift

With the release of SQL Server 2016 SP1, and the CTP of SQL Server on Linux, we’re excited by the potential for major cost savings in mid-market customers. Features like database compression and column store make it feasible to run a small data warehouse / data mart on even SQL Server Express Edition; stepping up through Standard and Enterprise Editions as storage and performance expectations grow.

We have been working with Microsoft on SQL Server on Linux for some time now, and see a bright future for this platform. Installation is easy, performance is great (although very slightly behind Windows at this time), and it has been rock-solid for over 2,000 cycles of DW deployments using Ajilius.

We’ve also been impressed by MariaDB’s new Column Store engine. We’re seeing great performance and scale from this platform, making it a natural target for any business running MariaDB / MySQL as its standard DBMS.

Exasol recently announced V6 of its in memory DBMS, and we’re busy upgrading our Exasol adapter to take advantage of its new features.

With all these new and upgraded platforms comes a harder decision, to drop one of our supported databases. While we introduced Greenplum support earlier this year, we have had just one enquiry during that time, and no customers. Time to retire it, and put the support time to better use.

We’ve still got one platform that we want to add as a target, but I’ll save that announcement until development is further advanced. A hint, it will be a BIG announcement.

Ajilius. Supporting tomorrow’s DW platforms, today.

SharePoint Data Warehouse

Reference data – the common lookup tables that drive many data warehouses – often has no clear home in an organisation. It usually gets dumped in spreadsheets, XML files or text files, and ends up in a mess. Recently we’ve found that SharePoint Online makes a great repository for DW reference data. It can be managed and maintained in a clean, multi-user environment, then neatly integrated into the data warehouse.

This was supposed to be a post about the relative merits of SharePoint and Google Sheets for the maintenance of reference data. Unfortunately, Google’s approach to browser-based OAUTH authentication doesn’t play well on servers, so we’re temporarily shelving Google Sheets as an Ajilius data source until we figure out a workable solution.

On with SharePoint!

Adding a SharePoint Online data source to Ajilius is a breeze. Simply enter your site, user name and password:

sharepoint01

Refresh your SharePoint metadata, and you’ll see the libraries, lists and other data sources that can be loaded into Ajilius. Here we’ve created a simple list in which we can maintain details of public holidays.

sharepoint02

You’ll notice that there are many metadata columns added by SharePoint. In fact, we’re only interested in three columns, and we’ll clean that up once we’ve imported the metadata.

Import your metadata just like any other Ajilius data source, by selecting the Load Metadata option from the context menu for the SharePoint.PublicHolidays table. Here is the screen to complete the metadata:

sharepoint03

When added, you’ll see a screen like this, showing the columns for which metadata has been imported:

sharepoint04

Let’s clean up those extra columns we don’t need. Click the Change link at the top of the right-hand panel, and you’ll be taken to the column list.

sharepoint05

We’re going to use the Delete link, highlighted in the previous picture, to quickly delete a large number of columns from the table. When you click it, you will see a simplified form of the column mapper. You can choose to delete a column by changing the drop-down to Delete, either by selection, or most browsers support simply tabbing into the drop-down and pressing the ‘D’ key to set the value.

sharepoint06

When you’ve deleted the columns you don’t need, you should be left with a Date, Store and Title column. Clean up their metadata with some better descriptions, set the Date and Store as business keys to the table, and it should look like this:

sharepoint07

Next, use the Scripts option from the load_public_holiday context menu, and Create and Load your new table. Here is how your screen should look after loading the data.

sharepoint08

And when we view the data, we can see that it has been successfully loaded from our source in SharePoint Online.

sharepoint09

SharePoint Online makes a great multi-user tool for managing the small reference data sets that often get overlooked in the data warehouse governance process. Ajilius now includes a SharePoint connector as a standard feature, at no additional cost.

Ajilius. Now with SharePoint.

 

 

 

Persistent Staging Tutorial

It almost qualifies as a Frequently Asked Question in demonstrations:

How do you handle persistent staging?

Persistent Staging is typically used to keep data from source systems in its original state.

One common reason for doing this is to have a full set of source system data available to support future changes that you may wish to make to star schemas. Having the data already in the data warehouse makes it simple to recreate or modify the star schema at will.

There are two types of persistent staging supported by Ajilius:

  • Latest
  • Historic

Read the tutorial to discover exactly how they work.

Persistent Staging Tutorial

Data Profiling Patterns

Patterns in data are a great way to summarise the type and content of a column.

Our data profiling feature creates patterns in the following manner:

  • Values are converted to a string representation
  • Alphabetic characters are converted to an ‘x’.
  • Characters in the range 0-9 are converted to a ‘9’.
  • All other characters are left as-is.

Here are some examples of pattern conversions:

  • 2010-09-23 to ‘9999-99-99’
  • 09/07/2016 to ’99/99/9999′
  • Ajilius to ‘xxxxxxx’
  • (03) 5432 9876 to ‘(99) 9999 9999’

We find that the frequency of pattern occurrence is more valuable than the frequency of value occurrence. For example, it is more useful to know that a column contains only values with a pattern ‘9999-99-99’ than to see a long list of dates, each occurring only a handful of times.

Ajilius. Better profiling for better data warehouses.

 

Announcement: Ajilius 2.3

I’m proud to announce that, today, we achieved our goals for Ajilius 2.3.

The hero feature of this release is data profiling. We now profile data sources faster than Trifacta, with more valuable information than Pentaho, and with more variety than SSIS. We give you the real data you need to make quality decisions about the content of your data sources.

To deliver this feature, we first added persistent metadata caching to Ajilius, as discussed in this earlier post.

Now, we’ve completed the feature by implementing the profiling and presentation features.

You profile a source table from the Extract Source Tables screen. The following screen shot shows that we are about to profile the Chinook Customer table.

profile1

We see any previous profile that is cached for this table, and we can refresh the profile at any time by pressing the Profile button.

profile2

We profile any number of rows, at a rate of around 4million rows per minute.

Every column is profiled in every row.

For columns of less than 64 characters in length, we profile up to 1,000,000 discrete values per column.

For columns of up to 256 characters in length, we profile up to 1,000,000 discrete patterns per column.

For columns of up to 4,000 characters in length, we profile the minimum and maximum values in a column.

Not only do we profile values and patterns, we examine your data for characters that might cause problems in your data warehouse. Null values? Got it. Control characters? Check. Extended ASCII characters? That too. Unicode characters? Again, check.

This is real, valuable profiling data for data warehouse professionals. And it is now included in your Ajilius licence.

So, once again, Ajilius provides real value through the addition of the features you need.

Ajilius. The real innovators in data warehouse automation.

 

Data Quality with Regular Expressions

Ajilius currently supports three types of data quality screens on data being loaded to the warehouse:

  • Data Type
  • Range/s
  • Regex

We’ve previously posted about type and range validation, but we recently had an enquiry about the use of Regular Expressions (regex) for data validation. Let’s build an example based on Postal Code validation.

The Person.Address table in AdventureWorks2014 contains a Postal Code column. Addresses are international, with many different formats of Postal Code. For the purposes of this demonstration, we are going to validate the column against Canadian Postal Codes. We’ll use a regular expression taken from the Regular Expressions Cookbook, 2nd. Edition

    ^(?!.*[DFIOQU])[A-VXY][0-9][A-Z] ?[0-9][A-Z][0-9]$

We’ll start from the point where we have imported the Person.Address metadata into our repository:

regex1

Click Change in the right-side panel, then modify the postal_code column. Scroll down to the Data Quality section, and make the following changes:

regex2

 

Save your changes, go back to the Load List, and select the Scripts option for the load_person_address table.

Notice the new section of script which has been generated. This is the validator that will be applied to this column at load time.

    rs = new AjiliusResultSet (rs)
    rs.setValidator('postal_code','text','regex','^(?!.*[DFIOQU])[A-VXY][0-9][A-Z] ?[0-9][A-Z][0-9]$','1')

Now run the script, and watch the results at the bottom left of the screen:

regex3

As you can see, only 1090 rows passed the validation. And when we view the contents of the table, we see that these rows do match the Canadian Postal Code format defined in our regular expression:

regex4

A word of caution, regex validation is slower than type and range checking. Without regex validation, that same table loaded in 0.749 seconds, and the difference was due entirely to the regex algorithm. If you have a choice, use range checking instead.

Ajilius. Better data screens.

 

Enhancement: Cached Metadata

Ajilius enables browsing of source system metadata, to identify and load data into the warehouse. Here is a typical display:

metadata01

A click on a table, on the left side, shows the columns for that table on the right side.

Until now, each update of this screen represented a round trip to the database. That was fine for a data source located close by the Ajilius server, but users in hybrid environments reported slow screen updates when pulling cloud metadata to on-premise Ajilius. This was particularly apparent with large Oracle systems, which add thousands of system table entries to a metadata set.

We’ve now added a feature to cache metadata within Ajilius. This means the delay of updating metadata happens only once, and subsequent interactions are as fast as an on-premise solution.

The context menu for a data source now contains an option to Refresh Metadata:

metadata02

When you select this option, you will be prompted with a screen warning that a metadata update might be slow. In this case, “slow” means a few seconds.

metadata03

On running this refresh, an internal metadata cache is added to your data warehouse metadata repository, and subsequent calls to browse metadata or load metadata into the warehouse will be drawn from this cache.

You may refresh the cache at any time.

An added bonus is that once your data source metadata is cached, you no longer need to be connected to the source whilst working with source metadata. That’s a great feature for companies with high risk data, and for people who like to take their work home with them 🙂

Ajilius. More flexible metadata.

 

Enhancement: MongoDB Data Source

Release 2.2.12 includes support for MongoDB as a data source.

MongoDB is a popular NoSQL DBMS, and Ajilius is the first data warehouse automation platform to support MongoDB as a first-class citizen.

Want proof? Here’s a shot of Ajilius displaying data from the MongoDB hosted TCPH database:

mongodb2

 

More? Here’s Ajilius showing data from the Restaurants sample database:

mongodb3

Ajilius makes it easy to work with MongoDB, as simple as working with any other data source we support.

Ajilius. MongoDB to your data warehouse.

Enhancement: Log to Data Warehouse

This week’s Release 2.2.12 includes a feature to write ELT log entries to the data warehouse.

Previous releases logged ELT performance to a log file. This file could be named through the Warehouse Edit screen, and defaulted to ajilius.log.

Now, in addition to the log file, a table AJILIUS_LOG is created in the AJILIUS schema of the data warehouse. On job completion, as long as a data warehouse connection is available, the results of the job will be inserted to this table.

The logged columns are:

  • log_stamp / Timestamp
  • log_script / Script name
  • log_elapsed / Elapsed time in seconds
  • log_status / Job status, 0=success
  • log_files / Number of files processed during job
  • log_inserts / Number of rows inserted during job
  • log_updates / Number of rows updated during job
  • log_deletes / Number of rows deleted during job
  • log_message / Descriptive error message if log_status<>0

Ajilius. Keeping track of performance.