Is your metadata secure?

Are your databases at risk from your data warehouse automation product?

We’ve seen DWA where user credentials for production systems are held in clear text!

If you’re using a data warehouse automation product other than Ajilius, use a query tool to open your metadata database. For example, if your product uses SQL Server, use Management Studio to open the metadata. Now look at the metadata tables which store data warehouse and data source credentials. Can you read them? If you can, so can anyone who reads that database.

All Ajilius user passwords – that is, the passwords that you use to access Ajilius – are one-way hashed. That means you can’t read the password in our database, and we can’t reverse or recover your password, even if we have your database. Our hashing is based on the SHA256 algorithm for world-class data protection.

All credentials, to your data warehouses and your data sources, are fully encrypted. The user-id and password for each connection are encrypted using the AES256 algorithm, an incredibly strong algorithm used by many government security organisations.

Database query tools with authorised access to the metadata database only see a long string of characters like “4e3d3cc78416…” and not your real credentials.

Even if someone gets a physical copy of your metadata, and browses the disk file, they can’t read your database credentials.

Ajilius. We won’t risk your database security.

Roadmap update

We’ve just updated the Ajilius Roadmap to reflect the imminent release of Version 1.3.

The major features of this release are:

  • Cloud to Cloud Data Warehouse, with interoperable support for Snowflake Elastic DW, Amazon Redshift and Microsoft Azure SQL DW.
  • Upgraded documentation generation, with interactive lineage navigation on all elements.
  • Enhanced support for JSON files and REST services

You might note that the Roadmap shows Azure SQL DW coming in Version 1.4, due at the end of Q4. The code is working, but Microsoft currently has the platform in preview only. We’ll update our level of support once it is released to production.

We’ve also added two new datasheets:

You’ll find all this content on our Bookshelf page.

Delimited file formats

Delimited files, often referred to as “CSV” or text files, represent tables of data in a simple text format.

Rows are separated by newline characters, or by newline and carriage-return pairs.

Columns within rows are separated by delimiters. The delimiters supported by Ajilius are comma, tab, semi-colon and vertical bar (‘|’). All rows must contain all columns, although empty columns may be represented as consecutive delimiters.

Text fields containing delimiters must be enquoted using single (‘) or double (“) quote characters.

Where a text field contains a character used as a delimiter, that character must be preceded by an escape character, typically backslash (\). If the text field contains the escape character, it must be preceded by another escape character (ie, ‘\\’).

Delimited files must have column names contained in the first row of the file.

The file may be encoded in any format supported by Ajilius (link).

Detection of encoding, delimiters, quotation and escape characters is automatically performed on file access. You don’t need to specify anything other than the file name.

Ajilius. Flexible delimited file formats.

File Transcoding

Because Ajilius is a multi-lingual and cross-platform product, we sometimes get asked about our ability to handle specific file encodings.

XML, JSON and Delimited files may be any of the following:

  • ASCII
  • windows-1252
  • UTF-8 (with or without a BOM)
  • UTF-16 BE or LE (with a BOM)
  • UTF-32 BE, LE, 3412-ordered, or 2143-ordered (with a BOM)
  • Big5, GB2312/GB18030, EUC-TW, HZ-GB-2312, and ISO-2022-CN (Traditional and Simplified Chinese)
  • EUC-JP, SHIFT_JIS, and ISO-2022-JP (Japanese)
  • EUC-KR and ISO-2022-KR (Korean)
  • KOI8-R, MacCyrillic, IBM855, IBM866, ISO-8859-5, and windows-1251 (Russian)
  • ISO-8859-2 and windows-1250 (Hungarian)
  • ISO-8859-5 and windows-1251 (Bulgarian)
  • ISO-8859-7 and windows-1253 (Greek)
  • ISO-8859-8 and windows-1255 (Visual and Logical Hebrew)
  • TIS-620 (Thai)

Ajilius will transcode these file types to the encoding of the target DBMS, assuming that the target encoding is capable of supporting the characters in the input file.

Ajilius. We can read your writing.

New feature: Separate extract / load

We’ve now separated Extract and Load processes in Ajilius.

Previous versions coupled the extract and load processes for a table. To process the LOAD_PRODUCT table, for example, we would connect to the source database, extract the required rows, and load those rows to the warehouse, all within the same process.

We faced a situation recently where the largest extract for a warehouse came from a system that finished its end-of-day processing four hours ahead of the window allocated for the data warehouse load. The extract from this system was the longest-running task in the ELT process.

By separating the extract and load processes, we are now able to schedule the extract from this system (and others) to complete as early as possible, with the load to the warehouse occurring at a later time.

The data warehouse load window is made significantly smaller, giving the operations team more headroom if any errors occur during end-of-day processing in upstream systems.

You can, of course, continue to run extract and load processes at the same time if you prefer.

Ajilius. Making ELT fast and flexible.

New feature: Warehouse Script All

We’ve just released a new feature for Ajilius, a one-click generation of all scripts for a warehouse.

In previous releases you ran each script manually. This was a little tedious if you wanted to run an entire load process from within the Ajilius application.

Now, you can select the Script All option from the Warehouse menu, and a full set of ELT scripts will be generated from the session warehouse metadata.

You will be shown a script screen with the full warehouse DDL in the left pane, and a script to run all ELT processes in the right pane. Use the Create and Run buttons to perform an end-to-end build of your data warehouse.

Ajilius. Eliminating repetition in Data Warehouse Automation.

New competitor: BI Builders

A new week, and a new competitor. Norwegian company BI builders has popped into view with a very pretty looking dimensional warehouse solution for SQL Server users.

It is a desktop solution, generating SSIS, which puts it in the same category as Dimodelo, TimeXtender and (perhaps) WhereScape. LeapfrogBI is a little different, being a web-based solution.

This is becoming a very competitive section of the market!

New feature: load streams

Before today, loads to an Ajilius data warehouse were single streamed. That is, one process extracted and loaded one table. You could load hundreds of thousands of rows per second, you could load different tables in parallel, but if you had one very large table there was nothing you could do to make it faster.

We’ve just added a feature called “Load Streams”, that enables you to parallel load large tables. This is specifically designed for MPP target databases, but may also speed up SMP operations.

The Load Table screen now has an option to select a number of streams that will be loaded in parallel. This is a number between 1, and a maximum set for the target platform. On MPP platforms, this number is the number of nodes, while SMP platforms typically benefit from a number between 4 and 8.

On extract, data will be split into the number of streams you defined. Depending on the platform, the extract may be split in a round-robin fashion or by the column/s you have defined as a distribution key on selected MPP platforms.

Because there is overhead in splitting the data, and in handling concurrent writes in the DBMS, the improvement is not linear. For SMP platforms, large tables on SSD storage can be orders of magnitude faster than single threaded loads, while MPP platforms can approach the fraction represented by the number of streams.

Ajilius. We work harder to make your loads faster.

Analytics tools: Too clever?

I’ve recently been trialling a mix of analytics tools, including PowerBI, Qlik, Microstrategy, Tableau and others. One “feature” that has been annoying me is their attempt to automatically form relationships between tables, usually by matching column names. Bad idea.

I’ve had to spend far too long correcting models that have automatically matched the Product, Customer, Location, dimensions, simply because they had ID as the name of their surrogate key.

When I have a table named Customer, a suitable name for a surrogate key is ID. Not CustomerID. It is the Identity of the Customer, not the CustomerIdentity of the Customer.

When I reference that table from another, I might use role names such as PurchasingCustomerID or ReferringCustomerID. Role, table, column. Only in the most trivial implementations might I refer to it as simply CustomerID.

Further, there is no requirement that columns be uniquely named in a schema, nor that they follow any specific pattern. Especially when dealing with data models from legacy business systems, the practice of automatic matching is counter-productive.

Automatic matching is particularly wrong in dimensional modelling, where the use of surrogate keys is standard practice. One of the properties of a surrogate key is that it is a meaningless identifier. From a business perspective, my view of the CustomerID might be “BIGSTORE001”, and the surrogate value “3498” has no value at all.

I suggest that if you can’t determine a relationship from the existence of a foreign key, then you should probably stop trying. Build a flexible relationship editor and skip the automatic matching, as I’m going to have to spend more time finding and fixing tool errors than building it properly in the first place.

It causes me to wonder how many complex analytical models have errors introduced by this practice?

Automatic relationship management may be a feature which demonstrates well, based on carefully selected sample data, but which fails in the real world. I’d like to see it disappear.

New competitor: Optimal BI

It is great to see new entrants, bringing new approaches, to the data warehouse automation market. The days when one or two players had the market to themselves are drawing to a close …

This time it is Optimal BI, from New Zealand, with a product named Optimal Data Engine (ODE). It is a data vault product, evolving from consulting assignments, but not much more detail available at this time.

Follow their blog for new developments. http://optimalbi.com/blog/2015/07/03/ode-the-start-of-a-journey/

The old dinosaurs better get started on evolution!