Why star schema?

There are currently three main schools of thought around building a data warehouse:

  • 3NF
  • Data Vault
  • Star Schema

We’ll leave it to Wikipedia to describe the main principles of each method.

We designed Ajilius to deliver star schema data warehouses for one reason: deliver value faster.

Over the years there have been many studies of the rates of failure in data warehouse projects. Gartner once reported that more than 50% of DW projects fail. Cutter put the figure at 41%. Bill Inmon wrote that it might go as high as 70-80%. Conning, studying the US Insurance industry, speculated that the rate of failure might be as high as 90%.

This terrible rate of failure is due to just one issue: the business owners of the data warehouse did not get timely value. Projects took years before delivering data. New data took so long to be loaded that the need for it had passed. The effort to acquire new data, and restructure existing data, were was too long and too expensive.

So, which data warehouse method best addresses this problem?

3NF might deliver quickly, but it has a track record of building ‘enterprise’ data warehouses. When we see the word ‘enterprise’ on a software project, we know three things – 1) it is expensive, 2) it will take a long time, and 3) it will probably never achieve its goals. The problem is largely one of scope and architecture. In attempting to build the one, true, repository of everything, the project quickly gets bogged down in analysis and design. It is not uncommon to see 3NF projects staffed by teams of architects and data modellers for months and years, often behind closed doors, with the only users of the warehouse being its overworked ETL coders.

We’ve never worked on a Data Vault project, so what we’ll give are our impressions from reading about it online. Firstly, there’s too much of a smell of snake oil around it. Any time we see experienced DW practitioners being told “Of course you don’t understand, you haven’t done the training”, it looks more like Scientology than Data Warehousing. That prejudices us against it. Secondly, we understand that end users are not supposed to directly access the Data Vault warehouse, but use a separate 3NF or star schema layer instead. We’d like to study this one day, because the Data Vault data structures don’t look like they make it easy to actually build 3NF or star schemas on the front end, we suspect some very complex joins and filters are required in the transformation.

Star schemas have, to our minds, three main advantages over 3NF and Data Vault warehouses:

  1. There is a clearly documented and widely available body of knowledge around designing and building star schema databases. There are books (see our Bookshelf for some favourites), tutorials and courses available from a wide range of sources; and skilled practitioners for most data warehouse platforms.
  2. Star schema data structures are a great fit for modern analytics and visualisation products. That means the design, construction and use of the data warehouse, by both developer and user, fit within the same conceptual framework. Also, star schemas have proven methods for dealing with issues like grain, history and aggregation; and for exposing the solutions in an end-user friendly manner
  3. Star schemas are a great fit for agile data warehouse methodologies. We see this as the key to overcoming the terrible failure rates described above, as agile projects engage the end-users in the process, and deliver usable iterations faster. With methodologies like BEAM, supported by tools like Ajilius, projects can been delivering usable data to end-users in days, not the months and years of other approaches.

We chose to base Ajilius on star schemas because we want to deliver business value in the fastest way possible. We want end users to be analysing and visualising our data in hours and days, not months and years.

Delivering value, faster, is the key to data warehouse success.