Be a DW hero with PostgreSQL

PostgreSQL and Ajilius can make you a DW Hero.

Lots of organisations don’t know about PostgreSQL, or are afraid that it might not perform. Here are eight simple steps to prove the value of PostgreSQL for data warehousing.

(1) Develop on SQL Server

Your IT management has probably asked you to deliver your data warehouse on SQL Server. There is nothing technically wrong with that decision, but it is going to cost a lot of money if you take it into production. Never mind, let’s humour them, go ahead and implement your development environment. We emphasise development, because using MSDN or SQL Server Developer Edition licences will have a negligible cost.

(2) Develop using Ajilius

Design and build your data warehouse using Ajilius. Use the power of data warehouse automation to generate a fully scripted, high performance data warehouse. Get all your user and technical documentation, and start testing user queries and reports. You’ll save hundreds of hours of development time using Ajilius, and eliminate the risks of bad ETL.

(3) Set up a PostgreSQL server

Download PostgreSQL and set up a separate development server, running on your choice of operating system. Create an empty database, and record the server name, database name, user-id and password.

(4) Clone your Metadata Repository

Go to the Warehouse List screen, and select the Clone Warehouse option. An independent copy of the metadata repository will be created under your chosen name.

(5) Change your Warehouse target

Go to your Warehouse List screen, and select the Change Warehouse option. Now select PostgreSQL as your target type, and enter the server details that you recorded in step (3).

(6) Generate Scripts

Go to your Warehouse List screen, and select the Generate Scripts option. Check each of the Create, Update, Schedule and Migrate script options, and enter the directory where scripts are to be written.

(7) Deploy

You now have a 100% compatible version of your original data warehouse, running on PostgreSQL. Because you used Ajilius to build your data warehouse, you are guaranteed portability across data warehouse platforms. All your extracts and loads have been fully replicated, your data has been migrated, and you can repeat your test queries and reports.

(8) Profit!

Here is where you become a DW Hero.

Demonstrate to your project stakeholders their data warehouse running on PostgreSQL. Discuss the cost savings that will come from running on this platform. Show that you’ve already done the migration, in just minutes of work and a few hours of processing time. Imagine proving that you can save hundreds of thousands of dollars in production licensing, for no cost to your organisation.

That’s HEROIC!

We’re the only data warehouse automation company that fully supports PostgreSQL data warehouses as a first-class citizen, and we’ve done so from the very first lines of code we wrote.

Read more about Ajilius, and the power of the PostgreSQL Data Warehouse.

Virtual loads and dimensions

Some data warehouse automation products do a good job on green-fields projects, but make it very difficult to integrate into an existing data warehouse architecture. Some don’t do it at all, while others require that you reverse engineer and rebuild the ETL / ELT processing before you can integrate new tables. We’re different.

Ajilius makes it easy to integrate with an existing data warehouse through Virtual Tables.

The purpose of a Virtual Table is to provide a mechanism to create Ajilius metadata over an existing table, without having to re-load or re-process its data. There are two types of Virtual Tables used in Ajilius – Virtual Loads and Virtual Dimensions.

Virtual Loads exist to define data that has already been loaded into the warehouse by other processes. This scenario is used where the table is to be processed in conjunction with data completely controlled by Ajilius, and may also be used in drip-feeding scenarios.

Virtual Dimensions enable you to integrate complete dimensions that have been created and maintained by external processes. If your existing warehouse has already built the perfect SCD2 dimension, there is no need to re-design and re-write that table just to fit it into our metadata structures.

Once defined, Virtual Tables work exactly the same as any other Ajilius load or dimension table.

Virtual Tables are a powerful concept when you are adding the power of Ajilius to an existing data warehouse environment.

PostgreSQL data sources

PostgreSQL can be a data source for any data warehouse built using Ajilius.

If you are using PostgreSQL as your DBMS for ERP (ie, xTuple or Odoo), HR (Advantec, perhaps) or CRM (such as OroCRM, Bitnami or Tryton), then Ajilius can easily bring data from these systems into your data warehouse.

We offer two sourcing mechanisms for PostgreSQL data, optimised depending on the target platform.

Low latency applications, such as near real-time retail transaction warehouses, are best supported by direct connections from the target database to the source. We use Foreign Data Wrappers from a PostgreSQL warehouse, or the PGNP OLEDB connection for Linked Servers from Microsoft SQL Server.

Where latency is not an issue, it can sometimes be faster to use bulk copy processes to extract and load via disk files. In this case we use the COPY bulk export utility.

Extracts from PostgreSQL are also required for cloud-hosted warehouses, such as Redshift, drawing data from on-premise systems. Ajilius is perfectly able to handle this requirement due to its scripted nature.

Building data warehouses from PostgreSQL data sources just got a lot easier.

PostgreSQL and Ajilius

PostgreSQL is a great option for dimensional data warehouses. Using Ajilius to build and deploy your PostgreSQL data warehouse helps you to get the most out of this exciting platform.

The biggest advantage of PostgreSQL is that it is free. At a time when competing databases cost tens of thousands of dollars per core, and typically require 8 or more cores in production, PostgreSQL could cut more than $100,000 from your software budget.

Over the next week, we will publish a series of posts that describe how you can use Ajilius and PostgreSQL to maximum effect, wrapping up with some advice that could make you a DW Hero in your organisation.

Stay tuned!

Column compromises

Sometimes you need to trade off one feature against another. A good example is that of column names, where we trade off size against portability.

Column names, more generally described as identifiers, have their permitted length and format vary by DBMS. Lengths range from Oracle’s 30 characters, to 63 characters in PostgreSQL, and 128 characters in SQL Server, Impala and Vertica.

An early design decision for Ajilius was that Oracle was not an important target platform for us. In fact, across the data warehouse automation market, we are not aware of any vendor that prioritises Oracle, nor has a significant volume of customers on that platform.

PostgreSQL, however, is a very important platform for Ajilius, and the 63 character limit in column names became our maximum. Should a source system column name exceed 63 characters, it will be truncated to this limit.

When truncating column names, or when merging tables during the staging process, it is possible that duplicate column names may exist in the source metadata. For example, if you were building a combined table from ‘order’ and ‘order_line’ tables, it is probably that an attribute named ‘order_id’ would exist in both of them. Our strategy for dealing with duplicate names is to add a unique suffix of the format ‘_nn’, where ‘nn’ is an incrementing number, to make the column name unique. We would probably then edit the metadata for the table to delete the column flagged as a duplicate.

When considering identifier names, we also had to take into account the issue of reserved words. When we import metadata from a data source it may have column names that are reserved words in the target DBMS. There are two strategies for dealing with reserved words in Ajilius: prefixing and delimiting.

Ajilius maintains a combined dictionary of reserved words for all target databases. When importing metadata from a source, we prefix the string ‘reserved_’ to any metadata column name which matches a reserved word. This makes it easy to spot conflicting column names, and choose better alternatives when editing the metadata. There are over 640 reserved words across our target databases. The longest reserved word is ‘current_transform_group_for_type’, which is 32 characters, so prefixed names still fit comfortably within our limit.

Most data warehouse platforms support delimited identifiers. These typically use quotes, or character pairs like [], to mark the start and end of the identifier name. While this method works, we’re not great fans of its cluttering of the DDL and DML, nor forcing database users to write queries that include delimiters. That said, we do offer a user option to use delimited identifiers in the warehouse.

PostgreSQL had more influence on identifier names through its quirk of converting non-delimited identifiers to lower case. If you keep the default option to use non-delimited column names, we will force all identifiers to lower case. Source column names will have embedded spaces (and special characters) converted to an underscore. Camel case names such as ‘OrderShipDate’ will be automatically translated to ‘order_ship_date’.

By understanding the constraints of our target data warehouse platforms, we are able to keep our promise of data warehouse portability.

External metadata

The term ‘metadata’ describes the information that is used by Ajilius to create a data warehouse and its ETL / ELT code. Metadata is the descriptive information about the tables, columns, data sources and other features of our warehouse. When Ajilius creates a table, or a warehouse, or a batch update procedure, it reads the metadata and generates code suitable for the target platform.

Ajilius stores its metadata separate from the data warehouse. Because Ajilius supports multiple data warehouses, each warehouse has its own repository. We store that metadata in a database for each warehouse. A common repository stores information that Ajilius uses for all warehouses.

The main benefit of external metadata is that it makes portability simple. The data warehouse market is increasingly fluid, and the ability to take advantage of new platform offerings could be a big advantage. Hadoop, for example, is offering increasing capability at very low prices, and the ability to quickly switch between an RDBMS and Hadoop using Ajilius could save you hundreds of thousands of dollars.

If you’re interested in migration as a feature, make sure that you ask your DW Automation provider to demonstrate this capability against an existing warehouse. No-one comes close to our “three-click migration”.

Another benefit is that we don’t need to migrate metadata into production. The concept of separation between development and production isn’t seen as important by some of our competitors, but we’ve seen scenarios where production data warehouses are hosted on managed platforms to which the DW developers do not have deployment rights. Ajilius makes it easy to generate scripts which can be tested and scheduled by a production operations team, independent of the development platform.

Being a metadata-driven product isn’t enough in today’s data warehouse market. Unless that metadata helps you to migrate across platforms and integrate with your operations framework, you’ve bought just another ETL tool, not a modern data warehouse automation platform.

Publishing our roadmap

Some companies treat their product development roadmap like it was a state secret. Others don’t even have a roadmap, sometimes not even a vision beyond their first release.

We’re different. We believe you should be fully aware of our production direction and development objectives, so that you can make informed decisions about whether Ajilius is going to satisfy your future needs.

There are two things to keep in mind when reading our roadmap:

  • The roadmap represents our best thinking at a point in time. It is subject to change. The further out it gets in the release cycle, the more likely it is to change.
  • The roadmap is not a guarantee of delivery. If you are interested in a specific feature, please keep in touch with us to make sure it is on track.

We’ll update the roadmap from time to time, but we’ll keep it published on our Bookshelf page.

Respecting your privacy

We don’t use Yesware, MailChimp, or any other “tracking” software.

That means we don’t embed any hidden bugs in our email that spy on when you open our email, where, and what you did as a result. No hidden images, no virtual links, no identifiers.

We don’t put hidden content in web pages that will track your activity.

We don’t use snoop-ware like Marketo, or any other automated marketing product. Any communication you have with us will be as a result of a personal contact, and written by a real person, not an automated pretender.

We believe that if our product is of interest, you will make a further enquiry. If not, we’ll never pester you about that choice.

We treat you with respect.

Portable, scalable technology

The technology stack that we’ve chosen for Ajilius had to meet the following requirements:

  1. Same code base on Windows, Linux and OSX
  2. Cloud and on-premise deployment
  3. Easy to deploy and upgrade
  4. Unicode for international languages
  5. Wide range of supporting libraries
  6. Modern, browser-based user interfaces
  7. Agile development

C++ Windows Forms applications were definitely off the table.

Our first decision was a programming language. Based on what we knew, and the advice of people we trusted, it came down to a choice between C# and Python. We prefer the C# language to Python. Mono, and Microsoft’s opening of .NET, would have let C# meet most of our criteria, except for #3. We still don’t know how heavy the load of upgrades and patches to Linux and OS X platforms will be with .Net, but if it is like Windows we didn’t want to find out.

Python absolutely won on criteria like same code base (#1), cloud and on-premise (#2), easy to deploy and upgrade (#3), and wide range of supporting libraries (#5). Python was our choice, and we’re using Python 3.4, the current version.

The next decision was the user interface approach. We looked at a wide range of UI approaches, such as Dojo, jQuery and others, but they all suffered from errors during testing or poor performance at different times. Our decision was to avoid a Javascript front-end, and generate HTML. We chose to base our user interface on the Twitter Bootstrap model, as it gave us good looking, responsive user interfaces that were based on standard HTML5.

The final decision was the DBMS to use for our metadata repository. Keeping deployment and upgrading (#3) in mind, we didn’t want a heavy-weight DBMS to be installed and configured. Our concurrent user numbers are low, our transaction rates are low, and we made the decision to hold our metadata in SQLite3.

Since making these decisions we’ve encountered two common objections – Python is slow, and SQLite3 can’t support multiple users. Both statements are wrong.

If we were inverting very large matrices, or time-stretching minutes of audio in memory using FFTs, Python might not be our first choice of language. But in Ajilius, we do very little CPU processing, we’re mostly pushing data to and from the network and the database. We’re IO bound, not CPU bound. Python is more than adequate for this job.

We agree that SQLite3 is not designed for high volume concurrency, but it is definitely capable of supporting concurrent users with just one constraint – only one user may write to the database at a time. That doesn’t mean only one user is connected, just that the transactions of other users will wait until the writer has finished, and potentially time out in long wait scenarios. We’re keeping transactions short, we use optimistic locking strategies, and we just don’t have long waits. In our experience, typical implementations of data warehouse automation software have 2-10 users. We’ve tested Ajilius with up to 20 simulated users of the one repository, and it has worked just fine, and we’ve got higher volume tests scheduled for later in the development cycle.

So there it is. A modern software stack that lets us deliver data warehouse automation any where, with minimal dependencies and operational overheads for our customers.

The benefits of browser

Someone I used to respect once told me that you couldn’t build a data warehouse automation tool in a web browser. That person was living in a 20 year old bubble.

My epiphany on the capabilities of browser-based software came around 10 years ago, when I first saw an early version of Microsoft’s Outlook Web Access. It was a browser-based application that looked and worked almost exactly the same as Outlook. And now, look at the email, contact and calendar capabilities of Office365 – truly amazing user interfaces, delivered to any mainstream web browser.

Some people think that browsers can’t handle interaction with complex graphics. Have you seen Canva lately? Or Gliffy? Or the online versions of PowerPoint and Keynote? They’re doing graphics online that are as good (or better) as I was doing with desktop-based Visio, SmartDraw and Publisher just a few years ago.

Browsers have more benefits for a product like Ajilius:

Any device
Today’s web browsers run on desktops, notebooks, tablets and phones. On Windows, Linux and OS X, Android and iOS. The browser has become a universal user interface for all types of software. Responsive user interface design means that Ajilius adapts its layout to the constraints of the device, making it usable from large desktop display to pocket sized phone.

No client installs
Packaging, installing and maintaining desktop software is a major cost for most IT departments. Browser-based software, especially when designed to work with all major browsers, eliminates that cost and effort. Assuming the presence of a modern browser, deploying Ajilius in a corporate environment requires nothing installed or changed on end user devices.

Modern user interfaces
Using a browser makes it easy to support modern developments in user interface technology. We’re fully touch-driven, for example, which means you can use Ajilius just as effectively on an iPad as you can on a Windows desktop. We’re accessible, meaning that people with low vision or motor skill issues can still use our software. And we’re multi-lingual, because the browser lets us quickly deal with the user interface issues that come from supporting the world’s written languages.

The days when software was automatically desktop are over. The desktop as a device may not be dying, but software designed for the desktop is last century’s product.

Ajilius, being browser based, is a modern data warehouse automation product.