Enhancement: Log to Data Warehouse

This week’s Release 2.2.12 includes a feature to write ELT log entries to the data warehouse.

Previous releases logged ELT performance to a log file. This file could be named through the Warehouse Edit screen, and defaulted to ajilius.log.

Now, in addition to the log file, a table AJILIUS_LOG is created in the AJILIUS schema of the data warehouse. On job completion, as long as a data warehouse connection is available, the results of the job will be inserted to this table.

The logged columns are:

  • log_stamp / Timestamp
  • log_script / Script name
  • log_elapsed / Elapsed time in seconds
  • log_status / Job status, 0=success
  • log_files / Number of files processed during job
  • log_inserts / Number of rows inserted during job
  • log_updates / Number of rows updated during job
  • log_deletes / Number of rows deleted during job
  • log_message / Descriptive error message if log_status<>0

Ajilius. Keeping track of performance.

 

New feature: Custom Query Loads

The normal workflow to define an extract/load job in Ajilius is to browse the source system for the required table or view, then load the metadata for that table, and finally execute the ELT job created from the metadata.

Complex load requirements can occasionally be represented as views in the source system, but often it is not possible to create a custom view in a source system. It might be a hosted application where you don’t have the ability to create new objects, or you might simply lack the authorisation to create a view.

In this circumstance, you might like to use our newest feature, Custom Query Loads.

http://ajilius.com/custom-query-loads/

Ajilius. The features keep coming.

 

Ajilius loves Snowflake

Snowflake HeartAlong with Amazon Redshift and Azure SQL Data Warehouse, Ajilius does cloud data warehouse automation for Snowflake Elastic Data Warehouse.

We don’t just support Snowflake, in a short space of time it has become a favourite cloud data warehouse platform. We had a great time working with the Snowflake team during the development of their Python adapter.

Here are some of the things that we love about Snowflake.

Price

Our Snowflake development and demonstration platform cost around AUD300 per month, on a lousy exchange rate. We pay a monthly storage cost, then pay for just the processing we need, when we need it.

For the smallest development machines Redshift may be slightly cheaper, but once you scale to multi-terabyte production workloads the advantage shifts to Snowflake unless you are prepared to commit to three-year reserved instances.

Given the rate of change in the cloud data warehouse market, we believe that long-term commitments are not in the interests of most customers, and Snowflake has a price/performance advantage.

Microsoft Azure SQL Data Warehouse is still in preview, and we can’t comment on comparative pricing at this time.

Scale

Scaling our Snowflake platform takes just seconds. In comparison, we’ve seen cluster resizing on Amazon take many minutes, and we’ve seen it take longer in customer sites.

Snowflake instances can scale from 1 to 128 8-core nodes. That is a huge amount of compute power, making Snowflake suitable for workloads of any size. At the lower end, we see Snowflake as an ideal platform for mid-market customers as its entry point and pricing model is so flexible.

We do a lot of Ajilius work without incurring any processing costs. This is because DDL operations are performed on the database, not the warehouse (see Features), and we don’t need to start a warehouse until we start actually loading, selecting or modifying data. The majority of our development and test work is done on a single node, with occasional scaling for performance tests.

Support

I’ve never had better support from a data warehouse company, especially when we were not known to the vendor, and not spending huge amounts of money. From sales, to pre-sales, to support, and even right into engineering, we’ve had amazing engagement from every level of the company.

Snowflake people respond to emails, pick up the phone, and respond to support requests with speed. We’ve never waited more than a couple of hours for a response to an issue, and that response has always been highly relevant, never of the type “have you unplugged and plugged in the keyboard” variety.

The Snowflake team is knowledgeable, enthusiastic, and committed to success.

Features

One intriguing feature of Snowflake is its avoidance of distribution keys, partitions, etc., in the database. This avoids one of the big design challenges present in both Redshift and Azure, where the wrong distribution method can really damage your performance. One day I’ll have a beer with Snowflake’s designers and figure out how this works, but for now, all I know is that it works well.

Better described as “quirky” is Snowflake’s terminology of database and warehouse. A “database” is a collection of schemas and data. A “warehouse” is the compute configuration that works on databases, to me they’d have been better off using a name like “server”. A powerful feature is the ability for multiple “warehouses” to act on a “database”, with different configuration settings. For example, an ETL warehouse might use very high scale to compress the ELT time, while a Browse warehouse might run for a long time at low scale for refreshing data used by BI tools like Tableau and Qlik Sense.

Another feature we love is the Snowflake administrative console, where we can not only administer databases, warehouses and users; but also review performance history and execute ad-hoc queries. The user interface for the console is a work of art, it is the first cloud-based data warehouse where I’ve not felt the need to find another administration tool.

What’s Missing?

Not much.

All the basic data types are there, all the basic SQL statements are there, you get JDBC, ODBC and Python interfaces, and the documentation is a work of art. There could be a few more examples in documentation for some of the more obscure features of the product, but it is being updated on a frequent basis.

Regarding data types, I’ve always been puzzled why data warehouse vendors avoid geospatial data. After all, map-based data is a major feature of the current generation of visualisation tools, but it is lacking from most cloud data warehouse platforms. I’d like to think Snowflake will get around to this feature soon.

If I was being picky, I’d also call out the absence of a TIME data type. We work around it by the use of date/time functions to extract time portions of timestamp fields into text fields, but a native TIME type would be helpful.

The only real pain we experience is that Snowflake is currently restricted to Amazon US data centers. That has no impact on warehouse performance, but our connection times and data transfer rates are a little slower than I’d like. We can co-locate Ajilius instances in their AWS data centre for fast Snowflake connections, but if your data is on-premise in Australia, you’re going to incur a penalty if you’re moving terabytes into Snowflake. I’m assured that data centers in other parts of the world are on their way.

Recommendation

Try it. You’ll like it.

Ajilius makes it easy to build Snowflake data warehouses from your on-premise and cloud data. Let us know if you’d like a deeper discussion of Snowflake and Ajilius.

One year of Ajilius

birthdayAjilius is now one year old.

Just over 12 months ago, we announced a new data warehouse automation platform, designed for a modern data warehouse workload.

We delivered all our objectives for complete Kimball support, on-premise and cloud, three click migration between databases, and full cross-platform portability.

We published a four-release roadmap for V1, and we’ve met its quarterly delivery schedules.

We’re not fully profitable, but we now have enough customers that we’re sustainable and growing.

This year we’re embarking on an ambitious roadmap for V2, with a focus on Data Quality, Data Profiling and Data Discovery. Again, we have defined a quarterly release schedule, and V2.1 will be delivered in March (we’re in beta already).

We’re also stepping up our marketing this year, with our first conference attendance being pgDayAsia, in Singapore, from March 17-19. We’ll be speaking on Data Warehousing and PostgreSQL, as well as using the occasion to showcase Ajilius 2.1.

Here’s to a great 2016 in data warehousing!

IBM DB2: Back to DSN

The devs told me that getting DB2 and Informix drivers to work took a bit of fiddling. That was the understatement of 2015. The driver setup experience is so bad that we can’t include it in the Ajilius installer.

On every platform we needed to manually copy files around, adjust environment variables, sometimes patch libraries, and often it just didn’t work. We tried the Python ibm_db adapter, we tried IBM’s ODBC / CLI adapter, and experienced nothing but pain.

As a result, we’re dumping the work we’ve done on native adapters, and reverting to the use of ODBC DSNs for data sourcing against DB2 and Informix.

To use DB2 as an Ajilius data source, have your IT department deploy an appropriate ODBC connection on your Ajilius server, then create a DSN to your data source. Select “ODBC DSN” to create your data source in Ajilius, then enter the DSN name in the Database field.

It is sad that IBM’s quality has sunk so far. I started my IT career on IBM System/34 computers well over 30 years ago, and at various times worked on System/38 and AS/400. I used one of the first DB2 mainframe installations in Australia, followed by the first OS2/EE implementation of DB2, before using DB2 on Windows as the core of a successful ISV product. Later, I ran a DBA team that included DB2 on mainframe, Linux and Windows in its portfolio.

You couldn’t call me a DB2 hater with that background, but the current connectivity options are rubbish.

DB2 is not a bad DBMS, but what good is a DBMS  without great connectivity? I’d struggle to recommend it to anyone based on my most recent exposure and I’d definitely not recommend it to any Python developer.

IBM Bluemix

During the past week we have been testing the new data source adapters for DB2 and Informix. This time we’re using IBM’s Bluemix cloud to host our test databases.

The initial experience with Bluemix is awful. A bizarre labyrinth of errors about missing spaces and empty containers, all solved when you finally realise that the service you want to provision is only available in some regions, and your default is not one of them.

Depending on the region you have chosen, there are many supported databases including variants of Informix, DB2 and Netezza, as well as a variety of open source, big data and NoSQL products.

Once you’re up and running, the actual database experience is quite good. I like the data load feature, which quickly helps you to move test data into the database. The help around connectivity – CLI, ODBC and JDBC – is also good, with all the connection information clearly presented for each of the options.

The free database allowance for SQL DB (the old DB2 LUW) is quite generous, enough for us to complete all our testing. If you want more than the free tier supports, though, the next step jumps from zero to $500 per month. That is expensive compared to Azure SQL and AWS RDS databases.

dashDb (Netezza) and Time Series (Informix) start at around $55 per month, which is reasonable value compared to other vendors.

Our testing is focussed on connectivity and extracts, not in-database performance, so we can’t comment on how well the platform scales.

Well, time to get the testing finished, as this is the last task between us and Version 1.4.

Adwords puzzle

For 10 months we’ve been testing combinations of Google Adwords keywords to drive traffic to Ajilius. Over that time we’ve got a pretty good idea of which keywords work.

As we planned a new campaign for the New Year, we decided to turn off Google advertising for a couple of weeks, to reset our no-advertising baseline.

The expected result was that traffic would go down, and it has, but not quite as much as we expected. That suggests search results rather than advertising may be driving a higher volume of traffic than we thought.

What was unexpected was the change to Google search results.

When running an Adwords campaign, if we searched for one of our terms such as “data warehouse automation” we were usually on result page 2 or 3, with many competitor ads above us.

Now that we’re not running a campaign we’re in roughly the same search position, but there are far fewer competitor advertisements showing above us.

Could it be that Google artificially inserts advertisements above yours, pressuring you to increase your budget to move you up the page?

That theory smells of tin-foil hats, but it is something for us to watch over the next few weeks.

Price changes

Working with our first customers and partners has exposed some issues with our initial pricing strategy.

  1. Member Edition (our free version) customers required more support hours than our paid version.
  2. The difference between Member and Subscriber edition was ambiguous when considering licensed, supported versions of Open Source databases.
  3. Subscriber Edition had insufficient margin to make it attractive to resellers.
  4. Both Subscriber and Sponsor Editions were perceived as “too cheap” by their target customers.

Accordingly, we’ve revised the pricing for Ajilius to take these issues into account.

From May 1, we will remove the Member Edition from our price list. Existing customers will continue their right to support, and access to all new versions, free of payment.

A new Evaluation Edition will be introduced. This will be a full-featured version of Ajilius, but with a 30-day time limit.

The annual licence for Subscriber Edition will be increased to USD 5,000pa, and Sponsor Edition will be increased to USD 50,000pa.

Remember, Ajilius is site licensed. That means you have the right to use Ajilius on any number of servers, by any number of developers, creating any number of data warehouses, on any number of data warehouse platforms.

Even with our price increase, you’ll still save tens of thousands of dollars over competing platforms, and still get to 100% ROI in a matter of days.

Ajilius. Committed to business value.

Be a DW hero with PostgreSQL

PostgreSQL and Ajilius can make you a DW Hero.

Lots of organisations don’t know about PostgreSQL, or are afraid that it might not perform. Here are eight simple steps to prove the value of PostgreSQL for data warehousing.

(1) Develop on SQL Server

Your IT management has probably asked you to deliver your data warehouse on SQL Server. There is nothing technically wrong with that decision, but it is going to cost a lot of money if you take it into production. Never mind, let’s humour them, go ahead and implement your development environment. We emphasise development, because using MSDN or SQL Server Developer Edition licences will have a negligible cost.

(2) Develop using Ajilius

Design and build your data warehouse using Ajilius. Use the power of data warehouse automation to generate a fully scripted, high performance data warehouse. Get all your user and technical documentation, and start testing user queries and reports. You’ll save hundreds of hours of development time using Ajilius, and eliminate the risks of bad ETL.

(3) Set up a PostgreSQL server

Download PostgreSQL and set up a separate development server, running on your choice of operating system. Create an empty database, and record the server name, database name, user-id and password.

(4) Clone your Metadata Repository

Go to the Warehouse List screen, and select the Clone Warehouse option. An independent copy of the metadata repository will be created under your chosen name.

(5) Change your Warehouse target

Go to your Warehouse List screen, and select the Change Warehouse option. Now select PostgreSQL as your target type, and enter the server details that you recorded in step (3).

(6) Generate Scripts

Go to your Warehouse List screen, and select the Generate Scripts option. Check each of the Create, Update, Schedule and Migrate script options, and enter the directory where scripts are to be written.

(7) Deploy

You now have a 100% compatible version of your original data warehouse, running on PostgreSQL. Because you used Ajilius to build your data warehouse, you are guaranteed portability across data warehouse platforms. All your extracts and loads have been fully replicated, your data has been migrated, and you can repeat your test queries and reports.

(8) Profit!

Here is where you become a DW Hero.

Demonstrate to your project stakeholders their data warehouse running on PostgreSQL. Discuss the cost savings that will come from running on this platform. Show that you’ve already done the migration, in just minutes of work and a few hours of processing time. Imagine proving that you can save hundreds of thousands of dollars in production licensing, for no cost to your organisation.

That’s HEROIC!

We’re the only data warehouse automation company that fully supports PostgreSQL data warehouses as a first-class citizen, and we’ve done so from the very first lines of code we wrote.

Read more about Ajilius, and the power of the PostgreSQL Data Warehouse.

PostgreSQL data sources

PostgreSQL can be a data source for any data warehouse built using Ajilius.

If you are using PostgreSQL as your DBMS for ERP (ie, xTuple or Odoo), HR (Advantec, perhaps) or CRM (such as OroCRM, Bitnami or Tryton), then Ajilius can easily bring data from these systems into your data warehouse.

We offer two sourcing mechanisms for PostgreSQL data, optimised depending on the target platform.

Low latency applications, such as near real-time retail transaction warehouses, are best supported by direct connections from the target database to the source. We use Foreign Data Wrappers from a PostgreSQL warehouse, or the PGNP OLEDB connection for Linked Servers from Microsoft SQL Server.

Where latency is not an issue, it can sometimes be faster to use bulk copy processes to extract and load via disk files. In this case we use the COPY bulk export utility.

Extracts from PostgreSQL are also required for cloud-hosted warehouses, such as Redshift, drawing data from on-premise systems. Ajilius is perfectly able to handle this requirement due to its scripted nature.

Building data warehouses from PostgreSQL data sources just got a lot easier.