Before today, loads to an Ajilius data warehouse were single streamed. That is, one process extracted and loaded one table. You could load hundreds of thousands of rows per second, you could load different tables in parallel, but if you had one very large table there was nothing you could do to make it faster.

We’ve just added a feature called “Load Streams”, that enables you to parallel load large tables. This is specifically designed for MPP target databases, but may also speed up SMP operations.

The Load Table screen now has an option to select a number of streams that will be loaded in parallel. This is a number between 1, and a maximum set for the target platform. On MPP platforms, this number is the number of nodes, while SMP platforms typically benefit from a number between 4 and 8.

On extract, data will be split into the number of streams you defined. Depending on the platform, the extract may be split in a round-robin fashion or by the column/s you have defined as a distribution key on selected MPP platforms.

Because there is overhead in splitting the data, and in handling concurrent writes in the DBMS, the improvement is not linear. For SMP platforms, large tables on SSD storage can be orders of magnitude faster than single threaded loads, while MPP platforms can approach the fraction represented by the number of streams.

Ajilius. We work harder to make your loads faster.

Leave a Reply