We recently had a situation where our load speeds were reported as being much slower than a competitor. This surprised me, because I knew that our loader could saturate the network from the source server, and I wondered how our competitor could be faster.
Luckily, the evaluator liked Ajilius, and did a little digging on our behalf. It turned out that the culprit was not our performance, but the competitor’s measurement technique.
When we load data into the warehouse from a source database, there are basically four steps that we need to perform:
The Query step is where we execute a query on the remote data source, such as “select c1,c2,c3 from t1 where c1 > 9467843”. The extract step is where we transfer the results of that query to the loader. The Load step moves those rows into the warehouse. Finally, the Commit step commits the load transaction/s. Depending on the source and warehouse, Ajilius may overlap one or more of those steps.
When we measure load performance we put a timer call before the Query, and again after the Commit. The elapsed time is the total time taken to extract and load the required data from the source system to the warehouse. This represents real-world performance, the type you need to measure if batch windows are important to you.
Our competitor had a different view of the world. Their measurement of performance was to take the time immediately before the Load step, and immediately after it. They claimed that this was a measurement of “load” performance. I guess they’re technically correct, but knowing just that one part of the job doesn’t help you to assess performance against real-world requirements.
When the customer repeated the tests, this time measuring the elapsed time for the whole job, the results were virtually neck and neck. I’m not surprised because, as I said earlier, I knew we were capable of saturating the relatively slow network in the customer’s development lab.
Ajilius performance tests? Always welcome.