The Effective Use of Partition Pruning for the Optimisation of Retrieval Speed (Part 2)

The Effective Use of Partition Pruning for the Optimisation of Retrieval Speed (Part 2)

After outlining the conventional methods for storing historical data in the first post of this blog series, I would like to introduce a more effective approach to partitioning a historical table in this second part.

Table of Contents

The key here is to avoid a significant increase in table size. Let’s call this approach a partitioned interval table. The algorithm is very simple:

  1. Logical table structure stays the same as for interval solution
  2. The table is divided into monthly partitions by VALID_FROM
  3. Values from the last day of the months are copied each time a next month will have begun and with VALID_FROM being equal to the first day of the new month.

Let's see how it works by example. For the first month, there is no difference, as compared to an interval table:

partition-1

But as soon as next month begins, we should create a new record and "close" the previous one for ech account, no matter if the attributes werre updated (33333333333) or not (1111111111) and (22222222222):

partition-2

Further updates in the Partition II are performed again like in normal interval table, until next partition is created.

partition-3

Obviously, this solution requires redundant rows and it's usually not the thing you do when you are optimizing performance; but there is an enormous advantage to be had as well: we can be sure that all rows needed to select balances on the given date have VALID_FROM date in the same month as given date. Accordingly, in order to select balances we can use query like this:

abfrage-1

Transitivity rule allows Oracle to realize that the VALID_FROM date should lie between 01.06.2016 and 15.06.2016, which means that only Partition II should be read to satisfy the query. In general, just one partition instead of the whole table should be scanned, therefore our metrics could be calculated with following formulas:

TRC = CA * (FC + 1 - FC/30) * HD * 12

RSR = CA * (FC + 1 - FC/30)

And in figures:

partition-intervals

Indeed, the effectiveness depends on the frequency of changes of historized attributes. The following table compares aforementioned measures based on the frequency of changes:
Conclusively, the partitioned interval table combines the advantages of previous two methods: it requires a slightly bigger table than a normal interval one, but offers selection's performance comparable with a snapshot table. Additionally, it is relatively easy to adjust filters in reports in order to use advantages of partitioning.

tabelle

As you can see, Partitioning Interval Tables are the most effective for frequency of changes from once per two month till 5 times per month. In case of less frequent changes, you can try to use bigger partitions (the same logic for quarterly or yearly partitions in addition to monthly ones). If number of changes is greater than 5, you can try weekly partitions; and if it nears 20, then the snapshot will be the most effective solutions.

It is essentially about how partitioned interval tables could work in general, but "the devil is in the details." Therefore, important nuances must be considered, which will be presented in the next blog post of this three-part series.

Want To Learn More? Contact Us!

Helene Fuchs

Your contact person

Helene Fuchs

Domain Lead Data Platform & Data Management

Pia Ehrnlechner

Your contact person

Pia Ehrnlechner

Domain Lead Data Platform & Data Management

Related Posts

chevron left icon
Previous post
Next post
chevron right icon

No previous post

No next post