After outlining the conventional methods for storing historical data in the first post of this blog series, I would like to introduce a more effective approach to partitioning a historical table in this second part.
Table of Contents
The key here is to avoid a significant increase in table size. Let’s call this approach a partitioned interval table. The algorithm is very simple:
Logical table structure stays the same as for interval solution
The table is divided into monthly partitions by VALID_FROM
Values from the last day of the months are copied each time a next month will have begun and with VALID_FROM being equal to the first day of the new month.
Let's see how it works by example. For the first month, there is no difference, as compared to an interval table:
But as soon as next month begins, we should create a new record and "close" the previous one for ech account, no matter if the attributes werre updated (33333333333) or not (1111111111) and (22222222222):
Further updates in the Partition II are performed again like in normal interval table, until next partition is created.
Obviously, this solution requires redundant rows and it's usually not the thing you do when you are optimizing performance; but there is an enormous advantage to be had as well: we can be sure that all rows needed to select balances on the given date have VALID_FROM date in the same month as given date. Accordingly, in order to select balances we can use query like this:
Transitivity rule allows Oracle to realize that the VALID_FROM date should lie between 01.06.2016 and 15.06.2016, which means that only Partition II should be read to satisfy the query. In general, just one partition instead of the whole table should be scanned, therefore our metrics could be calculated with following formulas:
TRC = CA * (FC + 1 - FC/30) * HD * 12
RSR = CA * (FC + 1 - FC/30)
And in figures:
Indeed, the effectiveness depends on the frequency of changes of historized attributes. The following table compares aforementioned measures based on the frequency of changes: Conclusively, the partitioned interval table combines the advantages of previous two methods: it requires a slightly bigger table than a normal interval one, but offers selection's performance comparable with a snapshot table. Additionally, it is relatively easy to adjust filters in reports in order to use advantages of partitioning.
As you can see, Partitioning Interval Tables are the most effective for frequency of changes from once per two month till 5 times per month. In case of less frequent changes, you can try to use bigger partitions (the same logic for quarterly or yearly partitions in addition to monthly ones). If number of changes is greater than 5, you can try weekly partitions; and if it nears 20, then the snapshot will be the most effective solutions.
It is essentially about how partitioned interval tables could work in general, but "the devil is in the details." Therefore, important nuances must be considered, which will be presented in the next blog post of this three-part series.
Who is b.telligent?
Do you want to replace the IoT core with a multi-cloud solution and utilise the benefits of other IoT services from Azure or Amazon Web Services? Then get in touch with us and we will support you in the implementation with our expertise and the b.telligent partner network.
Exasol is a leading manufacturer of analytical database systems. Its core product is a high-performance, in-memory, parallel processing software specifically designed for the rapid analysis of data. It normally processes SQL statements sequentially in an SQL script. But how can you execute several statements simultaneously? Using the simple script contained in this blog post, we show you how.
Many companies with SAP source systems are familiar with this challenge: They want to integrate their data into an Azure data lake in order to process them there with data from other source systems and applications for reporting and advanced analytics. The new SAP notice on use of the SAP ODP framework has also raised questions among b.telligent's customers. This blog post presents three good approaches to data integration (into Microsoft's Azure cloud) which we recommend at b.telligent and which are supported by SAP.
First of all, let us summarize the customers' requirements. In most cases, enterprises want to integrate their SAP data into a data lake in order to process them further in big-data scenarios and for advanced analytics (usually also in combination with data from other source systems).
As part of their current modernization and digitization initiatives, many companies are deciding to move their data warehouse (DWH) or data platform to the cloud. This article discusses from a technical/organizational perspective which aspects areof particularly important for this and which strategies help to minimize anyrisks. Migration should not be seen as a purely technical exercise. "Soft" factors and business use-cases have a much higher impact.