Persisting multiple fact-tables for faster Processing
When one of our clients decided to get a PDW, the newly gained power we got for providing our cubes with data was one of the main reasons. Our Business Warehouse is based on a regular star schema with about 110 dimension and about 70 fact tables, whereby about 80% of the dimensions and tables are used in 2 cubes with several measure groups.
In comparison two probably most other Warehouses and Cubes, our approach is, to provide all the data integrated in one cube with as many analytical options possible. Since most of our fact information is based on a „Contract ID“ pretty much all of the fact tables can be joined with each other. And this is exactly what we do. Instead of having only a couple of huge fact tables we need to update, the facts a strictly separated by the business requirement. For example contract information, client information, usage information, gross margin etc. Our largest measure group (288 partitions) requires 12 fact tables to be joined together, to provide all the necessary data. To get all the data for 6 years, stored in monthly partitions, we need around 6hours. The whole cube needs around 30h to be fully processed, which doesn’t leave much time for other cubes and regular ETL Jobs. So a solution was needed.
After several thoughts we decided to use the PDW power with CTAS and persist all the necessary data, so only straight select statements are needed. After several tests with various combination and query hints the scenario was perfect. So lets compare:
- Processing just data for the partitions takes about 6h.
- Creating a CTAS from the processing query with Workload Management Group XL and monthly partitions, takes about 4h 15m.
- Updating this table with another CTAS and renaming it to fit the old requirement takes about 2h 30m.
- But the greatest part is, that processing the data got dropped to around 1h and 45m
For our daily business we implemented a process where only the newest partition is switched with the data, taking only a few minutes. <
- Creating a temp table with the same structure as the persisted table
- Loading the actual month with the processing query and CTAS in a temp table
- Switch out the actual month from the persisted fact table into the created temp table
- Switch the newly created data back into the persisted table
All of the tasks can be automated so there is nothing left to be done.
- Processing query with the necessary parameter is stored in a control table
- In the depending job controllers, an additional procedure was implemented to replace the actual partition with the newly generated data
- Monthly rebuilding is also automated based on a control table
Even though you store quite a lot of data multiple times, the benefit of using APS power for prebuilding the measure groups and just selecting the data is just great and well worth a try.