Practical table partitioning on the Parallel Data Warehouse
This post is about table partitioning on the Parallel Data Warehouse (PDW). The topic itself is actually quite simple but there are some differences between the SMP SQL Server compared to the Parallel Data Warehouse.
On the SMP SQL Server table partitioning was important with large tables for two reasons:
- Query performance
- Workload management
For the SMP SQL Server, table partitioning allows queries and other table operations (like index rebuild) to be performed on multiple cores. Therefore table partitioning was done to improve query performance. However, the PDW architecture already stores larger tables (so called distributed tables) on each compute node by distributing it to multiple tables (so called distributions) on separate files in an optimal way for the available cores (currently 8 distributions per compute node). Therefore, when working on a PDW query performance usually isn’t the main reason for us to use table partitioning. But the second reason, workload management, still applies on the PDW. For example, when loading data it’s often useful to first load into a stage table, merge new and old data into a new partition and then switch that partition to the final table. So partitioning is still important on the PDW.
Creating a partitioned table on the PDW is a little bit easier compared to the SMP SQL Server as you don’t need (and don’t see) the partition schema or partition function. The following statement is an example of creating a partitioned table:
- CREATE TABLE [dbo].[MyTable1] (
- id int
- WITH (DISTRIBUTION = replicate, PARTITION (id range right FOR VALUES (10,20,30,40)));
In this query and throughout the remaining blog post, I’m only referring to a range right partition function. This is my preferred option as I think it’s more intuitive although both ways are almost identical and all partitioning is usually handled by automated tasks. So it isn’t really important. However, range right means that the partition boundary is in the same partition as the data to the right of the boundary (excluding the next boundary). So for a range right partition function, the left boundary is included while the right boundary is not, for example a partition with boundaries 10 and 20 contains data with values greater or equal to 10 and less than 20 (for integer values: 10, 11, 12, … 18, 19).
By specifying four boundaries in the create table statement from above, we have actually created five partitions as shown in the following table:
|Partition Number||Range From||Range To||Formula for id|
|1||10||id < 10|
|2||10||20||10 ≤ id < 20|
|3||20||30||20 ≤ id < 30|
|4||30||40||30 ≤ id < 40|
|5||40||40 ≤ id|
On the PDW, the partition number is important for switch operations as well as for index rebuild operations. For example, in order to perform an index rebuild on partition 3 you write run this code:
- ALTER INDEX «index name | ALL» ON MyTable1 REBUILD PARTITION = 3
The product documentation (aps.chm) contains a query to return “…the partition numbers, boundary values, boundary value types, and rows per boundary for a partitioned table”:
- SELECT sp.partition_number, prv.value AS boundary_value,
- lower(sty.name) AS boundary_value_type, sp.rows
- FROM sys.tables AS st
- JOIN sys.indexes AS si
- ON st.object_id = si.object_id AND si.index_id <2
- JOIN sys.partitions AS sp
- ON sp.object_id = st.object_id AND sp.index_id = si.index_id
- JOIN sys.partition_schemes AS ps
- ON ps.data_space_id = si.data_space_id
- JOIN sys.partition_range_values AS prv
- ON prv.function_id = ps.function_id
- JOIN sys.partition_parameters AS pp
- ON pp.function_id = ps.function_id
- JOIN sys.types AS sty
- ON sty.user_type_id = pp.user_type_id
- AND prv.boundary_id = sp.partition_number
- WHERE st.object_id =
- (SELECT object_id
- FROM sys.objects
- WHERE name = 'MyTable1')
- ORDER BY sp.partition_number
Let’s try the query with our table from above. Here is the output:
Some observations may be confusing here. The first thing is that each partition is reported to contain 200 rows although we have just created the table and therefore expect the table to be empty. However, the reported rows are taken from the sys.partitions system view. In the documentation for the sys.partitions view you find the following remark about the number of rows:
Approximate average number of rows in each table partition. To calculate this value, SQL Server PDW divides the number of rows in the table by the number of partitions in the table.
SQL Server PDW uses statistics, which might be out-of-date, to determine the total number of rows. The statistics are from the most recent run of UPDATE STATISTICS on the table. If UPDATE STATISTICS has not been run on the table, the statistics won’t exist, and SQL Server PDW will use 1000 as the default total number of rows. To display the number of rows in each partition within each distribution, use DBCC PDW_SHOWPARTITIONSTATS (SQL Server PDW) .
So, the number of rows is just estimated here, and since we don’t have created statistics for the table, PDW assumes the table to contain 1000 rows. But wait, 1000 rows divided by 4 partitions gives 250, not 200, right? Well, remember that we actually have 5 partitions although the meta data query from above only lists 4. I’m getting back on this soon.
Statistics are easy to create, so let’s do this first:
- create statistics stat_id on MyTable1(id)
Here is the result when running the meta data query again:
So, now the number of rows seems to be correct. But be carefull: This still is only an approximation and you cannot expect the approximation to be accurate.
The other thing to be puzzled about with the output of the meta data query may be, that it only reports 4 partitions although we first figured out, that there should be 5 partitions. And also the boundary value may be confusing. For partition number one, we found out that it contains all data rows with id less than 10 (not equal). So the boundary value from the output is the excluded right boundary of our range right partitioning – confusing.
Stephan Köppen already posted some useful queries for the PDW (see his post here ). Using his partition query gives a much better result. I made some minor changes to the query and for this blog post, here’s the version I’m using here:
- create table #Partitions
- with (LOCATION = USER_DB, distribution=replicate)
- p.partition_number PartitionNr
- , cast(coalesce(lag(r.value,1) over (order by p.partition_number),-2147483648) as int) RangeFromIncluding
- , cast(coalesce(r.value,2147483647) as int) AS [RangeToExcluding]
- FROM sys.tables AS t
- JOIN sys.indexes AS i ON t.object_id = i.object_id
- JOIN sys.partitions AS p ON i.object_id = p.object_id AND i.index_id = p.index_id
- JOIN sys.partition_schemes AS s ON i.data_space_id = s.data_space_id
- JOIN sys.partition_functions AS f ON s.function_id = f.function_id
- LEFT JOIN sys.partition_range_values AS r ON f.function_id = r.function_id and r.boundary_id = p.partition_number
- WHERE i.type <= 1
- and t.name='MyTable1'
- create table #PartitionData
- with (LOCATION = USER_DB, distribution=replicate)
- select id, count(*) [rows] from MyTable1 group by id
- -- show partitions and number of rows
- select PS.PartitionNr, PS.RangeFromIncluding, PS.RangeToExcluding, coalesce(Sum([rows]),0) [rows]
- from #Partitions PS left join #PartitionData GT on PS.RangeFromIncluding<= GT.id and PS.RangeToExcluding>GT.id
- group by PS.PartitionNr, PS.RangeFromIncluding, PS.RangeToExcluding
- drop table #Partitions
- drop table #PartitionData
If you’re only interested in the partitions, the blue part of the query is enough. The query uses the lag windows function to retrieve the lower boundary. The remaining query is used to obtain the exact number of rows for each partition. Please note, that the boundary information resulting from my modifications is only valid for a range right partition function. Here is the result:
As you see, this corresponds exactly to the five partitions from the first table above. The upper boundary of partition 5 should be increased by one to be 100% correct but this would conflict with the maximum integer value. If you like, just return null for the lower boundary of partition 1 and the upper boundary of partition 5 and observe this in the comparison with the existing data.
Also when reading the above query part that is printed in black you should adopt the method for counting the rows per partition to your needs. The method I’m using here proved to work fine for discrete values (integer ids). Since we usually partition by an integer column (for example a data written as yyyymmdd, 20140501 for May 1, 2014) this approach works fine for most of our workloads.
Next thing of interest is the partition number. As I wrote earlier in this post, the partition number is used for partition switch operations or for example for index rebuilds. It’s important to understand that the partition number is always a consecutive range of numbers starting with the number one. Even if you merge two partitions into one, the number is still consecutive.
For example, let’s merge partitions 3 and 4. In the merge partition statement we only need to specify the boundary. In a certain sense, this boundary is removed to form the new partition. In our case, partitions 3 and 4 share the boundary value 30, so the merge statement looks like this:
- ALTER TABLE MyTable1 MERGE RANGE (30);
Here is the result using the modified meta data table from above:
As you can see, the partition number is still consecutive and the partition ranging from 40 to infinity now has the number 4 instead of 5.
If you specify a boundary, that doesn’t exist you’ll get an error message:
- ALTER TABLE MyTable1 MERGE RANGE (35);
- A distributed query failed: Node(s):[201001-201002]The specified partition range value could not be found.
Splitting a partition works very similar to a merge. Again, you can think of a split as of inserting a new boundary. For example, let’s split at the value 35 (which is in partition 3):
- ALTER TABLE MyTable1 SPLIT RANGE (35);
Here’s the result:
Again, the partition numbering is still consecutive and the former partition 4 now becomes partition 5 because we split partition 3.
Now let’s validate our boundaries by inserting some lines of data:
- insert into MyTable1 values(0)
As expected, the value 0 is written to partition 1 as –infinity ≤ 0 < 10.
- truncate table MyTable1
- insert into MyTable1 values(20)
The value 20 goes to partition 3 as 20 ≤ 20 < 35.
Now we’re going to insert 5 values which should fit the constraints for partition 4:
- truncate table MyTable1
- insert into MyTable1 values(35)
- insert into MyTable1 values(36)
- insert into MyTable1 values(37)
- insert into MyTable1 values(38)
- insert into MyTable1 values(39)
All of these values satisfy the constraint 35 ≤ x < 40 and therefore all the values are written to partition 4.
Ok, these were just some examples to see how data is written to the different partitions of our table.
To complete this post, I finally like to show a partition switch. Therefore we need to create a table of the same structure:
- same columns, same data types, same nullable settings (take care when creating computed columns in a CTAS statement)
- same table geometry (heap, clustered index, clustered columnstore index)
- same distribution method (both tables replicated or distributed by the same key)
- same indexes and constraints
- partitioned by the same column (but the partitions itself may differ)
Generating the script for our table after the merge/split operation gives this result:
- CREATE TABLE [dbo].[MyTable1] (
- [id] int NULL
- WITH (DISTRIBUTION = REPLICATE, PARTITION ([id] RANGE RIGHT FOR VALUES (10, 20, 35, 40)));
Now, replacing MyTable1 with myTable2 we can create a table of exactly the same structure:
- CREATE TABLE [dbo].[MyTable2] (
- id int
- WITH (DISTRIBUTION = replicate, PARTITION (id range right FOR VALUES (10, 20, 35, 40)));
We can now switch the 5 rows of data from above. Since they are all stored in partition 4 we can switch them using this statement:
- alter table MyTable1 switch partition 4 to MyTable2 partition 4
This is where we finally needed the partition number. We can now check the rows in table MyTable2:
- select * from MyTable2
As you can see, all 5 rows are moved (switched) over to table MyTable2.
A common scenario for loading data into the appliance is to first load new data into a stage table of the same layout as the final fact table. Then our meta data query from above helps, by running it against both tables, using the boundaries as the join conditions. This results in the source partition and matching destination partition together with the number of rows in each of them. For example, if your workload contains only new or updated data you can now load the data as follows:
- If the source partition contains no rows at all, quit
- If the destination partition is empty switch the source partition directly into the destination partition and quit
- Otherwise blend/merge the source and destination partition data into a new table with corresponding boundary values (this requires three partitions), then move the existing data out of the destination partition and finally move the merged data into the destination partition.
Summary: Partitioning on the PDW is still useful for workload management but usually not to increase query performance. With the query presented here, it’s quite easy to find the partitions together with their boundaries and number of contained rows. This information can be used to decide about a good partition switching strategy.