Using “cursors” in PDW
Did I say cursor? Isn’t this an evil word? Shouldn’t we try as hard as possible to avoid them in database design and especially in a data warehouse?
Yes, sure. But there might be some patterns which make it useful to loop over a table, for example a configuration table, and do something with each line of the table. Since PDW currently doesn’t support cursors (why should it?), what can we do?
One option is to use the foreach-container in SSIS. It’s a good, reliable and easy way to implement loops.
However, if you need to, you can also do this using SQL. The following example shows a loop over all partitions of the FactSalesHeader table:
- create table #FactSalesHeaderPartitions
- WITH ( DISTRIBUTION = REPLICATE )
- SELECT sp.partition_number, prv.value AS boundary_value, lower(sty.name) AS boundary_value_type, sp.rows,
- row_number() over (order by sp.partition_number) AS RowNr
- FROM sys.tables st JOIN sys.indexes si ON st.object_id = si.object_id AND si.index_id <2
- join sys.partitions sp ON sp.object_id = st.object_id AND sp.index_id = si.index_id
- join sys.partition_schemes ps ON ps.data_space_id = si.data_space_id
- join sys.partition_range_values prv ON prv.function_id = ps.function_id
- join sys.partition_parameters pp ON pp.function_id = ps.function_id
- join sys.types sty ON sty.user_type_id = pp.user_type_id AND prv.boundary_id = sp.partition_number
- where st.object_id = (select object_id from sys.objects WHERE name = 'FactSalesHeader')
- declare @part_count int = (select count(*) from #FactSalesHeaderPartitions)
- declare @i int = 1
- while @i <= @part_count begin
- declare @partition int
- declare @boundary int
- select @partition=partition_numnber, @boundary=boundary_value
- from #FactSalesHeaderPartitions where RowNr=@i
- -- do something with the partition, for example take it out, merge (ctas)
- -- it with new data and then switch the resulting partition into the
- -- fact table.
- set @i+=1
- drop table #FactSalesHeaderPartitions
The idea is to put all the rows that you want to create a loop for in a temporary table, including a column with a row number. You can then create a simple while-loop starting with 1 and going up the the number of rows in the temporary table. Since the row number is also included in the temporary table, you can then select the correct values into your variables within the while loop.
Please note that the above SQL code for reading partitions uses PDW v1 AU 3.5 system tables. For other versions the table names may be different.