17.12.2014

High Performance String Aggregation on Microsoft APS (PDW)

Technical Value

On normal SMP-SQL Servers there are several ways to perform string aggregations via XML or Pivot and dynamic SQL, some more or less efficient.

On APS/PDW you have to do something different. We want to have a rapid solution so we like “JOIN” and simple additions and we hate anything like “CASE”, or “ISNULL”.

Let’s start with a simple table:

We have several items with their rownumbers in one table. In order to get them side-by-side we create a matrix/table which has an ID, an empty string at its diagonal und a NULL-string in all other fields. Next we perform a join and get a table with the items in its diagonal because addition with the NULL-value results in a NULL value and addition of empty string and item results in the item.

One max-aggregation and one concat-operation later we have the result: one row and all items concatinated.

Click to enlarge: Sample simpleNow we modify this attempt to perform an aggregation of items in multiple lines.

The matrix gets a space-character instead of the empty string from row 2 on (or any other delimeter). Next we insert a column with a partitioned rownumber and join the matrix by this. By keeping the line information we can group by this linenumber. Next the concat and we have our little christmas-song.

Click to enlarge: Thanks to Miriam Funke who implemented and tested it on the PDW.

We used this concept on a 4-node-pdw to aggregate receipt-items for side-by-side analysis. The matrix allows up to 200 items per receipt and we reduced about 60 mio lines to 5 mio lines in about 2 minutes even though we needed to perform a dense_rank and a row_number at once to eliminate same items of the receipt in one step.

Merry Christmas!

Here is a query to test the concept on standard smp including creation of sample data (remove "--" in last line to execute query):

  1. DECLARE @i INT
  2. DECLARE @i_max INT = 100 --max aggregate items
  3. DECLARE @query NVARCHAR(max)
  4. /***** 1 create matrix for join *****/
  5. SET @query =
  6. ,
  7. CREATE TABLE #T1
  8. (
  9. rowid INT
  10. ,
  11. SET @i = 1
  12. WHILE @i <= @i_max
  13. BEGIN
  14. SET @query +=
  15. ,
  16. 'T'+CAST(@i AS NVARCHAR(10)) + ,NVARCHAR(1) ,
  17. SET @i += 1
  18. END
  19. SET @query +=
  20. ,
  21.  )
  22. ,
  23. SET @i = 1
  24. WHILE @i <= @i_max
  25. BEGIN
  26. SET @query +=
  27. ,
  28. INSERT INTO #T1 (rowid, T'+CAST(@i AS NVARCHAR(10))+') SELECT '+CAST(@i AS NVARCHAR(10)) + ',''''
  29. ,
  30. SET @i+= 1
  31. END
  32. /***** CREATE SAMPLE DATA *****/
  33. SET @query +=
  34. ,
  35. CREATE TABLE #T2
  36. (
  37. ID INT IDENTITY(1,1),
  38. VBELN INT,
  39. POSNR NVARCHAR(10)
  40. )
  41.  
  42. DECLARE @i INT = 1
  43. DECLARE @imax INT = 1000000
  44.  
  45. WHILE @i <= @imax
  46. BEGIN
  47. INSERT INTO #T2
  48. SELECT ROUND(RAND(@i) * 30000,0), NULL
  49. SET @i+= 1
  50. END
  51. UPDATE T2 SET POSNR = POSNR_SOLL
  52. FROM #T2 AS T2
  53. JOIN
  54. (
  55. SELECT  ID, POSNR_SOLL = CAST(ROW_NUMBER() OVER (PARTITION BY VBELN ORDER BY VBELN)  AS nvarchar(10))
  56. FROM #T2 AS T2
  57. ) AS SOLL
  58. ON
  59. SOLL.ID = T2.ID
  60. ,
  61. /*************** MAIN TASK ***********/
  62. SET @query +=
  63. ,
  64. SELECT
  65. SINGULAR.VBELN,
  66. C_POSNR = CONCAT(
  67. ,
  68. SET @i = 1
  69. WHILE @i <= @i_max
  70. BEGIN
  71. IF @i = 1 SET @query += 'MAX(T'+CAST(@i AS NVARCHAR(10))+'+POSNR)'
  72. ELSE SET @query += ',MAX(+'',''+T'+CAST(@i AS NVARCHAR(10))+'+POSNR)'
  73. SET @i += 1
  74. END
  75. SET @query +=,
  76. )
  77. FROM
  78. (
  79. SELECT *, RN = ROW_NUMBER() OVER (PARTITION BY VBELN ORDER BY POSNR) FROM #T2
  80. ) AS SING
  81. JOIN #T1 AS PIV
  82. ON
  83. PIV.ROWID = SING.RN
  84. GROUP BY
  85. SINGULAR.VBELN
  86. ORDER BY
  87. SINGULAR.VBELN
  88. ,
  89. SET @query +=
  90. ,
  91. DROP TABLE #T1
  92. DROP TABLE #T2
  93. ,
  94. SELECT @query
  95. EXEC (@query)
Teilen auf

Newsletter Anmeldung

Abonnieren Sie unseren Newsletter!
Lassen Sie sich regelmäßig über alle Neuigkeiten rundum ORAYLIS und die BI- & Big-Data-Branche informieren.

Jetzt anmelden