17.12.2014

High Performance String Aggregation on Microsoft APS (PDW)

Technical Value

On normal SMP-SQL Servers there are several ways to perform string aggregations via XML or Pivot and dynamic SQL, some more or less efficient.

On APS/PDW you have to do something different. We want to have a rapid solution so we like “JOIN” and simple additions and we hate anything like “CASE”, or “ISNULL”.

Let’s start with a simple table:

We have several items with their rownumbers in one table. In order to get them side-by-side we create a matrix/table which has an ID, an empty string at its diagonal und a NULL-string in all other fields. Next we perform a join and get a table with the items in its diagonal because addition with the NULL-value results in a NULL value and addition of empty string and item results in the item.

One max-aggregation and one concat-operation later we have the result: one row and all items concatinated.

Click to enlarge: Sample simpleNow we modify this attempt to perform an aggregation of items in multiple lines.

The matrix gets a space-character instead of the empty string from row 2 on (or any other delimeter). Next we insert a column with a partitioned rownumber and join the matrix by this. By keeping the line information we can group by this linenumber. Next the concat and we have our little christmas-song.

Click to enlarge: Thanks to Miriam Funke who implemented and tested it on the PDW.

We used this concept on a 4-node-pdw to aggregate receipt-items for side-by-side analysis. The matrix allows up to 200 items per receipt and we reduced about 60 mio lines to 5 mio lines in about 2 minutes even though we needed to perform a dense_rank and a row_number at once to eliminate same items of the receipt in one step.

Merry Christmas!

Here is a query to test the concept on standard smp including creation of sample data (remove "--" in last line to execute query):

  1. DECLARE @i INT
  2. DECLARE @i_max INT = 100 --max aggregate items
  3. DECLARE @query NVARCHAR(max)
  4. /***** 1 create matrix for join *****/
  5. SET @query =
  6. ,
  7. CREATE TABLE #T1
  8. (
  9. rowid INT
  10. ,
  11. SET @i = 1
  12. WHILE @i <= @i_max
  13. BEGIN
  14. SET @query +=
  15. ,
  16. 'T'+CAST(@i AS NVARCHAR(10)) + ,NVARCHAR(1) ,
  17. SET @i += 1
  18. END
  19. SET @query +=
  20. ,
  21.  )
  22. ,
  23. SET @i = 1
  24. WHILE @i <= @i_max
  25. BEGIN
  26. SET @query +=
  27. ,
  28. INSERT INTO #T1 (rowid, T'+CAST(@i AS NVARCHAR(10))+') SELECT '+CAST(@i AS NVARCHAR(10)) + ',''''
  29. ,
  30. SET @i+= 1
  31. END
  32. /***** CREATE SAMPLE DATA *****/
  33. SET @query +=
  34. ,
  35. CREATE TABLE #T2
  36. (
  37. ID INT IDENTITY(1,1),
  38. VBELN INT,
  39. POSNR NVARCHAR(10)
  40. )
  41.  
  42. DECLARE @i INT = 1
  43. DECLARE @imax INT = 1000000
  44.  
  45. WHILE @i <= @imax
  46. BEGIN
  47. INSERT INTO #T2
  48. SELECT ROUND(RAND(@i) * 30000,0), NULL
  49. SET @i+= 1
  50. END
  51. UPDATE T2 SET POSNR = POSNR_SOLL
  52. FROM #T2 AS T2
  53. JOIN
  54. (
  55. SELECT  ID, POSNR_SOLL = CAST(ROW_NUMBER() OVER (PARTITION BY VBELN ORDER BY VBELN)  AS nvarchar(10))
  56. FROM #T2 AS T2
  57. ) AS SOLL
  58. ON
  59. SOLL.ID = T2.ID
  60. ,
  61. /*************** MAIN TASK ***********/
  62. SET @query +=
  63. ,
  64. SELECT
  65. SINGULAR.VBELN,
  66. C_POSNR = CONCAT(
  67. ,
  68. SET @i = 1
  69. WHILE @i <= @i_max
  70. BEGIN
  71. IF @i = 1 SET @query += 'MAX(T'+CAST(@i AS NVARCHAR(10))+'+POSNR)'
  72. ELSE SET @query += ',MAX(+'',''+T'+CAST(@i AS NVARCHAR(10))+'+POSNR)'
  73. SET @i += 1
  74. END
  75. SET @query +=,
  76. )
  77. FROM
  78. (
  79. SELECT *, RN = ROW_NUMBER() OVER (PARTITION BY VBELN ORDER BY POSNR) FROM #T2
  80. ) AS SING
  81. JOIN #T1 AS PIV
  82. ON
  83. PIV.ROWID = SING.RN
  84. GROUP BY
  85. SINGULAR.VBELN
  86. ORDER BY
  87. SINGULAR.VBELN
  88. ,
  89. SET @query +=
  90. ,
  91. DROP TABLE #T1
  92. DROP TABLE #T2
  93. ,
  94. SELECT @query
  95. EXEC (@query)

Kommentare

Ken Boone
Mi, 11.11.2015 - 01:43

My hat is off to you. This is very clever.

Neuen Kommentar schreiben

Der Inhalt dieses Feldes wird nicht öffentlich zugänglich angezeigt.

Klartext

  • Keine HTML-Tags erlaubt.
  • HTML - Zeilenumbrüche und Absätze werden automatisch erzeugt.
  • Web page addresses and email addresses turn into links automatically.
Teilen auf

Newsletter Anmeldung

Abonnieren Sie unseren Newsletter!
Lassen Sie sich regelmäßig über alle Neuigkeiten rundum ORAYLIS und die BI- & Big-Data-Branche informieren.

Jetzt anmelden

Unsere Website benutzt Cookies, um bestmögliche Funktionalitäten bieten zu können. Durch die Nutzung unserer Website, erklären Sie sich mit der Verwendung von Cookies einverstanden. In unserer Datenschutzerklärung finden Sie unter §5 Informationen und Links, wie Sie diesen Maßnahmen jederzeit widersprechen können.