26.06.2013

Initial Data Import from SMP to PDW

Technical Value

One of the first things in any PDW Project, is the migration of the existing data into the PDW. This could be necessary for POCs, Development Issues or the final move towards the new production system. Since the PDW is optimized for Star-Schema, we will most likely have non partitioned dimension tables and partitioned fact tables.   Exporting Data To split the existing data, we will export all non partitioned tables into one CSV File and the partitioned tables are split for each filled partition. The tool of choise for the SMP enviroment is the bcp (bulk copy program) tool. Bcp is a commandline tool delivered with the SQL-Server and is usually installed on the server itself. To create all the necessary statements, we will select the database we want to export, change the parameters, and execute the script. NOTE: In our example we imply that the partition columns are based on int values.

  1. -- START Parameters
  2. DECLARE @path VARCHAR(100) = 'C:\Export\'
  3. DECLARE @delimiter VARCHAR(100) = '|'
  4. -- END Parameters
  5. DECLARE @table NVARCHAR(100)
  6. DECLARE @partitionvalue SQL_VARIANT
  7. DECLARE @partitioncolumn SQL_VARIANT
  8. DECLARE @string NVARCHAR(max)
  9. DECLARE @database VARCHAR(50) = Db_name(Db_id())
  10. DECLARE @server VARCHAR(20) = @@SERVERNAME
  11. DECLARE cur_tabelle CURSOR fast_forward FOR
  12.   SELECT o.name                                    AS Tabelle,
  13.          Cast(Isnull(prv_left.value, 0) AS INT)    AS PartitionValue,
  14.          Cast(Isnull(c.name, '') AS NVARCHAR(100)) AS PartitionColumn
  15.   FROM   sys.objects AS o
  16.          LEFT JOIN sys.partitions AS p
  17.                 ON o.object_id = p.object_id
  18.          LEFT JOIN sys.indexes AS i
  19.                 ON i.object_id = p.object_id
  20.                    AND i.index_id = p.index_id
  21.          LEFT JOIN sys.data_spaces AS ds
  22.                 ON ds.data_space_id = i.data_space_id
  23.          LEFT JOIN sys.partition_schemes AS ps
  24.                 ON ps.data_space_id = ds.data_space_id
  25.          LEFT JOIN sys.partition_functions AS pf
  26.                 ON pf.function_id = ps.function_id
  27.          LEFT JOIN sys.destination_data_spaces AS dds2
  28.                 ON dds2.partition_scheme_id = ps.data_space_id
  29.                    AND dds2.destination_id = p.partition_number
  30.          LEFT JOIN sys.filegroups AS fg
  31.                 ON fg.data_space_id = dds2.data_space_id
  32.          LEFT OUTER JOIN sys.partition_range_values AS prv_left
  33.                       ON ps.function_id = prv_left.function_id
  34.                          AND prv_left.boundary_id = p.partition_number - 1
  35.          LEFT OUTER JOIN sys.index_columns ic
  36.                       ON --ic.partition_ordinal
  37.          ic.index_id = i.index_id
  38.          AND ic.object_id = o.object_id
  39.          LEFT JOIN sys.columns c
  40.                 ON c.object_id = ic.object_id
  41.                    AND c.column_id = ic.column_id
  42.   WHERE  o.type = 'U'
  43.          AND p.rows != 0
  44.  
  45. OPEN cur_tabelle
  46.  
  47. FETCH next FROM cur_tabelle INTO @table, @PartitionValue, @PartitionColumn
  48.  
  49. WHILE @@fetch_status = 0
  50.   BEGIN
  51.       IF @PartitionValue = 0
  52.         BEGIN
  53.             SET @string = 'bcp "select * from ' + @table
  54.                           + '" queryout "' + @path + @table + '.csv" -S '
  55.                           + @server + ' -T -C RAW -d ' + @database + ' -t"'
  56.                           + @delimiter + '" -c'
  57.         END
  58.       ELSE
  59.         BEGIN
  60.             SET @string = 'bcp "select * from ' + @table + ' where '
  61.                           + Cast(@PartitionColumn AS NVARCHAR(100))
  62.                           + ' = '
  63.                           + Cast (@PartitionValue AS NVARCHAR(10))
  64.                           + '" queryout "' + @path + @table + '_'
  65.                           + Cast (@PartitionValue AS NVARCHAR(100))
  66.                           + '.csv" -S ' + @server + ' -T -C RAW -d '
  67.                           + @database + ' -t"' + @delimiter + '" -c'
  68.         END
  69.  
  70.       PRINT @string
  71.  
  72.       FETCH next FROM cur_tabelle INTO @table, @PartitionValue, @PartitionColumn
  73.   END
  74.  
  75. CLOSE cur_tabelle
  76.  
  77. DEALLOCATE cur_tabelle 

The now created command line statements can easily be put into a regular batch file and be executed. It is also possible to split the result into multiple files to export in parallel.   Importing Data  As soon as all the necessary files are copied from the source to the Landingzone of the PDW, we can easily import the Data via DWHLoader Since the PDW doesn't support AD Logins, it is necessary to pass over some login information as well. The below shown script will also be executed on the source system in order to create the necessary import scripts. NOTE: In our example we imply that the database name stays the same and that the data model is an exact copy of the exported database.

  1. -- START Parameters
  2. DECLARE @path VARCHAR(100) = 'C:\Import\'
  3. DECLARE @delimiter VARCHAR(100) = '|'
  4. DECLARE @login VARCHAR(100) = 'username'
  5. DECLARE @password VARCHAR(100) = 'password'
  6. -- END Parameters
  7. DECLARE @table NVARCHAR(100)
  8. DECLARE @partitionvalue SQL_VARIANT
  9. DECLARE @partitioncolumn SQL_VARIANT
  10. DECLARE @string NVARCHAR(max)
  11. DECLARE @database VARCHAR(50) = Db_name(Db_id())
  12. DECLARE @server VARCHAR(20) = @@SERVERNAME
  13. DECLARE cur_tabelle CURSOR fast_forward FOR
  14.   SELECT o.name                                    AS Tabelle,
  15.          Cast(Isnull(prv_left.value, 0) AS INT)    AS PartitionValue,
  16.          Cast(Isnull(c.name, '') AS NVARCHAR(100)) AS PartitionColumn
  17.   FROM   sys.objects AS o
  18.          LEFT JOIN sys.partitions AS p
  19.                 ON o.object_id = p.object_id
  20.          LEFT JOIN sys.indexes AS i
  21.                 ON i.object_id = p.object_id
  22.                    AND i.index_id = p.index_id
  23.          LEFT JOIN sys.data_spaces AS ds
  24.                 ON ds.data_space_id = i.data_space_id
  25.          LEFT JOIN sys.partition_schemes AS ps
  26.                 ON ps.data_space_id = ds.data_space_id
  27.          LEFT JOIN sys.partition_functions AS pf
  28.                 ON pf.function_id = ps.function_id
  29.          LEFT JOIN sys.destination_data_spaces AS dds2
  30.                 ON dds2.partition_scheme_id = ps.data_space_id
  31.                    AND dds2.destination_id = p.partition_number
  32.          LEFT JOIN sys.filegroups AS fg
  33.                 ON fg.data_space_id = dds2.data_space_id
  34.          LEFT OUTER JOIN sys.partition_range_values AS prv_left
  35.                       ON ps.function_id = prv_left.function_id
  36.                          AND prv_left.boundary_id = p.partition_number - 1
  37.          LEFT OUTER JOIN sys.index_columns ic
  38.                       ON --ic.partition_ordinal
  39.          ic.index_id = i.index_id
  40.          AND ic.object_id = o.object_id
  41.          LEFT JOIN sys.columns c
  42.                 ON c.object_id = ic.object_id
  43.                    AND c.column_id = ic.column_id
  44.   WHERE  o.type = 'U'
  45.          AND p.rows != 0
  46.  
  47. OPEN cur_tabelle
  48.  
  49. FETCH next FROM cur_tabelle INTO @table, @PartitionValue, @PartitionColumn
  50.  
  51. WHILE @@fetch_status = 0
  52.   BEGIN
  53.       IF @PartitionValue = 0
  54.         BEGIN
  55.             SET @string =
  56. '"C:\Program Files\Microsoft SQL Server Parallel Data Warehouse\100\dwloader.exe" -U '
  57. + @login + ' -P ' + @password + ' -T ' + @database
  58. + '.dbo.' + @table + ' -i "' + @path + @table
  59. + '.csv" -R "' + @path + @table
  60. + '.reject.txt" -E -M append -fh 0 -e ASCII -rt value -rv 0 -t "'
  61. + @delimiter + '" -r 0x0d0x0a'
  62. END
  63. ELSE
  64.   BEGIN
  65.       SET @string =
  66. '"C:\Program Files\Microsoft SQL Server Parallel Data Warehouse\100\dwloader.exe" -U '
  67. + @login + ' -P ' + @password + ' -T ' + @database
  68. + '.dbo.' + @table + ' -i "' + @path + @table + '_'
  69. + Cast (@PartitionValue AS NVARCHAR(100))
  70. + '.csv" -R "' + @path + @table
  71. + '.reject.txt" -E -M fastappend -fh 0 -e ASCII -rt value -rv 0 -t "'
  72. + @delimiter + '" -r 0x0d0x0a'
  73. END
  74.  
  75.     PRINT @string
  76.  
  77.     FETCH next FROM cur_tabelle INTO @table, @PartitionValue, @PartitionColumn
  78. END
  79.  
  80. CLOSE cur_tabelle
  81.  
  82. DEALLOCATE cur_tabelle 

As already described after the export, the now created statements can be put into a batch file and let the DWLoader to the rest.

Teilen auf

Newsletter Anmeldung

Abonnieren Sie unseren Newsletter!
Lassen Sie sich regelmäßig über alle Neuigkeiten rundum ORAYLIS und die BI- & Big-Data-Branche informieren.

Jetzt anmelden