Agile BI Tools – Database versioning with SSDT #2
In my previuous posting (Agile BI Tools – Database versioning with SSDT #1 ) I showed how we can reuse the already existing version number from the SSDT backend DAC. Today I will explain how to write a migration script based on database version numbering.
First of all I create two deployment script, one for post- and one for pre-deployment. As you can see the Build Action is set to PostDeploy repectivly PreDeploy. This means SSDT's build mechanism will handle this files in a special way. In very simple terms the build prepends the Pre-Deploy-Script and appends the Post-Deploy-Script to the autogenerated deployment script at the end of the build.
What is Change Data Capture?
Referring to the Microsoft documentation Change Data Capture (CDC) is described as following: Change data capture records insert, update, and delete activity that is applied to a SQL Server table. This makes the details of the changes available in an easily consumed relational format. Column information and the metadata that is required to apply the changes to a target environment is captured for the modified rows and stored in change tables that mirror the column structure of the tracked source tables. Table-valued functions are provided to allow systematic access to the change data by consumers.
A good example of a data consumer that is targeted by this technology is an extraction, transformation, and loading (ETL) application. An ETL application incrementally loads change data from SQL Server source tables to a data warehouse or data mart. Although the representation of the source tables within the data warehouse must reflect changes in the source tables, an end-to-end technology that refreshes a replica of the source is not appropriate. Instead, you need a reliable stream of change data that is structured so that consumers can apply it to dissimilar target representations of the data. SQL Server change data capture provides this technology.
To know how to deal with CDC we need to know a little bit more about how CDC works: The picture bellow shows the CDC data flow. As you can see there is a Capture Process. This process captures any changes made on the underlying tables. These changes will be written to so called change tables. If a process incrementally loads change data from SQL Server source tables to a data warehouse or data mart it can use the CDC query functions to get only the new data since last run. But Change Data Capture comes with some limitations. Lets focus on the limitations which prevents a smooth database migrations with SSDTs built-in functionality. Change the table layout of an CDC enabled table does not work out of the box. If you try you will end up with a message like this:
- Update cannot proceed due to validation errors.
- Please correct the following errors and try again.
- SQL72035 :: [dbo].[MyCDCEnabledTable] is under change
- data capture control and cannot be modified
The reason for this behavior is that the corresponding change table layout is based on the layout of the CDC enabled table. The capture process therefore needs a fixed table layout. To come around this limitation you have to disable CDC for this table. Doing so will delete the corresponding change table with its content which means you will loose the current change state of the table.
How to deal with Change Data Capture during database migration?
At general to deal with CDC you will need to follow this recipe:
- Copy the change table with a SELECT INTO statement to save the current CDC state.
- Disable CDC on the CDC enabled table with the sys.sp_cdc_disable_table stored procedure.
- Deploy the Database with SSDT - This step for example will add, delete columns or change data types. Data must not be deleted in this step!
- Enable CDC with the sys.sp_cdc_enable_table stored procedure. An empty change table will be generated in this step.
- Copy the table backup to the empty change table with a INSERT INTO statement.
- Update the the column __$start_lsn in the cdc.change_tables table to the original value.
- Drop the backup table.
OK lets get a little more into detail: Step 1.1: Backup Change Table Step 1.2: Disable CDC Step 2.0: The original SSDT generated alter scripts. Step 3.1: Enable CDC Step 3.2: Restore Change Table Step 3.3: Restore the start lsn Step 3.4: Drop backup table
As you will certainly have noticed there is still a limitation in the above recipe. In step 3.2 we restore the data with a INSERT INTO ... SELECT * statement back to the change table. If you remove columns you will get an error because a column is missing which exist in the source table. Changing data types will lead you to explicitly define type casts. If you add columns with NOT NULL constrains you will end up with the problem to define default values. For the moment you have to modify step 3.2 for your needs.
Wrap Up Database Versioning
In the pre-deployment script we add some code to handle the target and the current database version: We now can reference the target database version in the pre- and post-deployment scripts with the TSQL variable @TargetDacVersion and the current database version with the SQLCMD variabele $(DacVersion). In the BEGIN ... END section we now can reuse the above CDC recipe.
I showed that you can use SSDT to handle some kind of special database migration, for example CDC. It would be nice to have a more "intelligent" mechanism to compare the database versions but then you would have to deal with parsing the version number or put the major, minor and hot fix portion in to separate variables. Some of the functionality could be hidden in stored procedures to leaf the deployment script in a clean state.
As I mentiond in my previuos post it seems to be that Microsofts preferred way maintaining migration scripts is to implement a build or deployment contributor (see here for reference http://msdn.microsoft.com/en-us/library/dn268597(v=vs.103).aspx). I will try cover this in my next post.