Agile BI Tools - Implementing Feature Toggles in SSIS / SSDT
Reminding the values of the Manifesto for Agile Software Development: Individuals and interactions over processes and tools, working software over comprehensive documentation, customer collaboration over contract negotiation and responding to change over following a plan.
Focussing on working software has lead us to a process commonly known as continuous integration. A working CI process will give us a continuous feedback about the code quality in our repository. This requires you to integrate your changes as early as possible. Feature branching, for example, is an anti-pattern for the whole CI process because it leads to isolated code areas (isolated from other features or software components) and therefore introduces a by-pass to this process. By the way, feature branching is not evil by nature, but it will slow down the whole CI process so you have to weigh all the pros and cons.
A Feature Toggle is a technique that attempts to provide an alternative to maintaining multiple source code branches. This posting addresses the question how to implement a feature toggle in SQL Server Integration Services (SSIS).
In SSIS we can have a feature on different scopes, so we have to define the scope for a feature toggle. The term "feature" is a bit squishy. My research discovers different scopes for an SSIS feature.
- Project - In SQL Server 2012 SSIS deployment uses a project centric approach. Projects will have one or more entry packages. These entry packages can be compared to the main() function of an executable. How these entry packages are get called is up to a scheduler or a worflow management tool and therefore out of scope of this article.
- Packages - Non-Entry-Packages commonly dealing with transporting data from A to B, data conversion or aggregating tasks.
- Control Flow - Every package has a control flow. This control flow consist of actions. These actions can be executed in sequential or parallel. Conditional construct like if-then-else (in a visual fashion) are possible.
- Data Flow - Every packages consist of zero to n data flows. A data flow is a specific action within a control flow and can transport data from multiple source to multiple destinations. A key task of an data flow is to extract, transform and load data. So a data flow can consists of multiple streams which get wider, narrower, split or brought together. For now I could not imagine a situation where a feature toggle would be useful in this context. If the new feature consists of a modified data flow then we have to copy the whole data flow on the control flow which leads us to a toggle on the control flow.
Feature Toggle on Packages and Control Flow Scope
Implementing feature toggles in a control flow is some kind of obvious because its similarity to other programming languages. The analogy of an if-then-else construct in SSIS are called Precedence Constraints (the links between the boxes).
Every precedence constraint can have an expression. This expression get evaluated during the execution of the package. You can use an project or package parameter to implement a feature toggle. Lets imagine we have created a project parameter called EnableBusinessLogic of type boolean.
Toggle by constraints
We now can refer to this parameter in the expression on the precedence constraint for the new feature:
- Evaluation operation - Expression and constraint
- Value - Success
- Expression - @[FeatureToggle::NewBusinessLogicFeature] = true
Also we can refer to this parameter in the expression on the precedence constraint for the outdated feature:
- Evaluation operation - Expression and constraint
- Value - Success
- Expression - @[$Package::EnableNewBusinessLogic] == false
Toggle by "Disable Execution" property
You can achive a simalar behaviour by using the disable execution property. Set an expression on the component like this:
- Disable - !@[$Package::EnableNewBusinessLogic]
How to implement two (or more) features in the same Data Flow
Imagine there is a old implementaton for loading customer data into a database. There is now a request to load some new fields from the CRM. Additionally at the end of the year the whole company will switch the CRM to a new provider. At the same time a bug in the old code has to be fixed. In the next release only the bug fix should be deployed because the other features needs more testing.
You now have two basic approaches to implement this features. The first one uses three branches, one for the extension of the old code, one for the migration to a completly different CRM solution, and one for the bug fix. The problem here is that the bug fix version and the version with the new fields are separated from each other So there is no integration testing for the two features. The second problem is that when for some reason the bug fix and the version with the new fields got be released together a merge of two branches is needed, which is not practicable for SSIS packages.
The second approach uses feature toggles without branching. Implementing toggles by using the disable execution property let us control which code got executed at runtime: When this features needs to be implemented in parallel by different developers separate the data flows to different packages by using the execute package task. All developers have to ensure that any changes they made are distributed to the other features as well (if needed). This task is the counterpart of merging branches, but ...
- You don't have to resolve conflicts because the different versions lay in the same solution file and branch
- You can easily switch which feature version is executed at runtime (by configuration)
How to control the different feature versions for a release?
Visual Studio can handle this by different solution and project configurations. Like a C# project, a SSIS project can have a Debug and a Release configuration with a slightly different behaviour. First of all create two solution configurations. You can achive this with the configuration manager in the toolbar: Development should hold the configuration for the current developer (you can also create differen development configurations). Release should reflect the configuration as it is needed for a procutive environment. Now we create three different project release configurations reflecting the different feature toggles: In the configuration manager we now can map the solution configuration to the project configuration. In this example we mark the "CustomerNewDataSource" feature as the current productive one: In the last step we need to add the default configuration values for the specific project configuration. You can do this in the package / project parameter view: Building the current SSIS project with the Release solution configuration (and therefore the Release_CustomerNewDataSource project configuration) activates this specific feature and disables the other ones.
Metadata on Data Flows Components
During development data flow components gain informations about tables, columns and data types of the used sources and destinations. This informations, called metadata, will be saved for runtime of the SSIS package. Before a package gets started this metadata will be checked. If for example a data source reads from a table named CustomerOrders and this table does not exist or has a different signature (different columns or data types) the validation of the package will fail. This will lead us to a problem.
Imagine you have to implement a new feature and there is at least on data flow affected by this new feature; the recommended way to implement this feature together with a feature toggle is to copy the old data flow and set an appropriate expression on the precedence constraint of the old and the new code. But what to do if the table signature is different? The package validation fails because the metadata will be checked for both the old and the new data flow.
A simple way to come around with this limitation is to set the DelayValidation property of the data flow to true. Now validation of the data flow metadata will be delayed to the time the data flow gets executed. Because the feature toggle prevents the execution of the inactive feature only the data flow of the active feature got validated.
How to manage which feature will be active?
First you need a release plan. This plan should respect budget, capacity, technical dependencies and dead lines. Now you can group functionalities to features with a fix delivery date. Your build should reflect this release plan. For example if the release with version 1.5.0 should contain the features A, B and C you need a post build test mechanism to validate that the corresponding feature toggles are set (or not set if the features are not part of this release). You can accomplish this with a simple PowerShell script or an XSL transformation on the SSIS packages. You could also push this test forward to the step of integration and directly test (a simple database test) the package or project parameters in your integration environment.
Implementing a feature toggle in SSIS is possible. We delay (not lose) the validation of the metadata on concurrent features (if a data flow is part of this feature). By doing so we can simply work in one single branch, and can avoid merges between branches, we can also reduce the problem we had with hot fixes in other branches that wasn't merged. We can also continuous deliver completed feature into production that is not yet enabled, but can be enabled by the product owner when he feels it's time to enable it. If a bug of a new deployed feature is found and a critical fix need to done, it can be turned off in production.