07.09.2016

Parallel execution of SSIS Packages

Technical Value

In BI Projects, sooner or later you will face the challenge to load and process vast amounts of data via SSIS packages, which would usually require numerous days or even weeks in order to be accomplished. The first idea popping into mind will most certainly be to split the to-be-processed data into more or less equal heaps and to implement a parallel execution of the SSIS processing package. At least this was my first idea when I encountered the task to load several million files with a total of nearly 35 billion records into a database. Following I will describe various possibilities of parallel SSIS execution and the solution I finally found for the task at hand.

The server I worked on had a total of 20 cores available solely for this task. So In my first design the first thing I did was to prepare a processing package in order to receive the destination table name and the file name via parameter – to prevent deadlocks, I created 20 identical destination tables which would then be populate by separate processing package instances. Next - sticking to SSIS standard components - I implemented the master package which featured 20 variables in order to store the to-be-processed lists. Afterwards I fetched a list of all files which had to be processed, split it into quite equal heaps (considering the number of files as much as the size of each file) and populated my variables accordingly. Finally I executed the processing package in 20 parallel execute package tasks. The result looked something like the following:

RAS_Parallel_SSIS_Execution_flow_1

This solution worked, but the performance wasn’t even close to what I expected. Checking the performance and resource monitor I noticed that the CPU utilization rarely hit the 50% mark and that most of the available CPU resource was idle. The main reason was most certainly the overhead of starting a package execution, process the file within the package, finish the execution and start the next execution.

This lead to the second design: I created a table in a database, which stored the file list and its distribution into equal heaps. Furthermore I modified the processing package’s parameters in order to not receive the file name anymore but a list ID from the newly created table. Furthermore the processing package was modified in order to read all corresponding filenames from the table and perform the loading of the files accordingly. The design of the master package changed as follows:

RAS_Parallel_SSIS_Execution_flow_2

With this done, the overhead of creating one package execution per file was reduced to one execution per list of files. Anyways, the performance was slightly improved but still far from acceptable.

Leaving the path of standard SSIS components, I decided to modify my master package in order to create the executions via a C# script task. Luckily, starting with .NET 4.0 C# natively supports parallelism with Parallel.For() and Parallel.ForEach() Loops. Basically this does what I already tried to implement with SSIS standard components: Create numerous parallel instances of the same package. Reusing my file list table and the processing package from my previous version, the master package of this version looked something like the following:

RAS_Parallel_SSIS_Execution_flow_3

This gave the performance a really good boost. After all, it reached a CPU utilization of round about 95%. But are we done? Unfortunately not yet: As it has shown, the 95% CPU utilization could not be kept permanently. After a certain time, the performance crashed back down to 50% or less and wouldn’t rise again. The reason for this seemingly was connected to the Parallel.ForEach() loop, which experienced certain problems when creating parallel executions of SSIS packages – even with a forced delay of half a second between each execution. This was as well shown in the task Manager: the number of running SSIS instances went down to 10 or even less.

This lead to my final design: I slightly modified the C# script task component from the previous version. Instead of the Parallel.ForEach() loop I used a C# task list in order to handle the parallel execution of the processing package. Following the essential section of the C# script component:

RAS_Parallel_SSIS_Execution_script_task_component

This change resulted in a steady count of 21 running SSIS instances (1 instance of the master package + 20 instances of the processing package) and a permanent CPU utilization of 98% - and a performance which finally allowed the loading of the mentioned amount of data.

Kommentare

Waldemar Sarmiento
Mi, 12.04.2017 - 23:23

Hi Rene,
Thanks for the great article. I am trying to do exactly the same but I am having problems when invoking the package. Could you please share the code of the function named CallPackage?
Thank you,
Waldemar

Schwarz
René
Schwarz
Mi, 19.04.2017 - 19:49

Hello Waldemar,

which problems are you facing when invoking the child package? Do you receive errors or is the package not starting at all?

However, following an example code for CallPackage. I reduced it to the most essential parts - you might want to add exception handling etc (package not found, parameter not set, DB connection not possible etc.). Furthermore, CallPackage requires several parameters in the package / project. For testing you might want to replace the Dts.Variables at the beginning with hardcoded values:

<code>
public void CallPackage(int ThreadID)
{
string VAR_DBServer = Dts.Variables["$Project::Conn_DBServer"].Value.ToString();
string VAR_SSISDB_Folder = Dts.Variables["$Project::SSISDB_Folder"].Value.ToString();
string VAR_SSISDB_Project = Dts.Variables["$Project::SSISDB_Project"].Value.ToString();
string VAR_SSISDB_Environment = Dts.Variables["$Project::SSISDB_Environment"].Value.ToString();
string VAR_ChildPackageToRun = Dts.Variables["$Package::ChildPackageToRun"].Value.ToString();
int VAR_ChildPackageLoggingLevel = Convert.ToInt16(Dts.Variables["$Package::ChildPackageLoggingLevel"].Value.ToString());

//Conncetionstring to the Server of the SSISDB
SqlConnection SqlCon = new SqlConnection(VAR_DBServer);

IntegrationServices integrationServices = new IntegrationServices(SqlCon);

//Childpackage
PackageInfo childpckg = integrationServices.Catalogs["SSISDB"].Folders[VAR_SSISDB_Folder].Projects[VAR_SSISDB_Project].Packages[VAR_ChildPackageToRun];

//Environment
EnvironmentReference EnvirRef = integrationServices.Catalogs["SSISDB"].Folders[VAR_SSISDB_Folder].Projects[VAR_SSISDB_Project].References[VAR_SSISDB_Environment, "."];

//Parameterset for package execution
System.Collections.ObjectModel.Collection executionValueParameterSet = new System.Collections.ObjectModel.Collection();

//Parameter mapping (depends on the parameters of the child package)
executionValueParameterSet.Add(new PackageInfo.ExecutionValueParameterSet { ParameterName = "ThreadID", ParameterValue = ThreadID, ObjectType = 30 });

//Execute the child package
long executionIdentifier = childpckg.Execute(false, EnvirRef, executionValueParameterSet);

//check status of Childpackage
ExecutionOperation executionOperation = integrationServices.Catalogs["SSISDB"].Executions[executionIdentifier];

//wait until child package execution is completed
while (!executionOperation.Completed)
{
executionOperation.Refresh();
Thread.Sleep(500);
}

if (executionOperation.Status == Operation.ServerOperationStatus.Failed)
{
Dts.TaskResult = (int)ScriptResults.Failure;
Dts.Events.FireError(0, "Run child pkg for parent task", VAR_ChildPackageToRun + " failed.", string.Empty, 0);
}
}
</code>

Best regards

Rene

Waldemar Sarmiento
Do, 18.05.2017 - 05:58

Hi Rene,
Thank you very much. I did not see your reply until now and still have not found a good solution so I will try it that way tomorrow. I will also let you know the different approaches that I had unsuccessfully tried.
Thanks again,
Waldemar

Neuen Kommentar schreiben

Der Inhalt dieses Feldes wird nicht öffentlich zugänglich angezeigt.

Klartext

  • Keine HTML-Tags erlaubt.
  • HTML - Zeilenumbrüche und Absätze werden automatisch erzeugt.
  • Web page addresses and email addresses turn into links automatically.
Teilen auf

Newsletter Anmeldung

Abonnieren Sie unseren Newsletter!
Lassen Sie sich regelmäßig über alle Neuigkeiten rundum ORAYLIS und die BI- & Big-Data-Branche informieren.

Jetzt anmelden