BLOG

SSIS Pattern: Naming Convention Part I

16.03.2012 Daniel Esser

A naming convention is a convention for naming things. The intent is to allow useful information to be deduced from the names based on regularities. For instance, in Manhattan, streets are named in two different fashions. East-West streets are called „Streets“ and North-South streets called „Avenues“.

First we have to identify which entities have to be named in SSIS:

  • Projects
  • Packages (filename and identification name)
  • Configurations
  • Loggers
  • Connections
  • Variables
  • Control Flow and Data Flow items

Secondly it is not important which naming convention you use. It is important that you use a naming convention. There is a good article about naming conventions on Wikipedia
you could use for inspiration. The CamelCase naming convention is often used in programming, so I decided to use it for the following examples.

Thirdly we need a clear and intuitive naming definition on how to describe the listed entities above. A naming definition of an entity assumes that we know how to describe important properties of the entity. So what are the important properties of the listed entities?

The Project Name

The name of the SSIS-Project gives a developer an hint which technical task or problem is solved inside. This name is the key entry point to a technical solution, a name for a jigsaw piece. Is is not best practise to bunch all technical things together. Sketch out the system boundaries and name them. Imagine it is a piece of a jigsaw. What picture (the name) have to be on the piece so you can assign it to the right place in the whole.

The Package Name

The package name is related to the given project name. A package is a small piece of the entire project. Without it the entire project can’t exist.

A package can do many things. It could be a control package, which starts other packages. It could be a package which start dimension or cube processing or it loads data from external or operative data sources into a local database for further processing. We can see that there are at least three different things a package can do: Control, Load and execute functional things like sending emails or starting dimension processing. So it suggests itself to use this information for the package naming definition. For example you could use the following prefixes in the package name:

  • Ctrl for packages which controls the execution of other packages
  • Load for packages which loads data
  • Func for packages which do some functional things like sending mails etc. pp.

Based on this we can go further. For Ctrl and Func packages it is obvious how to name them further. The simple questions are: What is the package to be controlled? What functional thing the package will do? Some examples:

  • CtrlMaster – A master control package for the whole project
  • CtrlCubeProcessing – A package which calls other packages to process dimensions and cubes in a non default manner
  • FuncSendSuccMail – Send a success mail to some people after processing cubes
  • FuncValidateMailAdresses – A package which validates a table with mail addresses. It marks which are correct and which not.

A load package is somewhat specific. Mostly there is at least one source and one target. So load packages can operate on different databases, tables, schemas etc. pp. The package name should reflect that fact. Here at ORAYLIS we have a process model on how to load data for Datawarehousing. We arrange the whole loading process into three layers: Extraction-Layer, Working-Layer and Frontroom. The Extraction-Layer hold the raw data from external and operative data sources. You could think of it as a one-to-one copy in a relational database or the interface to the outside world. In the Working-Layer we put data together (joining tables, calculating new columns, renaming, data cleansing etc.). The Frontroom-Layer is the actual Datawarehouse. So we have three important layers for loading a Datawarehouse. This could lead us to the following naming definition for Load-Packages.

  • LoadExtract – A package which loads data from an operative or external data source
  • LoadWork – A package which modifies the loaded data from the Extraction-Layer
  • LoadFront – A package which fills the Datawarehouse with data from the Work or Extraction-Layer

It is possible to have more than one Extraction-, Work- or Frontroom-Layer. Simply add the name of the layer (or other important information) at the end of the package name. For example LoadFrontSalesDimCustomer if the package fills the Customer Dimension for the Sales Datawarehouse which is stored in the Frontroom-Layer. The final naming definition for Load-Packages could be

„Load“ {Logical Layer} {Database Name or abbreviation} {Tablename/Dimension/Fact}

Connections

Based on the naming conventions above for packages it is best practice to name a connection based on a logical name rather a database name or a server name. A database name can be volatile or meaningless like EGH275. So calling a connection DWH123.EGH275 is either wrong after the Go-live because the server name changed or could be meaningless like EGH275. A logical server name could be DWH and a logical database could be Frontroom which leads to DWHFrontoom or DWHWork.

{Logical Server Name} {Layer}

Variables

Variables in SSIS can be described with a name, a name space and a context. The context is the world were the variable exists in, typically an SSIS container. So variables are context specific or global. The default name space for a variable is called User.

I use the name space to describe the main purpose for using the variable. Is it a variable to override connection strings? Is it a counter in an For-Each-Loop-Container? I found at least three name spaces:

  • User – holds mostly context specific variables
  • Config – holds variables which overloads package or task properties and is more global
  • Result – holds variables with information for further processing inside or outside the package. These variables are mostly global.

The variable name is highly specific. So the only advice I can give is to use lowerCamelCase.

Control and Data Flow Items

The name of a task or container is comparable with the name of a function or procedure. For example the function name Math.Add imply that it can add two numbers. A Task name should do also.

I often use the WhoN’How-Where-What-Pattern for Data Flow Items. For example a data source (Who?) which loads accounting periods (What?) from the Work-Layer (Where?) in the ABCD-Table (Where?) is called:

SSIS Naming Convention

Using CamelCase here would lead us to SrcWorkAbcdAccountingPeriod but this is too illegible. See here another example. A Lookup-Component. Proof it … what is done here?

SSIS Naming Convention

This name gives us the following information: A full cache lookup onto the ShippingAddress table in the Extract-Layer with the IDShipAddr key column which returns the StrName column named as Street. It is very long eeh? Of course it is, but having a meaningful name is better than having a non-meaningful name. Every time I see packages without naming conventions the first thing I do is to rename the items inside to get the big picture.

Conclusion

In this article I covered naming conventions for control and data flow items as well as for packages, connections and variables. Feel free to make comments and suggestions on how to name things and please keep in mind: It is not important which naming convention you use. It is important that you use a naming convention.

Your email address will not be published. Required fields are marked *

Join #teamoraylispeople

Gestalte mit uns
die Welt der Daten