Importing data from SAP using the SAP BW Delta Queue and Theobald Software Xtract IS
Importing data from SAP is a common task in many BI projects. Since Integration Services (SSIS) doesn’t have out of the box support for dealing with SAP as a data source, third party components should be considered to simplify this task. The components we’re using and which I am writing about here are supplied from Theobald Software and are called Xtract IS. A fully functional trial version is available and the examples in this blog post can be reproduced using this trial version. Xtract IS contains many different options to access a SAP system, for example
- read table (through rfc_read_table or by using a custom function included in Xtract IS that also supports compression) or sap queries
- call remote functions (RFCs) and BAPIs
- Retrieving data from legacy reports
- Connection to BW via OHS
- Connection to SAP extractors (aka “delta queue” or “generic data source”)
- much more…
For this short blog series I’d like to concentrate on the extractor interface which is optimal for reading large tables from SAP. Especially the support for the delta detection is important when loading tables or structures with maybe many billion rows of data. In order to avoid misunderstandings, Theobald’s delta queue component does not require SAP BW to be in place, but it basically acts like BW and uses the same interface to push the data to SSIS in a very efficient way.
Since this post got a little longer, let’s start with an agenda:
- How do I get access to a SAP system?
- Required customizations in SAP to use BW extractors (Delta Queue)
- How to use existing (pre-defined) extractors in Integration Services
- Create and use a simple custom extractor based on SAP query
- Create and use a delta-enabled custom extractor based on SAP query
How do I get access to a SAP system?
Before we start, of course we need some access to an existing SAP system. If you have a development system in house the administrators might give you the required access. If not, there are some other options to get started and to create a test environment for connecting SAP to Integration Services.
Most of the examples I’m using here in this blog post are working fine with the SAP ABAP NetWeaver trial version (so called “mini-SAP”). This edition runs for 90 days and can easily been extended for whatever time is needed. Since the download link changes from time to time it’s best to search for the download using your favorite search engine or to go to the SAP Community website (registration needed) and search for “SAP NetWeaver Trial Version ABAP”. There is an excellent blog series by Manfred Lutz about the ABAP trial version covering the installation process as well as many aspects of configuration. In addition, if you don’t want to install the trial version, I can also recommend having a look at the Consolut website as they’re also offering free of charge access to a mini sap. Until recently they also offered free of charge access to a full SAP IDES system which is somewhat like the AdventureWorks for Microsoft SQL Server (however IDES covers a lot more of real life business processes). Unfortunately they cannot offer this service any longer so you might want to take a look at the SAP offerings: https://training.sap.com/shop/live-access .
Required customizations in SAP to use BW extractors (Delta Queue)
Theobald Software has very detailed instructions on its web site about how to use and configure the SAP system. You can find the detailed step-by-step instructions here . For this post, I exactly followed those steps naming all of the created objects exactly as shown in the documentation.
How to use existing (pre-defined) extractors in Integration Services
In SSIS we first need to create a connection to the SAP system. With Theobald Software’s Xtract IS installed, you can add a new connection of type “XTRACT”:
SAP has some specific settings that need to be entered for making a connection (for example the client or the system number). The following dialog box is used to enter these settings which are usually provided by your administrator.
In order to use the source component in a data flow, we first have to create a data flow (obviously) and then drag the Xtract DeltaQ component to this data flow:
When opening the component’s settings you first start filling out the right four boxes. The search buttons help you to find the correct information.
If you’re using the Xtract IS components for the first time or if you are not sure if the customization of the SAP system was correct, it is a good idea to use the “Customization Check” link (marked with the red arrow in the screenshot). This function checks the security settings and customizations of the SAP system and finally gets back with a checklist:
If any of the tests fails the message usually gives administrators enough information to fix this. Next task to do is to find the extractor to deliver data to the data flow. If you’ve installed the SAP trial system, not much content is available. But as mentioned above, you could also connect to the Consolut IDES demo system containing a lot of pre-defined extractors. Now, how do you find the extractor for your data? The easiest thing is to search the web for “sap extractor for <<your topic>>”, for example “sap extractor for profit centers”. You could also go to http://help.sap.com directly or check the content available on the Consolut website .
For this demonstration, I’m using one of the profit center transaction extractors. The search function makes it easy to find the extractor by name or description. Please note that if you are using the Mini SAP installation, this extractor is not available so you need to go for one of the existing extractors there.
I’m using the first of the extractors in the list (0EC_PCA_1). After hitting ok, we’re back in the main configuration dialog now showing all the fields provided by the extractor:
If filters are defined for the columns in the data source/extractor you will notice an “Edit” link right beside the checkbox for including the column in the output. Filters can be defined following the typical way SAP handles filters (include/exclude, cascaded) and package variables may be used for filter values.
Some important settings can be found in the “Settings” dialog (link right below the dialog window header). Pay special attention to the “automatic data type conversion” setting:
With automatic type conversion enabled, Xtract IS derives the type correctly for many different columns. You can see this effect on the data flow (meta data):
|without automatic type conversion||with automatic type conversion|
Detailed explanation of all the settings for this dialog can be found on the Theobald Software website, for example here . The setting dialog also allows to name a package variable for passing the delta mode as shown below:
Another great feature is the ability to generate a documentation for all fields that are used in the extractor (button “generate documentation”). The output is in Excel format and looks like this:
Create and use a simple custom extractor based on SAP query
If the data you want to load is not covered by an extractor, you can still use the extractor functions by creating a custom extractor. You can do so by using a SAP View, a Query or by writing custom ABAP code. For this example, I’m using SAP Query to create a query based on the simple flight model data that is included in SAP systems (also in the SAP trial version) for educational purposes.
First we need to create an info set for the query using transaction code SQ02:
After clicking “Create” further information has to be provided:
We reference one of the tables of the flight model (sbook, the table for flight bookings) here. The other tables to join are defined in the next step. If you are unsure which tables are needed, a good starting point is to view the table definition using transaction se11 (ABAP dictionary). On the “Entry help/check” tab you see which tables are used to validate entry values. This is shown in the following dialog:
Also double clicking on the entries in the domain or search help column shows more information on the given structure. For this example, we will at least need the following additional tables:
- SCARR (airline)
- SFLIGHT (flight details)
- SPFLI (flight schedule)
From the info set editor you have to click the icon marked in red below in order to add more tables:
For tables that are required more than once (in this case the airport which is needed as deperature and arrival airport) you may create aliases before adding the table to the diagram:
SAP Query tries to automatically detect the links between the tables. However, in my case it didn’t work well, so you have to manually delete and recreate some of the links:
After adjusting the table links, we click on the “-> InfoSet” button to continue with the field selection. Here, the available fields of the query are defined. As a default we choose to start with the key fields here:
Next you can choose the fields you like by placing them in the right pane of the InfoSet editor (via drag and drop). Field groups are used mainly to structure fields a little bit:
Clicking on check validates the structures. In my case I had a wrong link created in the diagram:
So here is how the full diagram looks like after correcting it:
Next thing to do is to activate the InfoSet:
We can now create a new extractor (or “generic data source”) by using transaction code rso2. For my example I used “ZTESTHB” as the name for the new extractor. The name will be needed later on to refer to the extractor:
The entry box for the InfoSet becomes available once you hit the “Extraction by Query” button. If our InfoSet is not in the list it might be because of the visibility. In this case, go back to transaction SQ02 and click the “Role/User Group Assignment” button for the InfoSet.
You can now assign the InfoSet to all required user groups. Back in rso2 you may also need an application component which can be created using transaction code rsa6:
You may now save the extractor (aka generic data source). Depending on the fields you’ve chosen, you might get a warning about text fields:
Especially with large tables (the main reason why we’re using the extractor), it’s a good advise to transfer only the keys, not the text fields. Text fields and other descriptive information could be loaded separately (for example using the xtract table component). To remove textfields from the extractor, you can use this menu entry:
When saving the extractor you need to provide a package (or local object) and a workbench request (which you can also create from this dialog).
We’re now ready to use the new extractor from within SSIS. As shown above we need an XTRACT connection the SAP system and an Xtract IS DeltaQ component in the data flow. We can then configure this component:
The data source/extractor should be easy to find using the search function:
I simply select all fields here and activate the source:
A good way to check if everything works fine is the “Preview” functionality:
From within SAP you can use the gateway monitor (transaction code smgw) to see all active connections:
In the data flow the Xtract DeltaQ component has two outgoing connections. One is for the data retrieved, the other is for the log. Writing the log data to a table could also be helpful for tracking errors:
Create and use a delta-enabled custom extractor based on SAP query
So far, the data source/extractor generated does not support any delta mechanism. In order to change this we open the InfoSet again (using transaction code sq02) and add the booking data to the output (if it is not already there).
After re-generating the InfoSet we can edit the data source/extractor again (using transaction code rso2). Clicking on “Generic Delta” opens the following dialog to define the delta mechanism:
For the field we select our order_date field that we have just added to the field list. The safety interval results in the an overlapping data delivery so that we’re sure that we don’t miss any rows. However, in this case, each delta contains some rows from the previous batch which means that we need to take care of this in our data flow.
We can now switch the update mode to “C – Delta Initialization” (and I will show how this is done using a package variable in a minute). The next run of the package fetches all rows from the table.
Now switching the update mode to “D – Delta Update” gives no new rows on the next run:
The request log shows the detailed extraction:
In order to control the update mode from a package variable we first need to create the variable and then enter the variable name in the “Update Mode Variable” of the setting dialog. Also, don’t forget to change the update mode for the Xtract IS DeltaQ component to “V – SSIS variable” for the magic to happen:
At the end of this post I’d like to add some more tips on using the delta queue. First, you may test each extractor using then Extractor Checker (transaction code rsa3).
And delta queue maintenance (transaction code rsa7) may be used to check the current value for the delta field. Also, if something fails because of missing rights, a trace (transaction code ST01) may help (look for failed authority checks) and transaction code SU53 may be used to display the recently failed authority check. And finally, the Delta Queue Monitor (transaction code odqmon) could be helpful to monitor the state of delta enabled extractors.