Self-Service BI, PowerPivot and the future of traditional BI (DWH, OLAP, MDX)
SQL Server 2008R2 | Excel 2010/PowerPivot
Since I’m using the recently released Microsoft PowerPivot Add-In for Excel 2010 and when reading the rumors about the future of traditional OLAP and MDX there are some questions about the big picture of a BI-environment including self-service functionality. Basically, the BI world of the past had excepted the idea of a central data warehouse having a meta data layer such as OLAP to perfectly present the information to the end users. What about this new player PowerPivot then? How does this fit into the picture? Is there still a future for things like OLAP, MDX, central data warehouses or do we only need to roll out self-service BI functionality to every desktop PC? Some people have asked me questions about my point of view here and although I’m not a Microsoft representative I’d like to share my personal opinion with you:
Self-service BI tools are not and will never be a replacement for traditional BI-systems but a great enhancement for them.
In other words, including self-service BI functionality to your BI-system will increase the possibilities and the analytical power of the end users. But if you are cutting costs for centralized BI solutions believing to replace them with self-service tools, you’ll end up with even higher costs as suddenly the work that has only been done once in the centralized BI system will be performed redundantly, in an inconsistent way, error-prone and with much more working time needed.
In order to understand this, let’s look at just some advantages, a centralized BI-system can offer, which cannot be replaced by a decentralized self-service BI tool like PowerPivot:
Combination of data from multiple source in a consistent and time-saving way
If this is done on a user per user approach, it is very likely that different users are getting to different results. Also, this work is highly redundant. Imagine different departments getting to the IT in order get “their” data exported, then trying to combine it into a single data store. Often enough, this job requires additional mapping tables (customers, articles etc. may have different ids in different systems). We are supporting customers with multiple ERP-systems (due to mergers) and the mapping can be quite complicated. But even if it is just a single source of information, mapping between different tables has to be done and requires skills and knowledge about the data models. For example, if you forget to consider a key field the result may differ significantly. Or think about the need to exclude rows with a certain status, because this means they’re cancelled. One end user might know about this, the other might not.
This means that you will still need your sophisticated ETL processes, a proper front room model, slowly changing dimensions and all the stuff we know from our typical BI projects.
Consistent use of common calculations considering approved business rules
In many cases, our current ETL processes include complicated business processes for doing calculations and data mappings. Key measures and performance indicators have to be calculated in a consistent way. Business rules are the backbone for the company’s information system. If different departments are comparing apples and oranges, there is a lot of space for confusion and wrong decisions. Think of a simple example, like reporting the revenue. Are warranty adjustments considered? What about partial payments or commissions? What about discounts (for example staff discounts)? For all those aspects it has to be decided to include or exclude them into a certain key measure and also which relation to the time dimension is correct. Is revenue for a partial payment considered as a sale (full amount) at the date of purchase or are the real payments (cash flow) considered? If every user has to make these decisions it is very unlikely that everybody is doing the same calculations. Comparing results for different departments (think of a sales meeting for the different product managers) will then get very difficult.
Security can only be implemented in a central data store
Some real world scenarios are currently looking like this: The IT department exports data for other organizational structures, manually filtering out the data that is not intended for the recipient. This might work with very simple security structures, but with more security roles, user dependent security or more information recipients, this would lead to an enormous amount of work for exporting all the data. And if there are changes to the companies security model, all the exports have to be considered again. Having a central OLAP solution makes it easy to define the security roles and access rights in a central place using the business view on the data. For example, in OLAP you can restrict a user to see the cost centers for which she or he is responsible – OLAP takes care about all related data (for example automatically filtering the cost facts to these cost centers). There is no need for a huge amount of data exports as users can retrieve the data and information they need and IT only has to make sure that the data is available and secured.
The need for management reporting
Management needs an overview about some or all business units. Having the information (especially the calculations, KPIs etc.) in a decentralized environment makes it very difficult to get this management reporting in a simple, time saving way. It is more likely, that IT has to do special exports which are then processed by the controlling department to build the management reports. This could result in controlling spending all the time in doing data management, not information management and controlling. Also, in this scenario, the data from the management reporting will most likely differ from the data of the departments. Just imagine the CEO going to a some product manager saying “Hey, you’re product profitability is –5%” and the product manager says “No, it isn’t. Look at MY report. Here it reads +3%” and actually neither of them could say which result was the right one…
Then, after all, if the central BI environment is so important, do we really need decentralized self service BI? Well, not every user will need self-service BI but for some it can be a real time saver or give them a lot of analytical power. Here are just two important scenarios:
- The end user wants to analyze information by special properties of the data which are not present in the centralized data warehouse. Just think of product managers. Each product has a different target group, special conditions in the market and therefore potentially certain aspects that are different from product to product and therefore from product manager to product manager. Those information might only be relevant for certain products. Having this in a central data warehouse would be confusing as the information has no meaning for most of the products. Allowing each department to cover the specific needs of their work while still providing the central information being available in the data warehouse is the best way here.
- The end user wants to combine the centrally provided information with other sources of data, for example information that has been purchased/acquired from external sources and which is not complete in means of geography, time etc. Think of a marketing department planning a campaign. In order to do so, they want to analyze sales data in conjunction with external data for purchasing power. The external information was only purchased once for the region where the campaign is planned. This kind of data cannot be loaded into the central data warehouse. But with self-service BI it can be analyzed side-by-side with the centrally provided sales data.
Or think of one department trying to improve product quality by changing some of the parameters during the production cycle for some of the production batches. These changes are tracked in some other system (let’s say Excel) but not in the central DWH as they do only apply to this single line of production. Self-service BI allows us to analyze changes in the parameters together with data from the central data warehouse side by side (for example quality control data in this case).
So my opinion is that the traditional centralized BI systems really benefit from self-service BI functionality. However, self-service BI can never replace traditional BI. But what about technical aspects, especially the future of MDX? MDX is the query language for multidimensional databases (used by many vendors). In PowerPivot we can do a lot of the calculations using the new expression language called DAX. Will DAX be a replacement for MDX? Absolutely not! DAX is meant to bring analytical power to Excel users. It looks similar to Excel functions and in fact many Excel functions can be used. Its strength is simplicity. Although one could imagine to extent the expression based DAX language to query functionality (in MDX you can write both, queries and expressions) this would also complicate the use of DAX which is clearly not intended. Even with today’s Excel, many users only know about the operators +, –, / and * and the SUM function (advanced users know about SUMIF…). In order to have end users, even power users, being able to leverage the power of a self service BI solution, the calculation functions have to be as simple as possible. This is the idea of DAX. However, when defining complex queries, building highly sophisticated business logic into calculations or KPIs, implementing ease-of-use like KPI trends, OLAP actions, drill through queries, navigation in hierarchies (DAX has no hierarchies) MDX has everything that’s needed here. Including this functionality into DAX would only make it complicated and more difficult to understand and use. Of course, this is only my opinion, but I’m sure that MDX will still be used for what it is used today and DAX expressions will be used for self-service calculations and mappings that are intended to be done by none-technical people.
When using client-side technology to create any kind of informational insight we have to monitor this process carefully. There has to be a process to maintain business requirements and implement changes to the central BI-system to avoid the negative effects mentioned above. As with PowerPivot, it is also possible to monitor which workbooks have been used and which data sources have been queried (if the PowerPivot sheets are published to Sharepoint 2010). I think this is also important to really understand, if the self-service BI tool is used in a way it is intended to be used or if some analysis requirements that should have been part of the central BI solution are now starting to be solved in multiple departments redundantly.
So again, we will see a co-operation between the centralized BI-system and self-service BI solutions, as well as between MDX and DAX. Self-service BI and DAX are not the fox in the chicken-house of traditional BI but they are extending the vision and scope of BI-systems of today and in the future.