When you create relationships between tables in PowerPivot (or in a SSAS tabular cube), you may run into the following message:
Below I’ll explain a way to solve this.
By setting relationships in the designer you tell PowerPivot how the tables are related. This is important because that’s how PowerPivot knows how to slice and dice data from one table by attributes from other tables. PowerPivot needs to know unambiguously which records in one table are related to records in another table. Therefore, if you follow the relationships in the direction of the arrows, there may only be one direct or indirect path from any table to any other table.
When PowerPivot encounters two conflicting paths, it will automatically deactivate one of them. In the PowerPivot designer this path then appears as a dashed arrow. If you wish to reactivate a relationship, you first must deactivate another conflicting relationship. Unfortunately, in real live scenarios you’ll often have a business need for both conflicting relationships, so by changing which relationship is deactivated you’re only moving the problem. Fortunately, there is a way to solve this!
Let’s take a simple example:
In this example we have the three tables:
- Location – this is a table with addresses (for simplicity we only included ZipCode above)
- Customer – this is a table of customers. The customer table has a reference to the Location table for the customer’s home address.
- Sales – this is a table of sales orders. The table has a reference to the customer table as well as a reference to the Location table for the address of the store where the sales was made.
Also, note that the Customer and the Sales table have calculated fields. These are calculated aggregates (not calculated columns), which is important as you’ll see shortly.
Assume you have the following business requirements:
- You should be able to show sales by store location
- You should be able to show sales by household size
- You should be able to show average household size by ZipCode
As you see in the diagram above, one of the relationships was deactivated because there were multiple paths from Sales to Location. Because the relationship from Sales to Customer has been deactivated, requirement 2 cannot be satisfied. You can try to solve this by first deactivating another relationship, however, that will lead to one of the other business requirement not being satisfied.
You can solve this by importing certain tables two times and re-routing relationships. The $64.000 question is: which tables should you load twice and where to put the relationships? I’ll first explain the steps and then the rational behind them.
Steps to solve the problem
- Rename all tables which have one or more calculated aggregates so that their friendly names in PowerPivot are prefixed with “Fact”. In our example the Sales table is renamed to FactSales and Customer is renamed to “FactCustomer”.
- Import all tables which you just renamed a second time into the model. In our example we’ll import Sales and Customer a second time (keeping their friendly names unchanged). Do not create calculated aggregates on these tables. Note: in case PowerPivot doesn’t let you add the same table twice, see the comments at the bottom of this page on how you can work-around that.
- Remove all relationships and recreate them such that all relationships originate from a fact table and their destination is a non-fact table. Don’t forget to also create relationships from the fact tables to their non-fact sibling tables.
The diagram for the example project now looks like this:
All relationships are active and all of the business requirements can now be satisfied!
Finally there’s some cleaning up we can do. The fact and non-fact tables have a lot of duplicate fields. My advice is to hide all the columns on the fact tables in PowerPivot so only the calculated aggregates from these tables are exposed to Excel or another OLAP client.
Once everything is working you can also remove unneeded columns from the data model in order to minimize the amount of data loaded into PowerPivot or the tabular cube.
Rational behind this method
The ordinary way many BI experts model data in OLAP solutions is through“dimensional modeling”. Dimensional modeling implies that there are basically two types of tables: dimension tables and fact tables. Dimension tables typically hold text fields that are used for slicing and dicing (e.g. customer name, product name, country) whereas fact tables have numeric fields which can be aggregated (e.g. sales amount). Fact tables also have foreign key columns which link them to dimension tables. This leads to “star-schema” diagrams where each fact table is linked via a 1:many relationship to multiple dimensions tables. (There’s a variant known as “snowflake-schema” where dimension tables can also link to other dimension tables.)
PowerPivot’s tabular model is simpler and more intuitive to business users since it doesn’t force them to learn dimensional modeling. The downside is that you easily run into problems like the one we covered here. The method I described in this post uses the principles of dimensional modeling to solve the problem.
I tried to explain dimension modeling in a couple of sentences, but it would take a few hundred pages to discuss it thoroughly. Even though PowerPivot does not enforce dimensional modeling, understanding the concepts and principles of it helps you design better solutions. If you want to learn more about dimensional modeling I recommend reading Ralph Kimball’s book “The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling”.
If you’ve found another (better?) way of solving conflicting relationships in PowerPivot model, please let me know!