Kimball Hierarchy Bridge Table

Posted on January 15, 2011

1



The Bridge table advocated by Ralph Kimball is yet another method to navigate hierarchies. It works by storing all of the ancestors for each node. The leaf node is then joined to the fact table, and the ancestor joined to the dimension table ala:

The contents of the ProductBridge table would then look like this:

To save space I haven’t shown all results, and I’ve included the enumerated TreePath to give an understanding of the hierarchy involved.

So how do we populate this? Let me state that I’ve tried a number of methods that worked fine for the example provided, but not on a a hierarchy with 30,000 members. I initially tried building the TreePath and then using a scalar function inside a loop but I killed the job after an hour. I then tried converting the TreePath to the XML datatype and using an XQuery function to get each ancestor. This also proved too expensive and I killed the job after 30 minutes.

I then decided to use a cursor (they aren’t always bad), and use a recursive CTE to go up the hierarchy for each node. For 30,000 records it returned in 35 seconds (8 seconds without including the number of levels from the top node). A vast improvement.

So here’s the code:

IF OBJECT_ID('dbo.Product') IS NOT NULL DROP TABLE dbo.Product
GO
CREATE TABLE dbo.Product(ProductID int not null PRIMARY KEY
, ProductName varchar(100)
, ParentProductID INT)
GO
INSERT INTO dbo.Product(ProductID, ProductName, ParentProductID)
VALUES (0,'Baby Goods',NULL)
, (10,'Baby Food',0)
, (20,'Nappies',0)
, (100,'All Ages Baby Food',10)
, (150,'Beginners',10)
, (200,'Strawberry Yoghurt',100)
, (250,'Baby Cereal',100)
, (300,'Formula Milk',150)
, (310,'Heinz Formula',300)
, (400,'Small Pack Nappies',20)
, (450,'Bulk Pack Nappies',20)
GO
IF OBJECT_ID('dbo.ProductBridge') IS NOT NULL DROP TABLE dbo.ProductBridge
GO
CREATE TABLE dbo.ProductBridge(
   AncestorProductID INT
 , ProductID INT
 , HLevel INT)
GO
DECLARE @Node INT
DECLARE cur1 CURSOR LOCAL FAST_FORWARD FOR
SELECT ProductID
FROM dbo.Product a
 OPEN cur1
 FETCH NEXT FROM cur1 INTO @Node
 WHILE @@FETCH_STATUS = 0
 BEGIN
  ;WITH Ancestors
  As 
  ( SELECT  p.ProductID
    , ParentProductID
    , 0 AS Hlevel
   FROM dbo.Product p  
   WHERE ProductID = @Node
   UNION ALL
   SELECT p.ProductID
    , p.ParentProductID
    , h.Hlevel + 1
   FROM dbo.Product p  
   INNER JOIN Ancestors H
   ON H.ParentProductID = p.ProductID  
  ) 
   , FixLevels AS (
   SELECT ParentProductID, ProductID, MAX(HLevel) OVER() AS MaxLvl
   FROM Ancestors c 
   )
  INSERT INTO dbo.ProductBridge(AncestorProductID, ProductID, HLevel)
  SELECT COALESCE(ParentProductID, @Node) AS AncestorProductID
   , @Node AS ProductID
   , MaxLvl AS HLevel
  FROM FixLevels
  FETCH NEXT FROM cur1 INTO @Node
 END
 CLOSE cur1
 DEALLOCATE cur1

Now, to join this to the fact table to get the rollup of all levels in the hierarchy we would do the following:

IF OBJECT_ID('dbo.ProductSale') IS NOT NULL DROP TABLE dbo.ProductSale
GO
CREATE TABLE dbo.ProductSale(SaleDate DATE, ProductID INT, SaleQty INT)
GO
INSERT INTO dbo.ProductSale
VALUES (GETDATE(), 200, 50)
, (GETDATE(),250, 90)
, (GETDATE(), 300, 80)
, (GETDATE(), 400, 23)
, (GETDATE(), 450, 19)
GO
;WITH BridgeCTE AS (
SELECT p.AncestorProductID
 , p.ProductID
 , SaleQty
FROM dbo.ProductBridge p
LEFT OUTER JOIN dbo.ProductSale ps
on p.ProductID = ps.ProductID
)
SELECT p.ProductName, SUM(SaleQty) AS SaleQty
FROM  BridgeCTE ps
LEFT OUTER JOIN Product p
ON ps.AncestorProductID = p.ProductID
--where p.ProductName = 'Baby Goods'
GROUP BY p.ProductName


This returns:

ProductName SaleQty
All Ages Baby Food 140
Baby Cereal 90
Baby Food 220
Baby Goods 262
Beginners 80
Bulk Pack Nappies 19
Formula Milk 80
Heinz Formula NULL
Nappies 42
Small Pack Nappies 23
Strawberry Yoghurt 50

Finally, there are several downsides to the Bridge hierarchy method that need to be considered:

  • Difficult to maintain
  • Becomes more complex with Slowly Changing Dimensions
  • May suffer poor performance with large data sets
  • The bridge table explodes in size due to ancestor combinations
Advertisements
Posted in: SQL