Nested Set Hierarchy

Posted on February 14, 2011

2



Another method of modeling hierarchies in SQL, is the Nested Set method created by Michael J. Kamfonas and made popular by Joe Celko.

I’ve implemented this method in the past and through testing have found it to be extremely performant.

This method involves creating left and right values for each member in the hierarchy. Then using those left and right values to navigate the hierarchy.

Let’s create our test data again:

IF OBJECT_ID('dbo.Product') IS NOT NULL
DROP TABLE dbo.Product
GO
CREATE TABLE dbo.Product(ProductID int not null primary key
, ProductName varchar(100)
, ParentProductID INT)
GO
INSERT INTO dbo.Product(ProductID, ProductName, ParentProductID)
VALUES (0,'Baby Goods',NULL)
, (10,'Baby Food',0)
, (20,'Nappies',0)
, (100,'All Ages Baby Food',10)
, (150,'Beginners',10)
, (200,'Strawberry Yoghurt',100)
, (250,'Baby Cereal',100)
, (300,'Formula Milk',150)
, (310,'Heinz Formula',300)
, (400,'Small Pack Nappies',20)
, (450,'Bulk Pack Nappies',20)
GO

There are several ways to populate a nested set table. The best that I have found is described by Adam Machanic here:

http://sqlblog.com/blogs/adam_machanic/archive/2006/07/12/swinging-from-tree-to-tree-using-ctes-part-1-adjacency-to-nested-sets.aspx

So let’s use that method to populate our table:

IF OBJECT_ID('dbo.ProductsNested') IS NOT NULL
DROP TABLE dbo.ProductsNested
GO
;WITH ProductLevels
AS
( SELECT  p.ProductName
  , p.ProductID
  , 1 AS HLevel
  , CONVERT(VARCHAR(MAX), ProductID) AS TreePath
  , ParentProductID  
 FROM dbo.Product p  
 WHERE p.ParentProductID IS NULL
 UNION ALL
 SELECT p.ProductName
  , p.ProductID
  , x.HLevel+1
  , x.TreePath + '.'
  + CONVERT(VARCHAR(MAX), p.ProductID) AS TreePath
  , p.ParentProductID  
 FROM dbo.Product p  
 INNER JOIN ProductLevels x
 ON x.ProductID=p.ParentProductID  
)  
,ProductRows AS(
    SELECT
         ProductLevels.*,
         ROW_NUMBER() OVER (ORDER BY TreePath) AS Row
    FROM ProductLevels
)
SELECT ProductName
     ,ER.ProductID
     ,ParentProductID
     ,ER.TreePath
     ,ER.HLevel
     ,ER.Row
     ,(ER.Row * 2) - ER.HLevel AS Lft,
     ((ER.Row * 2) - ER.HLevel) +
        (
            SELECT COUNT(*) * 2
            FROM ProductRows ER2
            WHERE ER2.TreePath LIKE ER.TreePath + '.%'
        ) + 1 AS Rgt
INTO dbo.ProductsNested       
FROM ProductRows ER
ORDER BY TreePath

What this code does is convert this:

Adjacency List

To this:

 Nested Set Hierarchy

To get the descendants of Baby Foods, we do a self-join and look for all those descendants are between the left and right values i.e. where the left value is between 2 and 15. So looking at the graph above, we can see that the left value for All Ages Baby Food is 3, and 3 is between 2 and 15.  

Here’s the T-SQL:

SELECT child.ProductID
, REPLICATE('     ', child.Hlevel - parent.Hlevel)
  + child.ProductName AS ProductName
FROM ProductsNested parent
INNER JOIN ProductsNested child
ON parent.ProductName = 'Baby Food'
AND child.lft BETWEEN parent.lft AND parent.rgt
ORDER BY child.lft

This returns:

To get the ancestors of Heinz Formula we use:

SELECT parent.ProductID
, REPLICATE('   ', parent.Hlevel) + parent.ProductName AS ProductName
FROM ProductsNested parent
INNER JOIN ProductsNested child
ON child.ProductName = 'Heinz Formula'
AND child.lft BETWEEN parent.lft AND parent.rgt
ORDER BY child.lft

To retrieve all leaf nodes:

SELECT ProductName
FROM dbo.ProductsNested
WHERE rgt - lft = 1

Returning:

ProductName
Strawberry Yoghurt
Baby Cereal
Heinz Formula
Small Pack Nappies
Bulk Pack Nappies

To join to a dataset and retrieve the aggregated results:

IF OBJECT_ID('dbo.ProductSale') IS NOT NULL
DROP TABLE dbo.ProductSale
GO
CREATE TABLE dbo.ProductSale(SaleDate DATE
, ProductID INT
, SaleQty INT)
GO
INSERT INTO dbo.ProductSale
VALUES (GETDATE(), 200, 50)
, (GETDATE(),250, 90)
, (GETDATE(), 310, 80)
, (GETDATE(), 400, 23)
, (GETDATE(), 450, 19)
GO
--Get Totals
;WITH Sales AS (
 SELECT lft, ProductName,COALESCE(ps.SaleQty,0) AS SaleQty
 FROM dbo.ProductsNested child
 LEFT OUTER JOIN dbo.ProductSale ps
 ON ps.ProductID = child.ProductID
 )
SELECT REPLICATE('   ', parent.Hlevel) + parent.ProductName AS ProductName
 , SUM(SaleQty) AS SaleQty, parent.lft
FROM ProductsNested parent
INNER JOIN Sales child
ON child.lft BETWEEN parent.lft AND parent.rgt
GROUP BY parent.ProductName, parent.lft, parent.Hlevel
ORDER BY parent.lft

This returns the following:

 

From what I’ve read, this method is great for forests of trees, such as Discussion Forums, but can be slow with fast changing hierarchies where the entire table needs to be rebuilt to insert values between nodes e.g. if we inserted a child for Baby Cereal, then all of it’s parent and nodes to the right would have to reset their values.

From a Business Intelligence POV, I believe Nested Sets are ideal when dealing with ragged hierarchies in SQL, because the updates only occur once a night. The way I’ve implemented this in the past is to keep the parent-child values from the source system and simply update the lft and rgt values in my dimension table each night as part of the nightly batch job.

Advertisements
Posted in: Data Modeling, SQL