cte Archives - Tangra Inc.

Tag Archive for: cte

Convert Hex String to Binary String

April 3, 2008/1 Comment/in SQL Server, XML/by Plamen Ratchev

Here is another use of table with numbers and concatenation using FOR XML PATH. Given a hexadecimal string value like ‘7FE0’, convert it to the binary representation ‘0111111111100000’. The table with numbers generated by the CTE is used to slice the hex string, then replace it with the corresponding binary value and concatenate using FOR XML PATH with blank element.

CREATE FUNCTION dbo.Hex2Bin(@hex VARCHAR(256))
RETURNS VARCHAR(1024)
AS
BEGIN
 
DECLARE @bin VARCHAR(1024);
 
SET @bin = '';
 
WITH Converter (hex, bin)
AS
(SELECT '0', '0000' UNION ALL
 SELECT '1', '0001' UNION ALL
 SELECT '2', '0010' UNION ALL
 SELECT '3', '0011' UNION ALL
 SELECT '4', '0100' UNION ALL 
 SELECT '5', '0101' UNION ALL
 SELECT '6', '0110' UNION ALL
 SELECT '7', '0111' UNION ALL 
 SELECT '8', '1000' UNION ALL
 SELECT '9', '1001' UNION ALL
 SELECT 'A', '1010' UNION ALL 
 SELECT 'B', '1011' UNION ALL
 SELECT 'C', '1100' UNION ALL
 SELECT 'D', '1101' UNION ALL 
 SELECT 'E', '1110' UNION ALL 
 SELECT 'F', '1111'),
N1 (n) AS (SELECT 1 UNION ALL SELECT 1),
N2 (n) AS (SELECT 1 FROM N1 AS X, N1 AS Y),
N3 (n) AS (SELECT 1 FROM N2 AS X, N2 AS Y),
N4 (n) AS (SELECT ROW_NUMBER() OVER(ORDER BY X.n) 
           FROM N3 AS X, N3 AS Y)
SELECT @bin = (SELECT REPLACE(SUBSTRING(@hex, n, 1), hex, bin)
               FROM N4 AS Nums
               JOIN Converter AS C
                 ON SUBSTRING(@hex, n, 1) = hex
               WHERE n <= LEN(@hex)
               FOR XML PATH(''));
 
RETURN @bin;
 
END
 
GO
 
SELECT dbo.Hex2Bin('7FE0') AS bin;
 
-- Result
bin
----------------
0111111111100000

Obscure Real Data for Testing

April 2, 2008/1 Comment/in SQL Server, XML/by Plamen Ratchev

Very often we have a good set of production data but it cannot be used for training or testing because of data confidentiality. And the solution sometimes is just to tweak pieces of the data so it cannot be linked to the original data. Here is one method for obscure data in SQL Server 2005. It is based on a couple techniques:

– Using table with numbers (here generated on the fly by cross joining common table expressions)

– Using the table with numbers to slice each value into individual characters

– Generating random values in the normal character range that will be used to replace the real characters

– Using FOR XML PATH with empty element to concatenate back the obscured value.

CREATE TABLE Patients (
 pname VARCHAR(35),
 pname_masked VARCHAR(35));
 
INSERT INTO Patients VALUES('John Doe', NULL);
INSERT INTO Patients VALUES('Jeff Smith', NULL);
 
WITH 
N1 (n) AS (SELECT 1 UNION ALL SELECT 1),
N2 (n) AS (SELECT 1 FROM N1 AS X, N1 AS Y),
N3 (n) AS (SELECT 1 FROM N2 AS X, N2 AS Y),
N4 (n) AS (SELECT ROW_NUMBER() 
           OVER(ORDER BY X.n) FROM N3 AS X, N3 AS Y)
UPDATE Patients
SET pname_masked = 
    CAST(
    CAST((SELECT CASE 
                  WHEN SUBSTRING(pname, n, 1) = ' '
                  THEN ' '
                  ELSE CHAR(x + ASCII(SUBSTRING(pname, n, 1)))
                 END 
          FROM N4 AS Nums
          CROSS APPLY 
          (SELECT TOP(1) T.n AS x
           FROM N4 AS T
           WHERE T.n + ASCII(UPPER(SUBSTRING(pname, Nums.n, 1))) BETWEEN 65 AND 90
           ORDER BY CHECKSUM(NEWID())) AS N
          WHERE Nums.n <= LEN(pname)
          FOR XML PATH('')) AS XML) AS VARCHAR(35));
 
SELECT pname, pname_masked
FROM Patients;
 
-- Results
 
pname        pname_masked
------------ -------------
John Doe     Xwlv Lyi
Jeff Smith   Qmol Yuyvs

Updates with CTE

March 12, 2008/7 Comments/in SQL Server/by Plamen Ratchev

Performing updates on columns based on values from another table is a very common need. Using the ANSI UPDATE normally requires multiple subqueries, which can be very inefficient especially if multiple filters have to be applied. The Microsoft specific UPDATE with JOIN is one solution. However, common table expressions provide a very elegant alternative, which has the same efficient plan as UPDATE with JOIN, but is much easier to read and maintain. The sample below demonstrates how to perform update based on another table using join and CTE. Using MERGE in SQL Server 2008 will make it even better.

CREATE TABLE Products (
 sku CHAR(5) NOT NULL PRIMARY KEY,
 product_desc VARCHAR(35) NOT NULL,
 price DECIMAL(12, 2) NOT NULL DEFAULT 0);
 
CREATE TABLE ProductUpdates (
 sku CHAR(5) NOT NULL PRIMARY KEY,
 product_desc VARCHAR(35) NOT NULL,
 price DECIMAL(12, 2) NOT NULL DEFAULT 0,
 effective_date DATETIME NOT NULL);
 
INSERT INTO Products VALUES ('CHS01', 'Child seat', 25.50);
INSERT INTO Products VALUES ('CUP03', 'Water cup', 5.25);
INSERT INTO Products VALUES ('HOL01', 'Cup holder', 3.50);
 
INSERT INTO ProductUpdates VALUES ('CHS01', 'Child seat with cushion', 26.95, '20080301');
INSERT INTO ProductUpdates VALUES ('CUP03', 'Water cup with handle', 6.25, '20080405');
 
-- Update all current product descriptions and prices
-- with updates that have effective date past today (March 12, 2008)
WITH Updates 
AS
(SELECT P.product_desc,
        P.price,
        U.product_desc AS new_product_desc,
        U.price AS new_price
 FROM Products AS P
 JOIN ProductUpdates AS U
   ON P.sku = U.sku
 WHERE U.effective_date < CURRENT_TIMESTAMP)
UPDATE Updates
SET product_desc = new_product_desc,
    price = new_price;
 
SELECT sku, product_desc, price
FROM Products;
 
-- Results
sku   product_desc            price
----- ----------------------- -----
CHS01 Child seat with cushion 26.95
CUP03 Water cup                5.25
HOL01 Cup holder               3.50

Shortest Path for Friend Connections

February 20, 2008/5 Comments/in SQL Server/by Plamen Ratchev

An interesting problem to solve is finding relation paths in contact management systems. This is a limited case of the Dijkstra’s algorithm for finding the shortest path in a graph. Here we always have cost of 1 for each path and it is a two-way relation between the nodes. To put this in human readable format, the problem is to find the distance between friends, based on relationships defined. If A is friend with B and then B is friend with C, the path between A and C is A->B->C with distance 2.

Here is one solution using recursive CTEs in SQL Server 2005. The method is based on building relationship paths between all connected nodes and then searching the path for friend connections. If this searching if frequent, then the path can be materialized in a column.

-- Sample table with data
CREATE TABLE Contacts (
 c_from CHAR(1), 
 c_to CHAR(1),
 PRIMARY KEY (c_from, c_to));
 
INSERT    INTO Contacts
SELECT    'A', 'B' UNION ALL
SELECT    'B', 'D' UNION ALL
SELECT    'C', 'A' UNION ALL
SELECT    'C', 'E' UNION ALL
SELECT    'G', 'C' UNION ALL
SELECT    'B', 'G' UNION ALL
SELECT    'F', 'D' UNION ALL
SELECT    'E', 'F';
 
-- Table to store paths
CREATE TABLE Paths (c_path VARCHAR(200) PRIMARY KEY);
 
-- Recursive CTE to populate the paths
WITH PathCTE 
AS
(SELECT c_from, c_to,
        CAST('.' + CAST(c_from AS VARCHAR(8)) + '.' + 
        CAST(c_to AS VARCHAR(8)) + '.' AS VARCHAR(200)) AS c_path
 FROM Contacts AS C1
 UNION ALL
 SELECT C.c_from, C.c_to,
        CAST(P.c_path + C.c_to + '.' AS VARCHAR(200))
 FROM PathCTE AS P
 JOIN Contacts AS C
   ON P.c_to = C.c_from
 WHERE P.c_path NOT LIKE '%.' + 
                    CAST(C.c_from AS VARCHAR(10)) + 
                    '.' + 
                    CAST(C.c_to AS VARCHAR(10)) + 
                    '.%')
INSERT INTO Paths
SELECT c_path FROM PathCTE;
 
-- Show all paths between B and D
SELECT c_path
FROM Paths
WHERE c_path LIKE '.B.%'
  AND c_path LIKE '%.D.';
 
-- Shortest path distance, longest path distance, and number of paths
SELECT MIN(LEN(c_path) - LEN(REPLACE(c_path, '.', '')) - 2) AS shortest_distance,
       MAX(LEN(c_path) - LEN(REPLACE(c_path, '.', '')) - 2) AS longest_distance,
       COUNT(*) AS paths_cnt
FROM Paths
WHERE c_path LIKE '.B.%'
  AND c_path LIKE '%.D.';
 
-- Results
c_path
--------------
.B.D.
.B.G.C.A.B.D.
.B.G.C.E.F.D.
 
 
shortest_distance  longest_distance  paths_cnt
-----------------  ----------------  -----------
1                  5                 3

It is good to note that this method does not make effort to avoid reusing paths to reach a destination. If needed this can be handled by additional condition in the recursive CTE.

Hierarchies with CTEs

August 20, 2007/0 Comments/in SQL Server/by Plamen Ratchev

Common table expressions (CTEs) have many applications. However, one of their capabilities to implement recursive queries is very useful for navigating and manipulating hierarchies.

Here is one brief example of utilizing that. Given a table with employees and their managers represented as adjacency list, provide list of employees that report to particular manager, ordered by the natural hierarchy order of listing each employee under the corresponding manager.

-- Create sample table with employees and managers
CREATE TABLE Employees(
 employee_nbr INT NOT NULL PRIMARY KEY,
 employee_name VARCHAR(35),
 manager_nbr INT NULL REFERENCES Employees(employee_nbr));
 
INSERT INTO Employees VALUES (1, 'John Doe', NULL);
INSERT INTO Employees VALUES (2, 'James Brown', 1);
INSERT INTO Employees VALUES (3, 'Keith Green', NULL);
INSERT INTO Employees VALUES (4, 'Peter Roth', 2);
INSERT INTO Employees VALUES (5, 'Hans Gruber', 2);
INSERT INTO Employees VALUES (6, 'Kris Evans', 4);
INSERT INTO Employees VALUES (7, 'Jeff Colleman', NULL);
 
-- Use recursive CTE to build binary and charachter paths
WITH EmployeeHierarchy
AS
(SELECT employee_nbr, employee_name, manager_nbr,
        CAST(employee_nbr AS VARBINARY(MAX)) AS bpath,
        CAST('.' + 
             CAST(employee_nbr AS VARCHAR(4)) +
             '.' AS VARCHAR(MAX)) AS cpath
 FROM Employees
 WHERE manager_nbr IS NULL
 UNION ALL
 SELECT E.employee_nbr, E.employee_name, E.manager_nbr,
        H.bpath + CAST(E.employee_nbr AS BINARY(4)),
        H.cpath + CAST(E.employee_nbr AS VARCHAR(4)) + '.'
 FROM Employees AS E
 JOIN EmployeeHierarchy AS H
   ON E.manager_nbr = H.employee_nbr)
SELECT employee_nbr, employee_name, manager_nbr, bpath, cpath
FROM EmployeeHierarchy
WHERE cpath LIKE '%.2.%' -- filter all employees for manager 2
ORDER BY bpath;          -- order by natural hierarchy path
 
-- Results
employee_nbr employee_name  manager_nbr bpath                              cpath
------------ -------------- ----------- ---------------------------------- ----------
2            James Brown    1           0x0000000100000002                 .1.2.
4            Peter Roth     2           0x000000010000000200000004         .1.2.4.
6            Kris Evans     4           0x00000001000000020000000400000006 .1.2.4.6.
5            Hans Gruber    2           0x000000010000000200000005         .1.2.5.

The method above creates two manager/employee paths: a binary path that preserves the natural ordering at each level, which can be used to sort; a character path that can be used to filter by manager.

Cleaning Data with Recursive CTE

May 15, 2007/1 Comment/in SQL Server, XML/by Plamen Ratchev

SQL Server 2005 added a great new feature: Common Table Expressions (CTE). And even better than that – recursive CTEs. That provides a new powerful tool to solve many SQL problems. One of the areas where recursive CTEs shine is the hierarchical data management.

Here is another side of the recursive CTEs – utilizing them for some common tasks like cleaning data. The problem: a table has a column with values that have invalid characters. The task is to replace all those invalid characters with a space. Unfortunately the REPLACE function does not support pattern matching and each character in the column has to be verified individually and replaced if it falls in the invalid range. The solution below utilizes a recursive CTE to walk though the ACSII table of characters and to replace the invalid characters in the column values.

-- Create test table.
 
CREATE TABLE Foobar (
  key_col INT PRIMARY KEY,
  text_col NVARCHAR(100));
 
-- Populate sample data.
 
INSERT INTO Foobar VALUES (1, N'ABC!@#%DEFgh');
INSERT INTO Foobar VALUES (2, N'~!102WXY&*()_Z');
 
-- Perform the cleanup with recursive CTE.
 
WITH Clean (key_col, text_col, ch) 
AS
(SELECT key_col,
        REPLACE(text_col, CHAR(255), ' '),
        255
 FROM Foobar
 UNION ALL
 SELECT key_col,
        CASE WHEN
             CHAR(ch - 1) NOT LIKE '[A-Z]'
             THEN REPLACE(text_col, CHAR(ch - 1), ' ')
             ELSE text_col END,
        ch - 1
 FROM Clean
 WHERE ch > 1)
SELECT key_col, text_col 
FROM Clean
WHERE ch = 1
OPTION (MAXRECURSION 255);

On a side note – the recursive CTEs are not the best performers. Also, by default a CTE allows only 100 levels of recursion. The MAXRECURSION hint can be used to set higher level (a value between 0 and 32767; setting to 0 will remove the limit). Be aware that settings MAXRECURSION to 0 may create an infinite loop.

Here is a different method using utility table with numbers and FOR XML PATH, which is more effective:

WITH Clean (key_col, text_col)
AS
(SELECT key_col, REPLACE(CAST(
        (SELECT CASE 
                  WHEN SUBSTRING(text_col, n, 1) LIKE '[A-Z]' 
                  THEN SUBSTRING(text_col, n, 1) 
                  ELSE '.' 
                END
         FROM (SELECT number
               FROM master..spt_values
               WHERE type = 'P'
                 AND number BETWEEN 1 AND 100) AS Nums(n)
         WHERE n <= LEN(text_col)
         FOR XML PATH('')) AS NVARCHAR(100)), '.', ' ')
 FROM Foobar)
SELECT key_col, text_col
FROM Clean;