XML Archives - Tangra Inc.

Column Aggregates

August 21, 2009/10 Comments/in SQL Server, XML/by Plamen Ratchev

We all know how to use the aggregate functions MIN, MAX, COUNT, etc. to calculate aggregates across rows. For example, using MAX we can determine the maximum value for group of rows. But what if we want to see the maximum value for each row across columns? There is no aggregate function that accepts list of columns…

Let’s look at example to illustrate this. Here is sample table with four data columns:

CREATE TABLE Foo (

 keycol INT NOT NULL PRIMARY KEY,

 col1 INT NOT NULL,

 col2 INT NOT NULL,

 col3 INT NOT NULL,

 col4 INT NOT NULL);

 

INSERT INTO Foo VALUES(1, 5, 0, 1, 10); 

INSERT INTO Foo VALUES(2, 0, 0, 3, 1);

INSERT INTO Foo VALUES(3, 0, 0, 0, 0);

INSERT INTO Foo VALUES(4, 9, 1, 22, 8);

INSERT INTO Foo VALUES(5, 8, 8, 8, 8);

Our goal is to calculate the maximum value for the four data columns for each row. With two or three columns it may be an easy task using a CASE expression. But adding more columns will make a very long and complex CASE expression. Is there an easier way?

One solution that provides a shortcut is utilizing the SQL Server capabilities to generate an XML result and then using XQuery to find the max value. Here is the query:

-- max across columns with XQuery

SELECT c.query('keycol').value('.', 'INT') AS keycol,

       c.value('local-name(./*[data(.)=

                max(../*[not(local-name(.)="keycol") ])][1])', 

                'VARCHAR(20)') AS max_col,

       c.value('max(*[not(local-name(.)="keycol")])', 'FLOAT') AS max_val

FROM (SELECT keycol, col1, col2, col3, col4

      FROM Foo

      FOR XML PATH, TYPE) AS T(x)

CROSS APPLY x.nodes('row') AS R(c);

The tricky part here is the filtering of the key column from the evaluation for max. Here are the results:

keycol  max_col  max_val

------- -------- --------

1       col4     10

2       col3     3

3       col1     0

4       col3     22

5       col1     8

How about counting the number of none zero values across columns? Here is a solution for that using very similar query:

-- count non-zero columns with XQuery

SELECT c.query('keycol').value('.', 'INT') AS keycol,

       c.value('count(*[not(local-name(.)="keycol") 

                and not(.=0)])', 

               'INT')  AS cnt_non_zero

FROM (SELECT keycol, col1, col2, col3, col4

      FROM Foo

      FOR XML PATH, TYPE) AS T(x)

CROSS APPLY x.nodes('row') AS R(c);

And the results:

keycol  cnt_non_zero

------- ------------

1       3

2       2

3       0

4       4

5       4

This shows the power of XQuery to solve this problem in untraditional way. The catch? Look at the execution plan of the queries…

More efficient method to solve the first problem is to use unpivoting. Here is solution using the UNPIVOT operator (the same can be done with ANSI query using cross join with numbers table):

-- max across columns with UNPIVOT

SELECT keycol, col AS max_col, val AS max_val

FROM (

SELECT keycol, val, col, 

       ROW_NUMBER() OVER(PARTITION BY keycol ORDER BY val DESC, col) AS rk

FROM Foo

UNPIVOT

(val FOR col IN (col1, col2, col3, col4)) AS U) AS T

WHERE rk = 1;

And very similar solution using unpivoting to solve the second problem:

-- count non-zero columns with UNPIVOT

SELECT keycol, COUNT(NULLIF(val, 0)) AS cnt_non_zero

FROM Foo

UNPIVOT

(val FOR col IN (col1, col2, col3, col4)) AS U

GROUP BY keycol;

Import XML File to SQL Table

November 14, 2008/100 Comments/in SQL Server, XML/by Plamen Ratchev

Here is a brief example of importing an XML file into SQL Server table. This is accomplished by using the BULK option of OPENROWSET to load the file, and then utilizing the XQuery capabilities of SQL Server to parse the XML to normalized table format. This example requires SQL server 2005 or SQL Server 2008.

First, the following XML is saved to XML file C:Products.xml.

<Products>
  <Product>
    <SKU>1</SKU>
    <Desc>Book</Desc>
  </Product>
  <Product>
    <SKU>2</SKU>
    <Desc>DVD</Desc>
  </Product>
  <Product>
    <SKU>3</SKU>
    <Desc>Video</Desc>
  </Product>
</Products>

Next, a table named Products is created to store the XML data.

CREATE TABLE Products(
 sku INT PRIMARY KEY,
 product_desc VARCHAR(30));

Finally, the following statement will load the XML file, parse the XML elements to columns, and insert into the Products table:

INSERT INTO Products (sku, product_desc) 
SELECT X.product.query('SKU').value('.', 'INT'),
       X.product.query('Desc').value('.', 'VARCHAR(30)')
FROM ( 
SELECT CAST(x AS XML)
FROM OPENROWSET(
     BULK 'C:Products.xml',
     SINGLE_BLOB) AS T(x)
     ) AS T(x)
CROSS APPLY x.nodes('Products/Product') AS X(product);

Here are the results:

SELECT sku, product_desc 
FROM Products;
 
/*
 
Results:
 
sku         product_desc
----------- -------------
1           Book
2           DVD
3           Video
 
*/

Convert Hex String to Binary String

April 3, 2008/1 Comment/in SQL Server, XML/by Plamen Ratchev

Here is another use of table with numbers and concatenation using FOR XML PATH. Given a hexadecimal string value like ‘7FE0’, convert it to the binary representation ‘0111111111100000’. The table with numbers generated by the CTE is used to slice the hex string, then replace it with the corresponding binary value and concatenate using FOR XML PATH with blank element.

CREATE FUNCTION dbo.Hex2Bin(@hex VARCHAR(256))
RETURNS VARCHAR(1024)
AS
BEGIN
 
DECLARE @bin VARCHAR(1024);
 
SET @bin = '';
 
WITH Converter (hex, bin)
AS
(SELECT '0', '0000' UNION ALL
 SELECT '1', '0001' UNION ALL
 SELECT '2', '0010' UNION ALL
 SELECT '3', '0011' UNION ALL
 SELECT '4', '0100' UNION ALL 
 SELECT '5', '0101' UNION ALL
 SELECT '6', '0110' UNION ALL
 SELECT '7', '0111' UNION ALL 
 SELECT '8', '1000' UNION ALL
 SELECT '9', '1001' UNION ALL
 SELECT 'A', '1010' UNION ALL 
 SELECT 'B', '1011' UNION ALL
 SELECT 'C', '1100' UNION ALL
 SELECT 'D', '1101' UNION ALL 
 SELECT 'E', '1110' UNION ALL 
 SELECT 'F', '1111'),
N1 (n) AS (SELECT 1 UNION ALL SELECT 1),
N2 (n) AS (SELECT 1 FROM N1 AS X, N1 AS Y),
N3 (n) AS (SELECT 1 FROM N2 AS X, N2 AS Y),
N4 (n) AS (SELECT ROW_NUMBER() OVER(ORDER BY X.n) 
           FROM N3 AS X, N3 AS Y)
SELECT @bin = (SELECT REPLACE(SUBSTRING(@hex, n, 1), hex, bin)
               FROM N4 AS Nums
               JOIN Converter AS C
                 ON SUBSTRING(@hex, n, 1) = hex
               WHERE n <= LEN(@hex)
               FOR XML PATH(''));
 
RETURN @bin;
 
END
 
GO
 
SELECT dbo.Hex2Bin('7FE0') AS bin;
 
-- Result
bin
----------------
0111111111100000

Obscure Real Data for Testing

April 2, 2008/1 Comment/in SQL Server, XML/by Plamen Ratchev

Very often we have a good set of production data but it cannot be used for training or testing because of data confidentiality. And the solution sometimes is just to tweak pieces of the data so it cannot be linked to the original data. Here is one method for obscure data in SQL Server 2005. It is based on a couple techniques:

– Using table with numbers (here generated on the fly by cross joining common table expressions)

– Using the table with numbers to slice each value into individual characters

– Generating random values in the normal character range that will be used to replace the real characters

– Using FOR XML PATH with empty element to concatenate back the obscured value.

CREATE TABLE Patients (
 pname VARCHAR(35),
 pname_masked VARCHAR(35));
 
INSERT INTO Patients VALUES('John Doe', NULL);
INSERT INTO Patients VALUES('Jeff Smith', NULL);
 
WITH 
N1 (n) AS (SELECT 1 UNION ALL SELECT 1),
N2 (n) AS (SELECT 1 FROM N1 AS X, N1 AS Y),
N3 (n) AS (SELECT 1 FROM N2 AS X, N2 AS Y),
N4 (n) AS (SELECT ROW_NUMBER() 
           OVER(ORDER BY X.n) FROM N3 AS X, N3 AS Y)
UPDATE Patients
SET pname_masked = 
    CAST(
    CAST((SELECT CASE 
                  WHEN SUBSTRING(pname, n, 1) = ' '
                  THEN ' '
                  ELSE CHAR(x + ASCII(SUBSTRING(pname, n, 1)))
                 END 
          FROM N4 AS Nums
          CROSS APPLY 
          (SELECT TOP(1) T.n AS x
           FROM N4 AS T
           WHERE T.n + ASCII(UPPER(SUBSTRING(pname, Nums.n, 1))) BETWEEN 65 AND 90
           ORDER BY CHECKSUM(NEWID())) AS N
          WHERE Nums.n <= LEN(pname)
          FOR XML PATH('')) AS XML) AS VARCHAR(35));
 
SELECT pname, pname_masked
FROM Patients;
 
-- Results
 
pname        pname_masked
------------ -------------
John Doe     Xwlv Lyi
Jeff Smith   Qmol Yuyvs

Row Concatenation with FOR XML PATH

March 22, 2008/8 Comments/in SQL Server, XML/by Plamen Ratchev

Many times it is needed for reporting purposes to summarize normalized data into groups or lists of values. This is also known as rows concatenation. Some reporting and client side tools support this directly. Here is one approach to solve this in T-SQL using FOR XML PATH.

CREATE TABLE Products (
 sku INT PRIMARY KEY,
 product_desc VARCHAR(35));
 
CREATE TABLE Departments (
 department_nbr INT PRIMARY KEY,
 department_title VARCHAR(35));
 
CREATE TABLE DepartmentProducts (
 department_nbr INT 
   REFERENCES Departments (department_nbr),
 sku INT 
   REFERENCES Products (sku),
 PRIMARY KEY (department_nbr, sku));
 
INSERT INTO Products VALUES (1, 'Book');
INSERT INTO Products VALUES (2, 'Magazine');
INSERT INTO Products VALUES (3, 'DVD');
INSERT INTO Products VALUES (4, 'Video');
INSERT INTO Products VALUES (5, 'CD');
INSERT INTO Products VALUES (6, 'Map');
 
INSERT INTO Departments VALUES (1, 'Reference');
INSERT INTO Departments VALUES (2, 'Periodicals');
INSERT INTO Departments VALUES (3, 'Entertainment');
 
INSERT INTO DepartmentProducts VALUES (1, 1);
INSERT INTO DepartmentProducts VALUES (1, 6);
INSERT INTO DepartmentProducts VALUES (2, 2);
INSERT INTO DepartmentProducts VALUES (3, 3);
INSERT INTO DepartmentProducts VALUES (3, 4);
INSERT INTO DepartmentProducts VALUES (3, 5);
 
-- Using correlated subquery 
SELECT D.department_nbr,
       D.department_title,
       STUFF((SELECT ',' + product_desc
              FROM DepartmentProducts AS DP
              JOIN Products AS P
                ON P.sku = DP.sku
              WHERE DP.department_nbr = D.department_nbr
              ORDER BY product_desc
              FOR XML PATH('')), 1, 1, '') AS product_list
FROM Departments AS D;
 
-- Using CROSS APPLY
SELECT D.department_nbr,
       D.department_title,
       STUFF(P.product_list, 1, 1, '') AS product_list
FROM Departments AS D
CROSS APPLY (SELECT ',' + product_desc
             FROM DepartmentProducts AS DP
             JOIN Products AS P
               ON P.sku = DP.sku
             WHERE DP.department_nbr = D.department_nbr
             ORDER BY product_desc
             FOR XML PATH('')) AS P (product_list);
 
-- Results             
department_nbr department_title product_list
-------------- ---------------- ------------
1              Reference        Book,Map
2              Periodicals      Magazine
3              Entertainment    CD,DVD,Video

While this method is often called the XML blackbox method, the explanation of this effect using FOR XML PATH is simple. Normally the PATH clause is used with input string that indicates the name of the wrapper element that will be created. However, using the PATH clause with empty string as input results in skipping the wrapper element generation. And since the content is retrieved as text it achieves the effect of concatenation.

Resources:

What’s New in FOR XML in Microsoft SQL Server 2005
http://msdn2.microsoft.com/en-us/library/ms345137.aspx

Shredding XML in SQL Server 2005

June 10, 2007/9 Comments/in SQL Server, XML/by Plamen Ratchev

Using XML data has many applications. In databases in particular it can be used for passing parameters from client applications, exchange data between SQL modules, or storing details in XML columns. SQL Server 2005 offers new capabilities and native support for XML. Below is one example of utilizing some of the new XML methods in XML to shred XML data to table format.

-- Declare XML variable
DECLARE @data XML;
 
-- Element-centered XML
SET @data = 
N'<data>
    <customer>
      <id>1</id>
      <name>Allied Industries</name>
    </customer>
    <customer>
      <id>2</id>
      <name>Trades International</name>
    </customer>
  </data>';
 
-- Using the query() method
SELECT T.customer.query('id').value('.', 'INT') AS customer_id,
       T.customer.query('name').value('.', 'VARCHAR(20)') AS customer_name
FROM @data.nodes('data/customer') AS T(customer);
 
-- Using the value() method
SELECT T.customer.value('(id)[1]', 'INT') AS customer_id,
       T.customer.value('(name)[1]', 'VARCHAR(20)') AS customer_name
FROM @data.nodes('data/customer') AS T(customer);
 
-- Select only customer which id equals 2 using the exist() method
SELECT T.customer.value('(id)[1]', 'INT') AS customer_id,
       T.customer.value('(name)[1]', 'VARCHAR(20)') AS customer_name
FROM @data.nodes('data/customer') AS T(customer)
WHERE T.customer.exist('id/text()[. = "2"]') = 1;
 
-- Attribute-centered XML
SET @data = 
N'<data>
    <customer id="1" name="Allied Industries"/>
    <customer id="2" name="Trades International"/>
  </data>';
 
-- Using the value() method
SELECT T.customer.value('@id', 'INT') AS customer_id,
       T.customer.value('@name', 'VARCHAR(20)') AS customer_name
FROM @data.nodes('data/customer') AS T(customer);
 
 
-- Results
customer_id customer_name
----------- --------------------
1           Allied Industries
2           Trades International

Testing the processing for element-centered XML versus attribute-centered XML does show that attribute-centered XML is processed faster.

Resources:

XML Support in Microsoft SQL Server 2005
http://msdn2.microsoft.com/en-us/library/ms345117.aspx

Cleaning Data with Recursive CTE

May 15, 2007/1 Comment/in SQL Server, XML/by Plamen Ratchev

SQL Server 2005 added a great new feature: Common Table Expressions (CTE). And even better than that – recursive CTEs. That provides a new powerful tool to solve many SQL problems. One of the areas where recursive CTEs shine is the hierarchical data management.

Here is another side of the recursive CTEs – utilizing them for some common tasks like cleaning data. The problem: a table has a column with values that have invalid characters. The task is to replace all those invalid characters with a space. Unfortunately the REPLACE function does not support pattern matching and each character in the column has to be verified individually and replaced if it falls in the invalid range. The solution below utilizes a recursive CTE to walk though the ACSII table of characters and to replace the invalid characters in the column values.

-- Create test table.
 
CREATE TABLE Foobar (
  key_col INT PRIMARY KEY,
  text_col NVARCHAR(100));
 
-- Populate sample data.
 
INSERT INTO Foobar VALUES (1, N'ABC!@#%DEFgh');
INSERT INTO Foobar VALUES (2, N'~!102WXY&*()_Z');
 
-- Perform the cleanup with recursive CTE.
 
WITH Clean (key_col, text_col, ch) 
AS
(SELECT key_col,
        REPLACE(text_col, CHAR(255), ' '),
        255
 FROM Foobar
 UNION ALL
 SELECT key_col,
        CASE WHEN
             CHAR(ch - 1) NOT LIKE '[A-Z]'
             THEN REPLACE(text_col, CHAR(ch - 1), ' ')
             ELSE text_col END,
        ch - 1
 FROM Clean
 WHERE ch > 1)
SELECT key_col, text_col 
FROM Clean
WHERE ch = 1
OPTION (MAXRECURSION 255);

On a side note – the recursive CTEs are not the best performers. Also, by default a CTE allows only 100 levels of recursion. The MAXRECURSION hint can be used to set higher level (a value between 0 and 32767; setting to 0 will remove the limit). Be aware that settings MAXRECURSION to 0 may create an infinite loop.

Here is a different method using utility table with numbers and FOR XML PATH, which is more effective:

WITH Clean (key_col, text_col)
AS
(SELECT key_col, REPLACE(CAST(
        (SELECT CASE 
                  WHEN SUBSTRING(text_col, n, 1) LIKE '[A-Z]' 
                  THEN SUBSTRING(text_col, n, 1) 
                  ELSE '.' 
                END
         FROM (SELECT number
               FROM master..spt_values
               WHERE type = 'P'
                 AND number BETWEEN 1 AND 100) AS Nums(n)
         WHERE n <= LEN(text_col)
         FOR XML PATH('')) AS NVARCHAR(100)), '.', ' ')
 FROM Foobar)
SELECT key_col, text_col
FROM Clean;