Pivoting on Multiple Columns

Pivoting data on more than one column is not a common request. But there are times when it is a very useful technique for reporting purposes. Here is one example to demonstrate different methods to handle that.

The goal is to report product data by pivoting columns with value and quantity for each product based on year. Here is the script to create the table and insert sample data.

CREATE TABLE Products (

product VARCHAR(30),

market_year INT,

value INT,

quantity INT,

CONSTRAINT pk_products

PRIMARY KEY (product, market_year));
INSERT INTO Products VALUES('Corn', 2003, 100, 20);
INSERT INTO Products VALUES('Corn', 2004, 200, 25);

INSERT INTO Products VALUES('Corn', 2005, 150, 30);

INSERT INTO Products VALUES('Corn', 2006, 150, 10);

The request is to produce the following output:

product v2003 q2003 v2004 q2004 v2005 q2005 v2006 q2006

------- ----- ----- ----- ----- ----- ----- ----- -----

Corn 100 20 200 25 150 30 150 10

As usual, the first suspect for pivoting solution is the CASE function. Using CASE expressions is the most flexible method to manipulate data for pivoting. There is not much difference when pivoting a single or multiple columns. Here is the solution with CASE:

SELECT product,

SUM(CASE WHEN market_year = 2003 THEN value ELSE 0 END) AS v2003,

SUM(CASE WHEN market_year = 2003 THEN quantity ELSE 0 END) AS q2003,

SUM(CASE WHEN market_year = 2004 THEN value ELSE 0 END) AS v2004,

SUM(CASE WHEN market_year = 2004 THEN quantity ELSE 0 END) AS q2004,

SUM(CASE WHEN market_year = 2005 THEN value ELSE 0 END) AS v2005,

SUM(CASE WHEN market_year = 2005 THEN quantity ELSE 0 END) AS q2005,

SUM(CASE WHEN market_year = 2006 THEN value ELSE 0 END) AS v2006,

SUM(CASE WHEN market_year = 2006 THEN quantity ELSE 0 END) AS q2006

FROM Products

GROUP BY product;

Next, let's look at the PIVOT operator that was added in SQL Server 2005. The PIVOT operator has a few limitations, and one of them is that it supports pivoting only on a single column. However, T-SQL allows to specify multiple PIVOT operators in the FROM clause, that way providing a solution for pivoting on multiple columns. There is one catch: the second PIVOT consumes the temporary result set output of the first PIVOT operator. Because of that the year column is not available (since it is used as pivoting column in the first PIVOT) and there is a need to define a virtual column that replicates the values for the year. Also, since the first PIVOT operator uses the year values as column names, the new virtual column subtracts 2000 from the year value to generate different column names for the second PIVOT operator. Here is the query using two PIVOT operators:

SELECT product,

MAX([2003]) AS v2003,

MAX([3]) AS q2003,

MAX([2004]) AS v2004,

MAX([4]) AS q2004,

MAX([2005]) AS v2005,

MAX([5]) AS q2005,

MAX([2006]) AS v2006,

MAX([6]) AS q2006

FROM (SELECT product, market_year,

market_year - 2000 AS market_year2,

SUM(value) AS value,

SUM(quantity) AS quantity

FROM Products

GROUP BY product, market_year) AS T

PIVOT

(SUM(value) FOR market_year IN

([2003], [2004], [2005], [2006])) AS P1

PIVOT

(SUM(quantity) FOR market_year2 IN

([3], [4], [5], [6])) AS P2

GROUP BY product;

In this particular case there is more elegant solution using a single PIVOT operator. With little math the value and the quantity can be combined into single column and then after the pivoting split back to two separate columns. Here is the solution using a single PIVOT operator:

SELECT product,

[2003] / 1000 AS v2003,

[2003] % 1000 AS q2003,

[2004] / 1000 AS v2004,

[2004] % 1000 AS q2004,

[2005] / 1000 AS v2005,

[2005] % 1000 AS q2005,

[2006] / 1000 AS v2006,

[2006] % 1000 AS q2006

FROM (SELECT product, market_year,

value * 1000 + quantity AS value

FROM Products) AS T

PIVOT

(SUM(value) FOR market_year IN

([2003], [2004], [2005], [2006])) AS P;

Dynamic Pivoting

SQL Server 2005 added the PIVOT operator which makes creating cross-tab queries very easy. However, as of this writing none of the SQL Server versions has built-in support for dynamic pivoting. The PIVOT operator only provides basic pivoting capabilities on a static list of values. In practice it is very often needed to perform this for dynamic list of values.

Here is one solution for dynamic pivoting that uses the ability in SQL Server 2005/2008 to concatenate row values using FOR XML PATH with blank element name. This method performs a query on the distinct values to pivot and creates a column list based on that. Then the resulting column list is used in a dynamic query utilizing the PIVOT operator and executed as dynamic SQL.

In this scenario the goal is to pivot order amounts by month.

-- Table with orders

CREATE TABLE Orders (

  order_id INT NOT NULL PRIMARY KEY,

  order_date DATETIME NOT NULL

            DEFAULT CURRENT_TIMESTAMP,

  amount DECIMAL(8, 2) NOT NULL DEFAULT 0

        CHECK (amount >= 0));

 

-- Sample data

INSERT INTO Orders

(order_id, order_date, amount)

SELECT 1, '20070101', 10.50

UNION ALL

SELECT 2, '20070126', 12.50

UNION ALL

SELECT 3, '20070130', 12.00

UNION ALL

SELECT 4, '20070214', 13.75

UNION ALL

SELECT 5, '20070220', 10.00

UNION ALL

SELECT 6, '20070306', 15.00

UNION ALL

SELECT 7, '20070310', 17.50

UNION ALL

SELECT 8, '20070329', 20.00;

 

-- Build list of column values to pivot

DECLARE @cols NVARCHAR(1000);

SELECT @cols =

STUFF((SELECT N'],[' + year_month

       FROM (SELECT DISTINCT CONVERT(NCHAR(7), order_date, 126)

            FROM Orders) AS O(year_month)

       ORDER BY year_month

       FOR XML PATH('')

      ), 1, 2, '') + N']';

 

-- Build dynamic SQL query for pivoting  

DECLARE @sql NVARCHAR(2000);

SET @sql =

N'SELECT order_year, ' + @cols +

N'FROM (SELECT DATEPART(yyyy, order_date) AS order_year, ' +

            N'CONVERT(NCHAR(7), order_date, 126) AS year_month, ' +

            N'amount ' +

      N'FROM Orders) AS O ' +

N'PIVOT ' +

N'(SUM(amount) FOR year_month IN (' + @cols + N')) AS P;';

 

EXEC(@sql);

 

/*

 

Results:

 

order_year 2007-01 2007-02 2007-03

----------- -------- -------- --------

2007        35.00    23.75    52.50

 

*/

This query can be further improved by using the QUOTENAME function to prevent any possibility of SQL injection.

How to extract IDENTITY_SEED and IDENTITY_INCREMENT from autoval column in syscolumns

There are many hints on the internet that in the autoval column in the system table syscolumns in MS SQL Server 2000 it is stored the identity seed and identity increment values for columns that have been set as identity. Problem is there are little clues on how to extract those values from that column, because the data is stored as a binary value.

After some testing on my side I found that the autoval column can split in four parts, two of which represent IDENTITY_SEED and IDENTITY_INCREMENT values. But there is a catch – those binary data of each segment must be first reversed and then the values can be extracted. Here is the script:

T-SQL:

SELECT tCol.[name]
        , IDENTITY_INCREMENT = CAST(CAST(REVERSE(SUBSTRING(autoval, 5, 4)) AS VARBINARY(8)) AS BIGINT)
        , IDENTITY_SEED = CAST(CAST(REVERSE(SUBSTRING(autoval, 9, 4)) AS VARBINARY(8)) AS BIGINT)
FROM [syscolumns] tCol
WHERE tCol.autoval IS NOT NULL

You would ask: “Why the heck to we need such complex query when there are built in function like OBJECTPROPERTY and COLUMNPRPERTY that retrieve those values?”. Well if you are working with linked servers, and you want to retrieve the values from the remote server and you are connected to the local one, those functions do not work 😉

Import XML File to SQL Table

Here is a brief example of importing an XML file into SQL Server table. This is accomplished by using the BULK option of OPENROWSET to load the file, and then utilizing the XQuery capabilities of SQL Server to parse the XML to normalized table format. This example requires SQL server 2005 or SQL Server 2008.

First, the following XML is saved to XML file C:Products.xml.

<Products>

  <Product>

    <SKU>1</SKU>

    <Desc>Book</Desc>

  </Product>

  <Product>

    <SKU>2</SKU>

    <Desc>DVD</Desc>

  </Product>

  <Product>

    <SKU>3</SKU>

    <Desc>Video</Desc>

  </Product>

</Products>

Next, a table named Products is created to store the XML data.

CREATE TABLE Products(

 sku INT PRIMARY KEY,

 product_desc VARCHAR(30));

Finally, the following statement will load the XML file, parse the XML elements to columns, and insert into the Products table:

INSERT INTO Products (sku, product_desc)

SELECT X.product.query('SKU').value('.', 'INT'),

       X.product.query('Desc').value('.', 'VARCHAR(30)')

FROM (

SELECT CAST(x AS XML)

FROM OPENROWSET(

    BULK 'C:Products.xml',

    SINGLE_BLOB) AS T(x)

    ) AS T(x)

CROSS APPLY x.nodes('Products/Product') AS X(product);

Here are the results:

SELECT sku, product_desc

FROM Products;

 

/*

 

Results:

 

sku         product_desc

----------- -------------

1           Book

2           DVD

3           Video

 

*/

Top N by Group

It is a very common request to select the top N items per group. Examples are the top 2 sales agents per region, the last two orders per customer, etc. There are various techniques to accomplish this. On SQL Server 2005 and 2008 this normally involves CROSS APPLY, TOP, and the ranking functions.

Here are a couple examples of solving this problem utilizing the ranking functions in SQL Server 2005/2008. These methods are very simple and efficient, at the same time providing flexibility to manage ties.

CREATE TABLE Loans (

  loan_nbr INT NOT NULL PRIMARY KEY,

  loan_date DATETIME NOT NULL,

  loan_amount DECIMAL(12, 2) NOT NULL

              DEFAULT 0.0,

  customer_nbr INT NOT NULL,

  loan_type CHAR(1) NOT NULL

            DEFAULT 'P'

            CHECK (loan_type IN ('P'-- personal

                                'B')) -- business

  );

 

INSERT INTO Loans

VALUES (1, '20080801', 12000.00, 2, 'P'),

       (2, '20080805', 15700.00, 1, 'B'),

       (3, '20080610', 12000.00, 3, 'P'),

       (4, '20080401', 5000.00, 1, 'P'),

       (5, '20080715', 25000.00, 4, 'B'),

       (6, '20080610', 25000.00, 5, 'P'),

       (7, '20080501', 1000.00, 6, 'P'),

       (8, '20080810', 6000.00, 7, 'B'),

       (9, '20080815', 2000.00, 8, 'B'),

       (10, '20080815', 1000.00, 9, 'P'),

       (11, '20080715', 5500.00, 10, 'P'),

       (12, '20080615', 1000.00, 11, 'B'),

       (13, '20080820', 6000.00, 12, 'P'),

       (14, '20080510', 28000.00, 6, 'B'),

       (15, '20080815', 2000.00, 10, 'P'),

       (16, '20080810', 1500.00, 8, 'P'),

       (17, '20080817', 10000.00, 10, 'B'),

       (18, '20080816', 2500.00, 9, 'P');  

 

-- Top 3 loans by loan type (no ties)

SELECT loan_nbr, loan_type, loan_amount, rk

FROM (SELECT loan_nbr, loan_type, loan_amount,

            ROW_NUMBER() OVER(PARTITION BY loan_type

                               ORDER BY loan_amount DESC) AS rk

      FROM Loans) AS L

WHERE rk <= 3;

 

-- Top 3 loans by loan type (with ties)

SELECT loan_nbr, loan_type, loan_amount, rk

FROM (SELECT loan_nbr, loan_type, loan_amount,

            DENSE_RANK() OVER(PARTITION BY loan_type

                               ORDER BY loan_amount DESC) AS rk

      FROM Loans) AS L

WHERE rk <= 3;

 

-- Latest loan for each customer

SELECT customer_nbr, loan_nbr, loan_type, loan_amount, loan_date

FROM (SELECT loan_nbr, loan_type, loan_amount,

            customer_nbr, loan_date,

            ROW_NUMBER() OVER(PARTITION BY customer_nbr

                               ORDER BY loan_date DESC) AS rk

      FROM Loans) AS L

WHERE rk = 1;

Aggregate Window Functions

The addition of the OVER clause in SQL Server was a great enhancement to the T-SQL language. Using the ranking functions has helped solve an array of problems in a very efficient manner. While there is a huge benefit of the ranking functions, it is often overlooked that the OVER clause supports now aggregate window functions. This means that the window aggregate function (SUM, AVG, COUNT, MIN, MAX) computes a value for each row from a result set derived from the window (partition). That opens an opportunity to solve problems using new methods.

Here are a couple examples to demonstrate using aggregate window functions with the OVER clause.

CREATE TABLE Loans (

  loan_nbr INT NOT NULL PRIMARY KEY,

  loan_date DATETIME NOT NULL,

  loan_amount DECIMAL(12, 2) NOT NULL

              DEFAULT 0.0,

  customer_nbr INT NOT NULL,

  loan_type CHAR(1) NOT NULL

            DEFAULT 'P'

            CHECK (loan_type IN ('P'-- personal

                                'B')) -- business

  );

 

INSERT INTO Loans

VALUES (1, '20080801', 12000.00, 2, 'P'),

       (2, '20080805', 15700.00, 1, 'B'),

       (3, '20080610', 12000.00, 3, 'P'),

       (4, '20080401', 5000.00, 1, 'P'),

       (5, '20080715', 25000.00, 4, 'B'),

       (6, '20080610', 25000.00, 5, 'P'),

       (7, '20080501', 1000.00, 6, 'P'),

       (8, '20080810', 6000.00, 7, 'B'),

       (9, '20080815', 2000.00, 8, 'B'),

       (10, '20080815', 1000.00, 9, 'P'),

       (11, '20080715', 5500.00, 10, 'P'),

       (12, '20080615', 1000.00, 11, 'B'),

       (13, '20080820', 6000.00, 12, 'P'),

       (14, '20080510', 28000.00, 6, 'B'),

       (15, '20080815', 2000.00, 10, 'P'),

       (16, '20080810', 1500.00, 8, 'P'),

       (17, '20080817', 10000.00, 10, 'B'),

       (18, '20080816', 2500.00, 9, 'P');  

 

-- Aggregate window functions with the OVER clause

SELECT loan_nbr, loan_type, loan_amount, customer_nbr,

       SUM(loan_amount) OVER(PARTITION BY customer_nbr) AS total_customer_loans,

       AVG(loan_amount) OVER(PARTITION BY customer_nbr) AS avg_customer_loan_amt,

       MAX(loan_amount) OVER(PARTITION BY customer_nbr) AS max_customer_loan_amt,

       MIN(loan_amount) OVER(PARTITION BY customer_nbr) AS min_customer_loan_amt,

       COUNT(*) OVER(PARTITION BY customer_nbr) AS count_customer_loans

FROM Loans;

 

-- Calculate percent for current loan based on total customer loans

-- and total of all loans

SELECT loan_nbr, loan_type, loan_amount, customer_nbr,

       loan_amount /

       SUM(loan_amount) OVER(PARTITION BY customer_nbr) AS percent_of_customer_loans,

       loan_amount /

       SUM(loan_amount) OVER() AS percent_of_all_loans

FROM Loans;

 

-- Get customers (and all their loans) with more than 2 loans

SELECT customer_nbr, loan_nbr, loan_amount, cnt

FROM (SELECT customer_nbr, loan_nbr, loan_amount,

            COUNT(*) OVER(PARTITION BY customer_nbr) AS cnt

      FROM Loans) AS L

WHERE cnt >= 2;

Resources:

OVER Clause
http://msdn.microsoft.com/en-us/library/ms189461.aspx

Paging with Ranking Functions

Paging through result sets is a very common need in applications. It provides a convenient way to deliver small chunks of data to the client application, minimizing network traffic and allowing end users to browse data in page size format.

There are different methods to accomplish the paging, both on client and server side. The introduction of the ranking functions in SQL Server 2005 (and SQL Server 2008) provides another efficient tool to implement paging.

The following example demonstrates paging utilizing the ROW_NUMBER function. Here it helps to generate sequence for each row ordered by the loan date column, and the sequence is later used to split the result set into pages.

CREATE TABLE Loans (

  loan_nbr INT NOT NULL PRIMARY KEY,

  loan_date DATETIME NOT NULL,

  loan_amount DECIMAL(12, 2) NOT NULL

              DEFAULT 0.0,

  customer_nbr INT NOT NULL,

  loan_type CHAR(1) NOT NULL

            DEFAULT 'P'

            CHECK (loan_type IN ('P'-- personal

                                'B')) -- business

  );

 

INSERT INTO Loans

VALUES (1, '20080801', 12000.00, 2, 'P'),

       (2, '20080805', 15700.00, 1, 'B'),

       (3, '20080610', 12000.00, 3, 'P'),

       (4, '20080401', 5000.00, 1, 'P'),

       (5, '20080715', 25000.00, 4, 'B'),

       (6, '20080610', 25000.00, 5, 'P'),

       (7, '20080501', 1000.00, 6, 'P'),

       (8, '20080810', 6000.00, 7, 'B'),

       (9, '20080815', 2000.00, 8, 'B'),

       (10, '20080815', 1000.00, 9, 'P'),

       (11, '20080715', 5500.00, 10, 'P'),

       (12, '20080615', 1000.00, 11, 'B'),

       (13, '20080820', 6000.00, 12, 'P'),

       (14, '20080510', 28000.00, 6, 'B'),

       (15, '20080815', 2000.00, 10, 'P'),

       (16, '20080810', 1500.00, 8, 'P'),

       (17, '20080817', 10000.00, 10, 'B'),

       (18, '20080816', 2500.00, 9, 'P');  

 

-- Paging

DECLARE @page_size INT = 5;

DECLARE @page_nbr  INT = 4;

 

WITH LoansRanked (loan_date, loan_amount, loan_type, seq)

AS

(SELECT loan_date, loan_amount, loan_type,

        ROW_NUMBER() OVER (ORDER BY loan_date, loan_nbr)

FROM Loans)

SELECT loan_date, loan_amount, loan_type, seq

FROM LoansRanked

WHERE seq > (@page_nbr - 1) * @page_size

  AND seq <= @page_nbr * @page_size;

 

/*

 

Results (4th page which contains only 3 rows):

 

loan_date               loan_amount   loan_type seq

----------------------- ------------- --------- ----

2008-08-16 00:00:00.000     2500.00 P         16

2008-08-17 00:00:00.000     10000.00 B         17

2008-08-20 00:00:00.000     6000.00 P         18

 

*/

Predicates in SQL

One of the most confusing topics for developers coming from traditional languages like C, C++, Java, C#, and VB to SQL, is the evaluation of predicates. In most languages predicate logic is evaluated left to right. SQL is a different story. It follows the concept of all-at-once operations. This means all logical predicates will be evaluated at the same time. In practice, the SQL engine is free to process the logical expressions in any order it finds appropriate and cost effective to retrieve the data.

Below is example to illustrate this behavior. Given a table with account policies, where a policy may be represented only as numeric values for certain accounts, and a mix of alpha-numeric characters for other accounts. Knowing that account 2 has policies that can be represented only as numeric values, the goal is to retrieve policies that are greater than 50000.

An attempt to add a first predicate to filter on account 2 and then a second one to select policies greater than 50000 will fail. Even if the query is formed with a derived table to extract first all policies for account 2 and then the outer query filters on the policy, it will not work. The derived table is expanded in the main query plan and a single query plan is produced. Then the query engine is free to push up and down the predicates in the plan as it finds efficient.

CREATE TABLE AccountPolicies (

 account_nbr INT,

 policy VARCHAR(20),

 PRIMARY KEY (account_nbr, policy));

 

INSERT INTO AccountPolicies VALUES(1, 'P1000234');

INSERT INTO AccountPolicies VALUES(1, 'P1020256');

INSERT INTO AccountPolicies VALUES(2, '1001');

INSERT INTO AccountPolicies VALUES(2, '5002');

INSERT INTO AccountPolicies VALUES(2, '50001');

INSERT INTO AccountPolicies VALUES(2, '50005');

INSERT INTO AccountPolicies VALUES(2, 'P50005');

 

-- Second predicate may be evaluated first

-- resulting in conversion error

SELECT account_nbr, policy

FROM AccountPolicies

WHERE account_nbr = 2

  AND CAST(policy AS INT) > 50000;

 

-- The derived table is expended in the

-- query plan and predicates can be pushed

-- up or down in the plan

SELECT account_nbr, policy

FROM (SELECT account_nbr, policy

      FROM AccountPolicies

      WHERE account_nbr = 2) AS P

WHERE CAST(policy AS INT) > 50000;

One way to solve this is to use a CASE expression to force the desired evaluation because WHEN clauses in CASE are evaluated in order. This solution not only demonstrates how to handle the evaluation process, but adds verification that only policies with numeric values are casted (as usual business rules change and later account 2 may be allowed to create policies with alpha-numeric characters).

-- Enforce sequence of evaluating conditions

-- and cast only valid values

SELECT account_nbr, policy

FROM AccountPolicies

WHERE CASE

        WHEN account_nbr = 2

        THEN CASE

               WHEN policy NOT LIKE '%[^0-9]%'

               THEN CASE

                      WHEN CAST(policy AS INT) > 50000

                      THEN 'True'

                      ELSE 'False'

                    END

               ELSE 'False'

            END            

        ELSE 'False'

      END = 'True';

SQL Server 2008 Launch Event

With SQL Server 2008 just released, there is a lot of interest of what it offers. On Friday, September 12, you can hear about all the new features of SQL Server 2008 in a full day of free training brought to you by JSSUG, Idea Integration and Pragmatic Works. Each session will dive deeply into new features from a BI, DBA and developer perspective around SQL Server 2008. To resister go to http://www.sqlsaturday.com and select the SQL Server 2008 Launch Event.

I will be presenting at this event the following session:

– T-SQL Enhancements in SQL Server 2008

Code Camp

On August 23, 2008 is the Jax Code Camp IV in Jacksonville. A full day of coding and free training on different technologies: WCF, LINQ, Cold Fusion, Flex/Air, Silverlight, Ruby on Rails, .Net Compact Framework, SharePoint, SQL Server 2008, and more. To sign up and see the full schedule visit the Code Camp website at http://www.jaxcodecamp.com/.

I will be presenting three sessions:

– Advanced SQL Querying Techniques
– SQL Injection
– SQL Server 2008 for Developers