Tag Archive for: T-SQL

Dynamic Pivoting

SQL Server 2005 added the PIVOT operator which makes creating cross-tab queries very easy. However, as of this writing none of the SQL Server versions has built-in support for dynamic pivoting. The PIVOT operator only provides basic pivoting capabilities on a static list of values. In practice it is very often needed to perform this for dynamic list of values.

Here is one solution for dynamic pivoting that uses the ability in SQL Server 2005/2008 to concatenate row values using FOR XML PATH with blank element name. This method performs a query on the distinct values to pivot and creates a column list based on that. Then the resulting column list is used in a dynamic query utilizing the PIVOT operator and executed as dynamic SQL.

In this scenario the goal is to pivot order amounts by month.

-- Table with orders

CREATE TABLE Orders (

  order_id INT NOT NULL PRIMARY KEY,

  order_date DATETIME NOT NULL

            DEFAULT CURRENT_TIMESTAMP,

  amount DECIMAL(8, 2) NOT NULL DEFAULT 0

        CHECK (amount >= 0));

 

-- Sample data

INSERT INTO Orders

(order_id, order_date, amount)

SELECT 1, '20070101', 10.50

UNION ALL

SELECT 2, '20070126', 12.50

UNION ALL

SELECT 3, '20070130', 12.00

UNION ALL

SELECT 4, '20070214', 13.75

UNION ALL

SELECT 5, '20070220', 10.00

UNION ALL

SELECT 6, '20070306', 15.00

UNION ALL

SELECT 7, '20070310', 17.50

UNION ALL

SELECT 8, '20070329', 20.00;

 

-- Build list of column values to pivot

DECLARE @cols NVARCHAR(1000);

SELECT @cols =

STUFF((SELECT N'],[' + year_month

       FROM (SELECT DISTINCT CONVERT(NCHAR(7), order_date, 126)

            FROM Orders) AS O(year_month)

       ORDER BY year_month

       FOR XML PATH('')

      ), 1, 2, '') + N']';

 

-- Build dynamic SQL query for pivoting  

DECLARE @sql NVARCHAR(2000);

SET @sql =

N'SELECT order_year, ' + @cols +

N'FROM (SELECT DATEPART(yyyy, order_date) AS order_year, ' +

            N'CONVERT(NCHAR(7), order_date, 126) AS year_month, ' +

            N'amount ' +

      N'FROM Orders) AS O ' +

N'PIVOT ' +

N'(SUM(amount) FOR year_month IN (' + @cols + N')) AS P;';

 

EXEC(@sql);

 

/*

 

Results:

 

order_year 2007-01 2007-02 2007-03

----------- -------- -------- --------

2007        35.00    23.75    52.50

 

*/

This query can be further improved by using the QUOTENAME function to prevent any possibility of SQL injection.

How to extract IDENTITY_SEED and IDENTITY_INCREMENT from autoval column in syscolumns

There are many hints on the internet that in the autoval column in the system table syscolumns in MS SQL Server 2000 it is stored the identity seed and identity increment values for columns that have been set as identity. Problem is there are little clues on how to extract those values from that column, because the data is stored as a binary value.

After some testing on my side I found that the autoval column can split in four parts, two of which represent IDENTITY_SEED and IDENTITY_INCREMENT values. But there is a catch – those binary data of each segment must be first reversed and then the values can be extracted. Here is the script:

T-SQL:

SELECT tCol.[name]
        , IDENTITY_INCREMENT = CAST(CAST(REVERSE(SUBSTRING(autoval, 5, 4)) AS VARBINARY(8)) AS BIGINT)
        , IDENTITY_SEED = CAST(CAST(REVERSE(SUBSTRING(autoval, 9, 4)) AS VARBINARY(8)) AS BIGINT)
FROM [syscolumns] tCol
WHERE tCol.autoval IS NOT NULL

You would ask: “Why the heck to we need such complex query when there are built in function like OBJECTPROPERTY and COLUMNPRPERTY that retrieve those values?”. Well if you are working with linked servers, and you want to retrieve the values from the remote server and you are connected to the local one, those functions do not work 😉

Import XML File to SQL Table

Here is a brief example of importing an XML file into SQL Server table. This is accomplished by using the BULK option of OPENROWSET to load the file, and then utilizing the XQuery capabilities of SQL Server to parse the XML to normalized table format. This example requires SQL server 2005 or SQL Server 2008.

First, the following XML is saved to XML file C:Products.xml.

<Products>

  <Product>

    <SKU>1</SKU>

    <Desc>Book</Desc>

  </Product>

  <Product>

    <SKU>2</SKU>

    <Desc>DVD</Desc>

  </Product>

  <Product>

    <SKU>3</SKU>

    <Desc>Video</Desc>

  </Product>

</Products>

Next, a table named Products is created to store the XML data.

CREATE TABLE Products(

 sku INT PRIMARY KEY,

 product_desc VARCHAR(30));

Finally, the following statement will load the XML file, parse the XML elements to columns, and insert into the Products table:

INSERT INTO Products (sku, product_desc)

SELECT X.product.query('SKU').value('.', 'INT'),

       X.product.query('Desc').value('.', 'VARCHAR(30)')

FROM (

SELECT CAST(x AS XML)

FROM OPENROWSET(

    BULK 'C:Products.xml',

    SINGLE_BLOB) AS T(x)

    ) AS T(x)

CROSS APPLY x.nodes('Products/Product') AS X(product);

Here are the results:

SELECT sku, product_desc

FROM Products;

 

/*

 

Results:

 

sku         product_desc

----------- -------------

1           Book

2           DVD

3           Video

 

*/

Top N by Group

It is a very common request to select the top N items per group. Examples are the top 2 sales agents per region, the last two orders per customer, etc. There are various techniques to accomplish this. On SQL Server 2005 and 2008 this normally involves CROSS APPLY, TOP, and the ranking functions.

Here are a couple examples of solving this problem utilizing the ranking functions in SQL Server 2005/2008. These methods are very simple and efficient, at the same time providing flexibility to manage ties.

CREATE TABLE Loans (

  loan_nbr INT NOT NULL PRIMARY KEY,

  loan_date DATETIME NOT NULL,

  loan_amount DECIMAL(12, 2) NOT NULL

              DEFAULT 0.0,

  customer_nbr INT NOT NULL,

  loan_type CHAR(1) NOT NULL

            DEFAULT 'P'

            CHECK (loan_type IN ('P'-- personal

                                'B')) -- business

  );

 

INSERT INTO Loans

VALUES (1, '20080801', 12000.00, 2, 'P'),

       (2, '20080805', 15700.00, 1, 'B'),

       (3, '20080610', 12000.00, 3, 'P'),

       (4, '20080401', 5000.00, 1, 'P'),

       (5, '20080715', 25000.00, 4, 'B'),

       (6, '20080610', 25000.00, 5, 'P'),

       (7, '20080501', 1000.00, 6, 'P'),

       (8, '20080810', 6000.00, 7, 'B'),

       (9, '20080815', 2000.00, 8, 'B'),

       (10, '20080815', 1000.00, 9, 'P'),

       (11, '20080715', 5500.00, 10, 'P'),

       (12, '20080615', 1000.00, 11, 'B'),

       (13, '20080820', 6000.00, 12, 'P'),

       (14, '20080510', 28000.00, 6, 'B'),

       (15, '20080815', 2000.00, 10, 'P'),

       (16, '20080810', 1500.00, 8, 'P'),

       (17, '20080817', 10000.00, 10, 'B'),

       (18, '20080816', 2500.00, 9, 'P');  

 

-- Top 3 loans by loan type (no ties)

SELECT loan_nbr, loan_type, loan_amount, rk

FROM (SELECT loan_nbr, loan_type, loan_amount,

            ROW_NUMBER() OVER(PARTITION BY loan_type

                               ORDER BY loan_amount DESC) AS rk

      FROM Loans) AS L

WHERE rk <= 3;

 

-- Top 3 loans by loan type (with ties)

SELECT loan_nbr, loan_type, loan_amount, rk

FROM (SELECT loan_nbr, loan_type, loan_amount,

            DENSE_RANK() OVER(PARTITION BY loan_type

                               ORDER BY loan_amount DESC) AS rk

      FROM Loans) AS L

WHERE rk <= 3;

 

-- Latest loan for each customer

SELECT customer_nbr, loan_nbr, loan_type, loan_amount, loan_date

FROM (SELECT loan_nbr, loan_type, loan_amount,

            customer_nbr, loan_date,

            ROW_NUMBER() OVER(PARTITION BY customer_nbr

                               ORDER BY loan_date DESC) AS rk

      FROM Loans) AS L

WHERE rk = 1;

Aggregate Window Functions

The addition of the OVER clause in SQL Server was a great enhancement to the T-SQL language. Using the ranking functions has helped solve an array of problems in a very efficient manner. While there is a huge benefit of the ranking functions, it is often overlooked that the OVER clause supports now aggregate window functions. This means that the window aggregate function (SUM, AVG, COUNT, MIN, MAX) computes a value for each row from a result set derived from the window (partition). That opens an opportunity to solve problems using new methods.

Here are a couple examples to demonstrate using aggregate window functions with the OVER clause.

CREATE TABLE Loans (

  loan_nbr INT NOT NULL PRIMARY KEY,

  loan_date DATETIME NOT NULL,

  loan_amount DECIMAL(12, 2) NOT NULL

              DEFAULT 0.0,

  customer_nbr INT NOT NULL,

  loan_type CHAR(1) NOT NULL

            DEFAULT 'P'

            CHECK (loan_type IN ('P'-- personal

                                'B')) -- business

  );

 

INSERT INTO Loans

VALUES (1, '20080801', 12000.00, 2, 'P'),

       (2, '20080805', 15700.00, 1, 'B'),

       (3, '20080610', 12000.00, 3, 'P'),

       (4, '20080401', 5000.00, 1, 'P'),

       (5, '20080715', 25000.00, 4, 'B'),

       (6, '20080610', 25000.00, 5, 'P'),

       (7, '20080501', 1000.00, 6, 'P'),

       (8, '20080810', 6000.00, 7, 'B'),

       (9, '20080815', 2000.00, 8, 'B'),

       (10, '20080815', 1000.00, 9, 'P'),

       (11, '20080715', 5500.00, 10, 'P'),

       (12, '20080615', 1000.00, 11, 'B'),

       (13, '20080820', 6000.00, 12, 'P'),

       (14, '20080510', 28000.00, 6, 'B'),

       (15, '20080815', 2000.00, 10, 'P'),

       (16, '20080810', 1500.00, 8, 'P'),

       (17, '20080817', 10000.00, 10, 'B'),

       (18, '20080816', 2500.00, 9, 'P');  

 

-- Aggregate window functions with the OVER clause

SELECT loan_nbr, loan_type, loan_amount, customer_nbr,

       SUM(loan_amount) OVER(PARTITION BY customer_nbr) AS total_customer_loans,

       AVG(loan_amount) OVER(PARTITION BY customer_nbr) AS avg_customer_loan_amt,

       MAX(loan_amount) OVER(PARTITION BY customer_nbr) AS max_customer_loan_amt,

       MIN(loan_amount) OVER(PARTITION BY customer_nbr) AS min_customer_loan_amt,

       COUNT(*) OVER(PARTITION BY customer_nbr) AS count_customer_loans

FROM Loans;

 

-- Calculate percent for current loan based on total customer loans

-- and total of all loans

SELECT loan_nbr, loan_type, loan_amount, customer_nbr,

       loan_amount /

       SUM(loan_amount) OVER(PARTITION BY customer_nbr) AS percent_of_customer_loans,

       loan_amount /

       SUM(loan_amount) OVER() AS percent_of_all_loans

FROM Loans;

 

-- Get customers (and all their loans) with more than 2 loans

SELECT customer_nbr, loan_nbr, loan_amount, cnt

FROM (SELECT customer_nbr, loan_nbr, loan_amount,

            COUNT(*) OVER(PARTITION BY customer_nbr) AS cnt

      FROM Loans) AS L

WHERE cnt >= 2;

Resources:

OVER Clause
http://msdn.microsoft.com/en-us/library/ms189461.aspx

Paging with Ranking Functions

Paging through result sets is a very common need in applications. It provides a convenient way to deliver small chunks of data to the client application, minimizing network traffic and allowing end users to browse data in page size format.

There are different methods to accomplish the paging, both on client and server side. The introduction of the ranking functions in SQL Server 2005 (and SQL Server 2008) provides another efficient tool to implement paging.

The following example demonstrates paging utilizing the ROW_NUMBER function. Here it helps to generate sequence for each row ordered by the loan date column, and the sequence is later used to split the result set into pages.

CREATE TABLE Loans (

  loan_nbr INT NOT NULL PRIMARY KEY,

  loan_date DATETIME NOT NULL,

  loan_amount DECIMAL(12, 2) NOT NULL

              DEFAULT 0.0,

  customer_nbr INT NOT NULL,

  loan_type CHAR(1) NOT NULL

            DEFAULT 'P'

            CHECK (loan_type IN ('P'-- personal

                                'B')) -- business

  );

 

INSERT INTO Loans

VALUES (1, '20080801', 12000.00, 2, 'P'),

       (2, '20080805', 15700.00, 1, 'B'),

       (3, '20080610', 12000.00, 3, 'P'),

       (4, '20080401', 5000.00, 1, 'P'),

       (5, '20080715', 25000.00, 4, 'B'),

       (6, '20080610', 25000.00, 5, 'P'),

       (7, '20080501', 1000.00, 6, 'P'),

       (8, '20080810', 6000.00, 7, 'B'),

       (9, '20080815', 2000.00, 8, 'B'),

       (10, '20080815', 1000.00, 9, 'P'),

       (11, '20080715', 5500.00, 10, 'P'),

       (12, '20080615', 1000.00, 11, 'B'),

       (13, '20080820', 6000.00, 12, 'P'),

       (14, '20080510', 28000.00, 6, 'B'),

       (15, '20080815', 2000.00, 10, 'P'),

       (16, '20080810', 1500.00, 8, 'P'),

       (17, '20080817', 10000.00, 10, 'B'),

       (18, '20080816', 2500.00, 9, 'P');  

 

-- Paging

DECLARE @page_size INT = 5;

DECLARE @page_nbr  INT = 4;

 

WITH LoansRanked (loan_date, loan_amount, loan_type, seq)

AS

(SELECT loan_date, loan_amount, loan_type,

        ROW_NUMBER() OVER (ORDER BY loan_date, loan_nbr)

FROM Loans)

SELECT loan_date, loan_amount, loan_type, seq

FROM LoansRanked

WHERE seq > (@page_nbr - 1) * @page_size

  AND seq <= @page_nbr * @page_size;

 

/*

 

Results (4th page which contains only 3 rows):

 

loan_date               loan_amount   loan_type seq

----------------------- ------------- --------- ----

2008-08-16 00:00:00.000     2500.00 P         16

2008-08-17 00:00:00.000     10000.00 B         17

2008-08-20 00:00:00.000     6000.00 P         18

 

*/

Predicates in SQL

One of the most confusing topics for developers coming from traditional languages like C, C++, Java, C#, and VB to SQL, is the evaluation of predicates. In most languages predicate logic is evaluated left to right. SQL is a different story. It follows the concept of all-at-once operations. This means all logical predicates will be evaluated at the same time. In practice, the SQL engine is free to process the logical expressions in any order it finds appropriate and cost effective to retrieve the data.

Below is example to illustrate this behavior. Given a table with account policies, where a policy may be represented only as numeric values for certain accounts, and a mix of alpha-numeric characters for other accounts. Knowing that account 2 has policies that can be represented only as numeric values, the goal is to retrieve policies that are greater than 50000.

An attempt to add a first predicate to filter on account 2 and then a second one to select policies greater than 50000 will fail. Even if the query is formed with a derived table to extract first all policies for account 2 and then the outer query filters on the policy, it will not work. The derived table is expanded in the main query plan and a single query plan is produced. Then the query engine is free to push up and down the predicates in the plan as it finds efficient.

CREATE TABLE AccountPolicies (

 account_nbr INT,

 policy VARCHAR(20),

 PRIMARY KEY (account_nbr, policy));

 

INSERT INTO AccountPolicies VALUES(1, 'P1000234');

INSERT INTO AccountPolicies VALUES(1, 'P1020256');

INSERT INTO AccountPolicies VALUES(2, '1001');

INSERT INTO AccountPolicies VALUES(2, '5002');

INSERT INTO AccountPolicies VALUES(2, '50001');

INSERT INTO AccountPolicies VALUES(2, '50005');

INSERT INTO AccountPolicies VALUES(2, 'P50005');

 

-- Second predicate may be evaluated first

-- resulting in conversion error

SELECT account_nbr, policy

FROM AccountPolicies

WHERE account_nbr = 2

  AND CAST(policy AS INT) > 50000;

 

-- The derived table is expended in the

-- query plan and predicates can be pushed

-- up or down in the plan

SELECT account_nbr, policy

FROM (SELECT account_nbr, policy

      FROM AccountPolicies

      WHERE account_nbr = 2) AS P

WHERE CAST(policy AS INT) > 50000;

One way to solve this is to use a CASE expression to force the desired evaluation because WHEN clauses in CASE are evaluated in order. This solution not only demonstrates how to handle the evaluation process, but adds verification that only policies with numeric values are casted (as usual business rules change and later account 2 may be allowed to create policies with alpha-numeric characters).

-- Enforce sequence of evaluating conditions

-- and cast only valid values

SELECT account_nbr, policy

FROM AccountPolicies

WHERE CASE

        WHEN account_nbr = 2

        THEN CASE

               WHEN policy NOT LIKE '%[^0-9]%'

               THEN CASE

                      WHEN CAST(policy AS INT) > 50000

                      THEN 'True'

                      ELSE 'False'

                    END

               ELSE 'False'

            END            

        ELSE 'False'

      END = 'True';

Division by Zero

Handling division by zero is a common need in SQL queries. Here is the standard way of performing a check if divisor is zero using the CASE function.

-- Sample table

CREATE TABLE Foo (

 keycol INT PRIMARY KEY,

 x INT,

 y INT);

 

INSERT INTO Foo

VALUES(1, 15, 3), (2, 10, 0);

 

-- Using CASE to check divisor for zero

SELECT keycol,

       CASE WHEN y = 0

            THEN 0

            ELSE x / y

       END AS result

FROM Foo;

While this is a very natural way to handle this problem, here is another approach which is more compact and as effective. The method below uses the NULLIF function to set the divisor to NULL if it is zero, then since the result from the division will be NULL the COALESCE function is used to return 0.

-- Using NULLIF and COALESCE

SELECT keycol,

       COALESCE(x / NULLIF(y, 0), 0) AS result

FROM Foo;

Top 10 T-SQL Enhancements in SQL Server 2008

As SQL Server 2008 is getting in the advances phases of release, a lot has been published about new features and changes. There have been a few enhancements to Transact SQL that will be a great help for both developers and administrators. Here is list of the top 10 in no particular order, with links to detailed examples. In other words:

SELECT TOP(10) t_sql_enhancements

FROM SQLServer2008

ORDER BY CHECKSUM(NEWID());

• Delighters: http://pratchev.blogspot.com/2008/04/delighters-in-sql-server-2008.html
• Row constructors: http://pratchev.blogspot.com/2008/01/table-value-constructors-in-sql-server.html
• MERGE: http://pratchev.blogspot.com/2008/03/upsert-and-more-with-merge.html
• Composable DML: http://pratchev.blogspot.com/2008/04/composable-dml.html
• Table valued parameters: http://pratchev.blogspot.com/2008/04/table-valued-parameters.html
• Filtered indexes: http://pratchev.blogspot.com/2008/04/filtered-indexes.html
• Sparse columns: http://pratchev.blogspot.com/2008/04/sparse-columns.html
• Hierarchy ID: http://pratchev.blogspot.com/2008/05/hierarchies-in-sql-server-2008.html
• Date and Time data types: http://pratchev.blogspot.com/2008/05/new-date-and-time-data-types.html
• FILESTREAM data type: http://pratchev.blogspot.com/2008/05/filestream-data-type.html

Here are a few more that did not make the list (casualty of random ordering) but are still of importance:

• Spatial data (GEOMETRY and GEOGRAPHY): http://pratchev.blogspot.com/2008/06/spatial-support-in-sql-server.html
• Grouping Sets: http://msdn.microsoft.com/en-us/library/bb522495(SQL.100).aspx
• Table hints: http://msdn.microsoft.com/en-us/library/bb510478(SQL.100).aspx
• Star join query optimizations: http://technet.microsoft.com/en-us/magazine/cc434693(TechNet.10).aspx
• T-SQL debugger: http://blogs.msdn.com/buckwoody/archive/2008/04/25/sql-server-2008-management-improvements-t-sql-debugger.aspx

Sparse Columns

Sparse columns are another addition to the SQL Server 2008 tool belt. They offer a good trade-off for many applications: taking no space if they are empty and more space if they have data. In other words they optimize storage for NULL values.

Sparse columns are just like ordinary columns with a few limitations. They are defined with the SPARSE keyword and there is no difference on how they are used in data manipulation statements.

Here are the details:

Pros:
• Storing NULL in a sparse column takes up no space at all
• Up to 30,000 columns
• To any external application the column will behave the same
• Sparse columns fit well with filtered indexes to index on non-NULL values

Cons:
• If a sparse column has data it takes 4 more bytes than a normal column
• Not all data types can be sparse: TEXT, NTEXT, IMAGE, TIMESTAMP, user-defined data types, GEOMETRY, GEOGRAPHY, and VARBINARY(MAX) with the FILESTREAM attribute
• Computed columns cannot be sparse
• Cannot have default values

Here is example of creating and using sparse columns.

CREATE TABLE Survey (

  survey_nbr INT NOT NULL PRIMARY KEY,

  survey_desc VARCHAR(30),

  survey_info1 VARCHAR(30) SPARSE NULL,

  survey_info2 VARCHAR(30) SPARSE NULL,

  survey_info3 VARCHAR(30) SPARSE NULL,

  survey_info4 VARCHAR(30) SPARSE NULL);

 

INSERT INTO Survey(survey_nbr, survey_desc, survey_info1)

VALUES(1, 'Survey 1', 'some data'),

      (2, 'Survey 2', 'good');

Tables with sparse columns can have defined column set to manipulate all sparse columns as a structure. The column set is untyped XML column and is treated as any other XML column.

Here are the characteristics of column sets:

• Defined as XML data type column
• Not stored, like computed column, but updatable
• Only one column set per table
• Cannot be changed
• Cannot be added if the table already has sparse columns
• Limited by XML data size limit of 2 GB (all sparse columns in a row cannot exceed 2 GB)
• Cannot define constraints or default values
• Allows to update or insert sparse column values
• On update all sparse columns are updated, when no value provided sets the column to NULL

Follows example of creating a column set and inserting data into sparse columns using the column set.

CREATE TABLE Survey (

  survey_nbr INT NOT NULL PRIMARY KEY,

  survey_desc VARCHAR(30),

  survey_info1 VARCHAR(30) SPARSE NULL,

  survey_info2 VARCHAR(30) SPARSE NULL,

  survey_info3 VARCHAR(30) SPARSE NULL,

  survey_info4 VARCHAR(30) SPARSE NULL,

  survey_set XML column_set FOR ALL_SPARSE_COLUMNS);

 

-- Insert using the regular sparse columns

INSERT INTO Survey(survey_nbr, survey_desc, survey_info1)

VALUES(1, 'Survey 1', 'some data'),

      (2, 'Survey 2', 'good');

 

-- Insert using the column set

INSERT INTO Survey(survey_nbr, survey_desc, survey_set)

VALUES(3, 'Survey 3', '<survey_info3>data 3</survey_info3><survey_info4>answer 4</survey_info4>');

Any mistyping of column name in the XML column set or providing invalid data type value will result in error as when manipulating directly the columns.

Resources:

Using Sparse Columns
http://technet.microsoft.com/en-us/library/cc280604(SQL.100).aspx

Using Column Sets
http://technet.microsoft.com/en-us/library/cc280521(SQL.100).aspx