## Unpivoting Data

Unpivoting data is a very useful technique for normalizing denormalized tables. It refers to the process of transposing multiple columns across into rows in a single column. In other words, taking the data below:

`sales_year  first_quarter  second_quarter   third_quarter   fourth_quarter----------  -------------  --------------   -------------   --------------2006        211203.50      381594.95        439187.00       503155.802007        231205.10      451101.25        601209.40       531907.30`

and converting to look like this:

`sales_year   sales_quarter  sales_amount-----------  -------------  ------------2006         1              211203.502006         2              381594.952006         3              439187.002006         4              503155.802007         1              231205.102007         2              451101.252007         3              601209.402007         4              531907.30`

Here are different methods to achieve that:

`-- Create sample tableCREATE TABLE QuarterlySales ( sales_year INT PRIMARY KEY, first_quarter DECIMAL(10, 2), second_quarter DECIMAL(10, 2), third_quarter DECIMAL(10, 2), fourth_quarter DECIMAL(10, 2)); -- Insert dataINSERT INTO QuarterlySales VALUES(2006, 211203.50, 381594.95, 439187.00, 503155.80); INSERT INTO QuarterlySales VALUES(2007, 231205.10, 451101.25, 601209.40, 531907.30); -- Method 1) Using UNIONSELECT sales_year,        1 AS sales_quarter,        first_quarter AS sales_amountFROM QuarterlySalesUNION ALLSELECT sales_year, 2, second_quarterFROM QuarterlySalesUNION ALLSELECT sales_year, 3, third_quarterFROM QuarterlySales UNION ALLSELECT sales_year, 4, fourth_quarterFROM QuarterlySalesORDER BY sales_year, sales_quarter; -- Method 2) Using cross join with table with numbersSELECT sales_year,       qtr AS sales_quarter,       CASE qtr          WHEN 1 THEN first_quarter         WHEN 2 THEN second_quarter         WHEN 3 THEN third_quarter         WHEN 4 THEN fourth_quarter       END AS sales_amountFROM QuarterlySalesCROSS JOIN(SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4) AS Quarters(qtr)ORDER BY sales_year, sales_quarter; -- Method 3) Using the UNPIVOT operator in SQL Server 2005SELECT sales_year,       CASE sales_quarter         WHEN 'first_quarter' THEN 1         WHEN 'second_quarter' THEN 2         WHEN 'third_quarter' THEN 3         WHEN 'fourth_quarter' THEN 4       END AS sales_quarter,       sales_amount     FROM QuarterlySalesUNPIVOT(sales_amount FOR  sales_quarter IN  (first_quarter, second_quarter,   third_quarter, fourth_quarter)) AS UORDER BY sales_year, sales_quarter;`

It is good no note that unpivoting multiple columns (like amount1, date1, amount2, date2 to amount and date columns) is not supported by the SQL Server 2005 UNPIVOT operator. This can be achieved only by using the first two methods.

Using PIVOT and UNPIVOT
http://msdn2.microsoft.com/en-us/library/ms177410.aspx

## Indexes on Table Variables

It is a common misunderstanding when comparing temporary tables and table variables that one difference is that table variables cannot have indexes. While table variables are not directly materialized and you cannot execute CREATE INDEX on them, an index can be part of the definition of the table variable.

Here is one example with two indexes on a table variable.

`SET SHOWPLAN_TEXT ON;GO DECLARE @Employees TABLE (emp_nbr INT PRIMARY KEY NONCLUSTERED,                          emp_name VARCHAR(35),                          ssn CHAR(11) UNIQUE CLUSTERED);  INSERT INTO @Employees VALUES(1, 'Jim Brown', '987-11-0234');INSERT INTO @Employees VALUES(2, 'Chris Defoe', '987-13-9899');INSERT INTO @Employees VALUES(3, 'Keith Goldwin', '986-31-1543'); SELECT emp_nbr, emp_name, ssnFROM @EmployeesWHERE emp_nbr = 2; SELECT emp_nbr, emp_name, ssnFROM @EmployeesWHERE ssn LIKE '998-13%'; --Results  |--Nested Loops(Inner Join, OUTER REFERENCES:([ssn]))       |--Index Seek(OBJECT:(@Employees), SEEK:([emp_nbr]=(2)) ORDERED FORWARD)       |--Clustered Index Seek(OBJECT:(@Employees), SEEK:([ssn]=[ssn]) LOOKUP ORDERED FORWARD)   |--Clustered Index Seek(OBJECT:(@Employees), SEEK:([ssn] >= '998-12þ' AND [ssn] < '998-14'), WHERE:([ssn] like '998-13%') ORDERED FORWARD)`

## Use ABS to Find the Closest Match

The ABS function in SQL Server is a little bit overlooked and unutilized. It returns the absolute, positive value of the given numeric expression.

One particular set of problems that ABS is very useful is when needing the find the closest match to a given value. That could be to find the closest ZIP code, date, etc. Below are two examples of utilizing ABS for this purpose.

`-- Create table with sales regionsCREATE TABLE SalesRegions ( zip_code INT PRIMARY KEY, region_name VARCHAR(35)); -- Insert sampel dataINSERT INTO SalesRegions VALUES(40320, 'North Area');INSERT INTO SalesRegions VALUES(40324, 'North/East Area');INSERT INTO SalesRegions VALUES(40326, 'North/West Area'); -- Define search zip codeDECLARE @search_zip INT;SET @search_zip = 40323; -- Find the closes region by zip codeSELECT TOP(1) zip_code, region_nameFROM SalesRegionsORDER BY ABS(zip_code - @search_zip); -- Resultszip_code    region_name----------- ---------------40324       North/East Area  -- Create Employee tableCREATE TABLE Employees ( employee_nbr INT PRIMARY KEY, employee_name VARCHAR(35)); -- Employee evaluation datesCREATE TABLE EmployeeEvaluations ( employee_nbr INT, evaluation_date DATETIME, PRIMARY KEY (employee_nbr, evaluation_date)); -- Sample dataINSERT INTO Employees VALUES(1, 'John Doe');INSERT INTO Employees VALUES(2, 'Jeff Brown'); INSERT INTO EmployeeEvaluations VALUES(1, '20070101');INSERT INTO EmployeeEvaluations VALUES(1, '20080101');INSERT INTO EmployeeEvaluations VALUES(1, '20080304');INSERT INTO EmployeeEvaluations VALUES(2, '20080604'); -- Find the closest evaluation date, could be in the futureSELECT A.employee_nbr, A.employee_name, B.evaluation_dateFROM Employees AS ACROSS APPLY(SELECT TOP(1) B.evaluation_date FROM EmployeeEvaluations AS B WHERE A.employee_nbr = B.employee_nbr ORDER BY ABS(DATEDIFF(DAY, B.evaluation_date, CURRENT_TIMESTAMP))) AS B; -- Resultsemployee_nbr employee_name  evaluation_date------------ -------------- -----------------------1            John Doe       2008-03-04 00:00:00.0002            Jeff Brown     2008-06-04 00:00:00.000`

## Hierarchies with CTEs

Common table expressions (CTEs) have many applications. However, one of their capabilities to implement recursive queries is very useful for navigating and manipulating hierarchies.

Here is one brief example of utilizing that. Given a table with employees and their managers represented as adjacency list, provide list of employees that report to particular manager, ordered by the natural hierarchy order of listing each employee under the corresponding manager.

`-- Create sample table with employees and managersCREATE TABLE Employees( employee_nbr INT NOT NULL PRIMARY KEY, employee_name VARCHAR(35), manager_nbr INT NULL REFERENCES Employees(employee_nbr)); INSERT INTO Employees VALUES (1, 'John Doe', NULL);INSERT INTO Employees VALUES (2, 'James Brown', 1);INSERT INTO Employees VALUES (3, 'Keith Green', NULL);INSERT INTO Employees VALUES (4, 'Peter Roth', 2);INSERT INTO Employees VALUES (5, 'Hans Gruber', 2);INSERT INTO Employees VALUES (6, 'Kris Evans', 4);INSERT INTO Employees VALUES (7, 'Jeff Colleman', NULL); -- Use recursive CTE to build binary and charachter pathsWITH EmployeeHierarchyAS(SELECT employee_nbr, employee_name, manager_nbr,        CAST(employee_nbr AS VARBINARY(MAX)) AS bpath,        CAST('.' +              CAST(employee_nbr AS VARCHAR(4)) +             '.' AS VARCHAR(MAX)) AS cpath FROM Employees WHERE manager_nbr IS NULL UNION ALL SELECT E.employee_nbr, E.employee_name, E.manager_nbr,        H.bpath + CAST(E.employee_nbr AS BINARY(4)),        H.cpath + CAST(E.employee_nbr AS VARCHAR(4)) + '.' FROM Employees AS E JOIN EmployeeHierarchy AS H   ON E.manager_nbr = H.employee_nbr)SELECT employee_nbr, employee_name, manager_nbr, bpath, cpathFROM EmployeeHierarchyWHERE cpath LIKE '%.2.%' -- filter all employees for manager 2ORDER BY bpath;          -- order by natural hierarchy path -- Resultsemployee_nbr employee_name  manager_nbr bpath                              cpath------------ -------------- ----------- ---------------------------------- ----------2            James Brown    1           0x0000000100000002                 .1.2.4            Peter Roth     2           0x000000010000000200000004         .1.2.4.6            Kris Evans     4           0x00000001000000020000000400000006 .1.2.4.6.5            Hans Gruber    2           0x000000010000000200000005         .1.2.5.`

The method above creates two manager/employee paths: a binary path that preserves the natural ordering at each level, which can be used to sort; a character path that can be used to filter by manager.

## Parameter Sniffing

What is “parameter sniffing”? When a stored procedure is first executed SQL Server looks at the input parameters and uses this as guidance to build the query plan. This is known as “parameter sniffing”.

This is good as long as the input parameters for the first invocation are typical for future invocations. But if that is not the case this will cause performance problems.

For example, a procedure is supposed to retrieve all rows for customer orders with non-clustered index on the customer column. If the first invocation returns a small set of orders it may be most efficient to use index seek. Further invocations may be for large set of orders, but the first cached plan with index seek will be used resulting in poor performance (instead of using a scan).

Here is one example stored procedure and different methods to handle parameter sniffing.

`CREATE PROCEDURE GetCustomerOrders @customerid NCHAR(5)ASBEGIN     SELECT orderid, customerid, orderdate, shippeddate    FROM Orders    WHERE customerid = @customerid; END`

Replace parameters with local variables

This solution is based on assigning the stored procedure parameters to local variables and then using the local variables in the query. This works because SQL Server is not sniffing local variables and using the local variables in place of parameters forces plan generated based on statistics (in effect this disables parameter sniffing).

`CREATE PROCEDURE GetCustomerOrders @customerid NCHAR(5)ASBEGIN     DECLARE @local_customerid NCHAR(5);     SET @local_customerid = @customerid;     SELECT orderid, customerid, orderdate, shippeddate    FROM Orders    WHERE customerid = @local_customerid; END`

Execute using WITH RECOMPILE

This solution forces recompile of the stored procedure on each run, that way forcing a fresh query plan for the current parameters. Note that this will recompile all statements inside the stored procedure.

`EXEC GetCustomerOrders @customerid = N'CACYK' WITH RECOMPILE;`

Query hint RECOMPILE

SQL Server 2005 offers the new query hint RECOMPILE which will force recompilation of the individual query. This method is better than the prior method because recompilation will affect only one statement and all other queries in the stored procedure will not be recompiled.

`CREATE PROCEDURE GetCustomerOrders @customerid NCHAR(5)ASBEGIN     SELECT orderid, customerid, orderdate, shippeddate    FROM Orders    WHERE customerid = @customerid    OPTION (RECOMPILE); END`

Query hint OPTIMIZE FOR

Another new query hint in SQL Server 2005 is OPTIMIZE FOR. It allows specifying a constant that will be used to optimize the query plan instead of the variable. This could be useful if it is known that particular selective value is frequently used to invoke the stored procedure. However, any other parameter value will suffer the same performance problems.

`CREATE PROCEDURE GetCustomerOrders @customerid NCHAR(5)ASBEGIN     SELECT orderid, customerid, orderdate, shippeddate    FROM Orders    WHERE customerid = @customerid    OPTION (OPTIMIZE FOR (@customerid = N'CACYK')); END`

Note: SQL Server 2008 adds a new option to specify “OPTION (OPTIMIZE FOR UNKNOWN)”. This specifies that the query optimizer will use statistical data instead of the initial value to determine the value for query optimization.

Plan Guides

Plan guides in SQL Server 2005 provide the opportunity to optimize a query without changing the actual code of the query. This is especially useful when dealing with third party vendor applications where access to code may not be available. A plan guide allows associating query hints with a query without changing the query.

`EXEC sp_create_plan_guide      @name = N'SolveParameterSniffing',     @stmt = N'SELECT orderid, customerid, orderdate, shippeddate               FROM Orders               WHERE customerid = @customerid',     @type = N'OBJECT',     @module_or_batch = N'GetCustomerOrders',     @params = NULL,     @hints = N'OPTION (RECOMPILE)';`

USE PLAN query hint
Another plan stability feature in SQL Server 2005 is the USE PLAN “xml_plan” query hint, which allows forcing the use of a specific plan every time the query is run.

Batch Compilation, Recompilation, and Plan Caching Issues in SQL Server 2005
http://www.microsoft.com/technet/prodtechnol/sql/2005/recomp.mspx

## Column Properties

There are different ways to query the meta data in SQL Server. The system catalog views in SQL Server are one great improvement. Here is one example on retrieving properties for all columns, including default values and description (if defined as extended property).

`SELECT SCHEMA_NAME(T.schema_id) AS 'Schema',       T.name AS 'Table Name',        C.name AS 'Column Name',       S.name AS 'Data Type',       C.precision AS 'Precision',       C.scale AS 'Scale',       CASE WHEN S.name IN ('nvarchar', 'nchar')             THEN C.max_length/2             ELSE C.max_length        END AS 'Length',       CASE WHEN C.is_nullable = 1             THEN 'Y'             ELSE 'N'        END AS 'Allow NULLs',       D.definition AS 'Default Value',       P.value AS 'Description'FROM sys.tables AS TJOIN sys.columns AS C  ON T.object_id = C.object_idJOIN sys.types AS S  ON C.user_type_id = S.user_type_idLEFT OUTER JOIN sys.default_constraints AS D  ON C.object_id = D.parent_object_id AND C.column_id = D.parent_column_idLEFT OUTER JOIN sys.extended_properties  AS P  ON C.object_id = P.major_id  AND C.column_id = P.minor_id  AND P.class = 1 AND P.name = 'MS_Description'WHERE T.type = 'U'ORDER BY T.name, C.column_id;`

## Shredding XML in SQL Server 2005

Using XML data has many applications. In databases in particular it can be used for passing parameters from client applications, exchange data between SQL modules, or storing details in XML columns. SQL Server 2005 offers new capabilities and native support for XML. Below is one example of utilizing some of the new XML methods in XML to shred XML data to table format.

`-- Declare XML variableDECLARE @data XML; -- Element-centered XMLSET @data = N'<data>    <customer>      <id>1</id>      <name>Allied Industries</name>    </customer>    <customer>      <id>2</id>      <name>Trades International</name>    </customer>  </data>'; -- Using the query() methodSELECT T.customer.query('id').value('.', 'INT') AS customer_id,       T.customer.query('name').value('.', 'VARCHAR(20)') AS customer_nameFROM @data.nodes('data/customer') AS T(customer); -- Using the value() methodSELECT T.customer.value('(id)[1]', 'INT') AS customer_id,       T.customer.value('(name)[1]', 'VARCHAR(20)') AS customer_nameFROM @data.nodes('data/customer') AS T(customer); -- Select only customer which id equals 2 using the exist() methodSELECT T.customer.value('(id)[1]', 'INT') AS customer_id,       T.customer.value('(name)[1]', 'VARCHAR(20)') AS customer_nameFROM @data.nodes('data/customer') AS T(customer)WHERE T.customer.exist('id/text()[. = "2"]') = 1; -- Attribute-centered XMLSET @data = N'<data>    <customer id="1" name="Allied Industries"/>    <customer id="2" name="Trades International"/>  </data>'; -- Using the value() methodSELECT T.customer.value('@id', 'INT') AS customer_id,       T.customer.value('@name', 'VARCHAR(20)') AS customer_nameFROM @data.nodes('data/customer') AS T(customer);  -- Resultscustomer_id customer_name----------- --------------------1           Allied Industries2           Trades International`

Testing the processing for element-centered XML versus attribute-centered XML does show that attribute-centered XML is processed faster.

Resources:

XML Support in Microsoft SQL Server 2005
http://msdn2.microsoft.com/en-us/library/ms345117.aspx

## Anatomy of a Query

To write a good and correct query it is very important to understand the logical processing of a query. It is sad that very few SQL programming manuals start with that. Here is a look at the insides of logical query processing.

First, to start with a note: the logical processing of a query does not guarantee that the query will be executed that way. Since the cost based optimization of SQL Server goes through a few steps to find the optimal execution plan, it may choose to process the query different way that the sequence of steps below. Examples are when a WHERE predicate is applied in a JOIN, or when it is pushed down to the HAVING clause.

The logical query processing determines logically what the final result set would look like, regardless of how the database engine will manage to generate that result physically.

Here are the phases of the logical query processing in order of processing:

FROM: This is where it all starts! The FROM defines the source table(s) for the processing. If more than one table is included, then a Cartesian product (cross join) is performed between the first two tables. The result of the Cartesian product is every possible combination of a row from the first table and a row from the second table. That is, if one of the tables has M rows and the other one has N rows then the result set will be M x N rows.

ON: Next, the ON filter is applied to the result set. Only rows that match the join condition are selected.

OUTER: If an OUTER join is used, then rows from the preserved table for which a match was not found are added back. In INNER joins this phase is skipped. If more than two tables are in the FROM clause, then FROM/ON/OUTER are applied between the current result set and the next table, until all tables are processed.

WHERE: The next step is to apply the WHERE filter to the result set. Rows that satisfy the where conditions are selected. Only conditions that evaluate to TRUE (not UNKNOWN or FALSE) will pass.

GROUP BY: The result set rows are divided in groups based on the column list specified in GROUP BY. Rows of the grouped set must be only grouping columns, aggregate functions (i.e. COUNT, SUM, AVG, MIN, MAX), function or constants, and an expression made up of the first three items. Important to note here is that NULL values are considered as equal and grouped into one group.

HAVING: Next, the HAVING filter is applied. Only groups that match the having conditions are selected. If a group doesn’t satisfy the conditions, then the whole group is removed from the result set. If there is no GROUP BY the entire result set is treated as one group.

SELECT: Here it is, the SELECT list! The SELECT list is processed down here, even that it is the first line in the query. At this time column aliases are added. Steps prior to this in the list cannot use the column aliases, only the following steps will see the aliases. This is the step that will define the columns in the result set.

If there is a SELECT DISTINCT, then the duplicate rows are removed. Just like in GROUP BY, NULLs are treated as matching.

ORDER BY: The rows of the result set are sorted according to the column list specified in the ORDER BY clause. Only using ORDER BY can guarantee a sort order for rows. Otherwise the tables are unordered sets. At this step the result set is transformed to a cursor. Also, NULLs are considered equal for sorting. The ANSI standard doesn’t allow to order by columns that are not included in the SELECT list, but SQL Server allows doing that (even by expressions based on those columns). Columns in the ORDER by can be referred to by the alias or by their ordinal number.

A couple more notes. The CUBE & ROLLUP are processed after GROUP BY. At that point the super-groups are added to the result set. Also, TOP which is T-SQL specific is processed after ORDER BY, and there the specified number or percentage or rows are selected in the final result set.

For nested queries, the innermost queries can reference columns and tables in the queries in which they are contained.

## Bulk Loading Images in SQL Server

Loading images and any binary files (like Adobe PDF documents) to a database is not always the best option but sometimes needed. In SQL Server 2005 this process is simplified a lot with the BULK option of OPENROWSET. Here is an example of inserting image data into a VARBINARY(MAX) column (the same applies to loading an Adobe PDF file or any other binary file):

`CREATE TABLE Foobar (  image_data VARBINARY(MAX)); INSERT INTO Foobar (image_data)SELECT image_dataFROM OPENROWSET(         BULK N'C:image.jpg',         SINGLE_BLOB) AS ImageSource(image_data);`

Note the use of the SINGLE_BLOB option.

## Keeping History Data in SQL Server

Very often there is the need to archive data on daily basis. Just had that question today and here is trimmed down solution based on a recent project.

The task is to keep history data on daily basis of changed data only (that is only rows that have changed for the last day). The scenario is a Loan table with loan number and loan amount. If there is change in any of the columns in the table, that change needs to be inserted into a history table. Further, the history table should be able to provide a convenient way to query the latest data (most recent updated row) for reporting purposes.

Here is the code to that solution:

`-- Create sample tables. CREATE TABLE Loans (  loan_nbr INTEGER NOT NULL PRIMARY KEY,  loan_amount DECIMAL(12, 2) DEFAULT 0.0 NOT NULL); CREATE TABLE LoansHistory (  loan_nbr INTEGER NOT NULL,  loan_amount DECIMAL(12, 2) NOT NULL,  change_date DATETIME               DEFAULT CAST(DATEDIFF(day, 0, CURRENT_TIMESTAMP) AS DATETIME)              NOT NULL,  PRIMARY KEY (loan_nbr, change_date),  archive_date DATETIME NULL,  CHECK (change_date < archive_date)); GO -- Sample data. INSERT INTO Loans VALUES (1, 100.00);INSERT INTO Loans VALUES (2, 150.00);INSERT INTO Loans VALUES (3, 120.00);INSERT INTO Loans VALUES (4, 160.00); INSERT INTO LoansHistory VALUES (1, 100.00, '20070501', NULL);INSERT INTO LoansHistory VALUES (2, 95.00, '20070503', NULL);INSERT INTO LoansHistory VALUES (3, 80.00, '20070506', '20070508');INSERT INTO LoansHistory VALUES (3, 85.00, '20070508', NULL); GO -- View to use for latest loans. CREATE VIEW CurrentLoans (loan_nbr, loan_amount)ASSELECT loan_nbr, loan_amountFROM LoansHistoryWHERE archive_date IS NULL; GO -- Insert changed rows. -- SQL Server 2005. INSERT INTO LoansHistory (loan_nbr, loan_amount)SELECT loan_nbr, loan_amount FROM LoansEXCEPTSELECT loan_nbr, loan_amount FROM CurrentLoans; -- SQL Server 2000.  INSERT INTO LoansHistory (loan_nbr, loan_amount)SELECT loan_nbr, loan_amountFROM ( SELECT 'Live' AS Source,              loan_nbr,              loan_amount         FROM Loans         UNION ALL         SELECT 'History' AS Source,              loan_nbr,              loan_amount         FROM CurrentLoans ) AS LGROUP BY loan_nbr, loan_amountHAVING COUNT(*) = 1   AND MIN(Source) = 'Live'; -- Update archive date for old history.-- Runs in the same transaction as the insert of new history. UPDATE LoansHistorySET archive_date = CAST(DATEDIFF(day, 0, CURRENT_TIMESTAMP) AS DATETIME)WHERE archive_date IS NULL  AND EXISTS (SELECT *               FROM LoansHistory AS H              WHERE H.loan_nbr = LoansHistory.loan_nbr                AND H.change_date > LoansHistory.change_date); GO -- Use the view for all reporting purposes. SELECT loan_nbr, loan_amountFROM CurrentLoansORDER BY loan_nbr; -- Loans as of 2007-05-08. SELECT loan_nbr, loan_amountFROM LoansHistoryWHERE change_date <= '20070508'  AND COALESCE(archive_date, CURRENT_TIMESTAMP) > '20070508';`

The process is fairly simple with keeping two dates: change date for when the last change occurred (and that is the date when the row was inserted in history), and archive date which indicates that this row is archived and there is another more recent change. If the archive data is NULL it indicates that this is the most recent change. To simplify the querying process both dates have time portion set to midnight.