Sparse Columns

Sparse columns are another addition to the SQL Server 2008 tool belt. They offer a good trade-off for many applications: taking no space if they are empty and more space if they have data. In other words they optimize storage for NULL values.

Sparse columns are just like ordinary columns with a few limitations. They are defined with the SPARSE keyword and there is no difference on how they are used in data manipulation statements.

Here are the details:

Pros:
• Storing NULL in a sparse column takes up no space at all
• Up to 30,000 columns
• To any external application the column will behave the same
• Sparse columns fit well with filtered indexes to index on non-NULL values

Cons:
• If a sparse column has data it takes 4 more bytes than a normal column
• Not all data types can be sparse: TEXT, NTEXT, IMAGE, TIMESTAMP, user-defined data types, GEOMETRY, GEOGRAPHY, and VARBINARY(MAX) with the FILESTREAM attribute
• Computed columns cannot be sparse
• Cannot have default values

Here is example of creating and using sparse columns.

CREATE TABLE Survey (

  survey_nbr INT NOT NULL PRIMARY KEY,

  survey_desc VARCHAR(30),

  survey_info1 VARCHAR(30) SPARSE NULL,

  survey_info2 VARCHAR(30) SPARSE NULL,

  survey_info3 VARCHAR(30) SPARSE NULL,

  survey_info4 VARCHAR(30) SPARSE NULL);

 

INSERT INTO Survey(survey_nbr, survey_desc, survey_info1)

VALUES(1, 'Survey 1', 'some data'),

      (2, 'Survey 2', 'good');

Tables with sparse columns can have defined column set to manipulate all sparse columns as a structure. The column set is untyped XML column and is treated as any other XML column.

Here are the characteristics of column sets:

• Defined as XML data type column
• Not stored, like computed column, but updatable
• Only one column set per table
• Cannot be changed
• Cannot be added if the table already has sparse columns
• Limited by XML data size limit of 2 GB (all sparse columns in a row cannot exceed 2 GB)
• Cannot define constraints or default values
• Allows to update or insert sparse column values
• On update all sparse columns are updated, when no value provided sets the column to NULL

Follows example of creating a column set and inserting data into sparse columns using the column set.

CREATE TABLE Survey (

  survey_nbr INT NOT NULL PRIMARY KEY,

  survey_desc VARCHAR(30),

  survey_info1 VARCHAR(30) SPARSE NULL,

  survey_info2 VARCHAR(30) SPARSE NULL,

  survey_info3 VARCHAR(30) SPARSE NULL,

  survey_info4 VARCHAR(30) SPARSE NULL,

  survey_set XML column_set FOR ALL_SPARSE_COLUMNS);

 

-- Insert using the regular sparse columns

INSERT INTO Survey(survey_nbr, survey_desc, survey_info1)

VALUES(1, 'Survey 1', 'some data'),

      (2, 'Survey 2', 'good');

 

-- Insert using the column set

INSERT INTO Survey(survey_nbr, survey_desc, survey_set)

VALUES(3, 'Survey 3', '<survey_info3>data 3</survey_info3><survey_info4>answer 4</survey_info4>');

Any mistyping of column name in the XML column set or providing invalid data type value will result in error as when manipulating directly the columns.

Resources:

Using Sparse Columns
http://technet.microsoft.com/en-us/library/cc280604(SQL.100).aspx

Using Column Sets
http://technet.microsoft.com/en-us/library/cc280521(SQL.100).aspx

Passing DataSet to a WCF Method

Recently I’ve made a simple WCF service that had a method that takes a MemoryStream parameter. After some time I had to add another method that should take a whole DataSet as parameter. And then my problems begun!

After I’ve added the method and refreshed the service reference of my client the proxy class generator decided to create his own MemoryStream class in the namespace of my WCF service! Then I knew that the true evil of DataSets is back (for more informations see Scott Hanselmann’s post about using DataSets in web services). The solution – use the DataSet’s methods to get the data in XML format and pass the XML string instead of the DataSet itself to the service. Here is a sample code:

IService.vb:

<ServiceContract()> _
Public Interface IService
    <OperationContract()> _
    Function PassDataSet(ByVal sXMLSchema As String, ByVal sXML As String) As Boolean
End Interface

Note that because we do not know the schema of the DataSet in the service we also pass the XML schema of the DataSet.

Service.vb:

Public Class Service
    Implements IService
    Public Function PassDataSet(ByVal sXMLSchema As String, ByVal sXML As String) As Boolean Implements IService.PassDataSet
        Dim dsObjects As New Data.DataSet
        'First read the schema of the DataSet
        dsObjects.ReadXmlSchema(New MemoryStream(Text.Encoding.Unicode.GetBytes(sXMLSchema)))
        'Then read the data itself
        dsObjects.ReadXml(New MemoryStream(Text.Encoding.Unicode.GetBytes(sXML)))
        'Here do what ever you like with your DataSet

        'Finally return a value to the client
        Return True
    End Function
End Class

And finally here is how you call the method:

Client:

Dim proxy As New ServiceClient()
Dim dsDataSet As New DataSet()

'Here load your data in your DataSet

'Finally call the service
proxy.PassDataSet(dsDataSet.GetXmlSchema(), dsDataSet.GetXml())

Now you can enjoy your WCF service that accepts a DataSet 🙂

SQL Saturday

SQL Saturday is a free event for SQL Server professionals. It offers multiple tracks with sessions for different skill levels. It is a great way to learn, share knowledge, and network with colleagues in the SQL Server field.

The upcoming SQL Saturday #3 in Jacksonville (May 3rd, 2008) includes four complete tracks for developers, DBAs, SQL Server 2008, and performance tuning. Here is the list of speakers:
Andy Warren, Bayer White, Brandie Tarvin, Brian Kelley, Brian Knight (2 of them), Bryan Oliver, Chris Rock, Darren Herbold, David Dye, David Fekke, Devin Knight, Fadi Albatal, Geoff Hiten, Jeffrey Garbus, Joe Healy, Joseph Memmo, Kent Waldrop, Mark Polino, Mike Mollenhour, Pam Shaw, Patrick Thompson, Plamen Ratchev, Rodney Landrum, Shawn Weisfeld, Sherif Elfayoumy, and Tim Mitchell.

I will be presenting two sessions: “Performance Tuning and Query Optimization” and “T-SQL Enhancements in SQL Server 2008”.

Visit http://www.sqlsaturday.com/ to see the complete schedule and register.

Auditing Data Modification Statements Using Trigger

Here is an interesting problem: how to audit data modification statements against a table? The goal is to log the SQL executed to modify the table using a trigger. Of course, SELECT statements cannot be logged since only INSERT/UPDATE/DELETE can invoke a trigger. One way to implement this is using a server side trace (which actually can capture the SELECT statements too), however that brings unnecessary overhead.

The following query will get the current SQL statement in SQL Server 2005.

SELECT [text]

FROM sys.dm_exec_requests AS R

CROSS APPLY sys.dm_exec_sql_text(plan_handle) AS S

WHERE session_id = @@SPID;

However, when executed in a trigger body it simply returns the CREATE TRIGGER statement, not the actual modification SQL statement. A better approach that works is using DBCC INPUTBUFFER. The only tricky part is that the DBCC output cannot be directly saved to a table. The workaround is to use INSERT EXEC to execute DBCC INPUTBUFFER as dynamic SQL and insert the results to a table. Here is the complete solution.

-- SQL log table

CREATE TABLE SQLLog (

 language_event NVARCHAR(100),

 parameters INT,

 event_info NVARCHAR(4000),

 event_time DATETIME DEFAULT CURRENT_TIMESTAMP);

 

-- Sample table to audit actions for

CREATE TABLE Foo (

 keycol INT PRIMARY KEY,

 datacol CHAR(1));

 

-- Sample data

INSERT INTO Foo VALUES (1, 'a');

INSERT INTO Foo VALUES (2, 'b');

INSERT INTO Foo VALUES (3, 'c');

 

GO

 

-- Audit trigger

CREATE TRIGGER LogMySQL

ON Foo

AFTER INSERT, UPDATE, DELETE

AS

 INSERT INTO SQLLog (language_event, parameters, event_info)

 EXEC('DBCC INPUTBUFFER(@@SPID);');

GO

 

-- Perform some logged actions

GO

 

INSERT INTO Foo VALUES (4, 'd');

 

GO

 

DELETE Foo

WHERE keycol = 1;

 

GO

 

UPDATE Foo

SET datacol = 'f'

WHERE keycol = 2;

 

GO

 

-- Perform non-logged action

-- SELECT cannot be logged

SELECT datacol

FROM Foo

WHERE keycol = 4;

 

GO

 

-- Check what we have in the log

SELECT event_info, event_time

FROM SQLLog;

 

/*

 

-- Results

 

event_info                                        event_time

--------------------------------                 -----------------------

INSERT INTO Foo VALUES (4, 'd');                 2008-04-24 22:24:31.153

DELETE Foo WHERE keycol = 1;                     2008-04-24 22:24:31.170

UPDATE Foo SET datacol = 'f' WHERE keycol = 2;    2008-04-24 22:24:31.170

 

*/

It is good to note here that DBCC INPUTBUFFER requires the user executing to be member of the sysadmin fixed server role. One way to handle this is to specify user or login with sufficient privileges in EXEC (by using EXEC AS LOGIN or USER:

INSERT INTO SQLLog (language_event, parameters, event_info)

EXEC('DBCC INPUTBUFFER(@@SPID);') AS LOGIN = 'admin_login';

Table Valued Parameters

Passing multiple rows of data to stored procedures has always intrigued application developers. Many different methods have been used, more evolving around XML or some form of concatenated list. Those approaches require a method for parsing before the data can be used in normalized table format.

SQL Server 2008 adds another option: table valued parameters. This allows sending a table as parameter to the stored procedure. The current implementation still has shortcomings because it is a read-only parameter, and it cannot be used as output parameter.

Here is one example to demonstrate using table valued parameters to pass a table as stored procedure input.

CREATE TABLE Loans (

 loan_nbr INT PRIMARY KEY,

 loan_date DATETIME,

 loan_amount DECIMAL(15, 2));

 

-- User-defined table type

CREATE TYPE LoanTable

AS TABLE (

 loan_nbr INT PRIMARY KEY,

 loan_date DATETIME,

 loan_amount DECIMAL(15, 2));

 

GO

 

-- Procedure with table valued parameter

-- Must use the READONLY clause

CREATE PROCEDURE InsertLoans

 @Loans LoanTable READONLY

AS

 

 INSERT INTO Loans

 SELECT loan_nbr, loan_date, loan_amount

 FROM @Loans;

 

GO

 

-- Declare table variable of the new type

DECLARE @NewLoans LoanTable;

 

-- Initialize the table variable with data

INSERT INTO @NewLoans

VALUES (1, '20080101', 10000.00),

       (2, '20080203', 15000.00),

       (3, '20080315', 25000.00);

 

 

-- Insert new loans using the table variable as parameter

EXEC InsertLoans @Loans = @NewLoans;

The really cool part here is that table valued parameters are fully supported on the client side via ADO.NET (.NET Framework 3.5 or later) . That means we could easily have a data bound control like grid on the client side that can be used to feed a table parameter to update database table via a stored procedure. Here is a look at what that client code will be.

DataTable loans;

 

loans = new DataTable();

loans.Columns.Add("loan_nbr", typeof(int));

loans.Columns.Add("loan_date", typeof(System.DateTime));

loans.Columns.Add("loan_amount", typeof(decimal));

 

using(SqlCommand cmd = new SqlCommand("InsertLoans", conn)

{

  cmd.CommandType = CommandType.StoredProcedure;

  cmd.Parameters.AddWithValue("Loans", loans);

  cmd.ExecuteNonQuery();

}

Resources:

Table-Valued Parameters
http://msdn2.microsoft.com/en-us/library/bb510489(SQL.100).aspx

Filtered Indexes

Filtered indexes are another welcome addition in SQL Server 2008. They allow creating an index filtered with WHERE clause. While this was doable in SQL Server 2005 using indexed views on the table, the natural approach of directly defining a filtered index is much more appealing.

Below are some examples of using filtered indexes. The first one demonstrate how filtered index will assist on commonly searched values allowing to use INDEX SEEK for those, and a TABLE SCAN for less frequently searched values. The other example implements a very common request to define unique values for column by allowing multiple NULL values (using UNIQUE index allows only a single NULL value).

CREATE TABLE Regions (

 region_cd CHAR(2),

 region_name VARCHAR(35),

 region_phone VARCHAR(12) NULL);

 

INSERT INTO Regions VALUES ('NE', 'North East', NULL),

                           ('SW', 'South West', NULL),

                           ('SE', 'South East', '902-202-1234');

 

-- Index to filter on frequently queried values

CREATE NONCLUSTERED INDEX ix_SouthEastRegion

  ON Regions (region_cd)

  INCLUDE(region_name, region_phone)

  WHERE region_cd = 'SE';

 

GO

SET SHOWPLAN_TEXT ON;

GO

 

-- Typical user query

SELECT region_cd, region_name, region_phone

FROM Regions

WHERE region_cd = 'SE';

 

/*

 

-- Execution plan

StmtText

--------------------------------------------------------------

  |--Index Seek(OBJECT:([Testing].[dbo].[Regions].[ix_SouthEastRegion]),

           SEEK:([Testing].[dbo].[Regions].[region_cd]='SE') ORDERED FORWARD)

 

*/

 

-- Less frequent user query

SELECT region_cd, region_name, region_phone

FROM Regions

WHERE region_cd = 'NE';

 

/*

 

-- Execution plan

StmtText

--------------------------------------------------------------

  |--Table Scan(OBJECT:([Testing].[dbo].[Regions]),

           WHERE:([Testing].[dbo].[Regions].[region_cd]='NE'))

 

*/

 

GO

SET SHOWPLAN_TEXT OFF;

GO

 

-- Guarantee unique values excluding NULLs

CREATE UNIQUE NONCLUSTERED INDEX ix_RegionPhone

  ON Regions (region_phone)

  WHERE region_phone IS NOT NULL;

 

-- OK, multiple NULLs allowed because filtered out from the UNIQUE index

INSERT INTO Regions VALUES ('NW', 'North West', NULL); 

 

-- Attempt to insert duplicate non NULL value

INSERT INTO Regions VALUES ('NW', 'North West', '902-202-1234'); 

 

/*

 

-- Error

Msg 2601, Level 14, State 1, Line 11

Cannot insert duplicate key row in object 'dbo.Regions' with unique index 'ix_RegionPhone'.

The statement has been terminated.

 

*/

Composable DML

Another powerful T-SQL enhancement in SQL Server 2008 is the ability to write composable DML. It allows to consume the OUTPUT clause result set and to feed it as source for a query.

Here is one example to demonstrate that. The scenario is to merge data from daily updated branch office table to central office static table. Additional requirement is to store auditing data for any changes in the company name column. In prior versions of SQL Server this would require a couple SQL statements to accomplish. In SQL Server 2008 the new MERGE statement allows to perform all merge actions in a single statement, and then feed via the OUTPUT clause data for the auditing table. That way a single SQL statement completes the task.

-- Create central accounts static table

CREATE TABLE CentralOfficeAccounts (

 account_nbr INT PRIMARY KEY,

 company_name VARCHAR(35),

 primary_contact VARCHAR(35),

 contact_phone VARCHAR(12));

 

-- Daily updated branch data

CREATE TABLE BranchOfficeAccounts (

 account_nbr INT PRIMARY KEY,

 company_name VARCHAR(35),

 primary_contact VARCHAR(35),

 contact_phone VARCHAR(12));

 

-- Insert sample data

INSERT INTO CentralOfficeAccounts

VALUES (1, 'Bayside Motors', 'Jim Smith', '902-203-1234'),

       (2, 'Dallas Industries', 'Joe Doe', '301-663-9134'),

       (3, 'Sky Computer Systems', 'Jane Brown', '201-943-6053');

 

INSERT INTO BranchOfficeAccounts

VALUES (2, 'Dallas Industries, Inc.', 'Rick Gross', '301-663-9134'),    -- changed

       (3, 'Sky Computer Systems', 'Jane Brown', '201-943-6053'),       -- same

       (4, 'Auto Insurance Co.', 'Chris Jefferson', '313-601-6201');    -- new

 

-- Table for auditing

CREATE TABLE AccountsAudit (

 account_nbr INT,

 change_action NVARCHAR(10),

 change_date DATETIME DEFAULT CURRENT_TIMESTAMP,

 old_company_name VARCHAR(35),

 new_company_name VARCHAR(35),

 PRIMARY KEY(account_nbr, change_action, change_date));

 

-- Single SQL statement performing the merge and auditing actions

INSERT INTO AccountsAudit

 (account_nbr, change_action, old_company_name, new_company_name)

SELECT account_nbr, merge_action, old_company_name, new_company_name

FROM (MERGE INTO CentralOfficeAccounts AS C   

      USING BranchOfficeAccounts AS B       

        ON C.account_nbr = B.account_nbr

      WHEN MATCHED                       

       AND C.company_name <> B.company_name THEN                       

           UPDATE SET C.company_name = B.company_name

      WHEN NOT MATCHED THEN           

           INSERT (account_nbr, company_name,

                   primary_contact, contact_phone)

           VALUES (B.account_nbr, B.company_name,

                   B.primary_contact, B.contact_phone)

    WHEN SOURCE NOT MATCHED THEN       

          DELETE

    OUTPUT $action,

            COALESCE(inserted.account_nbr, deleted.account_nbr),

            deleted.company_name,

            inserted.company_name

    ) AS T(merge_action, account_nbr, old_company_name, new_company_name);

 

SELECT account_nbr, change_action, change_date,

       old_company_name, new_company_name

FROM AccountsAudit;

 

/*

 

-- Results

account_nbr change_action change_date old_company_name   new_company_name

----------- ------------- ----------- ------------------ -----------------------

1           DELETE        2008-04-15 Bayside Motors     NULL

2           UPDATE        2008-04-15 Dallas Industries Dallas Industries, Inc.

4           INSERT        2008-04-15 NULL               Auto Insurance Co.

 

*/

Note:

This code has been tested with SQL Server 2008 CTP 6 (February 2008). As of SQL Server 2008 Release Candidate 0 (June 2008) the clause “WHEN SOURCE NOT MATCHED” has been replaced with “WHEN NOT MATCHED BY SOURCE”. This makes the wording clearer. It is good to note that this clause of MERGE is not standard.

Delighters in SQL Server 2008

Two of the new T-SQL enhancements in SQL Server 2008 are the inline variable initialization and compound assignment (also called delighters). We had those for a long time in languages like C, C++, C#, VB, and they are just coming to T-SQL. While not of significant value, it will make writing code easier and more compact.

Here are a couple examples to illustrate their use.

-- Declare and initialize variable

DECLARE @count INT = 1;

 

-- Compound assignments

SET @count += 1;

SET @count /= 2;

SET @count *= 5;

SET @count %= 3;

SET @count -= 1;

 

SELECT @count;

 

/*

 

-- Result

1

 

*/

 

-- Create sample table for employee pay rates

CREATE TABLE PayRates (

 employee_nbr INT PRIMARY KEY,

 pay_rate DECIMAL(15, 2),

 performance_score INT);

 

-- Insert data using the new row constructors

INSERT INTO PayRates VALUES (1, 40.00, 5), (2, 45.50, 4), (3, 39.50, 6);

 

-- Apply pay rate increase

-- Compound assignments can be used with columns on the right side

UPDATE PayRates

SET pay_rate *= performance_score * 0.25;

 

SELECT employee_nbr, pay_rate, performance_score

FROM PayRates;

 

-- Declare variable and initialize using query

DECLARE @max_pay_rate DECIMAL(15, 2) = (SELECT MAX(pay_rate)

                                        FROM PayRates);

 

SELECT @max_pay_rate AS max_pay_rate;

Convert Hex String to Binary String

Here is another use of table with numbers and concatenation using FOR XML PATH. Given a hexadecimal string value like ‘7FE0’, convert it to the binary representation ‘0111111111100000’. The table with numbers generated by the CTE is used to slice the hex string, then replace it with the corresponding binary value and concatenate using FOR XML PATH with blank element.

CREATE FUNCTION dbo.Hex2Bin(@hex VARCHAR(256))

RETURNS VARCHAR(1024)

AS

BEGIN

 

DECLARE @bin VARCHAR(1024);

 

SET @bin = '';

 

WITH Converter (hex, bin)

AS

(SELECT '0', '0000' UNION ALL

 SELECT '1', '0001' UNION ALL

 SELECT '2', '0010' UNION ALL

 SELECT '3', '0011' UNION ALL

 SELECT '4', '0100' UNION ALL

 SELECT '5', '0101' UNION ALL

 SELECT '6', '0110' UNION ALL

 SELECT '7', '0111' UNION ALL

 SELECT '8', '1000' UNION ALL

 SELECT '9', '1001' UNION ALL

 SELECT 'A', '1010' UNION ALL

 SELECT 'B', '1011' UNION ALL

 SELECT 'C', '1100' UNION ALL

 SELECT 'D', '1101' UNION ALL

 SELECT 'E', '1110' UNION ALL

 SELECT 'F', '1111'),

N1 (n) AS (SELECT 1 UNION ALL SELECT 1),

N2 (n) AS (SELECT 1 FROM N1 AS X, N1 AS Y),

N3 (n) AS (SELECT 1 FROM N2 AS X, N2 AS Y),

N4 (n) AS (SELECT ROW_NUMBER() OVER(ORDER BY X.n)

           FROM N3 AS X, N3 AS Y)

SELECT @bin = (SELECT REPLACE(SUBSTRING(@hex, n, 1), hex, bin)

               FROM N4 AS Nums

               JOIN Converter AS C

                ON SUBSTRING(@hex, n, 1) = hex

               WHERE n <= LEN(@hex)

               FOR XML PATH(''));

 

RETURN @bin;

 

END

 

GO

 

SELECT dbo.Hex2Bin('7FE0') AS bin;

 

-- Result

bin

----------------

0111111111100000

Obscure Real Data for Testing

Very often we have a good set of production data but it cannot be used for training or testing because of data confidentiality. And the solution sometimes is just to tweak pieces of the data so it cannot be linked to the original data. Here is one method for obscure data in SQL Server 2005. It is based on a couple techniques:

– Using table with numbers (here generated on the fly by cross joining common table expressions)

– Using the table with numbers to slice each value into individual characters

– Generating random values in the normal character range that will be used to replace the real characters

– Using FOR XML PATH with empty element to concatenate back the obscured value.

CREATE TABLE Patients (

 pname VARCHAR(35),

 pname_masked VARCHAR(35));

 

INSERT INTO Patients VALUES('John Doe', NULL);

INSERT INTO Patients VALUES('Jeff Smith', NULL);

 

WITH

N1 (n) AS (SELECT 1 UNION ALL SELECT 1),

N2 (n) AS (SELECT 1 FROM N1 AS X, N1 AS Y),

N3 (n) AS (SELECT 1 FROM N2 AS X, N2 AS Y),

N4 (n) AS (SELECT ROW_NUMBER()

           OVER(ORDER BY X.n) FROM N3 AS X, N3 AS Y)

UPDATE Patients

SET pname_masked =

    CAST(

    CAST((SELECT CASE

                  WHEN SUBSTRING(pname, n, 1) = ' '

                  THEN ' '

                  ELSE CHAR(x + ASCII(SUBSTRING(pname, n, 1)))

                END

          FROM N4 AS Nums

          CROSS APPLY

          (SELECT TOP(1) T.n AS x

           FROM N4 AS T

           WHERE T.n + ASCII(UPPER(SUBSTRING(pname, Nums.n, 1))) BETWEEN 65 AND 90

           ORDER BY CHECKSUM(NEWID())) AS N

          WHERE Nums.n <= LEN(pname)

          FOR XML PATH('')) AS XML) AS VARCHAR(35));

 

SELECT pname, pname_masked

FROM Patients;

 

-- Results

 

pname        pname_masked

------------ -------------

John Doe    Xwlv Lyi

Jeff Smith   Qmol Yuyvs