March 2008 - Tangra Inc.

UPSERT and More with MERGE

March 25, 2008/3 Comments/in SQL Server/by Plamen Ratchev

One very common requirement is to merge source data to a target table. For example, merge the daily changes to accounts data from branch office table to central accounts table. On SQL Server version 2005 and prior this was accomplished using separate INSERT and UPDATE statements. This involved checking if key column exists to perform update and insert if not, or attempt an update first and then if not rows were affected perform insert. Not to mention if we have to check if account is missing from the branch office data and needs to be deleted from the central accounts table. That way we end up with multiple (sometimes complex) statements to implement one transaction, accessing both source and target tables multiple times.

SQL Server 2008 offers a lot more elegant way using the MERGE statement (MERGE is supported by ANSI SQL). It makes data merging very simple and elegant, as well as efficient.

Here is the base syntax for MERGE:

[ WITH <common_table_expression> [,...n] ]
MERGE 
[ TOP ( expression ) [ PERCENT ] ] 
[ INTO ] target_table [ [ AS ] table_alias ]
        [ WITH ( <merge_hint> ) ]
USING <table_source>
ON <search_condition>
[ WHEN MATCHED [ AND <search_condition> ]
        THEN <merge_matched> ]
[ WHEN [TARGET] NOT MATCHED [ AND <search_condition> ]
        THEN <merge_not_matched> ]
[ WHEN SOURCE NOT MATCHED [ AND <search_condition> ]
        THEN <merge_matched> ]
<output_clause>
[ OPTION ( <query_hint> [ ,...n ] ) ]    
;

Below is the initial setup for the source and target tables.

-- Static table as target for merging data
CREATE TABLE CentralOfficeAccounts (
 account_nbr INT PRIMARY KEY,
 company_name VARCHAR(35),
 primary_contact VARCHAR(35),
 contact_phone VARCHAR(12));
 
-- Dynamic table with daily updates to be merged
CREATE TABLE BranchOfficeAccounts (
 account_nbr INT PRIMARY KEY,
 company_name VARCHAR(35),
 primary_contact VARCHAR(35),
 contact_phone VARCHAR(12));
 
-- Sample centrall office static data 
INSERT INTO CentralOfficeAccounts
VALUES (1, 'Bayside Motors', 'Jim Smith', '902-203-1234'),
       (2, 'Dallas Industries', 'Joe Doe', '301-663-9134'),
       (3, 'Sky Computer Systems', 'Jane Brown', '201-943-6053');
 
-- Daily updated branch office data
INSERT INTO BranchOfficeAccounts
VALUES (2, 'Dallas Industries, Inc.', 'Rick Gross', '301-663-9134'), -- changed
       (3, 'Sky Computer Systems', 'Jane Brown', '201-943-6053'),    -- same
       (4, 'Auto Insurance Co.', 'Chris Jefferson', '313-601-6201'); -- new

Here is how MERGE can be used to perform an update on existing accounts and insert of new accounts.

-- Update existing and add missing
MERGE INTO CentralOfficeAccounts AS C    -- Target
USING BranchOfficeAccounts AS B          -- Source
   ON C.account_nbr = B.account_nbr
WHEN MATCHED THEN                        -- On match update
     UPDATE SET C.company_name = B.company_name,
                C.primary_contact = B.primary_contact,
                C.contact_phone = B.contact_phone
WHEN NOT MATCHED THEN                    -- Add missing
     INSERT (account_nbr, company_name, primary_contact, contact_phone)
     VALUES (B.account_nbr, B.company_name, B.primary_contact, B.contact_phone);
 
SELECT account_nbr, company_name, primary_contact, contact_phone 
FROM CentralOfficeAccounts; 
 
/*
 
-- Results:
 
account_nbr company_name             primary_contact  contact_phone
----------- ------------------------ ---------------- -------------
1           Bayside Motors           Jim Smith        902-203-1234
2           Dallas Industries, Inc.  Rick Gross       301-663-9134
3           Sky Computer Systems     Jane Brown       201-943-6053
4           Auto Insurance Co.       Chris Jefferson  313-601-6201
 
*/

Next step is to enhance the statement including predicates to check and update only accounts that have changed. The only difference here is the additional conditions in the MATCHED clause, results will be the same as before.

-- Update existing that changed and add missing
-- Use of predicates 
MERGE INTO CentralOfficeAccounts AS C   -- Target
USING BranchOfficeAccounts AS B         -- Source
   ON C.account_nbr = B.account_nbr
WHEN MATCHED                            -- On match update
 AND (C.company_name <> B.company_name  -- Additional search conditions
   OR C.primary_contact <> B.primary_contact
   OR C.contact_phone <> B.contact_phone) THEN                        
     UPDATE SET C.company_name = B.company_name,
                C.primary_contact = B.primary_contact,
                C.contact_phone = B.contact_phone
WHEN NOT MATCHED THEN                   -- Add missing
     INSERT (account_nbr, company_name, primary_contact, contact_phone)
     VALUES (B.account_nbr, B.company_name, B.primary_contact, B.contact_phone);

Going one step further, let’s add the option to delete accounts that do not exist in the source table from the target table.

--Update existing that changed and add missing, delete missing in source
MERGE INTO CentralOfficeAccounts AS C   -- Target
USING BranchOfficeAccounts AS B         -- Source
   ON C.account_nbr = B.account_nbr
WHEN MATCHED                            -- On match update
 AND (C.company_name <> B.company_name  -- Additional search conditions
   OR C.primary_contact <> B.primary_contact
   OR C.contact_phone <> B.contact_phone) THEN                        
     UPDATE SET C.company_name = B.company_name,
                C.primary_contact = B.primary_contact,
                C.contact_phone = B.contact_phone
WHEN NOT MATCHED THEN                   -- Add missing
     INSERT (account_nbr, company_name, primary_contact, contact_phone)
     VALUES (B.account_nbr, B.company_name, B.primary_contact, B.contact_phone)
WHEN SOURCE NOT MATCHED THEN            -- Delete missing from source
     DELETE;
 
SELECT account_nbr, company_name, primary_contact, contact_phone 
FROM CentralOfficeAccounts; 
 
/*
 
-- Results:
 
account_nbr company_name             primary_contact  contact_phone
----------- ------------------------ ---------------- -------------
2           Dallas Industries, Inc.  Rick Gross       301-663-9134
3           Sky Computer Systems     Jane Brown       201-943-6053
4           Auto Insurance Co.       Chris Jefferson  313-601-6201
 
*/

There is a lot more to MERGE, but this just shows the tremendous power it provides for merging data.

Note:

This code has been tested with SQL Server 2008 CTP 6 (February 2008). As of SQL Server 2008 Release Candidate 0 (June 2008) the clause “WHEN SOURCE NOT MATCHED” has been replaced with “WHEN NOT MATCHED BY SOURCE”. This makes the wording clearer. It is good to note that this clause of MERGE is not standard.

References:

MERGE (Transact-SQL)
http://technet.microsoft.com/en-us/library/bb510625(SQL.100).aspx

Row Concatenation with FOR XML PATH

March 22, 2008/8 Comments/in SQL Server, XML/by Plamen Ratchev

Many times it is needed for reporting purposes to summarize normalized data into groups or lists of values. This is also known as rows concatenation. Some reporting and client side tools support this directly. Here is one approach to solve this in T-SQL using FOR XML PATH.

CREATE TABLE Products (
 sku INT PRIMARY KEY,
 product_desc VARCHAR(35));
 
CREATE TABLE Departments (
 department_nbr INT PRIMARY KEY,
 department_title VARCHAR(35));
 
CREATE TABLE DepartmentProducts (
 department_nbr INT 
   REFERENCES Departments (department_nbr),
 sku INT 
   REFERENCES Products (sku),
 PRIMARY KEY (department_nbr, sku));
 
INSERT INTO Products VALUES (1, 'Book');
INSERT INTO Products VALUES (2, 'Magazine');
INSERT INTO Products VALUES (3, 'DVD');
INSERT INTO Products VALUES (4, 'Video');
INSERT INTO Products VALUES (5, 'CD');
INSERT INTO Products VALUES (6, 'Map');
 
INSERT INTO Departments VALUES (1, 'Reference');
INSERT INTO Departments VALUES (2, 'Periodicals');
INSERT INTO Departments VALUES (3, 'Entertainment');
 
INSERT INTO DepartmentProducts VALUES (1, 1);
INSERT INTO DepartmentProducts VALUES (1, 6);
INSERT INTO DepartmentProducts VALUES (2, 2);
INSERT INTO DepartmentProducts VALUES (3, 3);
INSERT INTO DepartmentProducts VALUES (3, 4);
INSERT INTO DepartmentProducts VALUES (3, 5);
 
-- Using correlated subquery 
SELECT D.department_nbr,
       D.department_title,
       STUFF((SELECT ',' + product_desc
              FROM DepartmentProducts AS DP
              JOIN Products AS P
                ON P.sku = DP.sku
              WHERE DP.department_nbr = D.department_nbr
              ORDER BY product_desc
              FOR XML PATH('')), 1, 1, '') AS product_list
FROM Departments AS D;
 
-- Using CROSS APPLY
SELECT D.department_nbr,
       D.department_title,
       STUFF(P.product_list, 1, 1, '') AS product_list
FROM Departments AS D
CROSS APPLY (SELECT ',' + product_desc
             FROM DepartmentProducts AS DP
             JOIN Products AS P
               ON P.sku = DP.sku
             WHERE DP.department_nbr = D.department_nbr
             ORDER BY product_desc
             FOR XML PATH('')) AS P (product_list);
 
-- Results             
department_nbr department_title product_list
-------------- ---------------- ------------
1              Reference        Book,Map
2              Periodicals      Magazine
3              Entertainment    CD,DVD,Video

While this method is often called the XML blackbox method, the explanation of this effect using FOR XML PATH is simple. Normally the PATH clause is used with input string that indicates the name of the wrapper element that will be created. However, using the PATH clause with empty string as input results in skipping the wrapper element generation. And since the content is retrieved as text it achieves the effect of concatenation.

Resources:

What’s New in FOR XML in Microsoft SQL Server 2005
http://msdn2.microsoft.com/en-us/library/ms345137.aspx

Grouping with ROW_NUMBER

March 14, 2008/2 Comments/in SQL Server/by Plamen Ratchev

The new ranking functions in SQL Server 2005 have great impact on how problems can be solved. Here is just another example of that.

Given a table with events at different venues (all event dates are unique), find the duration intervals of start/end date of uninterrupted performances at a given venue. Solving this without the use of ROW_NUMBER would require a lot more complex SQL, not to mention that this solution has excellent performance. Running a test with 10 years of random sample data completes on average 15 milliseconds.

-- Create the sample events table
CREATE TABLE Events
(event_date DATETIME NOT NULL PRIMARY KEY,
 event_venue VARCHAR(20) NOT NULL);
 
-- Insert venue event dates
INSERT INTO Events VALUES ('20080101', 'The Palace');
INSERT INTO Events VALUES ('20080201', 'The Palace');
INSERT INTO Events VALUES ('20080301', 'The Palace');
INSERT INTO Events VALUES ('20080401', 'The Palace');
INSERT INTO Events VALUES ('20080501', 'The Palace');
INSERT INTO Events VALUES ('20080601', 'Fox Theater');
INSERT INTO Events VALUES ('20080701', 'Fox Theater');
INSERT INTO Events VALUES ('20080801', 'Grand Hall');
INSERT INTO Events VALUES ('20080901', 'Grand Hall');
INSERT INTO Events VALUES ('20081001', 'Grand Hall');
INSERT INTO Events VALUES ('20081101', 'The Palace');
INSERT INTO Events VALUES ('20071201', 'The River Place');
INSERT INTO Events VALUES ('20081202', 'The River Place');
 
-- Group the event period dates at each venue
SELECT MIN(event_venue) AS venue, 
       MIN(event_date) AS venue_start_date, 
       MAX(event_date) AS venue_end_date
FROM (
SELECT event_venue , event_date ,
       ROW_NUMBER() OVER (ORDER BY event_venue, event_date) -
       ROW_NUMBER() OVER (PARTITION BY event_venue
                          ORDER BY event_date),
       ROW_NUMBER() OVER (ORDER BY event_date) -
       ROW_NUMBER() OVER (PARTITION BY event_venue
                          ORDER BY event_date)
FROM Events) AS X(event_venue, event_date, grp1, grp2)
GROUP BY grp1, grp2;
 
/*
 
-- Results
 
venue                venue_start_date  venue_end_date
-------------------- ----------------- --------------
The River Place      2007-12-01        2007-12-01
The Palace           2008-01-01        2008-05-01
Fox Theater          2008-06-01        2008-07-01
The Palace           2008-11-01        2008-11-01
Grand Hall           2008-08-01        2008-10-01
The River Place      2008-12-02        2008-12-02
 
*/

Updates with CTE

March 12, 2008/7 Comments/in SQL Server/by Plamen Ratchev

Performing updates on columns based on values from another table is a very common need. Using the ANSI UPDATE normally requires multiple subqueries, which can be very inefficient especially if multiple filters have to be applied. The Microsoft specific UPDATE with JOIN is one solution. However, common table expressions provide a very elegant alternative, which has the same efficient plan as UPDATE with JOIN, but is much easier to read and maintain. The sample below demonstrates how to perform update based on another table using join and CTE. Using MERGE in SQL Server 2008 will make it even better.

CREATE TABLE Products (
 sku CHAR(5) NOT NULL PRIMARY KEY,
 product_desc VARCHAR(35) NOT NULL,
 price DECIMAL(12, 2) NOT NULL DEFAULT 0);
 
CREATE TABLE ProductUpdates (
 sku CHAR(5) NOT NULL PRIMARY KEY,
 product_desc VARCHAR(35) NOT NULL,
 price DECIMAL(12, 2) NOT NULL DEFAULT 0,
 effective_date DATETIME NOT NULL);
 
INSERT INTO Products VALUES ('CHS01', 'Child seat', 25.50);
INSERT INTO Products VALUES ('CUP03', 'Water cup', 5.25);
INSERT INTO Products VALUES ('HOL01', 'Cup holder', 3.50);
 
INSERT INTO ProductUpdates VALUES ('CHS01', 'Child seat with cushion', 26.95, '20080301');
INSERT INTO ProductUpdates VALUES ('CUP03', 'Water cup with handle', 6.25, '20080405');
 
-- Update all current product descriptions and prices
-- with updates that have effective date past today (March 12, 2008)
WITH Updates 
AS
(SELECT P.product_desc,
        P.price,
        U.product_desc AS new_product_desc,
        U.price AS new_price
 FROM Products AS P
 JOIN ProductUpdates AS U
   ON P.sku = U.sku
 WHERE U.effective_date < CURRENT_TIMESTAMP)
UPDATE Updates
SET product_desc = new_product_desc,
    price = new_price;
 
SELECT sku, product_desc, price
FROM Products;
 
-- Results
sku   product_desc            price
----- ----------------------- -----
CHS01 Child seat with cushion 26.95
CUP03 Water cup                5.25
HOL01 Cup holder               3.50

Extracting List Item from Delimited String

March 3, 2008/1 Comment/in CLR, SQL Server/by Plamen Ratchev

Performing string manipulations is not the best use of T-SQL. But there are a lot of operations that can be done using pure SQL. Here is one example. Given an input string that has list elements separated by delimiter, slice the list into individual element, and provided index parameter return the matching item based on index position in the list, NULL if the index was not found, or the entire string if the input index is 0. For this method we need utility table with numbers.

-- Create sample utility table with numbers
CREATE TABLE Numbers (
 n INT PRIMARY KEY)
 
INSERT INTO Numbers
SELECT number
FROM master..spt_values
WHERE type = 'P'
  AND number BETWEEN 1 AND 100
 
GO
 
-- Extract function to perform the string split
CREATE FUNCTION dbo.Extract(
 @string VARCHAR(200),
 @delimiter CHAR(1),
 @idx INT)
RETURNS VARCHAR(200)
AS
BEGIN
 
RETURN
 CASE @idx
 WHEN 0 THEN @string
 ELSE
 (SELECT string
  FROM
      (SELECT SUBSTRING(@string, n,
              CHARINDEX( @delimiter, @string + @delimiter, n ) - n ),
              n + 1 - LEN(REPLACE(LEFT(@string, n), @delimiter, ''))
       FROM Numbers
       WHERE SUBSTRING(@delimiter + @string, n, 1) = @delimiter
         AND n < LEN(@string) + 1) AS T(string, idx)
  WHERE idx = @idx)
 END
 
END
GO
 
-- Test
DECLARE @foo VARCHAR(100)
SET @foo = 'Joe*Doe*123 Main Street'
 
SELECT dbo.Extract(@foo, '*', 2) -- returns 'Doe'
SELECT dbo.Extract(@foo, '*', 3) -- returns '123 Main Street'
SELECT dbo.Extract(@foo, '*', 0) -- returns entire string
SELECT dbo.Extract(@foo, '*', 9) -- returns null

If running SQL Server 2005 this can be accomplished much easier using CLR:

using System.Collections;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
 
public class CLRSplit
{
    [Microsoft.SqlServer.Server.SqlFunction()]
    public static string extract(string str, char delimiter, int index)
    {
        if (index == 0)
        { return str; }
        try
        {
            string[] list = new string[100];
            list = str.Split(new char[] { delimiter });
            return list[index - 1];
        }
        catch
        {
            return null;
        }
    }
}

UPSERT and More with MERGE

Row Concatenation with FOR XML PATH

Grouping with ROW_NUMBER

Updates with CTE

Extracting List Item from Delimited String

SITE MAP

SEARCH