February 2008 - Tangra Inc.

Shortest Path for Friend Connections

February 20, 2008/5 Comments/in SQL Server/by Plamen Ratchev

An interesting problem to solve is finding relation paths in contact management systems. This is a limited case of the Dijkstra’s algorithm for finding the shortest path in a graph. Here we always have cost of 1 for each path and it is a two-way relation between the nodes. To put this in human readable format, the problem is to find the distance between friends, based on relationships defined. If A is friend with B and then B is friend with C, the path between A and C is A->B->C with distance 2.

Here is one solution using recursive CTEs in SQL Server 2005. The method is based on building relationship paths between all connected nodes and then searching the path for friend connections. If this searching if frequent, then the path can be materialized in a column.

-- Sample table with data
CREATE TABLE Contacts (
 c_from CHAR(1), 
 c_to CHAR(1),
 PRIMARY KEY (c_from, c_to));
 
INSERT    INTO Contacts
SELECT    'A', 'B' UNION ALL
SELECT    'B', 'D' UNION ALL
SELECT    'C', 'A' UNION ALL
SELECT    'C', 'E' UNION ALL
SELECT    'G', 'C' UNION ALL
SELECT    'B', 'G' UNION ALL
SELECT    'F', 'D' UNION ALL
SELECT    'E', 'F';
 
-- Table to store paths
CREATE TABLE Paths (c_path VARCHAR(200) PRIMARY KEY);
 
-- Recursive CTE to populate the paths
WITH PathCTE 
AS
(SELECT c_from, c_to,
        CAST('.' + CAST(c_from AS VARCHAR(8)) + '.' + 
        CAST(c_to AS VARCHAR(8)) + '.' AS VARCHAR(200)) AS c_path
 FROM Contacts AS C1
 UNION ALL
 SELECT C.c_from, C.c_to,
        CAST(P.c_path + C.c_to + '.' AS VARCHAR(200))
 FROM PathCTE AS P
 JOIN Contacts AS C
   ON P.c_to = C.c_from
 WHERE P.c_path NOT LIKE '%.' + 
                    CAST(C.c_from AS VARCHAR(10)) + 
                    '.' + 
                    CAST(C.c_to AS VARCHAR(10)) + 
                    '.%')
INSERT INTO Paths
SELECT c_path FROM PathCTE;
 
-- Show all paths between B and D
SELECT c_path
FROM Paths
WHERE c_path LIKE '.B.%'
  AND c_path LIKE '%.D.';
 
-- Shortest path distance, longest path distance, and number of paths
SELECT MIN(LEN(c_path) - LEN(REPLACE(c_path, '.', '')) - 2) AS shortest_distance,
       MAX(LEN(c_path) - LEN(REPLACE(c_path, '.', '')) - 2) AS longest_distance,
       COUNT(*) AS paths_cnt
FROM Paths
WHERE c_path LIKE '.B.%'
  AND c_path LIKE '%.D.';
 
-- Results
c_path
--------------
.B.D.
.B.G.C.A.B.D.
.B.G.C.E.F.D.
 
 
shortest_distance  longest_distance  paths_cnt
-----------------  ----------------  -----------
1                  5                 3

It is good to note that this method does not make effort to avoid reusing paths to reach a destination. If needed this can be handled by additional condition in the recursive CTE.

Column Alias Based on Variable

February 18, 2008/9 Comments/in SQL Server/by Plamen Ratchev

Although formatting output belongs to the reporting or front-end interface, occasionally there could be the need to change a column alias based on variable. Since T-SQL does not support proving variable as column alias in a query, here are two methods to handle that.

CREATE TABLE Foo (
 keycol INT PRIMARY KEY,
 datacol CHAR(1))
 
INSERT INTO Foo VALUES (1, 'a')
INSERT INTO Foo VALUES (2, 'b')
 
DECLARE @column_alias VARCHAR(30)
SET @column_alias = 'new_title'
 
-- 1). Using dynamic SQL
DECLARE @sql NVARCHAR(200)
 
SET @sql = 'SELECT keycol, datacol AS ' + @column_alias + ' FROM Foo'
 
EXEC sp_executesql @sql
 
-- 2). Using results table and renaming the column
CREATE TABLE Results (
 keycol INT PRIMARY KEY,
 datacol CHAR(1)) 
 
INSERT INTO Results
SELECT keycol, datacol
FROM Foo
 
EXEC sp_rename 'Results.datacol', @column_alias, 'COLUMN'
 
SELECT * FROM Results

Performing UPSERT in T-SQL

February 13, 2008/0 Comments/in SQL Server/by Plamen Ratchev

Very often there is the need to check if a key value exists to perform an update, and if it does not exist to insert new data. The upcoming SQL Server 2008 provides the MERGE statement (MERGE actually allows to do more: simultaneous UPDATE, INSERT and/or DELETE operations on the table), but until it is released we have to wait.

Here is just one way to implement in the current versions of T-SQL.

CREATE TABLE Foo (
 keycol INT PRIMARY KEY,
 datacol CHAR(1) NOT NULL);
 
-- Sample data
INSERT INTO Foo VALUES (1, 'a');
INSERT INTO Foo VALUES (2, 'b');
INSERT INTO Foo VALUES (4, 'd');
 
-- New values to insert/update
DECLARE @key INT;
DECLARE @data CHAR(1);
 
-- New key, will perform insert
SET @key = 3;
SET @data = 'c';
 
BEGIN TRAN
 
-- Try update
UPDATE Foo WITH (SERIALIZABLE)
SET datacol = @data
WHERE keycol = @key;
 
-- If no rows updated then must be new value, perform insert
IF @@ROWCOUNT = 0 
INSERT INTO Foo VALUES (@key, @data);
 
COMMIT
 
-- Existing key, will perform update
SET @key = 4;
SET @data = 'x';
 
BEGIN TRAN
 
-- Try update
UPDATE Foo WITH (SERIALIZABLE)
SET datacol = @data
WHERE keycol = @key;
 
-- If no rows updated then must be new value, perform insert
IF @@ROWCOUNT = 0 
INSERT INTO Foo VALUES (@key, @data);
 
COMMIT
 
SELECT keycol, datacol 
FROM Foo;

The SERIALIZABLE hint here is very important to avoid deadlocks.

Shortest Path for Friend Connections

Column Alias Based on Variable

Performing UPSERT in T-SQL

SITE MAP

SEARCH