## Shortest Path for Friend Connections

An interesting problem to solve is finding relation paths in contact management systems. This is a limited case of the Dijkstra’s algorithm for finding the shortest path in a graph. Here we always have cost of 1 for each path and it is a two-way relation between the nodes. To put this in human readable format, the problem is to find the distance between friends, based on relationships defined. If A is friend with B and then B is friend with C, the path between A and C is A->B->C with distance 2.

Here is one solution using recursive CTEs in SQL Server 2005. The method is based on building relationship paths between all connected nodes and then searching the path for friend connections. If this searching if frequent, then the path can be materialized in a column.

-- Sample table with data

CREATE TABLE Contacts (

c_from CHAR(1),

c_to CHAR(1),

PRIMARY KEY (c_from, c_to));

INSERT INTO Contacts

SELECT 'A', 'B' UNION ALL

SELECT 'B', 'D' UNION ALL

SELECT 'C', 'A' UNION ALL

SELECT 'C', 'E' UNION ALL

SELECT 'G', 'C' UNION ALL

SELECT 'B', 'G' UNION ALL

SELECT 'F', 'D' UNION ALL

SELECT 'E', 'F';

-- Table to store paths

CREATE TABLE Paths (c_path VARCHAR(200) PRIMARY KEY);

-- Recursive CTE to populate the paths

WITH PathCTE

AS

(SELECT c_from, c_to,

CAST('.' + CAST(c_from AS VARCHAR(8)) + '.' +

CAST(c_to AS VARCHAR(8)) + '.' AS VARCHAR(200)) AS c_path

FROM Contacts AS C1

UNION ALL

SELECT C.c_from, C.c_to,

CAST(P.c_path + C.c_to + '.' AS VARCHAR(200))

FROM PathCTE AS P

JOIN Contacts AS C

ON P.c_to = C.c_from

WHERE P.c_path NOT LIKE '%.' +

CAST(C.c_from AS VARCHAR(10)) +

'.' +

CAST(C.c_to AS VARCHAR(10)) +

'.%')

INSERT INTO Paths

SELECT c_path FROM PathCTE;

-- Show all paths between B and D

SELECT c_path

FROM Paths

WHERE c_path LIKE '.B.%'

AND c_path LIKE '%.D.';

-- Shortest path distance, longest path distance, and number of paths

SELECT MIN(LEN(c_path) - LEN(REPLACE(c_path, '.', '')) - 2) AS shortest_distance,

MAX(LEN(c_path) - LEN(REPLACE(c_path, '.', '')) - 2) AS longest_distance,

COUNT(*) AS paths_cnt

FROM Paths

WHERE c_path LIKE '.B.%'

AND c_path LIKE '%.D.';

-- Results

c_path

--------------

.B.D.

.B.G.C.A.B.D.

.B.G.C.E.F.D.

shortest_distance longest_distance paths_cnt

----------------- ---------------- -----------

1 5 3

It is good to note that this method does not make effort to avoid reusing paths to reach a destination. If needed this can be handled by additional condition in the recursive CTE.