# Shortest Path for Friend Connections

An interesting problem to solve is finding relation paths in contact management systems. This is a limited case of the Dijkstra’s algorithm for finding the shortest path in a graph. Here we always have cost of 1 for each path and it is a two-way relation between the nodes. To put this in human readable format, the problem is to find the distance between friends, based on relationships defined. If A is friend with B and then B is friend with C, the path between A and C is A->B->C with distance 2.

Here is one solution using recursive CTEs in SQL Server 2005. The method is based on building relationship paths between all connected nodes and then searching the path for friend connections. If this searching if frequent, then the path can be materialized in a column.

`-- Sample table with dataCREATE TABLE Contacts ( c_from CHAR(1),  c_to CHAR(1), PRIMARY KEY (c_from, c_to)); INSERT    INTO ContactsSELECT    'A', 'B' UNION ALLSELECT    'B', 'D' UNION ALLSELECT    'C', 'A' UNION ALLSELECT    'C', 'E' UNION ALLSELECT    'G', 'C' UNION ALLSELECT    'B', 'G' UNION ALLSELECT    'F', 'D' UNION ALLSELECT    'E', 'F'; -- Table to store pathsCREATE TABLE Paths (c_path VARCHAR(200) PRIMARY KEY); -- Recursive CTE to populate the pathsWITH PathCTE AS(SELECT c_from, c_to,        CAST('.' + CAST(c_from AS VARCHAR(8)) + '.' +         CAST(c_to AS VARCHAR(8)) + '.' AS VARCHAR(200)) AS c_path FROM Contacts AS C1 UNION ALL SELECT C.c_from, C.c_to,        CAST(P.c_path + C.c_to + '.' AS VARCHAR(200)) FROM PathCTE AS P JOIN Contacts AS C   ON P.c_to = C.c_from WHERE P.c_path NOT LIKE '%.' +                     CAST(C.c_from AS VARCHAR(10)) +                     '.' +                     CAST(C.c_to AS VARCHAR(10)) +                     '.%')INSERT INTO PathsSELECT c_path FROM PathCTE; -- Show all paths between B and DSELECT c_pathFROM PathsWHERE c_path LIKE '.B.%'  AND c_path LIKE '%.D.'; -- Shortest path distance, longest path distance, and number of pathsSELECT MIN(LEN(c_path) - LEN(REPLACE(c_path, '.', '')) - 2) AS shortest_distance,       MAX(LEN(c_path) - LEN(REPLACE(c_path, '.', '')) - 2) AS longest_distance,       COUNT(*) AS paths_cntFROM PathsWHERE c_path LIKE '.B.%'  AND c_path LIKE '%.D.'; -- Resultsc_path--------------.B.D..B.G.C.A.B.D..B.G.C.E.F.D.  shortest_distance  longest_distance  paths_cnt-----------------  ----------------  -----------1                  5                 3`

It is good to note that this method does not make effort to avoid reusing paths to reach a destination. If needed this can be handled by additional condition in the recursive CTE.

5 replies
1. Secure Place says:

Hi, could you give us an example that is able to deal with relations like in a real database, in detail:

A,C
C,A
A,B
B,A
A,D
D,A

Path from B->D should be B->A->D

The Problem is, it seems with your sql script there is an endless loop 🙁 But I think you will find a way to solve this issue.
Everything works well if I do not have duplicated entries in the “from” field but this is a must have…

2. Ajay Sharma says:

awesome post..but i guess its not connecting properly….could you edit the query ..enabling it to consider all the possible connection…right now if A->B & and A->C then its not considering the fact that B->C .

except it a very nice post ..u r a true sql master.

regards
Ajay

3. Plamen Ratchev says:

Hi secure place,

Did you try the query with the sample data you provided? Here are the results if you have not:

c_path
———–
.B.A.C.A.D.
.B.A.D.

The query does not go into endless loop because of the condition in the recursive member of the CTE. The condition WHERE P.c_path NOT LIKE … prevent the cycles in the path. Since your data has two valid distinct paths that connect B and D you see both.

4. Plamen Ratchev says:

Hi Ajay,

The case here was to demonstrate directed connections. If you want to consider all possible combinations (in all directions), that will increase a lot your result set. It is very easy to achieve this, simply add one more query to the CTE to create all possible combinations in both directions (transitive closure for undirected cyclic graph).

WITH AllContacts (c_from, c_to)
AS
(SELECT c_from, c_to
FROM Contacts
UNION ALL
SELECT c_to, c_from
FROM Contacts),
PathCTE
AS
(SELECT c_from, c_to,
CAST(‘.’ + CAST(c_from AS VARCHAR(8)) + ‘.’ +
CAST(c_to AS VARCHAR(8)) + ‘.’ AS VARCHAR(200)) AS c_path
FROM AllContacts AS C1
UNION ALL
SELECT C.c_from, C.c_to,
CAST(P.c_path + C.c_to + ‘.’ AS VARCHAR(200))
FROM PathCTE AS P
JOIN AllContacts AS C
ON P.c_to = C.c_from
WHERE P.c_path NOT LIKE ‘%.’ +
CAST(C.c_from AS VARCHAR(10)) + ‘.’ +
CAST(C.c_to AS VARCHAR(10)) + ‘.%’)
INSERT INTO Paths
SELECT c_path FROM PathCTE;

5. Anonymous says:

Hello,

I know this post is very old but I need to understand something :

SELECT C.c_from, C.c_to,

CAST(P.c_path + C.c_to + '.' AS VARCHAR(200))

FROM PathCTE AS P

JOIN Contacts AS C

ON P.c_to = C.c_from

I don't understand how P.c_path can and where it is defined.

Second, is P one step begind the current select result ?

P.c_to = C.c_from

c_to is before c_from, no ?

I don't understand everything out, I am a noob.