Grouping with ROW_NUMBER

The new ranking functions in SQL Server 2005 have great impact on how problems can be solved. Here is just another example of that.

Given a table with events at different venues (all event dates are unique), find the duration intervals of start/end date of uninterrupted performances at a given venue. Solving this without the use of ROW_NUMBER would require a lot more complex SQL, not to mention that this solution has excellent performance. Running a test with 10 years of random sample data completes on average 15 milliseconds.

-- Create the sample events table

CREATE TABLE Events

(event_date DATETIME NOT NULL PRIMARY KEY,

 event_venue VARCHAR(20) NOT NULL);

 

-- Insert venue event dates

INSERT INTO Events VALUES ('20080101', 'The Palace');

INSERT INTO Events VALUES ('20080201', 'The Palace');

INSERT INTO Events VALUES ('20080301', 'The Palace');

INSERT INTO Events VALUES ('20080401', 'The Palace');

INSERT INTO Events VALUES ('20080501', 'The Palace');

INSERT INTO Events VALUES ('20080601', 'Fox Theater');

INSERT INTO Events VALUES ('20080701', 'Fox Theater');

INSERT INTO Events VALUES ('20080801', 'Grand Hall');

INSERT INTO Events VALUES ('20080901', 'Grand Hall');

INSERT INTO Events VALUES ('20081001', 'Grand Hall');

INSERT INTO Events VALUES ('20081101', 'The Palace');

INSERT INTO Events VALUES ('20071201', 'The River Place');

INSERT INTO Events VALUES ('20081202', 'The River Place');

 

-- Group the event period dates at each venue

SELECT MIN(event_venue) AS venue,

       MIN(event_date) AS venue_start_date,

       MAX(event_date) AS venue_end_date

FROM (

SELECT event_venue , event_date ,

       ROW_NUMBER() OVER (ORDER BY event_venue, event_date) -

       ROW_NUMBER() OVER (PARTITION BY event_venue

                          ORDER BY event_date),

       ROW_NUMBER() OVER (ORDER BY event_date) -

       ROW_NUMBER() OVER (PARTITION BY event_venue

                          ORDER BY event_date)

FROM Events) AS X(event_venue, event_date, grp1, grp2)

GROUP BY grp1, grp2;

 

/*

 

-- Results

 

venue                venue_start_date venue_end_date

-------------------- ----------------- --------------

The River Place     2007-12-01        2007-12-01

The Palace           2008-01-01        2008-05-01

Fox Theater         2008-06-01        2008-07-01

The Palace           2008-11-01        2008-11-01

Grand Hall           2008-08-01        2008-10-01

The River Place     2008-12-02        2008-12-02

 

*/

2 replies
  1. Naomi
    Naomi says:

    This solution does not work for the first record. In other words, change your input data to be

    — Create the sample events table
    CREATE TABLE Events
    (event_date DATETIME NOT NULL PRIMARY KEY,
    event_venue VARCHAR(20) NOT NULL);

    — Insert venue event dates
    INSERT INTO Events VALUES ('20080101', 'The Palace');
    INSERT INTO Events VALUES ('20080401', 'The Palace');
    INSERT INTO Events VALUES ('20080501', 'The Palace');
    INSERT INTO Events VALUES ('20080601', 'Fox Theater');
    INSERT INTO Events VALUES ('20080701', 'Fox Theater');
    INSERT INTO Events VALUES ('20080801', 'Grand Hall');
    INSERT INTO Events VALUES ('20080901', 'Grand Hall');
    INSERT INTO Events VALUES ('20081001', 'Grand Hall');
    INSERT INTO Events VALUES ('20081101', 'The Palace');
    INSERT INTO Events VALUES ('20071201', 'The River Place');
    INSERT INTO Events VALUES ('20081202', 'The River Place');

    — Group the event period dates at each venue
    SELECT MIN(event_venue) AS venue,
    MIN(event_date) AS venue_start_date,
    MAX(event_date) AS venue_end_date
    FROM (
    SELECT event_venue , event_date ,
    ROW_NUMBER() OVER (ORDER BY event_venue, event_date) –
    ROW_NUMBER() OVER (PARTITION BY event_venue
    ORDER BY event_date),
    ROW_NUMBER() OVER (ORDER BY event_date) –
    ROW_NUMBER() OVER (PARTITION BY event_venue
    ORDER BY event_date)
    FROM Events) AS X(event_venue, event_date, grp1, grp2)
    GROUP BY grp1, grp2;

    /

    — Results

    venue venue_start_date venue_end_date
    ——————– —————– ————–
    The River Place 2007-12-01 2007-12-01
    The Palace 2008-01-01 2008-05-01
    Fox Theater 2008-06-01 2008-07-01
    The Palace 2008-11-01 2008-11-01
    Grand Hall 2008-08-01 2008-10-01
    The River Place 2008-12-02 2008-12-02

    /

    You can see that now the result is wrong for the Palace

    Reply
  2. Plamen Ratchev
    Plamen Ratchev says:

    Hi Naomi, the solution works exactly as it is supposed to work, even with your input. The goal here is to find uninterrupted performance dates, which means the performances at one venue without having performance at another venue in between. It is not to find performances without gaps of dates. In your input all is correct for The Palace because there is no performance at another venue between the dates of 20080101 and 20080401. To test this insert another row:

    INSERT INTO Events VALUES ('20080201', 'Another Venue');

    Then the result looks like this:

    venue venue_start_date venue_end_date
    ——————– ———————– ———————–
    The River Place 2007-12-01 00:00:00.000 2007-12-01 00:00:00.000
    The Palace 2008-01-01 00:00:00.000 2008-01-01 00:00:00.000
    Another Venue 2008-02-01 00:00:00.000 2008-02-01 00:00:00.000
    The Palace 2008-04-01 00:00:00.000 2008-05-01 00:00:00.000
    Fox Theater 2008-06-01 00:00:00.000 2008-07-01 00:00:00.000
    Grand Hall 2008-08-01 00:00:00.000 2008-10-01 00:00:00.000
    The Palace 2008-11-01 00:00:00.000 2008-11-01 00:00:00.000
    The River Place 2008-12-02 00:00:00.000 2008-12-02 00:00:00.000

    Reply

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Your email address will not be published. Required fields are marked *