I have a table that contains events of various types:
create table events (
event_id integer not null,
event_type varchar2(2) not null,
event_time date not null,
-- some other columns with the event details
);
The events may arrive sporadically, but many times multiple consecutive events of the same type arrive in a fixed frequency (e.g., “events of type A happen every 10 minutes between 8:10am and 8:50am”).
I want to list all the event times in a specific period for every event type.
This is a simple query, but there are many results, so it’s not very easy to read them:
SQL> break on event_type skip 1 dup
SQL> select event_type,
2 event_time
3 from events
4 where event_time > timestamp'2022-07-29 08:00:00'
5 order by event_type,
6 event_time;
EVENT
TYPE EVENT_TIME
----- -------------------
A 2022-07-29 08:10:00
A 2022-07-29 08:20:00
A 2022-07-29 08:30:00
A 2022-07-29 08:40:00
A 2022-07-29 08:50:00
A 2022-07-29 08:58:00
A 2022-07-29 09:06:00
A 2022-07-29 09:14:00
A 2022-07-29 09:22:00
A 2022-07-29 09:33:00
A 2022-07-29 09:42:00
A 2022-07-29 09:52:00
A 2022-07-29 10:02:00
A 2022-07-29 10:12:00
A 2022-07-29 10:22:00
A 2022-07-29 10:32:00
A 2022-07-29 10:42:00
A 2022-07-29 10:52:00
A 2022-07-29 10:59:00
A 2022-07-29 11:03:00
A 2022-07-29 11:07:00
A 2022-07-29 11:11:00
A 2022-07-29 11:21:00
A 2022-07-29 11:31:00
A 2022-07-29 11:41:00
A 2022-07-29 11:51:00
A 2022-07-29 12:01:00
B 2022-07-29 08:15:00
B 2022-07-29 08:35:00
B 2022-07-29 08:55:00
B 2022-07-29 09:15:00
B 2022-07-29 09:35:00
B 2022-07-29 09:55:00
B 2022-07-29 10:10:00
B 2022-07-29 10:25:00
B 2022-07-29 10:50:00
B 2022-07-29 11:15:00
B 2022-07-29 11:40:00
B 2022-07-29 12:05:00
C 2022-07-29 08:02:00
C 2022-07-29 08:04:00
C 2022-07-29 08:06:00
C 2022-07-29 08:08:00
C 2022-07-29 08:10:00
C 2022-07-29 08:12:00
C 2022-07-29 08:14:00
C 2022-07-29 08:16:00
C 2022-07-29 08:18:00
C 2022-07-29 08:20:00
C 2022-07-29 08:22:00
C 2022-07-29 08:24:00
C 2022-07-29 08:26:00
C 2022-07-29 08:28:00
C 2022-07-29 08:30:00
C 2022-07-29 08:32:00
C 2022-07-29 08:49:00
C 2022-07-29 08:51:00
...
114 rows selected.
Because this list contains batches of events with a fixed frequency, we can change the query to get a nicer output – with a single line for every batch.
Identifying these batches is easy with the MATCH_RECOGNIZE clause.
Let’s see first the output of the query:
EVENT
TYPE DATE EVENT_TIMES
----- ------------ -------------------------------------------
A 2022-07-29 08:10 - 08:50 every 10 minutes (5 events)
A 2022-07-29 08:58 - 09:22 every 8 minutes (4 events)
A 2022-07-29 09:33
A 2022-07-29 09:42 - 10:52 every 10 minutes (8 events)
A 2022-07-29 10:59 - 11:11 every 4 minutes (4 events)
A 2022-07-29 11:21 - 12:01 every 10 minutes (5 events)
B 2022-07-29 08:15 - 09:55 every 20 minutes (6 events)
B 2022-07-29 10:10
B 2022-07-29 10:25 - 12:05 every 25 minutes (5 events)
C 2022-07-29 08:02 - 08:32 every 2 minutes (16 events)
C 2022-07-29 08:49 - 09:05 every 2 minutes (9 events)
C 2022-07-29 09:08 - 09:20 every 4 minutes (4 events)
C 2022-07-29 09:22 - 10:16 every 2 minutes (28 events)
C 2022-07-29 11:46 - 12:20 every 2 minutes (18 events)
14 rows selected.
Much nicer, isn’t it? 🙂
Here is the query:
SQL> select event_type,
2 to_char(from_time,'yyyy-mm-dd') "DATE",
3 to_char(from_time,'hh24:mi') ||
4 nvl2(to_time,
5 ' - ' || to_char(to_time,'hh24:mi')
6 || ' every ' || minutes || ' minutes ('
7 || number_of_events || ' events)',
8 null) event_times
9 from (
10 select event_type,event_time
11 from events
12 where event_time > timestamp'2022-07-29 08:00:00'
13 ) match_recognize (
14 partition by event_type
15 order by event_time
16 measures X.event_time from_time,
17 last(Y.event_time) to_time,
18 (first(Y.event_time)-X.event_time)*24*60 minutes,
19 count(*) number_of_events
20 pattern (X Y{2,} | X)
21 define
22 Y as event_time-prev(event_time) = first(Y.event_time)-X.event_time
23 )
24 order by event_type, from_time;
The original query [lines 10-12] is the input to the MATCH_RECOGNIZE clause [lines 13-23].
We partition the input record-set by EVENT_TYPE [line 14], as we want to get a separate list for each type.
MATCH_RECOGNIZE is used for finding consecutive rows that match some pattern. “Consecutive” means we have to define the order of the input record-set (in each partition). In this case, we order the events of each type by EVENT_TIME [line 15].
Now, let’s define the pattern [lines 20-22]. We’re looking for at least 3 consecutive events – an X record followed by two or more Y records [“X Y{2,}”].
Any event is potentially an X record, so I didn’t define it explicitly.
Y is explicitly defined [line 22] as a record that its time difference from the previous record [event_time-prev(event_time)] is the same as the time difference between the first event [X] and the second event [the first Y] in the current batch [first(Y.event_time)-X.event_time].
If we don’t find such a batch of events [X followed by two or more Ys], we have a single event to return – X. This is expressed by the pattern “X Y{2,} | X” [line 20].
For every match of this pattern, 4 expressions are calculated [lines 16-19]:
FROM_TIME: the time of the first event in the match [line 16]
TO_TIME: the time of the last event in the match [line 17]
MINUTES: the time difference between every two consecutive events [line 18]
NUMBER_OF_EVENTS: the total number of events in the match [line 19]
Finally, we display the results returned by the MATCH_RECOGNIZE operator, using the expressions calculated in the MEASURES clause [lines 1-8].
The same technique can be applied for different tables. See the next post for a generalized solution, using a Table SQL Macro.