I have a table that contains events of various types:
create table events ( event_id integer not null, event_type varchar2(2) not null, event_time date not null, -- some other columns with the event details );
The events may arrive sporadically, but many times multiple consecutive events of the same type arrive in a fixed frequency (e.g., “events of type A happen every 10 minutes between 8:10am and 8:50am”).
I want to list all the event times in a specific period for every event type.
This is a simple query, but there are many results, so it’s not very easy to read them:
SQL> break on event_type skip 1 dup SQL> select event_type, 2 event_time 3 from events 4 where event_time > timestamp'2022-07-29 08:00:00' 5 order by event_type, 6 event_time; EVENT TYPE EVENT_TIME ----- ------------------- A 2022-07-29 08:10:00 A 2022-07-29 08:20:00 A 2022-07-29 08:30:00 A 2022-07-29 08:40:00 A 2022-07-29 08:50:00 A 2022-07-29 08:58:00 A 2022-07-29 09:06:00 A 2022-07-29 09:14:00 A 2022-07-29 09:22:00 A 2022-07-29 09:33:00 A 2022-07-29 09:42:00 A 2022-07-29 09:52:00 A 2022-07-29 10:02:00 A 2022-07-29 10:12:00 A 2022-07-29 10:22:00 A 2022-07-29 10:32:00 A 2022-07-29 10:42:00 A 2022-07-29 10:52:00 A 2022-07-29 10:59:00 A 2022-07-29 11:03:00 A 2022-07-29 11:07:00 A 2022-07-29 11:11:00 A 2022-07-29 11:21:00 A 2022-07-29 11:31:00 A 2022-07-29 11:41:00 A 2022-07-29 11:51:00 A 2022-07-29 12:01:00 B 2022-07-29 08:15:00 B 2022-07-29 08:35:00 B 2022-07-29 08:55:00 B 2022-07-29 09:15:00 B 2022-07-29 09:35:00 B 2022-07-29 09:55:00 B 2022-07-29 10:10:00 B 2022-07-29 10:25:00 B 2022-07-29 10:50:00 B 2022-07-29 11:15:00 B 2022-07-29 11:40:00 B 2022-07-29 12:05:00 C 2022-07-29 08:02:00 C 2022-07-29 08:04:00 C 2022-07-29 08:06:00 C 2022-07-29 08:08:00 C 2022-07-29 08:10:00 C 2022-07-29 08:12:00 C 2022-07-29 08:14:00 C 2022-07-29 08:16:00 C 2022-07-29 08:18:00 C 2022-07-29 08:20:00 C 2022-07-29 08:22:00 C 2022-07-29 08:24:00 C 2022-07-29 08:26:00 C 2022-07-29 08:28:00 C 2022-07-29 08:30:00 C 2022-07-29 08:32:00 C 2022-07-29 08:49:00 C 2022-07-29 08:51:00 ... 114 rows selected.
Because this list contains batches of events with a fixed frequency, we can change the query to get a nicer output – with a single line for every batch.
Identifying these batches is easy with the MATCH_RECOGNIZE clause.
Let’s see first the output of the query:
EVENT TYPE DATE EVENT_TIMES ----- ------------ ------------------------------------------- A 2022-07-29 08:10 - 08:50 every 10 minutes (5 events) A 2022-07-29 08:58 - 09:22 every 8 minutes (4 events) A 2022-07-29 09:33 A 2022-07-29 09:42 - 10:52 every 10 minutes (8 events) A 2022-07-29 10:59 - 11:11 every 4 minutes (4 events) A 2022-07-29 11:21 - 12:01 every 10 minutes (5 events) B 2022-07-29 08:15 - 09:55 every 20 minutes (6 events) B 2022-07-29 10:10 B 2022-07-29 10:25 - 12:05 every 25 minutes (5 events) C 2022-07-29 08:02 - 08:32 every 2 minutes (16 events) C 2022-07-29 08:49 - 09:05 every 2 minutes (9 events) C 2022-07-29 09:08 - 09:20 every 4 minutes (4 events) C 2022-07-29 09:22 - 10:16 every 2 minutes (28 events) C 2022-07-29 11:46 - 12:20 every 2 minutes (18 events) 14 rows selected.
Much nicer, isn’t it? 🙂
Here is the query:
SQL> select event_type, 2 to_char(from_time,'yyyy-mm-dd') "DATE", 3 to_char(from_time,'hh24:mi') || 4 nvl2(to_time, 5 ' - ' || to_char(to_time,'hh24:mi') 6 || ' every ' || minutes || ' minutes (' 7 || number_of_events || ' events)', 8 null) event_times 9 from ( 10 select event_type,event_time 11 from events 12 where event_time > timestamp'2022-07-29 08:00:00' 13 ) match_recognize ( 14 partition by event_type 15 order by event_time 16 measures X.event_time from_time, 17 last(Y.event_time) to_time, 18 (first(Y.event_time)-X.event_time)*24*60 minutes, 19 count(*) number_of_events 20 pattern (X Y{2,} | X) 21 define 22 Y as event_time-prev(event_time) = first(Y.event_time)-X.event_time 23 ) 24 order by event_type, from_time;
The original query [lines 10-12] is the input to the MATCH_RECOGNIZE clause [lines 13-23].
We partition the input record-set by EVENT_TYPE [line 14], as we want to get a separate list for each type.
MATCH_RECOGNIZE is used for finding consecutive rows that match some pattern. “Consecutive” means we have to define the order of the input record-set (in each partition). In this case, we order the events of each type by EVENT_TIME [line 15].
Now, let’s define the pattern [lines 20-22]. We’re looking for at least 3 consecutive events – an X record followed by two or more Y records [“X Y{2,}”].
Any event is potentially an X record, so I didn’t define it explicitly.
Y is explicitly defined [line 22] as a record that its time difference from the previous record [event_time-prev(event_time)] is the same as the time difference between the first event [X] and the second event [the first Y] in the current batch [first(Y.event_time)-X.event_time].
If we don’t find such a batch of events [X followed by two or more Ys], we have a single event to return – X. This is expressed by the pattern “X Y{2,} | X” [line 20].
For every match of this pattern, 4 expressions are calculated [lines 16-19]:
FROM_TIME: the time of the first event in the match [line 16]
TO_TIME: the time of the last event in the match [line 17]
MINUTES: the time difference between every two consecutive events [line 18]
NUMBER_OF_EVENTS: the total number of events in the match [line 19]
Finally, we display the results returned by the MATCH_RECOGNIZE operator, using the expressions calculated in the MEASURES clause [lines 1-8].
The same technique can be applied for different tables. See the next post for a generalized solution, using a Table SQL Macro.