Pattern Matching and Event Frequencies

I have a table that contains events of various types:

create table events (
  event_id   integer     not null,
  event_type varchar2(2) not null,
  event_time date        not null,
  -- some other columns with the event details
);

The events may arrive sporadically, but many times multiple consecutive events of the same type arrive in a fixed frequency (e.g., “events of type A happen every 10 minutes between 8:10am and 8:50am”).

I want to list all the event times in a specific period for every event type.

This is a simple query, but there are many results, so it’s not very easy to read them:

SQL> break on event_type skip 1 dup
SQL> select event_type,
  2         event_time
  3  from   events
  4  where  event_time > timestamp'2022-07-29 08:00:00'
  5  order  by event_type,
  6            event_time;

EVENT
TYPE  EVENT_TIME
----- -------------------
A     2022-07-29 08:10:00
A     2022-07-29 08:20:00
A     2022-07-29 08:30:00
A     2022-07-29 08:40:00
A     2022-07-29 08:50:00
A     2022-07-29 08:58:00
A     2022-07-29 09:06:00
A     2022-07-29 09:14:00
A     2022-07-29 09:22:00
A     2022-07-29 09:33:00
A     2022-07-29 09:42:00
A     2022-07-29 09:52:00
A     2022-07-29 10:02:00
A     2022-07-29 10:12:00
A     2022-07-29 10:22:00
A     2022-07-29 10:32:00
A     2022-07-29 10:42:00
A     2022-07-29 10:52:00
A     2022-07-29 10:59:00
A     2022-07-29 11:03:00
A     2022-07-29 11:07:00
A     2022-07-29 11:11:00
A     2022-07-29 11:21:00
A     2022-07-29 11:31:00
A     2022-07-29 11:41:00
A     2022-07-29 11:51:00
A     2022-07-29 12:01:00

B     2022-07-29 08:15:00
B     2022-07-29 08:35:00
B     2022-07-29 08:55:00
B     2022-07-29 09:15:00
B     2022-07-29 09:35:00
B     2022-07-29 09:55:00
B     2022-07-29 10:10:00
B     2022-07-29 10:25:00
B     2022-07-29 10:50:00
B     2022-07-29 11:15:00
B     2022-07-29 11:40:00
B     2022-07-29 12:05:00

C     2022-07-29 08:02:00
C     2022-07-29 08:04:00
C     2022-07-29 08:06:00
C     2022-07-29 08:08:00
C     2022-07-29 08:10:00
C     2022-07-29 08:12:00
C     2022-07-29 08:14:00
C     2022-07-29 08:16:00
C     2022-07-29 08:18:00
C     2022-07-29 08:20:00
C     2022-07-29 08:22:00
C     2022-07-29 08:24:00
C     2022-07-29 08:26:00
C     2022-07-29 08:28:00
C     2022-07-29 08:30:00
C     2022-07-29 08:32:00
C     2022-07-29 08:49:00
C     2022-07-29 08:51:00
...

114 rows selected.

Because this list contains batches of events with a fixed frequency, we can change the query to get a nicer output – with a single line for every batch.
Identifying these batches is easy with the MATCH_RECOGNIZE clause.

Let’s see first the output of the query:

EVENT
TYPE  DATE         EVENT_TIMES
----- ------------ -------------------------------------------
A     2022-07-29   08:10 - 08:50 every 10 minutes (5 events)
A     2022-07-29   08:58 - 09:22 every 8 minutes (4 events)
A     2022-07-29   09:33
A     2022-07-29   09:42 - 10:52 every 10 minutes (8 events)
A     2022-07-29   10:59 - 11:11 every 4 minutes (4 events)
A     2022-07-29   11:21 - 12:01 every 10 minutes (5 events)

B     2022-07-29   08:15 - 09:55 every 20 minutes (6 events)
B     2022-07-29   10:10
B     2022-07-29   10:25 - 12:05 every 25 minutes (5 events)

C     2022-07-29   08:02 - 08:32 every 2 minutes (16 events)
C     2022-07-29   08:49 - 09:05 every 2 minutes (9 events)
C     2022-07-29   09:08 - 09:20 every 4 minutes (4 events)
C     2022-07-29   09:22 - 10:16 every 2 minutes (28 events)
C     2022-07-29   11:46 - 12:20 every 2 minutes (18 events)


14 rows selected.

Much nicer, isn’t it? 🙂

Here is the query:

SQL> select event_type,
  2         to_char(from_time,'yyyy-mm-dd') "DATE",
  3         to_char(from_time,'hh24:mi') ||
  4           nvl2(to_time,
  5                ' - ' || to_char(to_time,'hh24:mi')
  6                      || ' every ' || minutes || ' minutes ('
  7                      || number_of_events || ' events)',
  8                null) event_times
  9  from (
 10    select event_type,event_time
 11    from   events
 12    where  event_time > timestamp'2022-07-29 08:00:00'
 13  ) match_recognize (
 14      partition by event_type
 15      order by event_time
 16      measures X.event_time from_time,
 17               last(Y.event_time) to_time,
 18               (first(Y.event_time)-X.event_time)*24*60 minutes,
 19               count(*) number_of_events
 20      pattern (X Y{2,} | X)
 21      define
 22        Y as event_time-prev(event_time) = first(Y.event_time)-X.event_time
 23  )
 24  order by event_type, from_time;

The original query [lines 10-12] is the input to the MATCH_RECOGNIZE clause [lines 13-23].

We partition the input record-set by EVENT_TYPE [line 14], as we want to get a separate list for each type.

MATCH_RECOGNIZE is used for finding consecutive rows that match some pattern. “Consecutive” means we have to define the order of the input record-set (in each partition). In this case, we order the events of each type by EVENT_TIME [line 15].

Now, let’s define the pattern [lines 20-22]. We’re looking for at least 3 consecutive events – an X record followed by two or more Y records [“X Y{2,}”].
Any event is potentially an X record, so I didn’t define it explicitly.
Y is explicitly defined [line 22] as a record that its time difference from the previous record [event_time-prev(event_time)] is the same as the time difference between the first event [X] and the second event [the first Y] in the current batch [first(Y.event_time)-X.event_time].
If we don’t find such a batch of events [X followed by two or more Ys], we have a single event to return – X. This is expressed by the pattern “X Y{2,} | X” [line 20].

For every match of this pattern, 4 expressions are calculated [lines 16-19]:
FROM_TIME: the time of the first event in the match [line 16]
TO_TIME: the time of the last event in the match [line 17]
MINUTES: the time difference between every two consecutive events [line 18]
NUMBER_OF_EVENTS: the total number of events in the match [line 19]

Finally, we display the results returned by the MATCH_RECOGNIZE operator, using the expressions calculated in the MEASURES clause [lines 1-8].

The same technique can be applied for different tables. See the next post for a generalized solution, using a Table SQL Macro.

Leave a Reply

Your email address will not be published. Required fields are marked *