Aggregation Filters

Overview

The FILTER clause is a nice addition to the SQL syntax in Oracle 26ai (23.26.1). It allows to restrict the rows considered by aggregate functions.

An aggregate query divides a set of records into groups, and returns a single result row for each group.
By default, the input for every aggregate function in the query includes all the records of the group.
However, sometimes we want the input to a specific aggregate function in the query to be only a subset of the group’s records. The FILTER clause provides an elegant solution for this requirement.

Example

As an example, I’ll use the EUROVISION_WINNERS table that I created for my Pattern Matching presentation.
This table contains a record for every year of the Eurovision Song Contest, with some details about the winner of that year.

SQL> select * from eurovision_winners order by year;

I want to analyze the languages of the winning songs. For each language, I’d like to get: Continue reading “Aggregation Filters”

Mixed Feelings About MERGE

I have a love-hate relationship with the MERGE SQL statement.
Well, maybe it’s a bit harsh. Let’s make it a like-dislike relationship.

Why I Like It

Because it provides an elegant way to perform an “insert if not exists, update if exists” logic in a single statement. It has a rich syntax with various options for different use cases. One of the very first posts I wrote in this blog, in 2013, was a primer of the MERGE statement.

Why I Dislike It

I don’t like how it deals with concurrent transactions that try to handle the same key.
Let me show you what I mean exactly.

Demo Setup

I have a table t, with a primary key on the id column. Continue reading “Mixed Feelings About MERGE”

Happy New Square Year

At a family dinner a few days ago, we realized that the upcoming year – 2025 – is a square number 🤩
Promptly the kids checked and announced that it’s the only year in the 21st century that is a square number.
We wondered how many square years there are per century over time, and, as always, such questions are answered using a simple SQL query.

with
   n as (select 30 centuries from dual),
   c as (select rownum century from n connect by level <= n.centuries),
   y as (select rownum*rownum year from n connect by level * level <= n.centuries*100),
   yc as (select y.year, to_number(to_char(to_date(y.year,'yyyy'),'cc')) century from y)
select
  c.century,
  count(yc.year) number_of_sqaure_years,
  decode(count(yc.year), 1, any_value(yc.year)) the_only_sqaure_in_the_century
from c, yc
where yc.century(+) = c.century
group by c.century
order by c.century;

   CENTURY NUMBER_OF_SQAURE_YEARS THE_ONLY_SQAURE_IN_THE_CENTURY
---------- ---------------------- ------------------------------
         1                     10
         2                      4
         3                      3
         4                      3
         5                      2
         6                      2
         7                      2
         8                      2
         9                      2
        10                      1                            961
        11                      2
        12                      1                           1156
        13                      2
        14                      1                           1369
        15                      1                           1444
        16                      2
        17                      1                           1681
        18                      1                           1764
        19                      1                           1849
        20                      1                           1936
        21                      1                           2025 -- We Are Here!
        22                      1                           2116
        23                      1                           2209
        24                      1                           2304
        25                      2
        26                      0
        27                      1                           2601
        28                      1                           2704
        29                      1                           2809
        30                      1                           2916

30 rows selected.

Looking at the first 30 centuries, the 10th century was the first to have only one square year. Since the 17th century, each century has exactly one square year, except for the 25th that has two and the 26th that has none.

Happy new year!

Optimization of NOT NULL Constraint Creation

Several years ago I wrote a series of 8 posts about constraint creation optimization. I think it’s time to add some more posts to the series.
I showed there, among other things, that Oracle does a nice optimization – that may save plenty of time – when we add in a single ALTER TABLE statement a new (nullable with no default value) column and a corresponding:
– inline (a.k.a. “column-level”) check constraint
or
– foreign key constraint (either inline or out-of-line)

What about NOT NULL constraints?

Oracle has a special optimization for the case of adding a column with a NOT NULL constraint (and no default value).
It checks if the table contains at least one record. If it does, then we cannot add the new column, becuase the NOT NULL constraint will be violated for all the existing rows. If the table doesn’t contain any records, the new column can be added.

Let’s start with an empty table: Continue reading “Optimization of NOT NULL Constraint Creation”

DECODE With Ranges

Tim Hall wrote this post yesterday (as part of Joel Kallman Day 2022) with some examples for “the old way” vs. “the new way” to do some stuff.
One of these examples is DECODE and CASE Expressions. As Tim wrote:

CASE expressions were first released in Oracle 8i […] Prior to that if we wanted something similar to a CASE expression in our SQL, we had to use the DECODE function.

This reminded me of a trick I came up with many years ago, probably before 8i 🙂

A table T has a column X, and we need a query that one of the items it returns is a rank – either “Low”, “Medium” or “High”. This rank is determined by the value of X and two thresholds – LOW_TH and HIGH_TH.

With a (Searched) CASE Expression it’s as simple as this: Continue reading “DECODE With Ranges”

Eurovision Analysis #JoelKallmanDay

Here’s my contribution to Joel Kallman Day 2022.

The picture is from the last time I met Joel. It was taken at ilOUG Tech Days 2020, just before an invigorating keynote by Joel.

Four years ago, as part of ODC Appreciation Day 2018 (as the annual Oracle community appreciation day was called that year), I wrote about Pattern Matching in SQL. That post was not a technical one.

This time I decided to return to the same topic, with a technical post.
I recorded a video that explains Pattern Matching by analyzing the winners of the Eurovision Song Contest over the years.

Pre-19c Crossedition Triggers

As shown in the previous post, finding all the crossedition triggers in the database is easy using DBA_TRIGGERS_AE, but this can be done only as of Oracle 19c – the release in which this data dictionary view was added.

Here are two options that can be used before 19c.

A Proper Way

We can use DBA_OBJECTS_AE to find all the triggers across all editions, and then for each one to “jump” to its edition using dbms_sql and check whether it is a crosseedition trigger or not using DBA_TRIGGERS. Continue reading “Pre-19c Crossedition Triggers”

USER_TRIGGERS_AE

One of the fundamental building blocks of EBR (Edition Based Redefinition) is Crossedition Triggers.

This special type of triggers is used when we change the structure of tables in an online application upgrade, and the data of these tables may be changed while we do the upgrade and afterwards, until no session uses the pre-upgrade edition. These triggers keep the pre-upgrade representation of the data and post-upgrade representation of the data in sync.

Crossedition triggers are temporary in nature – they should be dropped when all the clients use the edition in which the triggers were created (or one of its descendent editions).

In theory, crossedition triggers live for a very short period.

In real life, when we have multiple active editions and different types of clients that may keep using older editions for a while before upgrading to the latest edition, it may take days and even weeks before we can drop the crossedition triggers. In my opinion, this is perfectly fine, and actually one of the benefits that I see in EBR.

Another significant difference between crossedition triggers and regular triggers is that crossedition triggers are not inherited by descendent editions. However, they may be triggered by DML operations that are executed in a descendent edition.

This means that we need a way to check which crossedition triggers exist in our schema/database.
Continue reading “USER_TRIGGERS_AE”

Frequency-Based Ranges – Generalized with SQL Macro

In the previous post I used MATCH_RECOGNIZE to convert a list of concrete event times to frequency-based ranges.
The same technique can be applied with different tables, so I wrapped the query in a Table SQL Macro.

The parameters are the table, the “type” column, the “time” column, and (optionally) the period used for filtering.
I assumed the frequencies are always in minutes. The macro can be further generalized to support different time resolutions.

Here is the code:

create or replace function freq_view
(
    i_table     in dbms_tf.table_t,
    i_columns   in dbms_tf.columns_t,
    i_from_time in date default date '1970-01-01',
    i_to_time   in date default date '9999-12-31'
) return varchar2 sql_macro as
begin
    return 'select event_type ' || i_columns(1) || ',
       to_char(from_time,''yyyy-mm-dd'') "DATE",
       to_char(from_time,''hh24:mi'') ||
         nvl2(to_time,
              '' - '' || to_char(to_time,''hh24:mi'')
                    || '' every '' || minutes || '' minutes (''
                    || number_of_events || '' events)'',
              null) event_times
from (
  select ' || i_columns(1) || ' as event_type, trunc(' || i_columns(2) || ',''mi'') as event_time
  from   i_table
  where  ' || i_columns(2) || ' between i_from_time and i_to_time
) match_recognize (
    partition by event_type
    order by event_time
    measures X.event_time from_time,
             last(Y.event_time) to_time,
             round((first(Y.event_time)-X.event_time)*24*60) minutes,
             count(*) number_of_events
    pattern (X Y{2,} | X)
    define
      Y as event_time-prev(event_time) = first(Y.event_time)-X.event_time
)
order by event_type, from_time';
end freq_view;
/

Let’s use it for the same data set used in the original post:

SQL> break on event_type skip 1 dup
SQL> select * from freq_view(events,columns(event_type,event_time),timestamp'2022-07-29 08:00:00');

EVENT_ DATE           EVENT_TIMES
------ -------------- ----------------------------------------------
A      2022-07-29     08:10 - 08:50 every 10 minutes (5 events)
A      2022-07-29     08:58 - 09:22 every 8 minutes (4 events)
A      2022-07-29     09:33
A      2022-07-29     09:42 - 10:52 every 10 minutes (8 events)
A      2022-07-29     10:59 - 11:11 every 4 minutes (4 events)
A      2022-07-29     11:21 - 12:01 every 10 minutes (5 events)

B      2022-07-29     08:15 - 09:55 every 20 minutes (6 events)
B      2022-07-29     10:10
B      2022-07-29     10:25 - 12:05 every 25 minutes (5 events)

C      2022-07-29     08:02 - 08:32 every 2 minutes (16 events)
C      2022-07-29     08:49 - 09:05 every 2 minutes (9 events)
C      2022-07-29     09:08 - 09:20 every 4 minutes (4 events)
C      2022-07-29     09:22 - 10:16 every 2 minutes (28 events)
C      2022-07-29     11:46 - 12:20 every 2 minutes (18 events)


14 rows selected.

And now let’s use it for something else – the DBMS_SCHEDULER executions from the last 3 hours:

SQL> select * from freq_view(user_scheduler_job_run_details,columns(job_name,log_date),sysdate-3/24);

JOB_NAME     DATE            EVENT_TIMES
------------ --------------- ------------------------------------------------
JOB_AAA      2022-07-30      08:45 - 10:45 every 60 minutes (3 events)
JOB_BBB      2022-07-30      08:10 - 10:10 every 60 minutes (3 events)
JOB_CCC      2022-07-30      08:07 - 11:02 every 5 minutes (36 events)
JOB_DDD      2022-07-30      10:00
JOB_EEE      2022-07-30      08:06 - 11:05 every 1 minutes (180 events)

Pattern Matching and Event Frequencies

I have a table that contains events of various types:

create table events (
  event_id   integer     not null,
  event_type varchar2(2) not null,
  event_time date        not null,
  -- some other columns with the event details
);

The events may arrive sporadically, but many times multiple consecutive events of the same type arrive in a fixed frequency (e.g., “events of type A happen every 10 minutes between 8:10am and 8:50am”).

I want to list all the event times in a specific period for every event type.

This is a simple query, but there are many results, so it’s not very easy to read them:

SQL> break on event_type skip 1 dup
SQL> select event_type,
  2         event_time
  3  from   events
  4  where  event_time > timestamp'2022-07-29 08:00:00'
  5  order  by event_type,
  6            event_time;

EVENT
TYPE  EVENT_TIME
----- -------------------
A     2022-07-29 08:10:00
A     2022-07-29 08:20:00
A     2022-07-29 08:30:00
A     2022-07-29 08:40:00
A     2022-07-29 08:50:00
A     2022-07-29 08:58:00
A     2022-07-29 09:06:00
A     2022-07-29 09:14:00
A     2022-07-29 09:22:00
A     2022-07-29 09:33:00
A     2022-07-29 09:42:00
A     2022-07-29 09:52:00
A     2022-07-29 10:02:00
A     2022-07-29 10:12:00
A     2022-07-29 10:22:00
A     2022-07-29 10:32:00
A     2022-07-29 10:42:00
A     2022-07-29 10:52:00
A     2022-07-29 10:59:00
A     2022-07-29 11:03:00
A     2022-07-29 11:07:00
A     2022-07-29 11:11:00
A     2022-07-29 11:21:00
A     2022-07-29 11:31:00
A     2022-07-29 11:41:00
A     2022-07-29 11:51:00
A     2022-07-29 12:01:00

B     2022-07-29 08:15:00
B     2022-07-29 08:35:00
B     2022-07-29 08:55:00
B     2022-07-29 09:15:00
B     2022-07-29 09:35:00
B     2022-07-29 09:55:00
B     2022-07-29 10:10:00
B     2022-07-29 10:25:00
B     2022-07-29 10:50:00
B     2022-07-29 11:15:00
B     2022-07-29 11:40:00
B     2022-07-29 12:05:00

C     2022-07-29 08:02:00
C     2022-07-29 08:04:00
C     2022-07-29 08:06:00
C     2022-07-29 08:08:00
C     2022-07-29 08:10:00
C     2022-07-29 08:12:00
C     2022-07-29 08:14:00
C     2022-07-29 08:16:00
C     2022-07-29 08:18:00
C     2022-07-29 08:20:00
C     2022-07-29 08:22:00
C     2022-07-29 08:24:00
C     2022-07-29 08:26:00
C     2022-07-29 08:28:00
C     2022-07-29 08:30:00
C     2022-07-29 08:32:00
C     2022-07-29 08:49:00
C     2022-07-29 08:51:00
...

114 rows selected.

Because this list contains batches of events with a fixed frequency, we can change the query to get a nicer output – with a single line for every batch.
Identifying these batches is easy with the MATCH_RECOGNIZE clause.

Let’s see first the output of the query:

EVENT
TYPE  DATE         EVENT_TIMES
----- ------------ -------------------------------------------
A     2022-07-29   08:10 - 08:50 every 10 minutes (5 events)
A     2022-07-29   08:58 - 09:22 every 8 minutes (4 events)
A     2022-07-29   09:33
A     2022-07-29   09:42 - 10:52 every 10 minutes (8 events)
A     2022-07-29   10:59 - 11:11 every 4 minutes (4 events)
A     2022-07-29   11:21 - 12:01 every 10 minutes (5 events)

B     2022-07-29   08:15 - 09:55 every 20 minutes (6 events)
B     2022-07-29   10:10
B     2022-07-29   10:25 - 12:05 every 25 minutes (5 events)

C     2022-07-29   08:02 - 08:32 every 2 minutes (16 events)
C     2022-07-29   08:49 - 09:05 every 2 minutes (9 events)
C     2022-07-29   09:08 - 09:20 every 4 minutes (4 events)
C     2022-07-29   09:22 - 10:16 every 2 minutes (28 events)
C     2022-07-29   11:46 - 12:20 every 2 minutes (18 events)


14 rows selected.

Much nicer, isn’t it? 🙂

Here is the query:

SQL> select event_type,
  2         to_char(from_time,'yyyy-mm-dd') "DATE",
  3         to_char(from_time,'hh24:mi') ||
  4           nvl2(to_time,
  5                ' - ' || to_char(to_time,'hh24:mi')
  6                      || ' every ' || minutes || ' minutes ('
  7                      || number_of_events || ' events)',
  8                null) event_times
  9  from (
 10    select event_type,event_time
 11    from   events
 12    where  event_time > timestamp'2022-07-29 08:00:00'
 13  ) match_recognize (
 14      partition by event_type
 15      order by event_time
 16      measures X.event_time from_time,
 17               last(Y.event_time) to_time,
 18               (first(Y.event_time)-X.event_time)*24*60 minutes,
 19               count(*) number_of_events
 20      pattern (X Y{2,} | X)
 21      define
 22        Y as event_time-prev(event_time) = first(Y.event_time)-X.event_time
 23  )
 24  order by event_type, from_time;

The original query [lines 10-12] is the input to the MATCH_RECOGNIZE clause [lines 13-23].

We partition the input record-set by EVENT_TYPE [line 14], as we want to get a separate list for each type.

MATCH_RECOGNIZE is used for finding consecutive rows that match some pattern. “Consecutive” means we have to define the order of the input record-set (in each partition). In this case, we order the events of each type by EVENT_TIME [line 15].

Now, let’s define the pattern [lines 20-22]. We’re looking for at least 3 consecutive events – an X record followed by two or more Y records [“X Y{2,}”].
Any event is potentially an X record, so I didn’t define it explicitly.
Y is explicitly defined [line 22] as a record that its time difference from the previous record [event_time-prev(event_time)] is the same as the time difference between the first event [X] and the second event [the first Y] in the current batch [first(Y.event_time)-X.event_time].
If we don’t find such a batch of events [X followed by two or more Ys], we have a single event to return – X. This is expressed by the pattern “X Y{2,} | X” [line 20].

For every match of this pattern, 4 expressions are calculated [lines 16-19]:
FROM_TIME: the time of the first event in the match [line 16]
TO_TIME: the time of the last event in the match [line 17]
MINUTES: the time difference between every two consecutive events [line 18]
NUMBER_OF_EVENTS: the total number of events in the match [line 19]

Finally, we display the results returned by the MATCH_RECOGNIZE operator, using the expressions calculated in the MEASURES clause [lines 1-8].

The same technique can be applied for different tables. See the next post for a generalized solution, using a Table SQL Macro.