ANY_VALUE

A new aggregate function – ANY_VALUE – was added to Oracle. It is documented as of Oracle 21c, but apparently it exists also in 19c (at least in 19.8 – the version in which I tested it).

Many times when writing an aggregate query we add expressions to the GROUP BY clause just because we want to add them to the select list, although they don’t change the aggregation result.

For example, let’s count the number of cities per country_id:

select c2.country_id,
       count(*) number_of_cities
from   cities    c1,
       countries c2
where  c2.country_id = c1.country_id
group  by c2.country_id;

Now, what if we want to return also the country name?

country_id is the primary key of the countries table, so there is no reason to change the aggregation key. But in an aggregate query the select list can include only aggregate functions, GROUP BY expressions, constants, or expressions involving one of these.

Option 1

So a common practice is to add country_name to the GROUP BY clause:

select c2.country_id,
       c2.country_name,
       count(*) number_of_cities
from   cities    c1,
       countries c2
where  c2.country_id = c1.country_id
group  by c2.country_id,
          c2.country_name;

Adding country_name to the GROUP BY clause is artificial, makes the query less clean, and the SQL engine may need to work harder as the aggregation key is wider.

Option 2

An alternative approach is to choose some simple aggregate function that won’t change the desired result, like MIN or MAX (but not SUM or COUNT…), and apply it on country_name in the select list:

select c2.country_id,
       min(c2.country_name) country_name,
       count(*) number_of_cities
from   cities    c1,
       countries c2
where  c2.country_id = c1.country_id
group  by c2.country_id;

Since all the records in the same group belong to the same country_id, then country_name is the same for all of them, so MIN(country_name)=MAX(country_name)=the right value.
In this option we don’t litter the GROUP BY clause, but we add an arbitrary aggregate function to the select list, which still feels artificial, and adds some (small) extra work for the SQL engine.

Option 3 – ANY_VALUE

Now we can use ANY_VALUE instead. ANY_VALUE(country_name) returns, for each group, the value of country_name from one of the records in that group.

select c2.country_id,
       any_value(c2.country_name) country_name,
       count(*) number_of_cities
from   cities    c1,
       countries c2
where  c2.country_id = c1.country_id
group  by c2.country_id;

Since ANY_VALUE is considered an aggregate function, we don’t need to add it to the GROUP BY clause. But, by definition, it’s a very simple function, so the performance overhead should be negligible with respect to other aggregate functions.

Using ANY_VALUE also makes the query more readable, in my opinion. It makes our intention clear (unlike the use of the MIN function in the previous example).

Option 4

There is another alternative to solve our example:

select c2.country_id,
       c2.country_name,
       c1.number_of_cities
from   (select country_id,
               count(*) number_of_cities
        from   cities
        group  by country_id) c1,
       countries c2
where  c2.country_id = c1.country_id;

Here we “aggregate and then join”, rather than “join and then aggregate”.

Use ANY_VALUE Wisely

I think that ANY_VALUE is a great addition to Oracle SQL. But use it wisely. Remember that by definition it is not deterministic. If you know that some_expression is unique for all the records in the same group in your query, then any_value(some_expression) is deterministic in your query and can be used safely. Otherwise, think very carefully if it’s appropriate for your case.

Leave a Reply

Your email address will not be published. Required fields are marked *