Overview
One of the first posts I wrote in this blog (almost five years ago) was about the FIRST and LAST aggregate functions.
These functions are, in a way, extended versions of the much more popular aggregate functions MIN and MAX.
MIN and MAX allow you (conceptually) to sort a group of rows by some column and return the value of that column from the first or last row in the sorted group.
The FIRST and LAST functions extend this ability, and allow you to sort a group of rows by one column, but return the value of another column from the first or last row in the sorted group.
You are welcome to read the original post for more details about the functionality and syntax of these functions, and for seeing some examples.
In this post I’d like to focus on the performance of the FIRST and LAST functions.
Setup
First let’s setup a demo table – T – with 3 columns and an index, and populate it with one million rows.
SQL> create table t ( 2 c1 number not null, 3 c2 number not null, 4 c3 varchar2(100) 5 ); Table created. SQL> SQL> insert /*+ append */ 2 into t(c1,c2,c3) 3 select mod(rownum, 10000), 4 mod(rownum, 13), 5 lpad(rownum, 100) 6 from dual 7 connect by level <= 1000000; 1000000 rows created. SQL> commit; Commit complete. SQL> SQL> exec dbms_stats.gather_table_stats(user,'T') PL/SQL procedure successfully completed. SQL> SQL> create index t_c1_c2_ix on t(c1,c2); Index created.
Note that the C1 column has 10,000 distinct values, and for each of these values there are 100 records in the table.
I executed the following tests on 11.2.0.4, 12.1.0.2 and 12.2.0.1.
MIN/MAX Optimization
For starters, let’s find the minimum value of C1 in the table:
SQL> select min(c1) 2 from t; MIN(C1) ---------- 0
Since we have an index with C1 as the leading column, the optimizer has a special treatment for such a query – using the INDEX FULL SCAN (MIN/MAX) access path:
Execution Plan ---------------------------------------------------------- Plan hash value: 2417559164 ----------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 4 | 3 (0)| 00:00:01 | | 1 | SORT AGGREGATE | | 1 | 4 | | | | 2 | INDEX FULL SCAN (MIN/MAX)| T_C1_C2_IX | 1 | 4 | 3 (0)| 00:00:01 | ----------------------------------------------------------------------------------------- Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 3 consistent gets 0 physical reads 0 redo size 540 bytes sent via SQL*Net to client 608 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed
The optimizer knows that it only needs to look at the first entry of the index, and that’s it. And indeed, we can see it took only 3 consistent gets (the height of the index).
The same optimization is done the for the MAX function.
FIRST/LAST Lack of Optimization
Unfortunately, it seems there is no such optimization for the FIRST and LAST functions, although a very similar principle could have been applied.
Let’s use the FIRST function for finding the average value of C2 amongst all the records that have the minimum value of C1:
SQL> select avg(c2) keep (dense_rank first order by c1) avg_c2_of_min_c1 2 from t; AVG_C2_OF_MIN_C1 ---------------- 6.03 Execution Plan ---------------------------------------------------------- Plan hash value: 2534426607 ------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1 | 7 | 684 (1)| 00:00:01 | | 1 | SORT AGGREGATE | | 1 | 7 | | | | 2 | INDEX FAST FULL SCAN| T_C1_C2_IX | 1000K| 6835K| 684 (1)| 00:00:01 | ------------------------------------------------------------------------------------ Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 2516 consistent gets 0 physical reads 0 redo size 551 bytes sent via SQL*Net to client 608 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed
Oracle scanned all the million entries of the index (2516 consistent gets), while only the first 100 are relevant.
In this case, both columns (C1 and C2) are included in the index. If we change the query to find the average value of C3 (instead of C2) amongst all the records that have the minimum value of C1, the Index Fast Full Scan becomes a Full Table Scan (with 15658 consistent gets):
SQL> select avg(c3) keep (dense_rank first order by c1) avg_c3_of_min_c1 2 from t; AVG_C3_OF_MIN_C1 ---------------- 505000 Execution Plan ---------------------------------------------------------- Plan hash value: 2966233522 --------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 105 | 4246 (1)| 00:00:01 | | 1 | SORT AGGREGATE | | 1 | 105 | | | | 2 | TABLE ACCESS FULL| T | 1000K| 100M| 4246 (1)| 00:00:01 | --------------------------------------------------------------------------- Statistics ---------------------------------------------------------- 0 recursive calls 11 db block gets 15658 consistent gets 0 physical reads 0 redo size 551 bytes sent via SQL*Net to client 608 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed
An Efficient Alternative
We can achieve the same functionality with much better performance, by discarding the FIRST function and writing another query instead – a query that is more complex but takes advantage of the Optimizer’s smart mechanisms. Following is an idea for such a query.
First, for finding the average value of C2 amongst all the records that have the minimum value of C1:
SQL> select avg(c2) 2 from (select c2, 3 rank() over (order by c1) rnk 4 from t) 5 where rnk = 1; AVG(C2) ---------- 6.03 Execution Plan ---------------------------------------------------------- Plan hash value: 4145734408 -------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 26 | 2509 (1)| 00:00:01 | | 1 | SORT AGGREGATE | | 1 | 26 | | | |* 2 | VIEW | | 1000K| 24M| 2509 (1)| 00:00:01 | |* 3 | WINDOW NOSORT STOPKEY| | 1000K| 6835K| 2509 (1)| 00:00:01 | | 4 | INDEX FULL SCAN | T_C1_C2_IX | 1000K| 6835K| 2509 (1)| 00:00:01 | -------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter("RNK"=1) 3 - filter(RANK() OVER ( ORDER BY "C1")<=1) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 3 consistent gets 0 physical reads 0 redo size 542 bytes sent via SQL*Net to client 608 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed
We got the same result as before, but using only 3 consistent gets, compared to 2516.
Oracle starts scanning the index, and stops as soon as it gets to the second rank (i.e., the second unique value of C1) – after fetching only 100 entries.
Note the WINDOW NOSORT STOPKEY operation in the execution plan – the optimizer knows that the records are fetched in the same order they are sorted in the index, and therefore NO SORT is needed, and it knows to STOP fetching as soon as the first KEY (rank) is over.
The height of the index is 3, so it means that all the 100 entries were found in the first block.
Sadly, the optimizer is very pessimistic when it comes to the estimation of rows returned by the WINDOW NOSORT STOPKEY operation. It assumes that no row is filtered out by this operation. This is why I had to hint the following query, that finds the average value of C3 amongst all the records that have the minimum value of C1:
SQL> select avg(c3) 2 from (select /*+ index (t,(c1)) */ c3, 3 rank() over (order by c1) rnk 4 from t) 5 where rnk = 1; AVG(C3) ---------- 505000 Execution Plan ---------------------------------------------------------- Plan hash value: 92880731 --------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 65 | 1002K (1)| 00:00:40 | | 1 | SORT AGGREGATE | | 1 | 65 | | | |* 2 | VIEW | | 1000K| 61M| 1002K (1)| 00:00:40 | |* 3 | WINDOW NOSORT STOPKEY | | 1000K| 100M| 1002K (1)| 00:00:40 | | 4 | TABLE ACCESS BY INDEX ROWID| T | 1000K| 100M| 1002K (1)| 00:00:40 | | 5 | INDEX FULL SCAN | T_C1_C2_IX | 1000K| | 2509 (1)| 00:00:01 | --------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter("RNK"=1) 3 - filter(RANK() OVER ( ORDER BY "C1")<=1) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 104 consistent gets 0 physical reads 0 redo size 542 bytes sent via SQL*Net to client 608 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed
Same idea, and we get the answer using only 104 consistent gets, compared to 15658 in the query that used the FIRST function.
Conclusion
I really like the FIRST and LAST aggregate functions for the functionality they provide, as built-in SQL functions with a concise syntax.
When I need this functionality, and performance is not an issue, I prefer using them.
However, as shown in this post, performance may certianly be an issue. In such cases I’d prefer to use another approach – such as the one presented above.
The documentation says:
When you need a value from the first or last row of a sorted group, but the needed value is not the sort key, the FIRST and LAST functions eliminate the need for self-joins or views and enable better performance.
I wonder – “better” than what?
Nice post. Just last week I had to deal with exactly the same performance problem using the FIRST/LAST functions. I like your alternative solution using another analytic function (rank) that provides the same result but with good performance.
I used the old fashioned way of querying we used to do before analytic functions were available.
In this case these old fashioned queries which do need self joins have better performance than the analytic version using FIRST/LAST functions.
very helpful post, thank you.