The Performance of the FIRST and LAST Functions

Overview

One of the first posts I wrote in this blog (almost five years ago) was about the FIRST and LAST aggregate functions.
These functions are, in a way, extended versions of the much more popular aggregate functions MIN and MAX.
MIN and MAX allow you (conceptually) to sort a group of rows by some column and return the value of that column from the first or last row in the sorted group.
The FIRST and LAST functions extend this ability, and allow you to sort a group of rows by one column, but return the value of another column from the first or last row in the sorted group.
You are welcome to read the original post for more details about the functionality and syntax of these functions, and for seeing some examples.

In this post I’d like to focus on the performance of the FIRST and LAST functions.

Setup

First let’s setup a demo table – T – with 3 columns and an index, and populate it with one million rows.

SQL> create table t (
  2    c1 number not null,
  3    c2 number not null,
  4    c3 varchar2(100)
  5  );

Table created.

SQL>
SQL> insert /*+ append */
  2  into t(c1,c2,c3)
  3      select mod(rownum, 10000),
  4             mod(rownum, 13),
  5             lpad(rownum, 100)
  6      from   dual
  7      connect by level <= 1000000;

1000000 rows created.

SQL> commit;

Commit complete.

SQL>
SQL> exec dbms_stats.gather_table_stats(user,'T')

PL/SQL procedure successfully completed.

SQL>
SQL> create index t_c1_c2_ix on t(c1,c2);

Index created.

Note that the C1 column has 10,000 distinct values, and for each of these values there are 100 records in the table.

I executed the following tests on 11.2.0.4, 12.1.0.2 and 12.2.0.1.

MIN/MAX Optimization

For starters, let’s find the minimum value of C1 in the table:

SQL> select min(c1)
  2  from t;

   MIN(C1)
----------
         0

Since we have an index with C1 as the leading column, the optimizer has a special treatment for such a query – using the INDEX FULL SCAN (MIN/MAX) access path:

Execution Plan
----------------------------------------------------------
Plan hash value: 2417559164

-----------------------------------------------------------------------------------------
| Id  | Operation                  | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |            |     1 |     4 |     3   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE            |            |     1 |     4 |            |          |
|   2 |   INDEX FULL SCAN (MIN/MAX)| T_C1_C2_IX |     1 |     4 |     3   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          3  consistent gets
          0  physical reads
          0  redo size
        540  bytes sent via SQL*Net to client
        608  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
1	rows processed

The optimizer knows that it only needs to look at the first entry of the index, and that’s it. And indeed, we can see it took only 3 consistent gets (the height of the index).
The same optimization is done the for the MAX function.

FIRST/LAST Lack of Optimization

Unfortunately, it seems there is no such optimization for the FIRST and LAST functions, although a very similar principle could have been applied.

Let’s use the FIRST function for finding the average value of C2 amongst all the records that have the minimum value of C1:

SQL> select avg(c2) keep (dense_rank first order by c1) avg_c2_of_min_c1
  2  from t;

AVG_C2_OF_MIN_C1
----------------
            6.03


Execution Plan
----------------------------------------------------------
Plan hash value: 2534426607

------------------------------------------------------------------------------------
| Id  | Operation             | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |            |     1 |     7 |   684   (1)| 00:00:01 |
|   1 |  SORT AGGREGATE       |            |     1 |     7 |            |          |
|   2 |   INDEX FAST FULL SCAN| T_C1_C2_IX |  1000K|  6835K|   684   (1)| 00:00:01 |
------------------------------------------------------------------------------------


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       2516  consistent gets
          0  physical reads
          0  redo size
        551  bytes sent via SQL*Net to client
        608  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

Oracle scanned all the million entries of the index (2516 consistent gets), while only the first 100 are relevant.

In this case, both columns (C1 and C2) are included in the index. If we change the query to find the average value of C3 (instead of C2) amongst all the records that have the minimum value of C1, the Index Fast Full Scan becomes a Full Table Scan (with 15658 consistent gets):

SQL> select avg(c3) keep (dense_rank first order by c1) avg_c3_of_min_c1
  2  from t;

AVG_C3_OF_MIN_C1
----------------
          505000


Execution Plan
----------------------------------------------------------
Plan hash value: 2966233522

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |   105 |  4246   (1)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      |     1 |   105 |            |          |
|   2 |   TABLE ACCESS FULL| T    |  1000K|   100M|  4246   (1)| 00:00:01 |
---------------------------------------------------------------------------


Statistics
----------------------------------------------------------
          0  recursive calls
         11  db block gets
      15658  consistent gets
          0  physical reads
          0  redo size
        551  bytes sent via SQL*Net to client
        608  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

An Efficient Alternative

We can achieve the same functionality with much better performance, by discarding the FIRST function and writing another query instead – a query that is more complex but takes advantage of the Optimizer’s smart mechanisms. Following is an idea for such a query.
First, for finding the average value of C2 amongst all the records that have the minimum value of C1:

SQL> select avg(c2)
  2  from (select c2,
  3               rank() over (order by c1) rnk
  4        from t)
  5  where rnk = 1;

   AVG(C2)
----------
      6.03


Execution Plan
----------------------------------------------------------
Plan hash value: 4145734408

--------------------------------------------------------------------------------------
| Id  | Operation               | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |            |     1 |    26 |  2509   (1)| 00:00:01 |
|   1 |  SORT AGGREGATE         |            |     1 |    26 |            |          |
|*  2 |   VIEW                  |            |  1000K|    24M|  2509   (1)| 00:00:01 |
|*  3 |    WINDOW NOSORT STOPKEY|            |  1000K|  6835K|  2509   (1)| 00:00:01 |
|   4 |     INDEX FULL SCAN     | T_C1_C2_IX |  1000K|  6835K|  2509   (1)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("RNK"=1)
   3 - filter(RANK() OVER ( ORDER BY "C1")<=1)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          3  consistent gets
          0  physical reads
          0  redo size
        542  bytes sent via SQL*Net to client
        608  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

We got the same result as before, but using only 3 consistent gets, compared to 2516.
Oracle starts scanning the index, and stops as soon as it gets to the second rank (i.e., the second unique value of C1) – after fetching only 100 entries.
Note the WINDOW NOSORT STOPKEY operation in the execution plan – the optimizer knows that the records are fetched in the same order they are sorted in the index, and therefore NO SORT is needed, and it knows to STOP fetching as soon as the first KEY (rank) is over.
The height of the index is 3, so it means that all the 100 entries were found in the first block.
Sadly, the optimizer is very pessimistic when it comes to the estimation of rows returned by the WINDOW NOSORT STOPKEY operation. It assumes that no row is filtered out by this operation. This is why I had to hint the following query, that finds the average value of C3 amongst all the records that have the minimum value of C1:

SQL> select avg(c3)
  2  from (select /*+ index (t,(c1)) */ c3,
  3               rank() over (order by c1) rnk
  4        from t)
  5  where rnk = 1;

   AVG(C3)
----------
    505000


Execution Plan
----------------------------------------------------------
Plan hash value: 92880731

---------------------------------------------------------------------------------------------
| Id  | Operation                      | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |            |     1 |    65 |  1002K  (1)| 00:00:40 |
|   1 |  SORT AGGREGATE                |            |     1 |    65 |            |          |
|*  2 |   VIEW                         |            |  1000K|    61M|  1002K  (1)| 00:00:40 |
|*  3 |    WINDOW NOSORT STOPKEY       |            |  1000K|   100M|  1002K  (1)| 00:00:40 |
|   4 |     TABLE ACCESS BY INDEX ROWID| T          |  1000K|   100M|  1002K  (1)| 00:00:40 |
|   5 |      INDEX FULL SCAN           | T_C1_C2_IX |  1000K|       |  2509   (1)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("RNK"=1)
   3 - filter(RANK() OVER ( ORDER BY "C1")<=1)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        104  consistent gets
          0  physical reads
          0  redo size
        542  bytes sent via SQL*Net to client
        608  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

Same idea, and we get the answer using only 104 consistent gets, compared to 15658 in the query that used the FIRST function.

Conclusion

I really like the FIRST and LAST aggregate functions for the functionality they provide, as built-in SQL functions with a concise syntax.
When I need this functionality, and performance is not an issue, I prefer using them.
However, as shown in this post, performance may certianly be an issue. In such cases I’d prefer to use another approach – such as the one presented above.

The documentation says:

When you need a value from the first or last row of a sorted group, but the needed value is not the sort key, the FIRST and LAST functions eliminate the need for self-joins or views and enable better performance.

I wonder – “better” than what?

Nice post. Just last week I had to deal with exactly the same performance problem using the FIRST/LAST functions. I like your alternative solution using another analytic function (rank) that provides the same result but with good performance.

I used the old fashioned way of querying we used to do before analytic functions were available.

In this case these old fashioned queries which do need self joins have better performance than the analytic version using FIRST/LAST functions.

select avg(c2) 
from   t
where  t.c1 = (select min(c1) from t)
;
   AVG(C2)
----------
6,03

Explain Plan
-----------------------------------------------------------

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------
Plan hash value: 269985105

-------------------------------------------------------------------------------------------
| Id  | Operation                    | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |            |     1 |     7 |     3   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE              |            |     1 |     7 |            |          |
|*  2 |   INDEX RANGE SCAN           | T_C1_C2_IX |   100 |   700 |     3   (0)| 00:00:01 |
|   3 |    SORT AGGREGATE            |            |     1 |     4 |            |          |
|   4 |     INDEX FULL SCAN (MIN/MAX)| T_C1_C2_IX |     1 |     4 |     3   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("T"."C1"= (SELECT MIN("C1") FROM "T" "T"))

Statistics
-----------------------------------------------------------
               7  CPU used when call started
              11  DB time
              69  Requests to/from client
              69  SQL*Net roundtrips to/from client
               2  buffer is not pinned count
             802  bytes received via SQL*Net from client
           27993  bytes sent via SQL*Net to client
               2  calls to get snapshot scn: kcmgss
               6  consistent gets
               4  consistent gets - examination
               6  consistent gets from cache
               2  consistent gets from cache (fastpath)
               1  cursor authentications
               1  enqueue releases
               1  enqueue requests
               2  execute count
               2  ges messages sent
              12  global enqueue gets sync
              12  global enqueue releases
               2  index scans kdiixs1
           49152  logical read bytes from cache
               2  no work - consistent read gets
              75  non-idle wait count
               2  opened cursors cumulative
               1  opened cursors current
               1  parse count (hard)
               2  parse count (total)
               1  parse time elapsed
               1  recursive calls
               6  session logical reads
               1  sorts (memory)

               
select avg(c3) 
from   t
where  t.c1 = (select min(c1) from t)
;               

   AVG(C3)
----------
    505000

Explain Plan
-----------------------------------------------------------

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------
Plan hash value: 3317387217

--------------------------------------------------------------------------------------------
| Id  | Operation                     | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |            |     1 |   105 |   103   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE               |            |     1 |   105 |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID | T          |   100 | 10500 |   103   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN           | T_C1_C2_IX |   100 |       |     3   (0)| 00:00:01 |
|   4 |     SORT AGGREGATE            |            |     1 |     4 |            |          |
|   5 |      INDEX FULL SCAN (MIN/MAX)| T_C1_C2_IX |     1 |     4 |     3   (0)| 00:00:01 |

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("T"."C1"= (SELECT MIN("C1") FROM "T" "T"))

Statistics
-----------------------------------------------------------
               1  CPU used by this session
               1  CPU used when call started
              99  Cached Commit SCN referenced
               1  Commit SCN cached
              69  Requests to/from client
              69  SQL*Net roundtrips to/from client
             103  buffer is not pinned count
              99  buffer is pinned count
             802  bytes received via SQL*Net from client
           28018  bytes sent via SQL*Net to client
               2  calls to get snapshot scn: kcmgss
               1  cleanout - number of ktugct calls
               1  cleanouts only - consistent read gets
               1  commit txn count during cleanout
               1  concurrency wait time
             107  consistent gets
               5  consistent gets - examination
             107  consistent gets from cache
             102  consistent gets from cache (fastpath)
               1  db block changes
               1  enqueue releases
               1  enqueue requests
               2  execute count
               1  gc remote grants
               1  gcs messages sent
               2  ges messages sent
               2  global enqueue gets sync
               2  global enqueue releases
               1  immediate (CR) block cleanout applications
               2  index scans kdiixs1
          876544  logical read bytes from cache
             101  no work - consistent read gets
              77  non-idle wait count
               2  opened cursors cumulative
               1  opened cursors current
               1  parse count (hard)
               2  parse count (total)
               1  recursive calls
               1  redo entries
              72  redo size
             107  session logical reads
               1  sorts (memory)
             679  sorts (rows)
               1  sql area evicted
             100  table fetch by rowid
              72  user calls