Introduction
I’ve just realized, once again, that the Oracle Optimizer is even smarter than I thought.
The comments (by Iudith Mentzel and Thomas Mautsch) to my previous post, GROUPING SETS and COLLECT don’t get along, made me understand that not all GROUPING SETS were created equal.
The examples in this post are from Oracle 18.3
Extended Aggregation Options
GROUPING SETS, ROLLUP and CUBE are great features that enable us to perform several aggregations on the same population in a single statement, that is shorter, more elegant, and usually more efficient, than the alternative of combining (e.g., using UNION ALL) the results of the individual aggregations.
In my opinion, GROUPING SETS, ROLLUP and CUBE should be in the toolbox of every Oracle developer. If you are not familiar with them, I highly recommend reading Tim Hall‘s article https://oracle-base.com/articles/misc/rollup-cube-grouping-functions-and-grouping-sets.
Under the Hood
We can use GROUPING SETS with many different combinations, but the implementation under the hood may be different for different combinations.
Let’s take, for example, the following two queries. Note that the first query has to perform 4 aggregations (some of them with a composite grouping key), while the second query performs only 2 aggregations (with a single column grouping key).
Try to guess which of the two will run faster…
select grouping_id(x,y,z) grp_id,
x,
y,
z,
count(w)
from t
group by grouping sets((x),(y,x),(z,x,y),());
select grouping_id(x,y) grp_id,
x,
y,
count(w)
from t
group by grouping sets((x),(y));
This is (of course) a trick question. First, because I wouldn’t ask it if the answer was the intuitive one. And second, because the answer depends on various factors which I didn’t show, like the table structure and its content. But anyway, take a guess, you have nothing to lose 🙂
Setup
So let’s create the table and populate it with some data.
SQL> create table t (
2 x number,
3 y number,
4 z number,
5 w char(1000)
6 );
Table created.
SQL>
SQL> insert into t (x,y,z,w)
2 select mod(rownum, 3),
3 mod(rownum, 3)+1,
4 mod(rownum, 3)+2,
5 rownum
6 from dual
7 connect by level <= 1000000;
1000000 rows created.
SQL>
SQL> commit;
Commit complete.
SQL>
SQL> exec dbms_stats.gather_table_stats(user,'T')
PL/SQL procedure successfully completed.
The Test
Let’s turn timing and statistics on:
SQL> set timing on SQL> set autotrace on stat
And execute the queries.
SQL> select grouping_id(x,y,z) grp_id, 2 x, 3 y, 4 z, 5 count(w) 6 from t 7 group by grouping sets((x),(y,x),(z,x,y),()); GRP_ID X Y Z COUNT(W) ---------- ---------- ---------- ---------- ---------- 0 0 1 2 333333 1 0 1 333333 3 0 333333 0 1 2 3 333334 1 1 2 333334 3 1 333334 0 2 3 4 333333 1 2 3 333333 3 2 333333 7 1000000 10 rows selected. Elapsed: 00:00:00.40 Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 143027 consistent gets 0 physical reads 0 redo size 1015 bytes sent via SQL*Net to client 623 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 1 sorts (memory) 0 sorts (disk) 10 rows processed
SQL> select grouping_id(x,y) grp_id, 2 x, 3 y, 4 count(w) 5 from t 6 group by grouping sets((x),(y)); GRP_ID X Y COUNT(W) ---------- ---------- ---------- ---------- 2 1 333333 2 2 333334 2 3 333333 1 1 333334 1 2 333333 1 0 333333 6 rows selected. Elapsed: 00:00:06.85 Statistics ---------------------------------------------------------- 1118 recursive calls 145711 db block gets 428973 consistent gets 142858 physical reads 624 redo size 890 bytes sent via SQL*Net to client 623 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 6 rows processed
The first query performed 4 aggregations in less than a second, while the second query performed 2 (allegedly simpler) aggregations on the same population in more than 6 seconds and consumed much more resources.
Analysis of the First Query
The reason for the “winning” of the first query is that the 4 grouping keys in this query are “telescopic” – they are all included (either partially or completely) in the leading part of the same key – (x,y,z)
() (x) (x,y) (x,y,z)
To make it a bit more interesting, I used (y,x) in the query instead of (x,y) and (z,x,y) instead of (x,y,z), but they are equivalent as grouping keys
So in this case it’s possible to sort the table by x,y,z and to calculate all 4 aggregations in a single scan. And indeed, that’s what Oracle does here; the Optimizer has a special operation for that – SORT GROUP BY ROLLUP. Here is the execution plan of the first query:
SQL> select grouping_id(x,y,z) grp_id, 2 x, 3 y, 4 z, 5 count(w) 6 from t 7 group by grouping sets((x),(y,x),(z,x,y),()); ----------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 14 | 14140 | 40080 (1)| 00:00:02 | | 1 | SORT GROUP BY ROLLUP| | 14 | 14140 | 40080 (1)| 00:00:02 | | 2 | TABLE ACCESS FULL | T | 1000K| 963M| 40007 (1)| 00:00:02 | -----------------------------------------------------------------------------
GROUPING SETS with these 4 grouping keys is exactly the same as ROLLUP(x,y,z), so the operation name SORT GROUP BY ROLLUP makes lots of sense.
Even if we use just a subset of a ROLLUP, Oracle can still take advantage of the SORT GROUP BY ROLLUP operation.
For example:
SQL> select grouping_id(x,y,z) grp_id,
2 x,
3 y,
4 z,
5 count(*)
6 from t
7 group by grouping sets((x),(x,y,z));
GRP_ID X Y Z COUNT(*)
---------- ---------- ---------- ---------- ----------
0 0 1 2 333333
3 0 333333
0 1 2 3 333334
3 1 333334
0 2 3 4 333333
3 2 333333
6 rows selected.
Elapsed: 00:00:00.41
Execution Plan
----------------------------------------------------------
Plan hash value: 372929889
-----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 14 | 126 | 40079 (1)| 00:00:02 |
| 1 | SORT GROUP BY ROLLUP| | 14 | 126 | 40079 (1)| 00:00:02 |
| 2 | TABLE ACCESS FULL | T | 1000K| 8789K| 40006 (1)| 00:00:02 |
-----------------------------------------------------------------------------
Analysis of the Second Query
What about the second query? I used GROUPING SETS ((x),(y)), and here the “telescopic” (or rollup) characteristic does not apply.
Let’s see the execution plan:
SQL> select grouping_id(x,y) grp_id, 2 x, 3 y, 4 count(w) 5 from t 6 group by grouping sets((x),(y)); ----------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 7 | 364 | 114K (1)| 00:00:05 | | 1 | TEMP TABLE TRANSFORMATION | | | | | | | 2 | LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D667A_2180326 | | | | | | 3 | TABLE ACCESS FULL | T | 1000K| 960M| 40007 (1)| 00:00:02 | | 4 | LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D667B_2180326 | | | | | | 5 | HASH GROUP BY | | 3 | 3012 | 37160 (1)| 00:00:02 | | 6 | TABLE ACCESS FULL | SYS_TEMP_0FD9D667A_2180326 | 1000K| 957M| 37086 (1)| 00:00:02 | | 7 | LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D667B_2180326 | | | | | | 8 | HASH GROUP BY | | 3 | 3012 | 37160 (1)| 00:00:02 | | 9 | TABLE ACCESS FULL | SYS_TEMP_0FD9D667A_2180326 | 1000K| 957M| 37086 (1)| 00:00:02 | | 10 | VIEW | | 3 | 156 | 2 (0)| 00:00:01 | | 11 | TABLE ACCESS FULL | SYS_TEMP_0FD9D667B_2180326 | 3 | 126 | 2 (0)| 00:00:01 | -----------------------------------------------------------------------------------------------------------------------
Oracle loads the table into a temporary table (SYS_TEMP_0FD9D667A_2180326).
Then it scans SYS_TEMP_0FD9D667A_2180326 twice, for performing two separate “simple” GROUP BY operations (one by X and one by Y; using HASH GROUP BY operations), and inserts the results into a second temporary table (SYS_TEMP_0FD9D667B_2180326).
And finally it scans SYS_TEMP_0FD9D667B_2180326 for returning the combined results.
Summary
In this post we saw two possible execution plans for queries with GROUP BY GROUPING SETS.
I will write about additional plans in the next post.
Hi Oren,
Amazing analysis, hats off :):)
I guess that copying the data to the temporary table for the second query is done for achieving read consistency of the 2 independent
GROUP BY results.
I just wonder why does this approach look “more appealing” to Oracle
than using (internally) a flashback to the overall query starting SCN,
which it is doing anyway for any query, with the only difference that here
this should be repeated for each GROUP BY operation.
Or, maybe this decision depends in some way on the table size ?
And/or, possibly, on some (tracked) “change behavior” of the table ?
Waiting eagerly for you further posts 🙂
Cheers & Have a nice weekend !
Iudith
Thanks Iudith 🙂
And thanks for the comment in the previous post, that triggered my investigation.
I believe the use of temporary table here is an attempt to reduce I/O (assuming the base row set will fit in memory), similarly to the materialization of subquery factoring.
I will show in the next post an example where the optimizer chooses to scan the actual table twice instead of using a temporary table.
Thanks,
Oren.
Hi Oren,
Amazing analysis
but a more complex case:
select x,y,z from t group by cube(x), rollup(y,z);
the plan is:
—————————————————————————————————————————
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
—————————————————————————————————————————
| 0 | SELECT STATEMENT | | 1 | 39 | 15 (20)| 00:00:01 |
| 1 | TEMP TABLE TRANSFORMATION | | | | | |
| 2 | MULTI-TABLE INSERT | | | | | |
| 3 | SORT GROUP BY ROLLUP | | 1 | 39 | 3 (34)| 00:00:01 |
| 4 | TABLE ACCESS FULL | T | 1 | 39 | 2 (0)| 00:00:01 |
| 5 | DIRECT LOAD INTO (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D6DEC_3AB9AAFD | | | | |
| 6 | DIRECT LOAD INTO (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D6DED_3AB9AAFD | | | | |
| 7 | MULTI-TABLE INSERT | | | | | |
| 8 | SORT GROUP BY ROLLUP | | 1 | 26 | 3 (34)| 00:00:01 |
| 9 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6DEC_3AB9AAFD | 1 | 26 | 2 (0)| 00:00:01 |
| 10 | DIRECT LOAD INTO (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D6DEE_3AB9AAFD | | | | |
| 11 | DIRECT LOAD INTO (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D6DED_3AB9AAFD | | | | |
| 12 | LOAD AS SELECT (CURSOR DURATION MEMORY) | SYS_TEMP_0FD9D6DED_3AB9AAFD | | | | |
| 13 | SORT GROUP BY ROLLUP | | 1 | 13 | 3 (34)| 00:00:01 |
| 14 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6DEE_3AB9AAFD | 1 | 13 | 2 (0)| 00:00:01 |
| 15 | VIEW | | 3 | 117 | 6 (0)| 00:00:01 |
| 16 | VIEW | | 3 | 117 | 6 (0)| 00:00:01 |
| 17 | UNION-ALL | | | | | |
| 18 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6DEC_3AB9AAFD | 1 | 39 | 2 (0)| 00:00:01 |
| 19 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6DED_3AB9AAFD | 1 | 39 | 2 (0)| 00:00:01 |
| 20 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6DEE_3AB9AAFD | 1 | 26 | 2 (0)| 00:00:01 |
—————————————————————————————————————————
the key pointer is at the cursor memory duration, and the temp file can be reused.