Let’s assume the EMPLOYEES table (from the HR schema) contains many records, and we want to write an (efficient) SQL query that filters it by one or more of the following columns: DEPARTMENT_ID, JOB_ID, MANAGER_ID and LAST_NAME.
For example:
- in one execution we may want to get all the employees with DEPARTMENT_ID = 80
- in a second execution all the employees that their LAST_NAME is King
- in a third execution all the employees that their JOB_ID is ST_CLERK and their MANAGER_ID is 124
- and so on
These columns are indexed, each one in a separate index:
SQL> select index_name, 2 listagg(column_name, ',') within group(order by column_position) index_columns 3 from user_ind_columns 4 where table_name = 'EMPLOYEES' 5 group by index_name; INDEX_NAME INDEX_COLUMNS -------------------- -------------------- EMP_DEPARTMENT_IX DEPARTMENT_ID EMP_EMAIL_UK EMAIL EMP_EMP_ID_PK EMPLOYEE_ID EMP_JOB_IX JOB_ID EMP_MANAGER_IX MANAGER_ID EMP_NAME_IX LAST_NAME,FIRST_NAME 6 rows selected.
Many Queries, Many Indexes
We can write 15 different queries – a query for every possible combination.
For example, the query that filters only by DEPARTMENT_ID may be written like this:
select * from employees where department_id = :department_id;
the query that filters by JOB_ID and MANAGER_ID may be written like this:
select * from employees where job_id = :job_id and manager_id = :manager_id;
and the one that filters by DEPARTMENT_ID, JOB_ID and LAST_NAME may be written like this:
select * from employees where department_id = :department_id and job_id = :job_id and last_name = :last_name;
For this kind of queries we will probably want to add several composite indexes (and get rid of the original indexes that will become redundant).
For example, an index on (DEPARTMENT_ID,JOB_ID,LAST_NAME) will help the first and the third examples, and an index on (JOB_ID,MANAGER_ID) will be good for the second example.
The INTERSECT Approach
We can consider a different approach: still writing a dedicated query for every combination, but taking advantage of the existing indexes. This can be achieved by writing a query per filter column, and then intersecting the results.
So the query that filters only by DEPARTMENT_ID will be written like this:
select * from employees where department_id = :department_id;
the query that filters by JOB_ID and MANAGER_ID will be written like this:
select * from employees where job_id = :job_id
intersect
select * from employees where manager_id = :manager_id;
and the one that filters by DEPARTMENT_ID, JOB_ID and LAST_NAME may be written like this:
select * from employees where department_id = :department_id intersect select * from employees where job_id = :job_id intersect select * from employees where last_name = :last_name;
An Improved INTERSECT Approach
Actually, we don’t really need to intersect the whole result set. We can intersect only the ROWIDs, and then get the whole records of the ROWIDs that survived the intersection. So the last example will become:
select * from employees where rowid in ( select rowid from employees where department_id = :department_id intersect select rowid from employees where job_id = :job_id intersect select rowid from employees where last_name = :last_name);
Note that although we wrote “from employees” 4 times in the query, we actually access the table only once. In the other 3 times we access only the indexes (a different index in each time) and since we get the ROWID from the indexes we don’t need to access the table in these 3 times.
This can be clearly seen from the execution plan:
--------------------------------------------------------- | Id | Operation | Name | --------------------------------------------------------- | 0 | SELECT STATEMENT | | | 1 | NESTED LOOPS | | | 2 | VIEW | VW_NSO_1 | | 3 | INTERSECTION | | | 4 | INTERSECTION | | | 5 | SORT UNIQUE | | |* 6 | INDEX RANGE SCAN | EMP_DEPARTMENT_IX | | 7 | SORT UNIQUE | | |* 8 | INDEX RANGE SCAN | EMP_JOB_IX | | 9 | SORT UNIQUE | | |* 10 | INDEX RANGE SCAN | EMP_NAME_IX | | 11 | TABLE ACCESS BY USER ROWID| EMPLOYEES | --------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 6 - access("DEPARTMENT_ID"=TO_NUMBER(:DEPARTMENT_ID)) 8 - access("JOB_ID"=:JOB_ID) 10 - access("LAST_NAME"=:LAST_NAME)
A Single Query
But do we really want to write 15 different queries, and optionally add several indexes to the table? And think of a similar case where we have not 4, but 5 or 6 or 10 filter columns. Even if we use one query for several combinations, we may still end up with many different queries to write and maintain.
So, can we write a single query for that? Of course we can – the straightforward way would be this:
select * from employees where (:department_id is null or department_id=:department_id) and (:job_id is null or job_id=:job_id) and (:manager_id is null or manager_id=:manager_id) and (:last_name is null or last_name=:last_name);
But will it perform well? Probably not…
While working on a similar (but more complex) case, I had an idea that takes the concept of the INTERSECT approach – and therefore has similar performance and uses only the existing indexes – but can be written as a single query for all the combinations, by changing the implementation from INTERSECT to UNION ALL, GROUP BY and HAVING.
Here it is:
select * from employees where rowid in ( select rid from ( select rowid rid from employees where department_id = :department_id union all select rowid rid from employees where job_id = :job_id union all select rowid rid from employees where manager_id = :manager_id union all select rowid rid from employees where last_name = :last_name) group by rid having count(*) = nvl2(:department_id,1,0) + nvl2(:job_id,1,0) + nvl2(:manager_id,1,0) + nvl2(:last_name,1,0) );
If, for example, :department_id = 80, :job_id = ST_CLERK, and :manager_id = :last_name = NULL, then the first branch of the UNION ALL will return the ROWIDs of the employees in DEPARTMENT_ID 80, the second branch will return the ROWIDs of the employees with JOB_ID ST_CLERK, and the third and the forth branches will return no ROWIDs.
As we are interested only in employees that are both in DEPARTMENT_ID 80 and their JOB_ID is ST_CLERK, we want only the ROWIDs that are returned exactly twice, and this is exactly what the HAVING clause does – because nvl2(80,1,0) + nvl2(‘ST_CLERK’,1,0) + nvl2(NULL,1,0) + nvl2(NULL,1,0) is evaluated to 2.
Finally, here is the execution plan:
--------------------------------------------------------- | Id | Operation | Name | --------------------------------------------------------- | 0 | SELECT STATEMENT | | | 1 | NESTED LOOPS | | | 2 | VIEW | VW_NSO_1 | |* 3 | FILTER | | | 4 | HASH GROUP BY | | | 5 | VIEW | | | 6 | UNION-ALL | | |* 7 | INDEX RANGE SCAN | EMP_DEPARTMENT_IX | |* 8 | INDEX RANGE SCAN | EMP_JOB_IX | |* 9 | INDEX RANGE SCAN | EMP_MANAGER_IX | |* 10 | INDEX RANGE SCAN | EMP_NAME_IX | | 11 | TABLE ACCESS BY USER ROWID| EMPLOYEES | --------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - filter(COUNT(*)=NVL2(:DEPARTMENT_ID,1,0)+NVL2(:JOB_ID,1,0)+NVL2(:MANAGER_ID,1,0)+NVL2(:LAST_NAME,1,0)) 7 - access("DEPARTMENT_ID"=TO_NUMBER(:DEPARTMENT_ID)) 8 - access("JOB_ID"=:JOB_ID) 9 - access("MANAGER_ID"=TO_NUMBER(:MANAGER_ID)) 10 - access("LAST_NAME"=:LAST_NAME)
See part 2 for the case of filtering by zero or more of the columns (rather than “one or more”).
See part 3 for the case of a mandatory filter and zero or more optional filters.
Summary
The suggested solution is obviously not the optimal one for all the possible filter combinations. There is no “one size fits all” solution.
An optimal solution for this problem will usually require writing many different queries and creating many indexes.
Many times the suggested solution is a “good enough” compromise.
If not, don’t use it.
Or use some variation of it.
If you know, for example, that filtering by LAST_NAME has very high selectivity, then split the solution into two queries – the one I presented in this post for the case of :last_name is null and a dedicated query for the other case; like this:
select * from employees where last_name = :last_name and (:department_id is null or department_id = :department_id) and (:job_id is null or job_id = :job_id) and (:manager_id is null or manager_id = :manager_id) -- union all -- select * from employees where rowid in ( select rid from ( select rowid rid from employees where department_id = :department_id union all select rowid rid from employees where job_id = :job_id union all select rowid rid from employees where manager_id = :manager_id) group by rid having count(*) = nvl2(:department_id,1,0) + nvl2(:job_id,1,0) + nvl2(:last_name,1,0) ) and :last_name is null;
Or if you know, for example, that filtering by DEPARTMENT_ID has low selectivity, use it outside of the UNION ALL; like this:
-- assuming for this example that at least one of -- :job_id, :manager_id and :last_name is not null select * from employees where rowid in ( select rid from ( select rowid rid from employees where job_id = :job_id union all select rowid rid from employees where manager_id = :manager_id union all select rowid rid from employees where last_name = :last_name) group by rid having count(*) = nvl2(:job_id,1,0) + nvl2(:manager_id,1,0) + nvl2(:last_name,1,0) ) and (:department_id is null or department_id = :department_id);
Another example: if you know that :job_id and :department_id are always used together – either both are nulls or both are not null – create one composite index that contains both of them and use a single branch for them in the UNION ALL.
The suggested technique is just one more tool you can add to your toolbox…
Oren,
Interesting Idea, but I think the problem with this approach (not using composite indexes) is that if the table is large and one of the predicates has very weak selectivity, the index range scan on this column potentially does a lot of unnecessary buffer or even disk reads.
The combination of predicates maybe have a strong selectivity and an index range scan on a composite index would then require minimal block reads.
Thanks for your comment Thomas.
I’ve added a Summary section to the post, to address your comment (as well as others’).
Thanks,
Oren.
This is awesome. Will definitely file this away for use in the future. Thanks!
Thanks Chris 🙂
Hi Oren,
This is great stuff, I really like it!
It appears that the index range scans do no consistent gets to the index if the bind variable is null. What could be better?
Best regards, Stew
Thanks Stew 🙂
Yes, it’s a really nice optimization done by the Index Range Scan.
I referred to your comment in the followup post.
Thanks,
Oren.
Hi Oren,
I wonder whether in case that the final list of ROWID’s is “too long”,
Oracle will be able to “switch” from retrieving the (too many) rows by ROWID access back to a FULL TABLE SCAN.
This may happen if none of the non-null bind variables produces a filter that is efficient enough for an index range scan, that is, in cases when for a “normal” query the optimizer would have decided NOT to use an index.
Second, if one of the non-null filters is already selective enough,
the question is whether it will not be more efficient to use that filter only for an index range scan followed by a normal filter operation for the other conditions (which is how Oracle executes a normal query with several conditions), instead of “forcing” the use
of ALL the indexes, including the less selective ones, as the UNION ALL approach does.
I remember that this was one of the explanations for the fact that the
AND_EQUAL operation and hint are not used any more in the last several versions.
Thanks a lot & Best Regards,
Iudith
Thanks for your comment Iudith.
I’ve added a Summary section to the post, to address your comment (as well as others’).
Thanks,
Oren.