A Single Query with Many Filter Combinations - Part 3

In a recent post I suggested a way to write a single SQL query that filters a table by one or more of several columns.
Here is the query from that post:

select * from employees where rowid in (
  select rid from (
    select rowid rid from employees where department_id = :department_id
    union all
    select rowid rid from employees where job_id = :job_id
    union all
    select rowid rid from employees where manager_id = :manager_id
    union all
    select rowid rid from employees where last_name = :last_name)
  group by rid
  having count(*) = nvl2(:department_id,1,0) + 
                    nvl2(:job_id,1,0) + 
                    nvl2(:manager_id,1,0) + 
                    nvl2(:last_name,1,0)
);

This is actually a simplification of a real task I handled some time ago.
In today’s post I’d like to look at a variation of the problem, which is closer to the real issue I handled.

In the previous example we had 4 columns that each one of them may, or may not, be filtered by. Now, let’s think of another case, where, in addition to the 4 optional filters, there is one mandatory filter.

For example, let’s extend the HR data model, so it will include companies, and each company has its own departments, jobs and employees. There should be no dependency between records from different companies.
So I’ll add a new table – COMPANIES, and a mandatory column – COMPANY_ID – to each of the other tables (DEPARTMENTS, JOBS, and EMPLOYEES). COMPANY_ID will become part of the primary key in every table, and all the foreign key constraints from EMPLOYEES to the other tables will include the COMPANY_ID column as well.

Here are the DDL statements for this model:

create table companies (
  company_id   number(4)    not null,
  company_name varchar2(30) not null,
  --
  constraint companies_pk primary key (company_id)
); 
  
create table departments (
  company_id      number(4)    not null, 
  department_id   number(4)    not null,
  department_name varchar2(30) not null,
  manager_id      number(6),
  location_id     number(4),
  --
  constraint departments_pk primary key (company_id,department_id)
);

create table jobs(
  company_id     number(4)    not null, 
  job_id         varchar2(10) not null,
  job_title      varchar2(35) not null,
  min_salary     number(6),
  max_salary     number(6),
  --
  constraint jobs_pk primary key(company_id,job_id)
);

create table employees (
  company_id     number(4)    not null, 
  employee_id    number(6)    not null,
  first_name     varchar2(20),
  last_name      varchar2(25) not null,
  email          varchar2(25) not null,
  phone_number   varchar2(20),
  hire_date      date         not null,
  job_id         varchar2(10) not null,
  salary         number(8,2),
  commission_pct number(2,2),
  manager_id     number(6),
  department_id  number(4),
  --
  constraint emp_pk primary key (company_id,employee_id),
  constraint emp_salary_min check (salary > 0),
  constraint emp_email_uk unique (company_id,email),
  constraint emp_dept_fk foreign key (company_id,department_id)
      references departments (company_id,department_id),
  constraint emp_job_fk foreign key (company_id,job_id)
      references jobs (company_id,job_id),
  constraint emp_manager_fk foreign key (company_id,manager_id)
      references employees (company_id,employee_id)
);

create index emp_department_ix on employees (company_id,department_id);
create index emp_job_ix on employees (company_id,job_id);
create index emp_manager_ix on employees (company_id,manager_id);
create index emp_name_ix on employees (company_id, last_name, first_name);

And now we want to modify the SQL query, so it will filter the employees by COMPANY_ID (always), and by zero or more of the following columns: DEPARTMENT_ID, JOB_ID, MANAGER_ID and LAST_NAME. For that I’ll make 3 changes in the original query:

One change is adding the condition on COMPANY_ID. Note how we can add it once, and let the optimizer push the predicate into each one of the UNION ALL branches.
A second change is adding another branch to the UNION ALL, to handle the case of zero optional filters (the original query handled one or more optional filters).
And finally we need to change the HAVING clause. The original expression in the HAVING clause is evaluated to the number of non-null variable values – between 1 and 4. Now, if all the optional variables are nulls it is evaluated to 0, but then (and only then) the new branch of the UNION ALL is executed, so we’d like to count exactly 1 result per rowid. Hence, we take the greatest number between 1 and the original expression.

Here is the revised query:

select * from employees where rowid in (
  select rid from (
    select rowid rid,company_id from employees
        where department_id = :department_id
    union all
    select rowid rid,company_id from employees 
        where job_id = :job_id
    union all
    select rowid rid,company_id from employees 
        where manager_id = :manager_id
    union all
    select rowid rid,company_id from employees
        where last_name = :last_name
    union all
    select rowid rid,company_id from employees
        where :department_id is null
          and :job_id is null
          and :manager_id is null
          and :last_name is null)
  where company_id = :company_id
  group by rid
  having count(*) = greatest (1, nvl2(:department_id,1,0) + 
                                 nvl2(:job_id,1,0) + 
                                 nvl2(:manager_id,1,0) + 
                                 nvl2(:last_name,1,0) )
);

And here is the execution plan:

---------------------------------------------------------
| Id  | Operation                   | Name              |
---------------------------------------------------------
|   0 | SELECT STATEMENT            |                   |
|   1 |  NESTED LOOPS               |                   |
|   2 |   VIEW                      | VW_NSO_1          |
|*  3 |    FILTER                   |                   |
|   4 |     HASH GROUP BY           |                   |
|   5 |      VIEW                   |                   |
|   6 |       UNION-ALL             |                   |
|*  7 |        INDEX RANGE SCAN     | EMP_DEPARTMENT_IX |
|*  8 |        INDEX RANGE SCAN     | EMP_JOB_IX        |
|*  9 |        INDEX RANGE SCAN     | EMP_MANAGER_IX    |
|* 10 |        INDEX RANGE SCAN     | EMP_NAME_IX       |
|* 11 |        FILTER               |                   |
|* 12 |         INDEX RANGE SCAN    | EMP_DEPARTMENT_IX |
|  13 |   TABLE ACCESS BY USER ROWID| EMPLOYEES         |
---------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter(COUNT(*)=GREATEST(1,NVL2(:DEPARTMENT_ID,1,0)+NVL2(:JOB_ID,1,0)+NVL2(:MANAGE
              R_ID,1,0)+NVL2(:LAST_NAME,1,0)))
   7 - access("COMPANY_ID"=TO_NUMBER(:COMPANY_ID) AND
              "DEPARTMENT_ID"=TO_NUMBER(:DEPARTMENT_ID))
   8 - access("COMPANY_ID"=TO_NUMBER(:COMPANY_ID) AND "JOB_ID"=:JOB_ID)
   9 - access("COMPANY_ID"=TO_NUMBER(:COMPANY_ID) AND
              "MANAGER_ID"=TO_NUMBER(:MANAGER_ID))
  10 - access("COMPANY_ID"=TO_NUMBER(:COMPANY_ID) AND "LAST_NAME"=:LAST_NAME)
  11 - filter(:LAST_NAME IS NULL AND :MANAGER_ID IS NULL AND :JOB_ID IS NULL AND
              :DEPARTMENT_ID IS NULL)
  12 - access("COMPANY_ID"=TO_NUMBER(:COMPANY_ID))

A Single Query with Many Filter Combinations – Part 3

Leave a Reply Cancel reply