Write Less with More – Part 2 (Identity Columns)

This post is part 2 of the Write Less with More series of blog posts, focusing on new features of Oracle 12c that allow us developers to write less than in previous versions in order to achieve the same functionality. Each part is dedicated for one new feature that solves some common development task. For more details, including the setup of the examples used throughout the series and the list of development tasks that drive the series, please see part 1.

All the Series Parts

Published

Part 1 – write less configuration with SQL*Loader Express Mode
Part 2 – write less application code with Identity Columns
Part 3 – write less application code with In-Database Archiving
Part 4 – write less application code with Temporal Validity
Part 5 – write less code in SQL with Row Limiting
Part 6 – write less “inappropriately located” code with SELECT FROM Package-Level Collection Types
Part 7 – write less code in SQL with Lateral Inline Views
Part 8 – write less “inappropriately located” code with PL/SQL in the WITH Clause

Task #2

The PROJECT_ASSIGNMENTS table has a surrogate* primary key – each record should have a unique number in the ASSIGNMENT_ID column. This number has no business meaning; the only requirement for such a key is that the number will be unique – that there will be no two records in the table with the same number.

* In a recent post about defining primary keys using Data Modeler, HeliFromFinland is nicely walking through the decision making process of choosing between natural and surrogate keys

A Pre-12c Solution

This is a very common scenario, and almost always involves a sequence in Oracle. This makes sense, as a sequence is an object that generates unique numbers.

Before version 12c of Oracle there is no built-in relationship between tables and sequences, although logically there is almost always a one-to-one relationship between a table and a sequence.

Note: with the lack of a built-in relationship between tables and sequences, it is a good practice to use naming convention to represent this relationship; for example, naming the sequence that is responsible for assigning numbers to the ABC table ABC_SEQ (this specific convention implies yet another rule – that the length of table names should not exceed 26 characters; otherwise, the related sequence name will exceed the maximum length allowed for Oracle objects – 30 characters). Obviously, if two or more columns in one table are populated from different sequences, a more complex naming convention will be used.

So, the first part of the solution for our task is creating a sequence:

CREATE SEQUENCE project_assignments_seq;

The second part is to use the sequence whenever a new record of PROJECT_ASSIGNMENTS is inserted. If we know for sure about all the places that insert records to the table (preferably a single place) we can use the sequence explicitly in those places. However, if we cannot guarantee this, we should make sure the sequence will be used (as stated specifically in the task requirement) implicitly. And the way to achieve this in a pre-12c solution is using a BEFORE INSERT trigger, like this:

CREATE TRIGGER project_assignments_bir_tr
  BEFORE INSERT ON project_assignments
  FOR EACH ROW
BEGIN
  :new.assignment_id := project_assignments_seq.nextval; -- assuming 11g; in earlier versions use SELECT...
END;
/

Note that with this solution the sequence is used for each and every new record. Even if a value is specified explicitly in the INSERT (or MERGE) statement, this value will be ignored and overridden by the trigger, using the next value generated by the sequence.

If the requirement would be weaker – allowing specifying values explicitly and using the sequence only when values are not specified explicitly – then the solution would be:

CREATE TRIGGER project_assignments_bir_tr
  BEFORE INSERT ON project_assignments
  FOR EACH ROW
  WHEN (new.assignment_id IS NULL)
BEGIN
  :new.assignment_id := project_assignments_seq.nextval;
END;
/

Note that this option may cause problems. If you use it, make sure to allocate different (non-overlapping) ranges of numbers to the sequence and to the explicitly specified values. Otherwise, conflicts may happen.

A 12c Solution

Identity Columns

In Oracle 12c a table column can be created as “identity”. As a result, the column implicitly becomes mandatory, and a sequence is automatically created and associated with the table. Then (depending on how exactly the identity is defined) the sequence is automatically used to produce values for the identity column when new rows are inserted.

If you want to learn about many more new features of Oracle 12c, you may be interested in one of my in-house training sessions, particularly the Oracle 12c New Features for Developers full-day seminar.

The GENERATED ALWAYS Option

An identity column can behave in one of three different ways. In the first one – when the column is defined with the GENERATED ALWAYS AS IDENTITY option (or simply GENERATED AS IDENTITY) – the sequence is always used, and specifying explicit values is forbidden. Let’s use this option to implement our task:

drop table project_assignments purge;
create table project_assignments (
       assignment_id integer GENERATED AS IDENTITY constraint project_assignments_pk primary key,
       person_id integer not null constraint assignments_fk_people references people,
       project_id integer not null constraint assignments_fk_projects references projects
);

We’ll insert now two rows into the PROJECT_ASSIGNMENTS table, without specifying the ASSIGNMENT_ID column, and we’ll see that the rows get a unique ASSIGNMENT_ID value:

> insert into project_assignments (person_id,project_id) values (101,1);
1 row created.
> insert into project_assignments (person_id,project_id) values (102,2);
1 row created.
> select * from project_assignments ;

ASSIGNMENT_ID PERSON_ID  PROJECT_ID
------------- ---------- ----------
            1        101          1
            2        102          2

If we try to specify an ASSIGNMENT_ID value explicitly, we’ll get the following exception:

> insert into project_assignments (assignment_id,person_id,project_id) values (3,103,3);

ERROR at line 1:
ORA-32795: cannot insert into a generated always identity column

The GENERATED BY DEFAULT Option

If we do want the ability to specify the ASSIGNMENT_ID explicitly, and let Oracle assign a value automatically only when we do not specify it, we can use instead the GENERATED BY DEFAULT AS IDENTITY option. Let’s change the ASSIGNMENT_ID column to behave in this way:

> alter table project_assignments modify (
>   assignment_id generated BY DEFAULT as identity
> );
Table altered.

Retrying the last INSERT statement will succeed now:

> insert into project_assignments (assignment_id,person_id,project_id) values (3,103,3);
1 row created.

As I noted before, in the “A Pre-12c Solution” section, mixing manual and automatic assignments of values to the same column may cause conflicts, and unless there is a good reason to do it, I would refrain from it. As mentioned above, the implementation of identity columns uses sequences. In our example, this behind-the-scenes sequence has generated already the numbers 1 and 2, so the next time it will be used it will generate the number 3. However, in the third row that we inserted ASSIGNMENT_ID has the value 3, because we specified it explicitly. ASSIGNMENT_ID is the primary key of the table, so it cannot have the same value in two records. I feel an imminent conflict…

And indeed, trying to insert a record without specifying ASSIGNMENT_ID will fail on a unique constraint violation:

> insert into project_assignments (person_id,project_id) values (104,4);

ERROR at line 1:
ORA-00001: unique constraint (DEMO5.PROJECT_ASSIGNMENTS_PK) violated

Another attempt will succeed, as the sequence has already been incremented:

> insert into project_assignments (person_id,project_id) values (104,4);
1 row created.

The GENERATED BY DEFAULT ON NULL Option

Now, what will happen if we try to set ASSIGNMENT_ID to NULL during INSERT? An identity column is always mandatory – the column is defined as NOT NULL as part of the identity definition (just like it happens when we define a column as PRIMARY KEY) – and since we defined ASSIGNMENT_ID as GENERATED BY DEFAULT AS IDENTITY, the following statement will simply try to insert NULL to ASSIGNMENT_ID, and will fail:

> insert into project_assignments (assignment_id,person_id,project_id) values (null,201,1);

ERROR at line 1:
ORA-01400: cannot insert NULL into ("DEMO5"."PROJECT_ASSIGNMENTS"."ASSIGNMENT_ID")

We can use the third form of the identity clause – GENERATED BY DEFAULT ON NULL AS IDENTITY – which means that Oracle will generate values from the sequence whenever the identity column is not specified explicitly or it is specified explicitly with NULL. Let’s change the ASSIGNMENT_ID to behave in this way:

> alter table project_assignments modify (
>   assignment_id generated BY DEFAULT ON NULL as identity
> );

Table altered.

Retrying the last INSERT statement will succeed now:

> insert into project_assignments (assignment_id,person_id,project_id) values (null,201,1);
1 row created.

> select * from project_assignments;

ASSIGNMENT_ID PERSON_ID  PROJECT_ID
------------- ---------- ----------
            1        101          1
            2        102          2
            3        103          3
            4        104          4
            5        201          1

5 rows selected.

Note: DEFAULT ON NULL is actually an independent feature (new in 12c) – it can be used in the definition of any column, not only identity columns.

Note: It is also possible (as of 12c) to use a user-defined sequence as the default value of a numeric non-identity column. Identity columns, however, give a more complete solution.

Data Dictionary

Let’s see now what has changed in the data dictionary in order to represent identity columns. I’ll refer to the USER_* views, but the same changes are relevant to the corresponding ALL_* and DBA_* views as well.

USER_TABLES has a new column – HAS_IDENTITY – that contains YES if the table has an identity column and NO if not.

select table_name,HAS_IDENTITY from user_tables;

TABLE_NAME                     HAS_IDENTITY
------------------------------ ------------
PEOPLE                         NO
PROJECTS                       NO
PROJECT_ASSIGNMENTS            YES

The fact that USER_TABLES has such a Boolean attribute may hint that a table cannot have more than one identity column, and this is indeed one of the restrictions of this feature (which seems a very sensible restriction, in my opinion).

USER_TAB_COLUMNS has two new relevant columns: IDENTITY_COLUMN and DEFAULT_ON_NULL:

select column_name,data_type,nullable,
       column_id,IDENTITY_COLUMN,DEFAULT_ON_NULL 
from user_tab_columns
where table_name = 'PROJECT_ASSIGNMENTS'
order by column_id;

COLUMN_NAME   DATA_TYPE  NUL  COLUMN_ID IDENTITY_ DEFAULT_O
------------- ---------- --- ---------- --------- ---------
ASSIGNMENT_ID NUMBER     N            1 YES       YES
PERSON_ID     NUMBER     N            2 NO        NO
PROJECT_ID    NUMBER     N            3 NO        NO

A new view – USER_TAB_IDENTITY_COLS – contains the details of all the identity columns. In 12.1.0.1 this view contained TABLE_NAME, COLUMN_NAME, GENERATION_TYPE (which can be either ALWAYS or BY DEFAULT) and IDENTITY_OPTIONS (the configuration of the associated sequence). In 12.1.0.2 another column was added – SEQUENCE_NAME – which makes the relationship between the table column and the associated sequence quite formal.

select * From USER_TAB_IDENTITY_COLS;

                                  GENERATION
TABLE_NAME          COLUMN_NAME   TYPE       SEQUENCE_NAME IDENTITY_OPTIONS
------------------- ------------- ---------- ------------- -------------------------
PROJECT_ASSIGNMENTS ASSIGNMENT_ID BY DEFAULT ISEQ$$_111567 START WITH: 1, INCREMENT
                                                           BY: 1, MAX_VALUE: 9999999
                                                           999999999999999999999, MI
                                                           N_VALUE: 1, CYCLE_FLAG: N
                                                           , CACHE_SIZE: 20, ORDER_F
                                                           LAG: N

Let’s take a look at the objects we have in our schema at this point:

select object_type,object_name,object_id from user_objects;

OBJECT_TYPE  OBJECT_NAME                     OBJECT_ID
------------ ------------------------------ ----------
SEQUENCE     ISEQ$$_111567                      111568
TABLE        PEOPLE                             111530
INDEX        PEOPLE_PK                          111531
TABLE        PROJECTS                           111532
INDEX        PROJECTS_PK                        111533
TABLE        PROJECT_ASSIGNMENTS                111567
INDEX        PROJECT_ASSIGNMENTS_PK             111569

We can see the sequence that was created automatically by Oracle for supporting the ASSIGNMENT_ID identity column of the PROJECT_ASSIGNMENTS table. Note that the sequence_name contains the object_id of the associated table.

What happens when we drop the table?

drop table project_assignments;

select object_type,object_name,object_id from user_objects;

OBJECT_TYPE  OBJECT_NAME                     OBJECT_ID
------------ ------------------------------ ----------
INDEX        PROJECTS_PK                        111533
TABLE        PROJECTS                           111532
INDEX        PEOPLE_PK                          111531
TABLE        PEOPLE                             111530
SEQUENCE     ISEQ$$_111567                      111568

We can see that the table was dropped but the sequence was not. This may seem disappointing at first, but actually this is quite clever. The table was dropped but not purged, so it is still in the recycle bin (I’m using the default setting of the RECYCLEBIN parameter, which is “on”):

select object_name,original_name,operation,type from user_recyclebin;

OBJECT_NAME                    ORIGINAL_NAME          OPERATION TYPE
------------------------------ ---------------------- --------- ----------
BIN$lNxGd9cXTAuYpOa9kda26w==$0 PROJECT_ASSIGNMENTS    DROP      TABLE
BIN$qvScIgi+Rrel7veFMBxXsQ==$0 PROJECT_ASSIGNMENTS_PK DROP      INDEX

If we revive the table using FLASHBACK TABLE

flashback table project_assignments to before drop;

we can resume working with the table, and particularly with the identity column, because the sequence was not dropped:

insert into project_assignments (person_id,project_id) values (201,1);

select * from project_assignments;

ASSIGNMENT_ID  PERSON_ID PROJECT_ID
------------- ---------- ----------
            1        101          1
            2        102          2
            3        103          3
            4        104          4
            5        201          1
            6        201          1

If we drop the table and purge it (or purge the table from the recyclebin), then the associated sequence is dropped as expected:

drop table project_assignments purge;

select object_type,object_name,object_id from user_objects;

OBJECT_TYPE  OBJECT_NAME                     OBJECT_ID
------------ ------------------------------ ----------
TABLE        PEOPLE                             111530
INDEX        PEOPLE_PK                          111531
TABLE        PROJECTS                           111532
INDEX        PROJECTS_PK                        111533

select object_name,original_name,operation,type from user_recyclebin;

no rows selected

Sequence Configuration

When we create a sequence explicitly we can configure its attributes – from which number to start, the interval between numbers, the cache size, and more. When we create an identity column, we can similarly configure the implicitly-created sequence. For example, let’s recreate the PROJECT_ASSIGNMENTS table, but this time with a sequence that starts with 1000 and increments by 10:

create table project_assignments (
       assignment_id integer GENERATED AS IDENTITY 
                             (start with 1000 increment by 10)
                             constraint project_assignments_pk primary key,
       person_id integer not null  constraint assignments_fk_people references people,
       project_id integer not null constraint assignments_fk_projects references projects
);

insert into project_assignments (person_id,project_id) values (101,1);
insert into project_assignments (person_id,project_id) values (102,2);
insert into project_assignments (person_id,project_id) values (103,3);

select * from project_assignments;

ASSIGNMENT_ID  PERSON_ID PROJECT_ID
------------- ---------- ----------
         1000        101          1
         1010        102          2
         1020        103          3

DROP IDENTITY

A non-identity column cannot become an identity column. The opposite, however, is possible: an identity column can become a non-identity column, and its associated sequence is automatically dropped:

alter table project_assignments modify assignment_id DROP IDENTITY;
select * from user_tab_identity_cols;

no rows selected

select object_type,object_name,object_id from user_objects;

OBJECT_TYPE  OBJECT_NAME                     OBJECT_ID
------------ ------------------------------ ----------
TABLE        PEOPLE                             111530
INDEX        PEOPLE_PK                          111531
TABLE        PROJECTS                           111532
INDEX        PROJECTS_PK                        111533
TABLE        PROJECT_ASSIGNMENTS                111570
INDEX        PROJECT_ASSIGNMENTS_PK             111572

START WITH LIMIT VALUE

Let’s see one last option that is possible only because there is a real association between the table column and the sequence. I’ll drop the PROJECT_ASSIGNMENTS table and recreate it once more, with the GENERATED BY DEFAULT AS IDENTITY clause, and insert some records with explicit values in ASSIGNMENT_ID:

drop table project_assignments purge;
create table project_assignments (
       assignment_id integer GENERATED BY DEFAULT AS IDENTITY constraint project_assignments_pk primary key,
       person_id integer not null constraint assignments_fk_people references people,
       project_id integer not null constraint assignments_fk_projects references projects
);
insert into project_assignments (assignment_id,person_id,project_id) values (18,101,1);
insert into project_assignments (assignment_id,person_id,project_id) values (22,102,2);
insert into project_assignments (assignment_id,person_id,project_id) values (34,103,3);

select * from project_assignments;

ASSIGNMENT_ID  PERSON_ID PROJECT_ID
------------- ---------- ----------
           18        101          1
           22        102          2
           34        103          3

Suppose that now I want to convert the identity column to be GENERATED ALWAYS AS IDENTITY. Since the table already contains records, I need to prevent collisions between the existing values and the future to-be-automatically-generated values, and therefore I’d like to start the sequence from a number that is higher than all the existing ASSIGNMENT_ID values (in our case, 35).

We can make Oracle doing it automatically, by using the START WITH LIMIT VALUE clause:

alter table project_assignments modify assignment_id 
generated always as identity (START WITH LIMIT VALUE);

insert into project_assignments (person_id,project_id) values (104,4);
select * from project_assignments;

ASSIGNMENT_ID  PERSON_ID PROJECT_ID
------------- ---------- ----------
           18        101          1
           22        102          2
           34        103          3
           35        104          4

Conclusion

We saw in this part of the Write Less with More series that Identity Columns allow us to write less application code.

The next post will be about another new Oracle 12c feature – In-Database Archiving.

SET and CARDINALITY

In the last issue of Oracle Magazine, Tom Kyte shows cool solutions to some questions he got on asktom.oracle.com, using pure SQL. One of the nice things in SQL is that one problem may have many different solutions (I wrote about it once here).
One of the questions there is about counting distinct values across several columns within each row. You can see the question, Tom’s answer and several alternative answers suggested by other people here. I added a suggested alternative of my own, using two SQL functions that are not very known, so I thought it’s worth mentioning them here.

These two functions are SET and CARDINALITY, and both get a nested table as their input parameter.

SET returns a nested table of the same type as the input parameter, after removing duplicate elements from it.
CARDINALITY returns the number of elements in a nested table.

Both functions exist since Oracle 10g Release 1.

Let’s create a nested table type:

create type string_ntt as table of varchar2(4000)
/

and construct an object of this type, using the default constructor, containing 6 elements, out of which 4 are unique:

select string_ntt('A','B','C','B','D','A') as my_list from dual;

MY_LIST
------------------------------------------
STRING_NTT('A', 'B', 'C', 'B', 'D', 'A')

If we apply the CARDINALITY function on this object, we will get the number of elements it contains:

select CARDINALITY(string_ntt('A','B','C','B','D','A')) as num_of_elements from dual;

NUM_OF_ELEMENTS
---------------
              6

If we apply the SET function on this object, we will get a new object of the same type, with only the 4 unique values:

select SET(string_ntt('A','B','C','B','D','A')) as my_set from dual;

MY_SET
----------------------------------------
STRING_NTT('A', 'B', 'C', 'D')

And by combining both CARDINALITY and SET, we will get the number of unique elements in the nested table object:

select CARDINALITY(SET(string_ntt('A','B','C','B','D','A'))) as num_of_unique_elements from dual;

NUM_OF_UNIQUE_ELEMENTS
----------------------
                     4

Optimizer bug leads to wrong results

A few days ago a query that I wrote did not return the expected results. After some investigation I came to conclusion that the reason is a bug of the optimizer. Here is a very simplified example (and quite silly functionality-wise, but I guarantee you that the original, more complex, query does make sense).
I tried it with Oracle 11.2.0.2, 11.2.0.4 and 12.1.0.2, and got the same (wrong) behavior in all three cases.

Update [August 2, 2017]
I tested it with 12.2.0.1, and it’s working fine!

The query is based on the following table:

create table t(
  id   number not null constraint t_pk primary key,
  num1 number not null
);

Let’s fill it with some records:

insert into t values (1,74);
insert into t values (2,96);
insert into t values (3,41);

We start with this query (that works just fine):

select (select max(level) from dual connect by level <= t.num1) 
from   t
where  t.id = :id;

For our sample data we would expect the query to return 74 for :id=1, 96 for :id=2 and 41 for :id=3, and indeed these are the results that we get.
This is the execution plan:

--------------------------------------------------------------------------------------
| Id  | Operation                     | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |      |     1 |    26 |     3   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE               |      |     1 |       |            |          |
|*  2 |   CONNECT BY WITHOUT FILTERING|      |       |       |            |          |
|   3 |    FAST DUAL                  |      |     1 |       |     2   (0)| 00:00:01 |
|   4 |  TABLE ACCESS BY INDEX ROWID  | T    |     1 |    26 |     1   (0)| 00:00:01 |
|*  5 |   INDEX UNIQUE SCAN           | T_PK |     1 |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(LEVEL<=:B1)
   5 - access("T"."ID"=TO_NUMBER(:ID))

Now, I will only add an innocent inline view to the query:

select (select max(level) from dual connect by level <= t.num1) 
from   t,
       (select distinct 1 from dual)
where  t.id = :id;

Obviously the existence of the inline view - (select distinct 1 from dual) - should make no difference. It returns a single record, we don't use its result set, and we have no join conditions in the query (so we actually have a cartesian product between 1 record and [0 or 1] record).

But now the results are wrong. The query returns 1 in all the cases, and the reason is that the scalar subquery - (select max(level) from dual connect by level <= t.num1) - is always executed with t.num1 = null.

Let's look at the execution plan for this case:

-------------------------------------------------------------------------------------------
| Id  | Operation                     | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |           |     1 |    13 |     6  (17)| 00:00:01 |
|   1 |  SORT AGGREGATE               |           |     1 |       |            |          |
|*  2 |   CONNECT BY WITHOUT FILTERING|           |       |       |            |          |
|   3 |    FAST DUAL                  |           |     1 |       |     2   (0)| 00:00:01 |
|   4 |  VIEW                         | VM_NWVW_0 |     1 |    13 |     4  (25)| 00:00:01 |
|   5 |   NESTED LOOPS                |           |     1 |    38 |     3   (0)| 00:00:01 |
|   6 |    FAST DUAL                  |           |     1 |       |     2   (0)| 00:00:01 |
|*  7 |    INDEX UNIQUE SCAN          | T_PK      |     1 |       |     0   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(LEVEL<=:B1)
   7 - access("T"."ID"=TO_NUMBER(:ID))

For some reason, the optimizer decided that it is unnecessary to go to the table T. Oracle accesses only the index T_PK, which contains only the ID column. With this execution plan it is impossible to get the value of the NUM1 column, and indeed the results show that NUM1 is allegedly NULL.

Step 4 in the execution plan refers to VM_NWVW_0, which is usually related to Complex View Merging. So let's try to disable view merging using the NO_MERGE hint:

select (select max(level) from dual connect by level <= t.num1) 
from   t,
       (select /*+ NO_MERGE */ distinct 1 from dual)
where  t.id = :id;

And indeed we get the correct results, with this execution plan:

--------------------------------------------------------------------------------------
| Id  | Operation                     | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |      |     1 |    26 |     6  (17)| 00:00:01 |
|   1 |  SORT AGGREGATE               |      |     1 |       |            |          |
|*  2 |   CONNECT BY WITHOUT FILTERING|      |       |       |            |          |
|   3 |    FAST DUAL                  |      |     1 |       |     2   (0)| 00:00:01 |
|   4 |  NESTED LOOPS                 |      |     1 |    26 |     4  (25)| 00:00:01 |
|   5 |   TABLE ACCESS BY INDEX ROWID | T    |     1 |    26 |     1   (0)| 00:00:01 |
|*  6 |    INDEX UNIQUE SCAN          | T_PK |     1 |       |     1   (0)| 00:00:01 |
|   7 |   VIEW                        |      |     1 |       |     3  (34)| 00:00:01 |
|   8 |    FAST DUAL                  |      |     1 |       |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(LEVEL<=:B1)
   6 - access("T"."ID"=TO_NUMBER(:ID))

Also, if we remove the DISTINCT from the inline view, there will be no complex view merging anymore, and the execution plan (and therefore the results) is fine:

select (select max(level) from dual connect by level <= t.num1) 
from   t,
       (select 1 from dual)
where  t.id = :id;

--------------------------------------------------------------------------------------
| Id  | Operation                     | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |      |     1 |    26 |     5   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE               |      |     1 |       |            |          |
|*  2 |   CONNECT BY WITHOUT FILTERING|      |       |       |            |          |
|   3 |    FAST DUAL                  |      |     1 |       |     2   (0)| 00:00:01 |
|   4 |  NESTED LOOPS                 |      |     1 |    26 |     3   (0)| 00:00:01 |
|   5 |   FAST DUAL                   |      |     1 |       |     2   (0)| 00:00:01 |
|   6 |   TABLE ACCESS BY INDEX ROWID | T    |     1 |    26 |     1   (0)| 00:00:01 |
|*  7 |    INDEX UNIQUE SCAN          | T_PK |     1 |       |     0   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(LEVEL<=:B1)
   7 - access("T"."ID"=TO_NUMBER(:ID))

(However, in my original query the DISTINCT was required...)

You are most welcome to comment or to ask questions in this page, or to write me at oren@db-oriented.com.

3 Points about Indexes and Order

When dealing with indexes, order may be very significant, in several aspects.
Perhaps it’s not surprising after all when talking about a data structure that its purpose is to keep data in order…
We’ll refer to three different points:
1. Is the order of columns in a composite index important?
2. Which order is better: filling a table with data and then creating indexes, or creating indexes and then filling the table with data?
3. When creating several indexes, is the order of creation important?
Not always there is one correct answer that covers all the different cases, but it is always worth asking the questions before rushing to execute.

Is the order of columns in a composite index important?
Certainly yes. Let’s take a look at the following two indexes – they both contain the same two columns, but in a different order:

CREATE INDEX T_IDX_1 ON T (COL_A,COL_B);
CREATE INDEX T_IDX_2 ON T (COL_B,COL_A);

Let’s assume that the table T contains many records and that the following queries are highly selective (i.e., they return a relatively small number of records from the table), so it is likely that using an index is better than full scanning the table.

For the following query both indexes are equally good:

SELECT *
FROM T
WHERE COL_A = :VAL1
  AND COL_B = :VAL2;

But for the following query, the index T_IDX_1 is good, while T_IDX_2 is not:

SELECT *
FROM T
WHERE COL_A = :VAL1;

Although the index T_IDX_2 contains the column COL_A, it is not enough, as it does not contain it in its leading part. The order of columns in the index matters.
It’s easy to understand the difference when thinking of the following example: in the phone book the names are ordered first by last name, and then by first name. We can find very quickly all the subscribers whose last name is “Nakdimon”, but we must scan the whole phone book for finding all the subscribers whose first name is “Oren”.

Note: if the table contains a relatively small number of different unique values in the column COL_B, Oracle will still be able to use the index T_IDX_2 for answering the last query by using the Index Skip Scan access path, but still using T_IDX_1 with Index Range Scan will be better.

For the next two questions we’ll consider the following case: we need to create a new table with the following two indexes, and fill it with plenty of data:

CREATE TABLE T (
   COL_A NUMBER,
   COL_B DATE,
   COL_C VARCHAR2(4000),
   …
);
CREATE INDEX T_IDX_A ON T (COL_A);
CREATE INDEX T_IDX_BA ON T (COL_B,COL_A);

Which order is better: filling a table with data and then creating indexes, or creating indexes and then filling the table with data?
Usually working in the former order (creating the indexes when the table is already filled) will take less time than working in the latter order (filling the table when the indexes already exist), since in the latter case the indexes need to be maintained with the insertion of each and every record.

When creating several indexes, is the order of creation important?
Here the answer is positive in certain cases.
Suppose that we created the table T and filled it with many rows, and now it’s time to create the indexes. We can create T_IDX_A first and T_IDX_BA second, or vice versa. Let’s examine both options:

Option 1:

We’ll create T_IDX_A first. For that, Oracle will do a Full Table Scan of T (and will take the value of COL_A from every record, and of course the ROWID of every record to know where to point to from the index)
Now we’ll create T_IDX_BA. Once again, Oracle will do a Full Table Scan of T (and will take the values of COL_B and COL_A and the ROWID from every record)

Option 2:

We’ll create T_IDX_BA first. For that, Oracle will do a Full Table Scan of T (and will take the values of COL_B and COL_A and the ROWID from every record)
Now we’ll create T_IDX_A, and this is where the plot changes. Oracle can do a Full Table Scan of T here as well, but in this case it has another alternative, a better one in most cases. The only details that are needed in order to build the index are the values of COL_A and the ROWID of all the records in the table (where COL_A is not null), and these details already exist in the index T_IDX_BA. Therefore, Oracle can do an Index Fast Full Scan of T_IDX_BA, instead of Full Table Scan of the table.

So, if all the columns of one index are included in a second index, it is recommended to create the second index first, and only then the first index, and enable Oracle to consider more alternatives. The more the number of columns in the table that do not exist in the indexes, the more significant the improvement in the creation time of the second index (by doing Index Fast Full Scan instead of Full Table Scan) is.

You are most welcome to comment or to ask questions in this page, or to write me at oren@db-oriented.com.

One Problem – Many Solutions

One of the things I love in the SQL language is that one problem may have many different solutions, or more specifically, one functional question may be solved by different SQL queries.

It doesn’t mean that all the solutions are equivalent in all aspects. If all the solutions solve the functional question correctly, then all the result sets must be the same. But they may be different in performance, for example (it’s so satisfying reducing the execution time of a query from hours to seconds just by rewriting it).

Another example: one query may be short and elegant while another “equivalent” query may be long and cumbersome.

And not all the solutions are even legitimate in all versions of the database – many times a new feature may make a query much simpler (analytic functions are a great example for that – I still remember how their appearance [in Oracle 8.1.6 if I remember correctly] had a huge effect on my query writing – making them shorter, and many times faster – and actually I see the same effect on people today when I rewrite their queries and at the same time introduce them to analytic functions).

Oracle 12c introduced some new features that let us write less for getting the same functionality, like Lateral Inline Views or the Row Limiting clause.

Here is a nice example for one problem with several solutions.

A few days ago, Lucas Jellema from AMIS raised in this blog post the question of “packing” multiple rows that represent adjacent time periods into a single row that represents the unified period, and suggested the following solution, that demonstrates well some important features such as the LAG and RANK analytic functions, Subquery Factoring (supported since Oracle 9i), and Recursive Subquery Factoring (introduced in Oracle 11g):

with chairpeople as
( select chairperson
  ,      date_from
  ,      date_to
  ,      case date_from - lag(date_to) over (partition by chairperson order by date_from asc)
         when 1 then 'PACK'
         end action
  ,      rank()  over (partition by chairperson order by date_from asc) rnk
  from   chairmanships
)
, packing_chairs (chair, date_from, date_to, lvl) as
( select chairperson, date_from, date_to, 1
  from   chairpeople
  where  action is null
  union all
  select p.chair, p.date_from, c.date_to, lvl+1
  from   chairpeople c
         join
         packing_chairs p
         on (c.chairperson = p.chair and c.rnk = p.lvl+1)
  where  c.action='PACK'
  )
, packed_chairs as
( select chair, date_from, nullif(max(nvl(date_to,date'9999-12-31')),date'9999-12-31') date_to
  from   packing_chairs
  group
  by     chair, date_from
)
select *
from   packed_chairs
order
by     date_from;

-- note: this is a slightly revised version of the query from the original post

I suggested another solution, based on the LAST_VALUE analytic function:

select chairperson,
       date_from,
       max(date_to) keep(dense_rank last order by date_to) date_to
from   (select chairperson,
               last_value(new_period_date_from ignore nulls) over(partition by chairperson order by date_from) date_from,
               date_to
        from   (select chairperson,
                       date_from,
                       date_to,
                       case when lnnvl(date_from – lag(date_to) over(partition by chairperson order by date_from) = 1) then date_from end new_period_date_from
                from   chairmanships))
group  by chairperson,
          date_from
order  by date_from;

In another comment to Lucas’ post, Sayan Malakshinov suggested an even simpler and shorter solution:

select 
  chairperson
 ,min(date_from) keep (dense_rank first order by date_from,date_to) as date_from
 ,max(date_to  ) keep (dense_rank last  order by date_from,date_to) as date_to
from (
      select
           chairperson 
         , date_from 
         , date_to 
         , sum(flag) over(partition by chairperson order by date_from,date_to) grp
      from (
            select 
                 chairperson 
               , date_from 
               , date_to 
               , decode( 1 + lag(date_to)over(partition by chairperson order by date_from,date_to), date_from, 0, 1) flag
            from chairmanships
           )
     )
group by chairperson, grp
order by chairperson, grp;

Finally (for now at least), I suggested yet another solution, using a new Oracle 12c feature – Pattern Matching:

select * from chairmanships 
  match_recognize (
     partition by chairperson
     order by date_from
     measures frst.date_from as date_from,
              date_to as date_to
     one row per match
     pattern (frst nxt*)
     define nxt as nxt.date_from = prev(nxt.date_to)+1) 
order by chairperson,date_from;

So, there you go – one question, four very different solutions.

The COALESCE Function

Everybody knows the NVL function, and use it frequently. On the other hand, the COALESCE function, which belongs to the same family of functions, is less known, although it exists since version 9i and has more capabilities than the good old NVL.

Let’s start with the similarities between these functions:

What?

Both functions have a similar purpose – to return the first non-null element from a list of elements. If all the elements in the list are null, the result will be null as well.

Where?

Both functions can be called from SQL and from PL/SQL.

Regarding the differences between the functions – there are two major and significant ones:

How Many?

The first difference is in the number of arguments the functions get.

NVL supports in exactly two arguments. The expression

NVL(a,b)

returns the value of a if it is not null, and the value of b if a is null.

The COALESCE function supports two or more arguments.

Let’s look at the expression

COALESCE(X1, X2, X3,..., Xn)

If X1 is not null its value is returned
Otherwise, if X2 is not null its value is returned
Otherwise, if X3 is not null its value is returned
And so on…

So if you write sometimes something like

NVL(a,NVL(b,c))

or alternatively

CASE WHEN a IS NOT NULL THEN a WHEN b IS NOT NULL THEN b ELSE c END

then you can write instead

COALESCE(a,b,c)

When?

The second difference is less evident, but very significant in some cases.

For many years I assumed that if I write

NVL(expression1,expression2)

then expression2 will be evaluated only if expression1 is null. It seems logic, doesn’t it? But this is not how it works. in NVL both expressions are always evaluated. In COALESCE, on the other hand, the expressions are evaluated only when needed (and so is the case with DECODE and CASE, by the way).

Let’s see an example where this behavior of NVL is problematic:

In the following query we’d like to return the value of the DESCRIPTION column if it is not null. If the column is empty, we’d like to return the result of some function – get_description – that gets as input the columns A_ID and B_ID.

SELECT NVL(DESCRIPTION, get_description(A_ID,B_ID))
FROM TEST_TABLE;

It doesn’t matter what get_description does exactly. What matters is that every call to the function when DESCRIPTION contains a non-null value is unnecessary, but this is exactly what happens here, and the performance may be hurt significantly.

But if we use COALESCE instead of NVL, we’ll achieve the same functionality, while removing the unnecessary calls. get_description will be called only for records in which the DESCRIPTION column is empty:

SELECT COALESCE(DESCRIPTION, get_description(A_ID,B_ID))
FROM TEST_TABLE;

One advantage NVL does have over COALESCE is the number of characters we need to type.

So if we have exactly two expressions, and both of them are already evaluated, we can continue using NVL. In the other cases, using COALESCE is probably a better choice.

External Tables

This is a feature I use a lot…

An external table in Oracle is a table that its structure is defined in the data dictionary, but its content is taken from a standalone file, and is not stored inside the database. This feature is very useful as part of an ETL process, where you need to load the content of some files into the database.

The external file must be located in the file system of the database server machine (or at least in a directory that it can access). This is the major drawback I see in this feature with respect to its predecessor, SQL*Loader (which loads files located at the client machine). But the advantages are significantly superior to the disadvantages.

The definition of the external table describes how to locate, parse and process the file: Continue reading “External Tables”

Literals

It is possible to specify a date value as a literal, like that:

DATE '2013-08-29'

This instantiates a date constant, which is equivalent to the result of

to_date('29/08/2013','dd/mm/yyyy')

The date format in this case is always yyyy-mm-dd, and is not influenced by any parameter (the default format of the to_date function is defined by the NLS_DATE_FORMAT parameter).

For example, to find all the employees that were hired during August 2013 we can use the following query:

select * 
from employees 
where hire_date >= date'2013-08-01' 
  and hire_date < date'2013-09-01';

To specify a value that contain also a time part as a literal, you can use:

TIMESTAMP '2013-08-29 13:45:00'

The format is always yyyy-mm-dd hh24:mi:ss (fractions of seconds may also be specified).

The MERGE statement

The SQL statement MERGE (available from version 9i) provides the functionality of “INSERT or UPDATE” in a single statement.

The basic syntax is:

MERGE INTO <target>
USING <source>
ON (<join conditions>)
WHEN MATCHED THEN UPDATE <SET clause>
WHEN NOT MATCHED THEN INSERT (<column list>) VALUES (<expression list>);

where:

<target> is the table that we’d like to insert new records into and/or update existing records in
<source> is usually some SQL query (it may also be simply a table)

<join conditions> specify the relationship between the source and the target – these conditions determine whether a row from the source exists in the target or not

Each row from the source is looked up in the target table, using the conditions specified in the ON clause. If the row exists in the target table, then the WHEN MATCHED clause is executed – the target row is updated. If the row does not exist in the target table, then the WHEN NOT MATCHED clause is executed – a new row is inserted into the target table.

Let’s take an example. Continue reading “The MERGE statement”

Lateral Inline Views

Lateral Inline Views – an official feature of Oracle 12c (was hidden/unsupported/undocumented in 11g) – lets you refer within an inline view to tables that appear to its left in the FROM clause.

Let’s take an example (a simplification of a real-life use case I handled recently). We have a SCHEDULES table, including for each repeatable task the time range in which it should be executed, and the frequency:

create table schedules (
  name            varchar2(100) not null primary key,
  start_time      date,
  end_time        date, 
  repeat_interval interval day to second
);

Let’s fill it with 3 records: Continue reading “Lateral Inline Views”