(Lack of) Optimization of Unique Constraint Creation

The Constraint Optimization series:

Part 1: Optimization of Check Constraint Creation
Part 2: Optimization that Violates Data Integrity
Part 3: Optimization of Foreign Key Constraint Creation
Part 4: (Lack of) Optimization of Unique Constraint Creation
Part 5: Adding a Column with a Default Value and a Constraint
Part 6: Fast but Offline, or Online but Slow?
Part 7: Adding a Unique Constraint in an Online Way
Part 8: Constraint Optimization Summary

In the previous parts of this series I showed that Oracle does a nice optimization – that may save plenty of time – when we add in a single ALTER TABLE statement a new (nullable with no default value) column and a corresponding:
– inline (a.k.a. “column-level”) check constraint
or
– foreign key constraint (either inline or out-of-line)

What about unique constraints?

When we create a unique constraint on some column, Oracle creates by default a corresponding unique B*Tree index on that column, in order to enforce the constraint.
When we add a new column with no default value to a non-empty table, and in the same ALTER TABLE statement we create a unique constraint on that column, then the corresponding index will be created, by definition, with no entries (as B*Tree indexes do not contain entirely NULL keys). It would be nice if Oracle would simply create an empty index in this case, without scanning all the table rows, but unfortunately this kind of optimization is not done.
Continue reading “(Lack of) Optimization of Unique Constraint Creation”

Optimization of Foreign Key Constraint Creation

The Constraint Optimization series:

Part 1: Optimization of Check Constraint Creation
Part 2: Optimization that Violates Data Integrity
Part 3: Optimization of Foreign Key Constraint Creation
Part 4: (Lack of) Optimization of Unique Constraint Creation
Part 5: Adding a Column with a Default Value and a Constraint
Part 6: Fast but Offline, or Online but Slow?
Part 7: Adding a Unique Constraint in an Online Way
Part 8: Constraint Optimization Summary

In a recent post I showed that Oracle does a nice optimization when we add a new (nullable with no default value) column with an inline (a.k.a. “column-level”) check constraint in a single ALTER TABLE statement. This optimization does not apply for out-of-line (“table-level”) check constraints.

So, what about foreign key constraints?

Clearly, when adding a new nullable with no default value column to a table which contains records, then, by definition, the new column has NULL in all the existing records. If in the same ALTER TABLE statement we also create a foreign key constraint for the new column, then, by definition, the constraint is valid for all the existing records, because NULLs always pass the foreign key constraint check. So in this case it is unnecessary to access each and every record of the table for the validation phase of the foreign key constraint creation, and indeed Oracle applies this optimization, in both inline and out-of-line foreign key constraints. Let’s see it.
Continue reading “Optimization of Foreign Key Constraint Creation”

Optimization that Violates Data Integrity

The Constraint Optimization series:

In the previous post I showed that Oracle does a nice optimization when we add a new (nullable with no default value) column with an inline check constraint in a single ALTER TABLE statement.
However, there is one case where this optimization allows for data integrity violation instead of forbidding it (which makes it a bug, in this specific case, rather than an optimization). It happens when the check constraint is “column IS NOT NULL”.

I’ll setup a table with a few records for demonstrating this:

SQL> create table t (x number);

Table created.

SQL> insert into t select rownum from dual connect by level<=10;

10 rows created.

SQL> commit;

Commit complete.

First, let's try to add a column with an out-of-line constraint, so the optimization will not apply:

SQL>
SQL> ALTER TABLE T ADD (
  2    C NUMBER,
  3    CONSTRAINT C_CHK CHECK (C IS NOT NULL)
  4  );
  CONSTRAINT C_CHK CHECK (C IS NOT NULL)
             *
ERROR at line 3:
ORA-02293: cannot validate (TRANZMATE_PRODUCTION.C_CHK) - check constraint violated

It failed, and that's good.

Now, let's try to add the same column with the same check constraint, but this time we'll use an inline constraint:

SQL> ALTER TABLE T ADD (
  2    C NUMBER CONSTRAINT C_CHK CHECK (C IS NOT NULL)
  3  );

Table altered.

And it succeeded, although it should have failed.
So now we have a validated constraint, stating that C IS NOT NULL, and at the same time the table contains records, and in all of them C IS NULL:

SQL> select table_name,search_condition,status,validated,rely
  2  from user_constraints where constraint_name='C_CHK';

TABLE_NAME SEARCH_CONDITION     STATUS     VALIDATED  RELY
---------- -------------------- ---------- ---------- ------------
T          C IS NOT NULL        ENABLED    VALIDATED

SQL> select * from t;

         X          C
---------- ----------
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10

10 rows selected.

Since we're talking about adding a nullable column with no default value to an existing table that already contains rows, specifying a check constraint that states that this column is not nullable doesn't make much sense, and if someone tries to do it it's probably their bug. Still, it would be nice to get an error in this case, rather than violating the data integrity.

Why does it happen?

When Oracle validates some check constraint, it checks if there are records for which the constraint condition is FALSE. When the column is NULL, then usually the condition evaluates to NULL, not to FALSE, so the constraint is valid. For example, when C is NULL, then "C>0" is NULL (neither FALSE nor TRUE).
I guess this is why Oracle skips the validation step completely as part of this optimization.

What should happen?

It's good that Oracle doesn't validate each and every record, as it's really unnecessary, but to make this optimization foolproof it should validate a single value – NULL.

Optimization of Check Constraint Creation

The Constraint Optimization series:

I have a table T with many records and I want to add a new column C to this table with some check constraint on C.

Does it matter if I use the following statement

ALTER TABLE T ADD (C NUMBER, CONSTRAINT C_CHK CHECK (C>0));

or this one

ALTER TABLE T ADD (C NUMBER CONSTRAINT C_CHK CHECK (C>0));

?
Note that the only difference is the comma that appears in the first option and not in the second one.

It certainly matters.
In the first option the constraint is an out-of-line (“table-level”) check constraint, while in the second option it is an inline (“column-level”) one.
With out-of-line check constraint Oracle scans the table, and validates that each and every record satisfies the constraint. This is similar to adding the column in one statement, and then adding the constraint in a second statement.
But when we add the column with an inline check constraint in a single statement, Oracle optimizes this process. Since the added column is NULL for all the existing records, it is actually unnecessary to access the records in order to perform the validation.

We can see a small example (tested in 11.2.0.4, 12.1.0.2, 12.2.0.1). First, adding the column with the out-of-line check constraint:

> create table t (x varchar2(1000));

Table created.

> insert /*+ append */ into t
> select lpad(rownum,1000,'x') from dual connect by level<=1e6;

1000000 rows created.

> commit;

Commit complete.

> col value new_value value_0 noprint
> select b.value from v$statname a, v$mystat b 
> where a.statistic# = b.statistic# and a.name = 'session logical reads';

> set timing on
> ALTER TABLE T ADD (C NUMBER, CONSTRAINT C_CHK CHECK (C>0));

Table altered.

Elapsed: 00:00:04.27

> set timing off
> select b.value-&VALUE_0 logical_reads from v$statname a, v$mystat b
> where a.statistic# = b.statistic# and a.name = 'session logical reads';

LOGICAL_READS
-------------
       143243

Note the elapsed time and the number of logical reads – most of them are due to accessing all the rows of T in order to validate the check constraint.
Now, let’s see the same example but with the inline check constraint instead of the out-of-line one:

> drop table t purge;

Table dropped.

> create table t (x varchar2(1000));

Table created.

> insert /*+ append */ into t
> select lpad(rownum,1000,'x') from dual connect by level<=1e6;

1000000 rows created.

> commit;

Commit complete.

> col value new_value value_0 noprint
> select b.value from v$statname a, v$mystat b
> where a.statistic# = b.statistic# and a.name = 'session logical reads';

> set timing on
> ALTER TABLE T ADD (C NUMBER CONSTRAINT C_CHK CHECK (C>0));

Table altered.

Elapsed: 00:00:00.01

> set timing off
> select b.value-&VALUE_0 logical_reads from v$statname a, v$mystat b
> where a.statistic# = b.statistic# and a.name = 'session logical reads';

LOGICAL_READS
-------------
          359

Much faster, with significantly less logical reads, since accessing the actual records is omitted.

A Recipe for Summoning the RBO Monster (even in Oracle 12c): On Delete Cascade, Function-Based Index and Missing Table Statistics

The last version of Oracle in which CHOOSE was officially supported as an OPTIMIZER_MODE parameter value was 9.2.
This is what the documentation of Oracle 9.2 says about it:

choose
The optimizer chooses between a cost-based approach and a rule-based approach based on whether statistics are available.
If the data dictionary contains statistics for at least one of the accessed tables, then the optimizer uses a cost-based approach and optimizes with a goal of best throughput. If the data dictionary contains only some statistics, then the cost-based approach is used, and the optimizer must guess the statistics for the subjects without any statistics. This can result in sub-optimal execution plans. If the data dictionary contains no statistics for any of the accessed tables, then the optimizer uses a rule-based approach.

In subsequent versions CHOOSE does not appear in the documentation as a valid value, and the “What’s New in Oracle Performance?” chapter in the documentation of Oracle 10.1 includes this sentence:

Some consequences of the desupport of RBO are:
• CHOOSE and RULE are no longer supported as OPTIMIZER_MODE initialization parameter values …

But the Oracle database keeps using CHOOSE, and therefore keeps potentially using the RBO, internally, even in version 12c. And last week I hit a performance issue that was caused because of this fact and some poor circumstances.

Here is a simplified example, running on Oracle 12.1.0.2.

We have two tables – PARENTS and CHILDREN – with an “on delete cascade” foreign key constraint.

> create table parents (
    id number not null primary key
  );

Table created.

> insert into parents select rownum from dual connect by level<=10000;

10000 rows created.

> create table children (
    id number not null primary key,
    parent_id number not null references parents (id) on delete cascade,
    filler varchar2(4000)
  );

Table created.

> insert into children
  select rownum,mod(rownum,10000)+1,lpad('x',4000,'x')
  from dual
  connect by level<=100000;

100000 rows created.

> commit;

Commit complete.

There is an index that supports the foreign key, and it is a function-based index. Note that the leading column of the index is simply the referencing column.

> create index children_idx on children (parent_id, substr(filler,1,10));

Index created.

Now let’s delete a record from the PARENTS table.

> alter system flush buffer_cache;

System altered.

> set timing on
> delete parents where id=1;

1 row deleted.

Elapsed: 00:00:27.80
> set timing off
> roll
Rollback complete.

Why did it take so much time?

When we delete a record from PARENTS, Oracle implicitly deletes the child records of this parent (because of the “on delete cascade” foreign key). This is the recursive SQL statement:

 delete from "DEMO"."CHILDREN" where "PARENT_ID" = :1

Oracle uses the CHOOSE optimizer mode for the implicit deletion from CHILDREN.

> select value,isdefault
  from v$sql_optimizer_env
  where sql_id='f7j1aq9z8k6r1'
  and child_number=0
  and name='optimizer_mode';

VALUE                ISDEFAULT
-------------------- ---------
choose               NO

1 row selected.

Unfortunately I forgot to gather statistics on the CHILDREN table. The lack of statistics means that Oracle chose to use the rule-based optimizer for the implicit DELETE statement.
And this means that it cannot use the CHILDREN_IDX index, because it is a function-based index.

> SELECT * FROM table(DBMS_XPLAN.DISPLAY_CURSOR('f7j1aq9z8k6r1',0));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  f7j1aq9z8k6r1, child number 0
-------------------------------------
 delete from "DEMO"."CHILDREN" where "PARENT_ID" = :1

Plan hash value: 3379301338

---------------------------------------
| Id  | Operation          | Name     |
---------------------------------------
|   0 | DELETE STATEMENT   |          |
|   1 |  DELETE            | CHILDREN |
|*  2 |   TABLE ACCESS FULL| CHILDREN |
---------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("PARENT_ID"=:1)

Note
-----
   - rule based optimizer used (consider using cbo)


23 rows selected.

If I execute the DELETE FROM CHILDREN statement explicitly, the CBO chooses the good execution plan that uses the index, because the statistics on the index have been automatically gathered during the index creation. It’s the missing statistics on the table that caused the implicit (recursive) DELETE statement to use the RBO.

> alter system flush buffer_cache;

System altered.

> set timing on
> delete from "DEMO"."CHILDREN" where "PARENT_ID" = 1;

10 rows deleted.

Elapsed: 00:00:00.21
> set timing off
> select * from table(dbms_xplan.display_cursor());

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  0cm664fx8b944, child number 0
-------------------------------------
delete from "DEMO"."CHILDREN" where "PARENT_ID" = 1

Plan hash value: 1030488021

----------------------------------------------------------------------------------
| Id  | Operation         | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | DELETE STATEMENT  |              |       |       |     2 (100)|          |
|   1 |  DELETE           | CHILDREN     |       |       |            |          |
|*  2 |   INDEX RANGE SCAN| CHILDREN_IDX |    10 | 40100 |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("PARENT_ID"=1)


19 rows selected.

Even if I didn’t have statistics on the index, the CBO would still choose the index for performing the explicit DELETE, because it would use dynamic sampling:

> exec dbms_stats.delete_index_stats (user,'CHILDREN_IDX')

PL/SQL procedure successfully completed.

> delete from "DEMO"."CHILDREN" where "PARENT_ID" = 3;

10 rows deleted.

> select * from table(dbms_xplan.display_cursor());

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  a8x8wszgx1g6r, child number 0
-------------------------------------
delete from "DEMO"."CHILDREN" where "PARENT_ID" = 3

Plan hash value: 1030488021

----------------------------------------------------------------------------------
| Id  | Operation         | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | DELETE STATEMENT  |              |       |       |     3 (100)|          |
|   1 |  DELETE           | CHILDREN     |       |       |            |          |
|*  2 |   INDEX RANGE SCAN| CHILDREN_IDX |  1066 |  2111K|     3   (0)| 00:00:01 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("PARENT_ID"=3)

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)


23 rows selected.

Finally, if the index were a regular one rather than function-based, then a good plan would have been chosen for the recursive SQL, because the RBO prefers an index range scan over a full table scan:

> drop index children_idx;

Index dropped.

> create index children_idx on children (parent_id, id);

Index created.

> alter system flush buffer_cache;

System altered.

> set timing on
> delete parents where id=4;

1 row deleted.

Elapsed: 00:00:00.14
> set timing off

> select * from table(DBMS_XPLAN.DISPLAY_CURSOR('f7j1aq9z8k6r1',0));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  f7j1aq9z8k6r1, child number 0
-------------------------------------
 delete from "DEMO"."CHILDREN" where "PARENT_ID" = :1

Plan hash value: 1030488021

------------------------------------------
| Id  | Operation         | Name         |
------------------------------------------
|   0 | DELETE STATEMENT  |              |
|   1 |  DELETE           | CHILDREN     |
|*  2 |   INDEX RANGE SCAN| CHILDREN_IDX |
------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("PARENT_ID"=:1)

Note
-----
   - rule based optimizer used (consider using cbo)


23 rows selected.

Implementing Arc Relationships with Virtual Columns

Virtual columns were added in Oracle 11g Release 1. We can implement the same concept using views, but one of the advantages that I see in virtual columns is that we can define foreign key constraints on them. Well, we can define foreign key constraints on views as well, but only in DISABLE NOVALIDATE mode, which makes them a decoration rather than a data integrity protector. So to be more precise, we can define enabled foreign key constraints on virtual columns.

This post is about using virtual columns for implementing arc relationships.

In the following ERD, the arc represents the rule “each Entity Address must be owned by either a Person or a Company”:

There are three common ways to implement this logical data model.

Multiple Tables

We can split the ENTITY_ADDRESS entity into two tables – say, PERSON_ADDRESSES and COMPANY_ADDRESSES. Disadvantages of this solution are that we need to maintain two (or more) tables and the code parts that manipulate them, although they are very similar. In addition, if later on we need to add another entity to the relationship (for example, Store, in addition to Person and Company) we will have to add another join table – STORE_ADDRESSES – and the relevant code.

A Single Table with Dedicated Columns

We can implement the ENTITY_ADDRESS entity as a single table – say, ENTITY_ADDRESSES – with a dedicated column for each referenced entity – PERSON_ID and COMPANY_ID (in addition to ADDRESS_ID) – and a check constraint to make sure one and only one of the columns PERSON_ID and COMPANY_ID is not null. The disadvantage of this solution is that we still need to maintain the dedicated columns and the code parts that manipulate them, although they are very similar. And if we need to add another entity to the relationship we will have to add another column to the table and to add the relevant code to manipulate the new column.

A Generic Table

We can implement the ENTITY_ADDRESS entity as a single table – say, ENTITY_ADDRESSES – with a generic column for the referenced entity and a column that stores the entity type to which the generic column references – e.g., ADDRESS_ID, OWNER_TYPE_ID, OWNER_ID. The disadvantage of this solution is that we cannot define foreign key constraints from ENTITY_ADDRESSES to the PEOPLE and COMPANIES tables.

Using Virtual Columns

I’d like to suggest another solution, that, using virtual columns, overcomes the above disadvantages.
This solution is based on the generic table one, but we’ll add also a virtual column for each referenced entity. The purpose of the virtual columns is just for enforcing the referential integrity, and the application should not be aware of them. Therefore, if another entity should be added afterwards, we will add another virtual column, but we will have no code changes to make.

Here it is:

> create table people (
>   id integer not null primary key,
>   first_name varchar2(30) not null,
>   last_name varchar2(30) not null
> );

Table created.

> create table companies (
>   id integer not null primary key,
>   name varchar2(100) not null,
>   number_of_employees integer,
>   description varchar2(1000)
> );

Table created.

> create table addresses (
>   id integer not null primary key,
>   street varchar2(30),
>   house_number varchar2(10),
>   city varchar2(30),
>   country varchar2(30)
> );

Table created.

> create table entity_addresses (
>   address_id integer not null references addresses,
>   owner_type_id char(1) not null check (owner_type_id in ('P','C')),
>   owner_id integer not null,
>   --
>   primary key (owner_id,owner_type_id,address_id),
>   --
>   person_id generated always as (decode(owner_type_id,'P',owner_id)) virtual
>     constraint address_fk_people references people,
>   company_id generated always as (decode(owner_type_id,'C',owner_id)) virtual
>     constraint address_fk_companies references companies
> );

Table created.

> insert into people (id,first_name,last_name) values (1,'John','Doe');

1 row created.

> insert into companies (id,name) values (101,'DB Oriented');

1 row created.

> insert into addresses (id,street,city,country) values (1,'Dekel','Zurit','Israel');

1 row created.

> insert into entity_addresses (address_id,owner_type_id,owner_id) 
> values (1,'P',1);

1 row created.

> insert into entity_addresses (address_id,owner_type_id,owner_id) 
> values (1,'P',101);
insert into entity_addresses (address_id,owner_type_id,owner_id) values (1,'P',101)
*
ERROR at line 1:
ORA-02291: integrity constraint (DEMO5.ADDRESS_FK_PEOPLE) violated - parent key not found


> insert into entity_addresses (address_id,owner_type_id,owner_id) 
> values (1,'C',1);
insert into entity_addresses (address_id,owner_type_id,owner_id) values (1,'C',1)
*
ERROR at line 1:
ORA-02291: integrity constraint (DEMO5.ADDRESS_FK_COMPANIES) violated - parent key not found


> insert into entity_addresses (address_id,owner_type_id,owner_id) 
> values (1,'C',101);

1 row created.

Hiding the Virtual Columns

Since the virtual columns in this solution are just for enforcing the referential integrity, I prefer to hide them, so adding new entities to the arc (or removing entities from it) will be transparent to the application.

Invisible Columns

In Oracle 12c we can do it by making these columns invisible.

> alter table entity_addresses modify (
>   person_id invisible,
>   company_id invisible
> );

Table altered.

> select * from entity_addresses;

ADDRESS_ID OWN   OWNER_ID
---------- --- ----------
         1 P            1
         1 C          101

Editioning Views

Being an Edition-Based Redefinition (EBR) evangelist, my application code never refers to tables. Each table is covered by an editioning view, and the application code refers only to views. This practice enables to hide the virtual columns very easily, simply by not including them in the editioning view:

> create editioning view entity_addresses_v as
>   select address_id,
>          owner_type_id,
>          owner_id
>   from entity_addresses;

View created.

Unfortunately, EBR is not widely used 🙁
You can, however, use the same idea in non-EBR environment, by creating a regular view rather than an editioning view, and refer to this view from the application code.

impdp – which rows failed?

Since version 11.1, Data Pump Import supports the SKIP_CONSTRAINT_ERRORS option, which specifies that you want the import operation to proceed even if (non-deferred) constraint violations are encountered. It logs any rows that cause constraint violations, but does not stop, and does load the other rows. If SKIP_CONSTRAINT_ERRORS is not set, then the default behavior is to roll back the entire load of the data object on which constraint violations are encountered.

For example:

The table presidents was created by the following statement:

create table presidents (
  id         number       primary key,
  first_name varchar2(20),
  last_name  varchar2(20)
);

and is currently containing the following data:

        ID FIRST_NAME           LAST_NAME
---------- -------------------- --------------------
         1 George               Washington
         2 John                 Adams
         3 Thomas               Jefferson
         4 James                Madison
         5 James                Monroe
         6 John Quincy          Adams
         7 Andrew               Jackson
         8 Martin               Van Buren

We have a dump file that contains an export of the presidents table (from somewhere else or from some previous point in time), and we’d like to append the data from the dump file to the table.
The dump file contains the following data:

        ID FIRST_NAME           LAST_NAME
---------- -------------------- --------------------
         7 William              Harrison
         8 John                 Tyler
         9 James                Polk
        10 Zachary              Taylor
        11 Millard              Fillmore
        12 Franklin             Pierce

Obviously, the records with ID 7 and 8 cannot be added to the table, because a primary key constraint is defined on the ID column, and the table already contains records with these ID values.

If we try to execute the following import, without the SKIP_CONSTRAINT_ERRORS option, we will get an error and the whole operation will be rolled back:

impdp oren/oren tables=presidents dumpfile=presidents.dmp table_exists_action=append logfile=presidents_imp1.log

Import: Release 11.2.0.4.0 - Production on Fri Jul 18 19:33:33 2014
Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved.
Connected to: Oracle Database 11g Release 11.2.0.1.0 - 64bit Production
 Master table "OREN"."SYS_IMPORT_TABLE_01" successfully loaded/unloaded
 Starting "OREN"."SYS_IMPORT_TABLE_01": oren/******** tables=presidents dumpfile=presidents.dmp table_exists_action=append logfile=presidents_imp1.log
 Processing object type TABLE_EXPORT/TABLE/TABLE
 ORA-39152: Table "OREN"."PRESIDENTS" exists. Data will be appended to existing table but all dependent metadata will be skipped due to table_exists_action of append
 Processing object type TABLE_EXPORT/TABLE/TABLE_DATA
 ORA-31693: Table data object "OREN"."PRESIDENTS" failed to load/unload and is being skipped due to error:
 ORA-00001: unique constraint (OREN.SYS_C0023857) violated
 Processing object type TABLE_EXPORT/TABLE/CONSTRAINT/CONSTRAINT
 Job "OREN"."SYS_IMPORT_TABLE_01" completed with 2 error(s) at 16:33:35

As expected, if we check the contents of the table we see that no records have been added from the dump file:

select * from presidents;

        ID FIRST_NAME           LAST_NAME
---------- -------------------- --------------------
         1 George               Washington
         2 John                 Adams
         3 Thomas               Jefferson
         4 James                Madison
         5 James                Monroe
         6 John Quincy          Adams
         7 Andrew               Jackson
         8 Martin               Van Buren

Now let’s execute the import with the SKIP_CONSTRAINT_ERRORS option:

impdp oren/oren tables=presidents dumpfile=presidents.dmp table_exists_action=append logfile=presidents_imp2.log data_options=SKIP_CONSTRAINT_ERRORS

Import: Release 11.2.0.4.0 - Production on Fri Jul 18 19:38:00 2014

Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved.

Connected to: Oracle Database 11g Release 11.2.0.1.0 - 64bit Production
Master table "OREN"."SYS_IMPORT_TABLE_01" successfully loaded/unloaded
Starting "OREN"."SYS_IMPORT_TABLE_01": oren/******** tables=presidents dumpfile=presidents.dmp table_exists_action=append logfile=presidents_imp2.log data_options=SKIP_CONSTRAINT_ERRORS
Processing object type TABLE_EXPORT/TABLE/TABLE
ORA-39152: Table "OREN"."PRESIDENTS" exists. Data will be appended to existing table but all dependent metadata will be skipped due to table_exists_action of append
Processing object type TABLE_EXPORT/TABLE/TABLE_DATA
. . imported "OREN"."PRESIDENTS" 5.945 KB 4 out of 6 rows
2 row(s) were rejected with the following error:
ORA-00001: unique constraint (OREN.SYS_C0023857) violated

Rejected rows with the primary keys are:
Rejected row #1:
column ID: 7
Rejected row #2:
column ID: 8
Processing object type TABLE_EXPORT/TABLE/CONSTRAINT/CONSTRAINT
Job "OREN"."SYS_IMPORT_TABLE_01" completed with 1 error(s) at 16:38:06

We see that 4 rows were imported and that 2 rows were rejected due to unique constraint violation. We can also see from the log which rows were rejected, or more precisely, what are the primary key values of the rejected rows.

But what if we want to know the values of the other columns in the rejected rows?
I don’t know about such a feature, but I made some “digging” and succeeded to come out with a solution.

First I looked at the statements that Data Pump did behind the scenes (by querying V$SQL), and one of the statements I found was this:

INSERT INTO "OREN"."ERR$DP004BC9640001" (
  ORA_ERR_NUMBER$, ORA_ERR_MESG$, ORA_ERR_ROWID$, ORA_ERR_OPTYP$, ORA_ERR_TAG$,
  "ID", "FIRST_NAME", "LAST_NAME")
VALUES (:1, :2, :3, :4, :5, :6, :7, :8)

So apparently Data Pump simply uses DML Error Logging to support the SKIP_CONSTRAINT_ERRORS option. And this means that the values of all the columns (except for LOBs etc.) of the rejected rows are basically available. All we need to do is to select from this ERR$DP004BC9640001 table, right?
Well, there are two problems with this:

The name of the table is not consistent. I ran it several times and got a different name in each run (but it always started with ERR$DP).
The ERR$DP… table is dropped as soon as the import is done, so by the time we want to select from it, it is already gone.

But we can overcome these issues, with the following flow:

1. Before starting the import process, open a SQL*Plus session and execute:

lock table presidents in share mode;

and keep this session (and transaction) open. This will block the import process (as well as any other process!) when it tries to insert records into the table, and will give us the opportunity to check the name of the error logging table, which, as we’ll see, will already exist at that point in time.

2. Start the import:

impdp oren/oren tables=presidents dumpfile=presidents.dmp table_exists_action=append logfile=presidents_imp.log data_options=SKIP_CONSTRAINT_ERRORS

Import: Release 11.2.0.4.0 - Production on Fri Jul 18 23:13:14 2014

Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved.

Connected to: Oracle Database 11g Release 11.2.0.1.0 - 64bit Production
Master table "OREN"."SYS_IMPORT_TABLE_01" successfully loaded/unloaded
Starting "OREN"."SYS_IMPORT_TABLE_01": oren/******** tables=presidents dumpfile=presidents.dmp table_exists_action=append logfile=presidents_imp.log data_options=SKIP_CONSTRAINT_ERRORS
Processing object type TABLE_EXPORT/TABLE/TABLE
ORA-39152: Table "OREN"."PRESIDENTS" exists. Data will be appended to existing table but all dependent metadata will be skipped due to table_exists_action of append
Processing object type TABLE_EXPORT/TABLE/TABLE_DATA

3. The import is blocked now. Going back to the SQL*Plus session, let’s find out the error logging table name:

select table_name from user_tables where table_name like 'ERR$DP%';

TABLE_NAME
-----------------------
ERR$DP0047ABBF0001

4. Start a second SQL*Plus session and lock the error logging table in a mode that will allow the import to fill it but not to drop it:

lock table ERR$DP0047ABBF0001 in row share mode;

5. In the first SQL*Plus session, release the lock on the presidents table so the import can continue:

Rollback;

And the import continues:

. . imported "OREN"."PRESIDENTS" 5.945 KB 4 out of 6 rows
2 row(s) were rejected with the following error:
ORA-00001: unique constraint (OREN.SYS_C0023889) violated

Rejected rows with the primary keys are:
Rejected row #1:
column ID: 7
Rejected row #2:
column ID: 8
Processing object type TABLE_EXPORT/TABLE/CONSTRAINT/CONSTRAINT
Job "OREN"."SYS_IMPORT_TABLE_01" completed with 1 error(s) at 20:35:21

6. When the import is done, we can see all the rejected rows from one of the SQL*Plus sessions:

select id,first_name,last_name from ERR$DP0047ABBF0001;

ID  FIRST_NAME           LAST_NAME
--- -------------------- --------------------
7   William              Harrison
8   John                 Tyler

7. And finally, to cleanup, we can drop the error logging table from the second SQL*Plus session:

drop table ERR$DP0047ABBF0001;

You are most welcome to comment or to ask questions in this page, or to write me at oren@db-oriented.com.

Creating an Index on a Static Table Referenced by an Active Table

We have a parent-child pair of tables with a foreign key constraint between them, and we need to add an index to the parent table, while the application is active.

The parent table is static during the creation of the index (no DML on it), but lots of DML statements are done on the child table. The parent table is relatively big, so the index creation takes a relatively significant time.

Continue reading “Creating an Index on a Static Table Referenced by an Active Table”