RETURNING INTO – Enhancement Suggestion

Update: this enhancement suggestion got accepted, and it is now part of Oracle Database 23c

The RETURNING INTO clause is one of my favorite features.
It returns data from the rows that have been affected by the DML statement, and as I wrote in this previous post:
For INSERT it returns the after-insert values of the new row’s columns.
For UPDATE it returns the after-update values of the affected rows’ columns.
For DELETE it returns the before-delete values of the affected rows’ columns.

For INSERT there are no before-insert values, so the “after-insert values” is the only reasonable option.
Likewise, for DELETE there are no after-delete values, so the “before-delete values” is the only reasonable option.
But for UPDATE there are both before-update and after-update values, but currently the RETURNING INTO clause supports only the after-update values.

I think it will be really useful if for UPDATE we’ll be able to get the before-update values in addition to the after-update values. This will also make the feature more symmetric and complete, in my opinion.
Currently, if we need to know the before-update values, we have to make two SQL statement calls: SELECT (FOR UPDATE) for the record(s) we’re about to update, and then the UPDATE itself.

As a simple example, consider the following procedure, that increases the salary of an employee by some amount, up to maximum of 10000, and returns the percentage in which the salary was changed:

create or replace procedure raise_salary
(
    i_employee_id  in employees.employee_id%type,
    i_raise_amount in employees.salary%type,
    o_raise_pct    out number
) as
    l_old_salary employees.salary%type;
    l_new_salary employees.salary%type;
begin
    select e.salary
    into   l_old_salary
    from   employees e
    where  e.employee_id = i_employee_id
    for    update of e.salary;

    update employees e
    set    e.salary = least(e.salary + i_raise_amount, greatest (e.salary,10000))
    where  e.employee_id = i_employee_id
    returning e.salary into l_new_salary;

    o_raise_pct := l_new_salary / l_old_salary;

end raise_salary;

With the suggested enhancement, we could achieve the same functionality with one context switch to the SQL engine rather than two (obviously the syntax I’m using is made-up, just for this example):

create or replace procedure raise_salary
(
    i_employee_id  in employees.employee_id%type,
    i_raise_amount in employees.salary%type,
    o_raise_pct    out number
) as
    l_old_salary employees.salary%type;
    l_new_salary employees.salary%type;
begin

    update employees e
    set    e.salary = least(e.salary + i_raise_amount, greatest (e.salary,10000))
    where  e.employee_id = i_employee_id
    returning old e.salary, new e.salary 
    into l_old_salary, l_new_salary;

    o_raise_pct := l_new_salary / l_old_salary;

end raise_salary;

I submitted this suggestion to the Database Ideas space of the Oracle Developer Community (formerly known as OTN) web site.
If you think it’s a good idea, please vote up 🙂

Note

As a (dirty, confusing, and most probably inferior performance-wise to the “real” solution) workaround, we can achieve the same functionality for this example today, using the fact that a scalar subquery in the RETURNING INTO clause obeys the statement-level read consistency rule – it returns the data as it was before the DML happened:

create or replace procedure raise_salary
(
    i_employee_id  in employees.employee_id%type,
    i_raise_amount in employees.salary%type,
    o_raise_pct    out number
) as
    l_old_salary employees.salary%type;
    l_new_salary employees.salary%type;
begin

    update employees e
    set    e.salary = least(e.salary + i_raise_amount, greatest (e.salary,10000))
    where  e.employee_id = i_employee_id
    returning 
      (select e1.salary from employees e1 where e1.employee_id = i_employee_id),
      e.salary
    into l_old_salary, l_new_salary;

    o_raise_pct := l_new_salary / l_old_salary;

end raise_salary;

Implementing Arc Relationships with Virtual Columns? Or Not?

I wrote a post some time ago about implementing arc relationships using virtual columns.
Recently, Toon Koppelaars wrote a detailed and reasoned comment to that post. Since I admire Toon, getting his point of view on something that I wrote is a privilege for me, regardless if he agrees with me or disagrees (and just to be clear, it’s the latter this time). I think that having a public (and civilized) discussion – this time about principles of data modeling and implementation – is a good thing, even if at the end we don’t convince each other.
Therefore I thought Toon’s comment deserves a post of its own. So I’ll quote here everything that he wrote in the comment, with my response after every sentence or paragraph. And everyone is welcome to add their own points of view in the comments section.

I’d like to emphasize that basically I agree with Toon about the general guidelines. What makes life harder, as always, are the nuances, or as we know them well as “it depends”.

And just one general observation first. The starting point of the original post was that the design included an arc. This is the given fact, and my intent was to suggest an implementation for this given fact. I don’t want to put words in your mouth (or keyboard), Toon, but I think that the basis for your arguments is that the arc, as a design concept, is wrong. I think you would prefer to see several entities in the ERD instead of the one with the arc. Am I right in this interpretation?

Toon wrote:
Oren, I just cannot, not comment on this post 😉

And I really appreciate it. Really.

Toon wrote:
Re: Multiple Tables
1) What do you mean by “maintaining another table”? And why is that a disadvantage?

I’ll refer to it after the next paragraph.

Toon wrote:
In my opinion, you should always have a software factory in which the act of introducing a new table is cheap. Why? Because otherwise you end up with database designs with few tables, into which more than few “things” are stored. Which is bad. It’s bad for understandability, it’s bad for performance, it’s bad for future maintenance.

And I completely agree with you.

Toon wrote:
Apparently adding a table is not cheap in your factory?
But, yet, dropping and recreating a CHECK constraint (to involve a new single-char value) is cheap?

As you know me and my database development principles, beliefs and guidelines (for example, here and here), I assume this is a rhetorical question. Of course adding a table is cheap. Actually it is one of the cheapest tasks.
And dropping and recreating a CHECK constraint is not expensive either. It may take some time to VALIDATE it if the table is big, but this is not a reason not to do it. And anyway, if, in order to implement a good design, something is “expensive” (but still realistic), it should not be a reason not to do it (at least, not a good reason).

But, if we have, by design, several tables that are identical – in column names, data types, indexes, constraints – everything except for the table on the other side of the “owner” foreign key, then I start to feel uncomfortable. I get the same feeling that I get when I see two procedures that do the same (or almost the same) thing.
And although many times the arc appears for simple join tables, as in my example, this is not always the case. The entity with the arc may include many columns.
In addition, by “maintaining another table” I did not mean just the CREATE TABLE statement. When a new attribute is added to the designed entity, we need to add a new column – exactly the same column – to all the tables. And the same is true for any change – renaming a column, changing data types, adding or removing constraints, etc.
In my opinion, if I have to repeat the same work over and over instead of doing it just once, I probably do something wrong.
It also reminds me of denormalization. Usually when we talk about denormalization we mean that the same data is kept more than once. I think it’s the same here – just that the duplicate data is kept in the data dictionary tables and not in our schema. And one of the disadvantages of denormalization is that it increases the risk for bugs; in this case, due to a human error we may end up with PERSON_ADDRESSES and COMPANY_ADDRESSESS that are not exactly the same, although they should be.

Toon wrote:
2) Why is the implied code-difference for this solution, more difficult to maintain?

Here I was mainly thinking about lack of reusability.

Toon wrote:
Instead of having these two inserts:
insert into entity_addresses values(AID1,’P’,PID1), and insert into entity_addresses values(AID2,’C’,CID1).

You’d have these two:
insert into person_addresses(AID1,PID1), and insert into company_addresses(AID2,CID1).

Why is the former “cheaper”?

I was thinking of writing only one INSERT statement:

CREATE OR REPLACE PACKAGE BODY address_mgr AS

    PROCEDURE add_entity_address
    (
        i_address_id    IN entity_addresses.address_id%TYPE,
        i_owner_type_id IN entity_addresses.owner_type_id%TYPE,
        i_owner_id      IN entity_addresses.owner_id%TYPE
    ) IS
    BEGIN
        INSERT INTO entity_addresses
            (address_id,
             owner_type_id,
             owner_id)
        VALUES
            (i_address_id,
             i_owner_type_id,
             i_owner_id);
    END add_entity_address;
.
.
.
END address_mgr;
/

Toon wrote:
And this trickles down to all other kinds of SQL statements. Why is:

select *
from adresses a
entity_addresses ea
,persons p
where a.id = ea.address_id
and ea.object_type = ‘P’
and ea.object_id = p.person_id

apparently easier to write and/or maintain (i.e. cheaper), than:

select *
from adresses a
person_addresses pa
,persons p
where a.id = ea.person_id
and pa.person_id = p.person_id

?
I just don’t get that…

Again, I was thinking of a more generic code, rather than duplicating the same code pattern. Something like this:

create type address_t as object (
  id integer,
  street varchar2(30),
  house_number varchar2(10),
  city varchar2(30),
  country varchar2(30)
)
/

create type address_tt as table of address_t 
/

CREATE OR REPLACE PACKAGE BODY address_mgr AS
.
.
.
    PROCEDURE get_addresses
    (
        i_owner_type_id IN entity_addresses.owner_type_id%TYPE,
        i_owner_id      IN entity_addresses.owner_id%TYPE,
        o_addresses     OUT address_tt
    ) IS
    BEGIN
        SELECT address_t(a.id, a.street, a.house_number, a.city, a.country)
        BULK   COLLECT
        INTO   o_addresses
        FROM   addresses        a,
               entity_addresses ea
        WHERE  ea.owner_type_id = i_owner_type_id
        AND    ea.owner_id = i_owner_id
        AND    a.id = ea.address_id;
    END get_addresses;
.
.
.
END address_mgr;
/

CREATE OR REPLACE PACKAGE BODY person_mgr AS

    PROCEDURE get_person
    (
        i_person_id  IN people.id%TYPE,
        o_first_name OUT people.first_name%TYPE,
        o_last_name  OUT people.last_name%TYPE,
        o_addresses  OUT address_tt
    ) IS
    BEGIN
        SELECT p.first_name,
               p.last_name
        INTO   o_first_name,
               o_last_name
        FROM   people p
        WHERE  p.id = i_person_id;
    
        address_mgr.get_addresses(i_owner_type_id => 'P',
                                  i_owner_id      => i_person_id,
                                  o_addresses     => o_addresses);
    END get_person;
.
.
.
END person_mgr;
/

CREATE OR REPLACE PACKAGE BODY company_mgr AS

    PROCEDURE get_company
    (
        i_company_id          IN companies.id%TYPE,
        o_name                OUT companies.name%TYPE,
        o_number_of_employees OUT companies.number_of_employees%TYPE,
        o_description         OUT companies.description%TYPE,
        o_addresses           OUT address_tt
    ) IS
    BEGIN
        SELECT c.name,
               c.number_of_employees,
               c.description
        INTO   o_name,
               o_number_of_employees,
               o_description
        FROM   companies c
        WHERE  c.id = i_company_id;
    
        address_mgr.get_addresses(i_owner_type_id => 'C',
                                  i_owner_id      => i_company_id,
                                  o_addresses     => o_addresses);
    END get_company;
.
.
.
END company_mgr;
/

By the way, we can achieve the same level of code reusability with the “multiple tables” option, by using dynamic SQL instead of static SQL, but I think both of us do not want to go there, do we?

Toon wrote:
To me this alternative (Multiple Tables) *is* the way to deal with the information requirement.
Having entity_adresses, introduces a column in my design (owner_id) whose meaning depends upon the value of another column (owner_type), which is just bad in my opinion.

Isn’t this the case with every composite key, that we need the values of all the columns that compose the key in order to uniquely identify the entity?

Toon wrote:
This alternative also offers the best chances of being able to easily cater for possible future differences in information requirements at the relationship-level. Eg. we may want to track since when a person has had a particular address, but we don’t care for that information requirement in case of companies: you then just add the ‘as-of’ date column to the person_addresses table only, and be done with it.

In my opinion, this becomes a different case. If the entities are not identical in their attributes, I would consider them different entities, and therefore each one deserves its own table. And since I’m not afraid of changing my schema and my code, I would split the generic table into multiple tables only when the need arises.

Toon wrote:
It is the most simple, and easy to understand for others that come join the maintenance team.
Imagine coming in and seeing the table with invisible generated virtual columns that have FK’s on them. Hmmm…

I would probably say: “what a cool idea” 😉

Write (Even) Less with More – VALIDATE_CONVERSION

I wrote the post Write Less with More – Part 8 – PL/SQL in the WITH Clause in November 2015, when the latest released Oracle version was 12.1.
In that post I explained about PL/SQL in the WITH Clause – a new 12.1 feature – and demonstrated it using the following example:

todo8

Since then Oracle 12.2 was released, and introduced a new feature that enables solving this task in a simpler way – the VALIDATE_CONVERSION function. This function gets an expression and a data type, and returns 1 if the expression can be converted to the data type and 0 if not.
Using the same setup from the original post, the requested query becomes as simple as:

> select *
  from   people
  where  general_info is not null
  and    validate_conversion(general_info as date, 'dd/mm/yyyy') = 1;

PERSON_ID FIRST_NAME LAST_NAME       GENERAL_INFO
---------- ---------- --------------- --------------------
       102 Paul       McCartney       18/6/1942
       202 Ella       Fitzgerald      15/6/1996
       203 Etta       James           20/1/2012

In addition to introducing the new VALIDATE_CONVERSION function, the older CAST and some of the TO_* conversion functions have been enhanced in Oracle 12.2 and include a DEFAULT ON CONVERSION ERROR clause, so when data type conversion fails we can get some default value instead of an error.

> select p.person_id,
         p.first_name,
         p.last_name,
         to_date(p.general_info default null on conversion error, 'dd/mm/yyyy') my_date
  from   people p;

 PERSON_ID FIRST_NAME LAST_NAME       MY_DATE
---------- ---------- --------------- ----------
       101 John       Lennon
       102 Paul       McCartney       18/06/1942
       103 Ringo      Starr
       104 George     Harisson
       201 Louis      Armstrong
       202 Ella       Fitzgerald      15/06/1996
       203 Etta       James           20/01/2012
       317 Julie      Andrews

8 rows selected.

PL/SQL in SQL in View in SQL in PL/SQL

I presented “Write Less (Code) With More (Oracle 12c New Features)” yesterday at OGh Tech Experience 2017.
One of the features I talked about was PL/SQL in the WITH Clause. One of the restrictions of this feature is that you cannot embed a static SQL query, that contains PL/SQL in the WITH clause, in PL/SQL (see the section PL/SQL in SQL in PL/SQL in this post).
I was asked, regarding this restriction, if it’s possible to embed in PL/SQL a static “regular” SQL query, that selects from a view, that contains PL/SQL in the WITH clause. The answer is yes, since the restriction is only syntactic.

Using the same example from the original post:

create view people_with_dates_v as      
with 
  function is_date(i_info in varchar2) return number as
    l_date date;
  begin
    if i_info is null then
      return 0;
    else
      l_date := to_date(i_info, 'dd/mm/yyyy');
      return 1;
    end if;
  exception
    when others then
      return 0;
  end;
select p.*
from   people p
where  is_date(p.general_info) = 1;
/

View created.
create or replace procedure show_date_people as
begin
    for l_rec in (
      select * from people_with_dates_v
      )
    loop
        dbms_output.put_line(l_rec.person_id || ': ' || l_rec.first_name || ' ' || l_rec.last_name);
    end loop;
end show_date_people;
/
> exec show_date_people
102: Paul McCartney
202: Ella Fitzgerald
203: Etta James

PL/SQL procedure successfully completed.

RETURNING INTO

The RETURNING INTO clause is one of my favorite PL/SQL features. It allows to write less code, improves readability and reduces context switches between PL/SQL and SQL.
In this post I’d like to highlight some less-known characteristics of the RETURNING INTO clause and emphasize differences that exist when it is used in different DML statements.

Supported Statements

The RETURNING INTO clause is supported by the UPDATE, DELETE, and single-table single-row (“values-based”) INSERT statements.
It is not supported by subquery-based INSERT, multi-table INSERT, and MERGE. Some additional restrictions are documented.

What is returned?

The RETURNING INTO clause returns data from the rows that have been affected by the DML statement.
For INSERT it returns the after-insert values of the new row’s columns.
For UPDATE it returns the after-update values of the affected rows’ columns.
For DELETE it returns the before-delete values of the affected rows’ columns.

> create table t (
    id number generated as identity,
    data number
  );

Table created.

> declare
    ret1 number;
    ret2 number;
  begin
    insert into t (data) values (10) returning id,data into ret1,ret2;
    dbms_output.put_line('id='||ret1||' ; data='||ret2);
    insert into t (data) values (20) returning id,data into ret1,ret2;
    dbms_output.put_line('id='||ret1||' ; data='||ret2);
    insert into t (data) values (30) returning id,data into ret1,ret2;
    dbms_output.put_line('id='||ret1||' ; data='||ret2);
    commit;
  end;
  /

id=1 ; data=10
id=2 ; data=20
id=3 ; data=30

PL/SQL procedure successfully completed.
> declare
    ret number;
  begin
    update t set data=111 where id=1 returning data into ret;
    dbms_output.put_line('post-update value = '||ret);
    delete t where id=2 returning data into ret;
    dbms_output.put_line('pre-delete value = '||ret);
    rollback;
  end;
  /

post-update value = 111
pre-delete value = 20

PL/SQL procedure successfully completed.

Expressions

The returned values are not limited just to simple column names. It can virtually be any expression you can place in the select-list of a SELECT statement from the affected table.

> declare
    ret1 number;
    ret2 varchar2(10);
  begin
    update t
    set data = round(dbms_random.value(1,100))
    where id = 1
    returning data, decode(mod(data,2),0,'even','odd')
    into ret1, ret2;
 
    dbms_output.put_line('the new value is '||ret1||', an '||ret2||' number');
    rollback;
  end;
  /

the new value is 19, an odd number

PL/SQL procedure successfully completed.

Warning: you can even include in the returned expression a scalar subquery against the affected table. This is probably not a very good idea, especially with INSERT and UPDATE, which return the post-DML values. The scalar subquery, on the other hand, obeys the statement-level read consistency rule – it returns the data as it was before the DML happened. So it may lead to confusing inconsistent results:

> declare
    ret1 number;
    ret2 number;
  begin
    update t
    set data = 29081969
    where id = 1
    returning data, (select max(data) from t)
    into ret1, ret2;

    dbms_output.put_line('just set DATA of a row to ' || ret1 ||
                         ' and the maximum value is (no! *was*) ' || ret2);
    rollback;
  end;
  /

just set DATA of a row to 29081969 and the maximum value is (no! *was*) 30

PL/SQL procedure successfully completed.

BULK COLLECT

The returned values can be placed into collection variables, using RETURNING with the BULK COLLECT INTO clause. This is very useful for the UPDATE and DELETE statements, that may affect multiple rows in a single statement (but this syntax is supported also for the INSERT statement – perhaps as a preparation for supporting RETURNING in multi-row INSERT in the future?).

> declare
    type ntt is table of number;
    ret1 ntt;
    ret2 ntt;
  begin
    update t
    set data = data * 2
    returning id, data
    bulk collect into ret1, ret2;
 
    for i in 1..ret1.count loop
      dbms_output.put_line('id='||ret1(i)||' ; data='||ret2(i));
    end loop;
 
    rollback;
  end;
  /

id=1 ; data=20
id=2 ; data=40
id=3 ; data=60

PL/SQL procedure successfully completed.

Aggregation

For UPDATE and DELETE the returned expression can be an aggregate function. The aggregation is performed on the affected rows, and (since the aggregated value is scalar) BULK COLLECT is not needed when using this option. This is not supported for INSERT.

> declare
    ret1 number;
    ret2 number;
    ret3 varchar2(100);
  begin
    update t
    set data = data * 2
    where id < 3
    returning sum(data),
              avg(data),
              listagg(id||':'||data,',') within group (order by id)
    into ret1, ret2, ret3;
 
    dbms_output.put_line('sum(data) of updated rows = ' || ret1);
    dbms_output.put_line('avg(data) of updated rows = ' || ret2);
    dbms_output.put_line('new values = ' || ret3);
 
    rollback;
  end;
  /
sum(data) of updated rows = 60
avg(data) of updated rows = 30
new values = 1:20,2:40

PL/SQL procedure successfully completed.

RETURNING or RETURN?

One last anecdote: the RETURN keyword can be used instead of RETURNING. I prefer RETURNING.

What about MERGE?

As many others, I wish the MERGE statement would support the RETURNING INTO clause as well.
However, since MERGE is a combination of INSERT, UPDATE and DELETE, and given the differences in the way RETURNING works with these 3 DML statements (as listed above), supporting RETURNING INTO in MERGE is clearly not a simple task.
It may also lead to confusing situations – think for example about a MERGE statement that includes the UPDATE SET and DELETE WHERE clauses – what will RETURNING return for rows affected by the DELETE WHERE clause? The pre-MERGE value, the post-UPDATE-pre-DELETE value, or perhaps simply 42?

Adding a Column with a Default Value and a Constraint

The Constraint Optimization series:


In the previous parts of this series I showed that Oracle does a nice optimization – that may save plenty of time – when we add in a single ALTER TABLE statement a new nullable with no default value column and a corresponding:
– inline (a.k.a. “column-level”) check constraint
or
– foreign key constraint (either inline or out-of-line)

In these cases Oracle enables the constraint (for future DML statements) and marks the constraint as VALIDATED (stating that existing records respect the constraint) without actually performing any check – counting on the fact that all the existing records have NULL in the new column. This is a beautiful optimization – if the table already contains many records, then skipping the (unnecessary) validation phase can save lots of time (during which the table is locked in a highly restrictive mode [I’ll discuss locking in the next part of this series]).

What about adding a new column with a (constant) default value?

Since all the existing records will have the same value in the new column (the default value) – then the validation phase can be very simple and short. Oracle should only check that this single value respects the constraint; there is no reason to visit each and every record in the table for repeating the same validation over and over again. Unfortunately, this kind of optimization is not done.

I added this suggestion to the OTN Database Ideas section: http://community.oracle.com/ideas/17751. Please vote up if you think it’s a good idea.

With the latest releases of Oracle, such an optimization would be valuable in particular, because as of 11g adding a NOT NULLable column with a default value is a metadata-only operation, and as of 12c the same is true also for adding a NULLable column with a default value.

Here is an example with a check constraint:

> alter table t add (c number default 8 not null constraint c_chk check(c>0)); 

We can see in the SQL trace file the following:

=====================
PARSING IN CURSOR #407309168 len=110 dep=1 uid=194 oct=3 lid=0 tim=745434447778 hv=3957567910 ad='7ffb3434b250' sqlid='2z018fgpy7cd6'
 select /*+ all_rows ordered dynamic_sampling(2) */ A.rowid, :1, :2, :3 from "DEMO"."T" A where not ( c>0)
END OF STMT
.
.
.
FETCH #407309168:c=781250,e=3187349,p=142858,cr=142882,cu=0,mis=0,r=0,dep=1,og=1,plh=1601196873,tim=745437636908
STAT #407309168 id=1 cnt=0 pid=0 pos=1 obj=99064 op='TABLE ACCESS FULL T (cr=142882 pr=142858 pw=0 time=3187348 us cost=38895 size=1250000 card=50000)'
CLOSE #407309168:c=0,e=3,dep=1,type=0,tim=745437636980

And here is an example with a foreign key constraint:

> alter table c add (
  parent_id number default 1 not null constraint c_fk_p references p(id)
  );
PARSING IN CURSOR #386313832 len=207 dep=1 uid=0 oct=3 lid=0 tim=747182328580 hv=3249972061 ad='7ffb09624948' sqlid='83w6f8v0vd8ux'
 select /*+ all_rows ordered dynamic_sampling(2) */ A.rowid, :1, :2, :3 
 from "DEMO"."C" A , "DEMO"."P" B 
 where( "A"."PARENT_ID" is not null) and( "B"."ID" (+)= "A"."PARENT_ID") and( "B"."ID" is null)
END OF STMT
.
.
.
FETCH #386313832:c=906250,e=3396711,p=142857,cr=142883,cu=0,mis=0,r=0,dep=1,og=1,plh=1351743368,tim=747185726213
STAT #386313832 id=1 cnt=0 pid=0 pos=1 obj=0 op='HASH JOIN ANTI (cr=142883 pr=142857 pw=0 time=3396711 us cost=40650 size=38000000 card=1000000)'
STAT #386313832 id=2 cnt=1000000 pid=1 pos=1 obj=98387 op='TABLE ACCESS FULL C (cr=142882 pr=142857 pw=0 time=616429 us cost=38895 size=25000000 card=1000000)'
STAT #386313832 id=3 cnt=1 pid=1 pos=2 obj=98386 op='INDEX FULL SCAN P_PK (cr=1 pr=0 pw=0 time=8 us cost=0 size=13 card=1)'
CLOSE #386313832:c=0,e=3,dep=1,type=0,tim=747185726320

COLLECT DISTINCT in PL/SQL Works in Oracle 12.2

About a year ago I wrote the post Subtleties – Part 1 (SQL and PL/SQL). I wrote there:

Almost every valid SQL statement (i.e., that is executed successfully by the SQL engine) can be embedded successfully as a static SQL in PL/SQL. Almost, but not every statement.
One example is the COLLECT aggregate function with the DISTINCT option.

And I showed an example that was executed in 11.2.0.4 and in 12.1.0.2.

Today I tried it in 12.2.0.1, and I was very pleased to see that now COLLECT DISTINCT is working also in PL/SQL.
Continue reading “COLLECT DISTINCT in PL/SQL Works in Oracle 12.2”

(Lack of) Optimization of Unique Constraint Creation

The Constraint Optimization series:


In the previous parts of this series I showed that Oracle does a nice optimization – that may save plenty of time – when we add in a single ALTER TABLE statement a new (nullable with no default value) column and a corresponding:
– inline (a.k.a. “column-level”) check constraint
or
– foreign key constraint (either inline or out-of-line)

What about unique constraints?

When we create a unique constraint on some column, Oracle creates by default a corresponding unique B*Tree index on that column, in order to enforce the constraint.
When we add a new column with no default value to a non-empty table, and in the same ALTER TABLE statement we create a unique constraint on that column, then the corresponding index will be created, by definition, with no entries (as B*Tree indexes do not contain entirely NULL keys). It would be nice if Oracle would simply create an empty index in this case, without scanning all the table rows, but unfortunately this kind of optimization is not done.
Continue reading “(Lack of) Optimization of Unique Constraint Creation”

Optimization of Foreign Key Constraint Creation

The Constraint Optimization series:


In a recent post I showed that Oracle does a nice optimization when we add a new (nullable with no default value) column with an inline (a.k.a. “column-level”) check constraint in a single ALTER TABLE statement. This optimization does not apply for out-of-line (“table-level”) check constraints.

So, what about foreign key constraints?

Clearly, when adding a new nullable with no default value column to a table which contains records, then, by definition, the new column has NULL in all the existing records. If in the same ALTER TABLE statement we also create a foreign key constraint for the new column, then, by definition, the constraint is valid for all the existing records, because NULLs always pass the foreign key constraint check. So in this case it is unnecessary to access each and every record of the table for the validation phase of the foreign key constraint creation, and indeed Oracle applies this optimization, in both inline and out-of-line foreign key constraints. Let’s see it.
Continue reading “Optimization of Foreign Key Constraint Creation”

Optimization that Violates Data Integrity

The Constraint Optimization series:


In the previous post I showed that Oracle does a nice optimization when we add a new (nullable with no default value) column with an inline check constraint in a single ALTER TABLE statement.
However, there is one case where this optimization allows for data integrity violation instead of forbidding it (which makes it a bug, in this specific case, rather than an optimization). It happens when the check constraint is “column IS NOT NULL”.

I’ll setup a table with a few records for demonstrating this:

SQL> create table t (x number);

Table created.

SQL> insert into t select rownum from dual connect by level<=10;

10 rows created.

SQL> commit;

Commit complete.

First, let's try to add a column with an out-of-line constraint, so the optimization will not apply:

SQL>
SQL> ALTER TABLE T ADD (
  2    C NUMBER,
  3    CONSTRAINT C_CHK CHECK (C IS NOT NULL)
  4  );
  CONSTRAINT C_CHK CHECK (C IS NOT NULL)
             *
ERROR at line 3:
ORA-02293: cannot validate (TRANZMATE_PRODUCTION.C_CHK) - check constraint violated

It failed, and that's good.

Now, let's try to add the same column with the same check constraint, but this time we'll use an inline constraint:

SQL> ALTER TABLE T ADD (
  2    C NUMBER CONSTRAINT C_CHK CHECK (C IS NOT NULL)
  3  );

Table altered.

And it succeeded, although it should have failed.
So now we have a validated constraint, stating that C IS NOT NULL, and at the same time the table contains records, and in all of them C IS NULL:

SQL> select table_name,search_condition,status,validated,rely
  2  from user_constraints where constraint_name='C_CHK';

TABLE_NAME SEARCH_CONDITION     STATUS     VALIDATED  RELY
---------- -------------------- ---------- ---------- ------------
T          C IS NOT NULL        ENABLED    VALIDATED

SQL> select * from t;

         X          C
---------- ----------
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10

10 rows selected.

Since we're talking about adding a nullable column with no default value to an existing table that already contains rows, specifying a check constraint that states that this column is not nullable doesn't make much sense, and if someone tries to do it it's probably their bug. Still, it would be nice to get an error in this case, rather than violating the data integrity.

Why does it happen?

When Oracle validates some check constraint, it checks if there are records for which the constraint condition is FALSE. When the column is NULL, then usually the condition evaluates to NULL, not to FALSE, so the constraint is valid. For example, when C is NULL, then "C>0" is NULL (neither FALSE nor TRUE).
I guess this is why Oracle skips the validation step completely as part of this optimization.

What should happen?

It's good that Oracle doesn't validate each and every record, as it's really unnecessary, but to make this optimization foolproof it should validate a single value – NULL.