Write Less with More – Part 5 (Row Limiting)

This post is part 5 of the Write Less with More series of blog posts, focusing on new features of Oracle 12c that allow us developers to write less than in previous versions in order to achieve the same functionality. Each part is dedicated for one new feature that solves some common development task. For more details, including the setup of the examples used throughout the series and the list of development tasks that drive the series, please see part 1.

All the Series Parts

Published

Part 1 – write less configuration with SQL*Loader Express Mode
Part 2 – write less application code with Identity Columns
Part 3 – write less application code with In-Database Archiving
Part 4 – write less application code with Temporal Validity
Part 5 – write less code in SQL with Row Limiting
Part 6 – write less “inappropriately located” code with SELECT FROM Package-Level Collection Types
Part 7 – write less code in SQL with Lateral Inline Views
Part 8 – write less “inappropriately located” code with PL/SQL in the WITH Clause

Task #5

todo5

Pagination is the ability to retrieve an ordered result set one “page” at a time; for example, getting the first 20 records in one query execution, getting the second page (records 21-40) in another execution, or even jumping to page 9 and getting records 161-180 in a third query execution.

A Pre-12c Solution

Before 12c we can implement pagination using inline views and the ROWNUM pseudo column (or analytic functions like ROW_NUMBER), and the result is quite cumbersome. Let’s see such a solution:

We start with the complete result set, with no pagination:

select project_id,
       person_id,
       assignment_id,
       assignment_period_start,
       assignment_period_end
from project_assignments
order by project_id,person_id;

PROJECT_ID  PERSON_ID ASSIGNMENT_ID ASSIGNMENT ASSIGNMENT
---------- ---------- ------------- ---------- ----------
         1        101            39 28/09/2015 30/09/2015
         1        101            38 22/09/2015 24/09/2015
         1        101            37 16/09/2015 18/09/2015
         1        101            40 04/10/2015 06/10/2015
         1        101            41 10/10/2015 12/10/2015
         1        102            44 06/10/2015 08/10/2015
         1        102            43 28/09/2015 30/09/2015
         1        102            42 20/09/2015 22/09/2015
         1        102            45 14/10/2015 16/10/2015
         1        102            46 22/10/2015 24/10/2015
         1        103            47            03/10/2015
         2        101            48 27/09/2015 30/09/2015
         2        102            49 27/09/2015 30/09/2015
         2        103            50 28/09/2015 30/09/2015
         2        104            51 28/09/2015 30/09/2015
         2        201            52 29/09/2015 30/09/2015
         3        101            53 26/09/2015 27/09/2015
         3        102            54 29/09/2015

18 rows selected.

If we want to get only the top 8 records, we can put the previous query in an inline view, and add a condition on ROWNUM:

select x.*
from (
  select project_id,
         person_id,
         assignment_id,
         assignment_period_start,
         assignment_period_end
  from project_assignments
  order by project_id,person_id
) x
where rownum <= 8;

PROJECT_ID  PERSON_ID ASSIGNMENT_ID ASSIGNMENT ASSIGNMENT
---------- ---------- ------------- ---------- ----------
         1        101            37 16/09/2015 18/09/2015
         1        101            38 22/09/2015 24/09/2015
         1        101            39 28/09/2015 30/09/2015
         1        101            40 04/10/2015 06/10/2015
         1        101            41 10/10/2015 12/10/2015
         1        102            42 20/09/2015 22/09/2015
         1        102            43 28/09/2015 30/09/2015
         1        102            44 06/10/2015 08/10/2015

8 rows selected.

If we want to get only records 5-8 (or, in other words, the second page where each page contains 4 records), we’ll add yet one more inline view:

select project_id,
       person_id,
       assignment_id,
       assignment_period_start,
       assignment_period_end 
from (
  select x.*
         ,rownum row_num 
  from (
    select project_id,
           person_id,
           assignment_id,
           assignment_period_start,
           assignment_period_end
    from project_assignments
    order by project_id,person_id
  ) x
  where rownum <= 8
)
where row_num > 4;

PROJECT_ID  PERSON_ID ASSIGNMENT_ID ASSIGNMENT ASSIGNMENT
---------- ---------- ------------- ---------- ----------
         1        101            41 10/10/2015 12/10/2015
         1        102            42 20/09/2015 22/09/2015
         1        102            43 28/09/2015 30/09/2015
         1        102            44 06/10/2015 08/10/2015

4 rows selected.

A 12c Solution

The Row Limiting Clause

In Oracle 12c this can be achieved using the new Row Limiting clause, which can be added to the end of SELECT statements. The size of the page is defined by either an absolute number of records or a specific percent of records out of the complete result set. You can define whether the page starts from the first record of the original result set or from some offset. It can also be defined how to treat “ties” (i.e., when several records with the same value are on the borders of the page).

If you want to learn about many more new features of Oracle 12c, you may be interested in one of my in-house training sessions, particularly the Oracle 12c New Features for Developers full-day seminar.

Top N Queries

To get only the first 4 rows:

select project_id,
       person_id,
       assignment_id,
       assignment_period_start,
       assignment_period_end
from project_assignments
order by project_id,person_id
FETCH FIRST 4 ROWS ONLY;

PROJECT_ID  PERSON_ID ASSIGNMENT_ID ASSIGNMENT ASSIGNMENT
---------- ---------- ------------- ---------- ----------
         1        101            37 16/09/2015 18/09/2015
         1        101            38 22/09/2015 24/09/2015
         1        101            39 28/09/2015 30/09/2015
         1        101            40 04/10/2015 06/10/2015

4 rows selected.

Note: the Row Limiting syntax is quite liberal. Wherever I use the ROWS keyword you can use either ROW or ROWS. Wherever I use FIRST or NEXT you can use either FIRST or NEXT.

Handling Ties

Take a look at the complete ordered result set above (the one that returned all the 18 rows of the table). You can see that the first 5 records have the same PROJET_ID (1) and PERSON_ID (101). It means that the previous query – that returned the first 4 rows – is not deterministic. It may return different 4 rows (out of 5) in different executions, because there is a tie between these 5 rows. The clause ORDER BY PROJECT_ID, PERSON_ID dictates that this set of 5 rows will be returned first, but it does not dictate the order within this set.
If we want to get a deterministic result, we can replace the keyword ONLY with WITH TIES:

select project_id,
       person_id,
       assignment_id,
       assignment_period_start,
       assignment_period_end
from project_assignments
order by project_id,person_id
FETCH FIRST 4 ROWS WITH TIES;

PROJECT_ID  PERSON_ID ASSIGNMENT_ID ASSIGNMENT ASSIGNMENT
---------- ---------- ------------- ---------- ----------
         1        101            37 16/09/2015 18/09/2015
         1        101            38 22/09/2015 24/09/2015
         1        101            39 28/09/2015 30/09/2015
         1        101            40 04/10/2015 06/10/2015
         1        101            41 10/10/2015 12/10/2015

5 rows selected.

FETCH FIRST 4 ROWS WITH TIES means that we get the first 4 rows, and all the successive rows that have the same values in the “ORDER BY” columns as the 4th record.

Top Percentage

We can ask for a specific percentage of the rows rather than a specific number of rows:

select project_id,
       person_id,
       assignment_id,
       assignment_period_start,
       assignment_period_end
from project_assignments
order by project_id,person_id
FETCH FIRST 50 PERCENT ROWS ONLY;

PROJECT_ID  PERSON_ID ASSIGNMENT_ID ASSIGNMENT ASSIGNMENT
---------- ---------- ------------- ---------- ----------
         1        101            39 28/09/2015 30/09/2015
         1        101            38 22/09/2015 24/09/2015
         1        101            37 16/09/2015 18/09/2015
         1        101            40 04/10/2015 06/10/2015
         1        101            41 10/10/2015 12/10/2015
         1        102            44 06/10/2015 08/10/2015
         1        102            43 28/09/2015 30/09/2015
         1        102            42 20/09/2015 22/09/2015
         1        102            45 14/10/2015 16/10/2015

9 rows selected.

Once again, this is not a deterministic result. To make it deterministic we use WITH TIES instead of ONLY, just as in the previous section:

select project_id,
       person_id,
       assignment_id,
       assignment_period_start,
       assignment_period_end
from project_assignments
order by project_id,person_id
FETCH FIRST 50 PERCENT ROWS WITH TIES;

PROJECT_ID  PERSON_ID ASSIGNMENT_ID ASSIGNMENT ASSIGNMENT
---------- ---------- ------------- ---------- ----------
         1        101            39 28/09/2015 30/09/2015
         1        101            38 22/09/2015 24/09/2015
         1        101            37 16/09/2015 18/09/2015
         1        101            40 04/10/2015 06/10/2015
         1        101            41 10/10/2015 12/10/2015
         1        102            44 06/10/2015 08/10/2015
         1        102            43 28/09/2015 30/09/2015
         1        102            42 20/09/2015 22/09/2015
         1        102            45 14/10/2015 16/10/2015
         1        102            46 22/10/2015 24/10/2015

10 rows selected.

Note that we got 10 rows, which is more than the requested 50%, because the 10th row has the same PROJECT_ID and PERSON_ID as the 9th row.

Performance Considerations

Note: The percentage option means that Oracle has to count first the total number of records in the underlying result set. This can consume significant resources and time for large data sets.

Recommended Reading: Franck Patchot has recently compared (in its excellent blog) the execution plans of three alternatives of Top N queries: using ROWNUM, using ROW_NUMBER and 12c Row Limiting.

Paging

To get only the 4 rows that follow the first 4 rows (i.e., rows 5-8), we add the OFFSET clause:

select project_id,
       person_id,
       assignment_id,
       assignment_period_start,
       assignment_period_end
from project_assignments
order by project_id,person_id
OFFSET 4 ROWS
FETCH NEXT 4 ROWS ONLY;

PROJECT_ID  PERSON_ID ASSIGNMENT_ID ASSIGNMENT ASSIGNMENT
---------- ---------- ------------- ---------- ----------
         1        101            41 10/10/2015 12/10/2015
         1        102            42 20/09/2015 22/09/2015
         1        102            43 28/09/2015 30/09/2015
         1        102            44 06/10/2015 08/10/2015

4 rows selected.

Behind the Scenes

Using the DBMS_UTILITY.EXPAND_SQL_TEXT procedure (yet another new feature of Oracle 12c), we can see that Oracle actually translates the new Row Limiting clause into conditions with analytic functions – ROW_NUMBER for the ONLY option and RANK for the WITH TIES option. Here are some examples:

var x clob
begin
  dbms_utility.expand_sql_text(
    input_sql_text => '
      select project_id,
             person_id,
             assignment_id,
             assignment_period_start,
             assignment_period_end
      from project_assignments
      order by project_id,person_id
      OFFSET 100 ROWS 
      FETCH NEXT 4 ROWS ONLY',
    output_sql_text => :x);
end;
/
 
print x

-- I formatted the output to make it more readable
SELECT "A1"."PROJECT_ID"              "PROJECT_ID",
       "A1"."PERSON_ID"               "PERSON_ID",
       "A1"."ASSIGNMENT_ID"           "ASSIGNMENT_ID",
       "A1"."ASSIGNMENT_PERIOD_START" "ASSIGNMENT_PERIOD_START",
       "A1"."ASSIGNMENT_PERIOD_END"   "ASSIGNMENT_PERIOD_END"
FROM   (SELECT "A2"."PROJECT_ID" "PROJECT_ID",
               "A2"."PERSON_ID" "PERSON_ID",
               "A2"."ASSIGNMENT_ID" "ASSIGNMENT_ID",
               "A2"."ASSIGNMENT_PERIOD_START" "ASSIGNMENT_PERIOD_START",
               "A2"."ASSIGNMENT_PERIOD_END" "ASSIGNMENT_PERIOD_END",
               "A2"."PROJECT_ID" "rowlimit_$_0",
               "A2"."PERSON_ID" "rowlimit_$_1",
               row_number() over(ORDER BY "A2"."PROJECT_ID", "A2"."PERSON_ID") "rowlimit_$$_rownumber"
        FROM   "DEMO5"."PROJECT_ASSIGNMENTS""A2") "A1"
WHERE  "A1"."rowlimit_$$_rownumber" <= CASE WHEN (100 >= 0) THEN floor(to_number(100)) ELSE 0 END + 4
AND    "A1"."rowlimit_$$_rownumber" > 100
ORDER  BY "A1"."rowlimit_$_0",
          "A1"."rowlimit_$_1"
begin
  dbms_utility.expand_sql_text(
    input_sql_text  => '
      select project_id,
             person_id,
             assignment_id,
             assignment_period_start,
             assignment_period_end
      from project_assignments
      order by project_id,person_id
      OFFSET 100 ROWS 
      FETCH NEXT 4 ROWS WITH TIES',
    output_sql_text => :x);
end;
/
 
print x

-- I formatted the output to make it more readable
SELECT "A1"."PROJECT_ID"              "PROJECT_ID",
       "A1"."PERSON_ID"               "PERSON_ID",
       "A1"."ASSIGNMENT_ID"           "ASSIGNMENT_ID",
       "A1"."ASSIGNMENT_PERIOD_START" "ASSIGNMENT_PERIOD_START",
       "A1"."ASSIGNMENT_PERIOD_END"   "ASSIGNMENT_PERIOD_END"
FROM   (SELECT "A2"."PROJECT_ID" "PROJECT_ID",
               "A2"."PERSON_ID" "PERSON_ID",
               "A2"."ASSIGNMENT_ID" "ASSIGNMENT_ID",
               "A2"."ASSIGNMENT_PERIOD_START" "ASSIGNMENT_PERIOD_START",
               "A2"."ASSIGNMENT_PERIOD_END" "ASSIGNMENT_PERIOD_END",
               "A2"."PROJECT_ID" "rowlimit_$_0",
               "A2"."PERSON_ID" "rowlimit_$_1",
               row_number() over(ORDER BY "A2"."PROJECT_ID", "A2"."PERSON_ID") "rowlimit_$$_rownumber",
               rank() over(ORDER BY "A2"."PROJECT_ID", "A2"."PERSON_ID") "rowlimit_$$_rank"
        FROM   "DEMO5"."PROJECT_ASSIGNMENTS" "A2") "A1"
WHERE  "A1"."rowlimit_$$_rank" <= CASE WHEN (100 >= 0) THEN floor(to_number(100)) ELSE 0 END + 4
AND    "A1"."rowlimit_$$_rownumber" > 100
ORDER  BY "A1"."rowlimit_$_0",
          "A1"."rowlimit_$_1"
begin
  dbms_utility.expand_sql_text(
    input_sql_text  => '
      select project_id,
             person_id,
             assignment_id,
             assignment_period_start,
             assignment_period_end
      from project_assignments
      order by project_id,person_id
      FETCH FIRST 10 percent ROWS only',
    output_sql_text => :x);
end;
/
 
print x
-- I formatted the output to make it more readable

SELECT "A1"."PROJECT_ID"              "PROJECT_ID",
       "A1"."PERSON_ID"               "PERSON_ID",
       "A1"."ASSIGNMENT_ID"           "ASSIGNMENT_ID",
       "A1"."ASSIGNMENT_PERIOD_START" "ASSIGNMENT_PERIOD_START",
       "A1"."ASSIGNMENT_PERIOD_END"   "ASSIGNMENT_PERIOD_END"
FROM   (SELECT "A2"."PROJECT_ID" "PROJECT_ID",
               "A2"."PERSON_ID" "PERSON_ID",
               "A2"."ASSIGNMENT_ID" "ASSIGNMENT_ID",
               "A2"."ASSIGNMENT_PERIOD_START" "ASSIGNMENT_PERIOD_START",
               "A2"."ASSIGNMENT_PERIOD_END" "ASSIGNMENT_PERIOD_END",
               "A2"."PROJECT_ID" "rowlimit_$_0",
               "A2"."PERSON_ID" "rowlimit_$_1",
               row_number() over(ORDER BY "A2"."PROJECT_ID", "A2"."PERSON_ID") "rowlimit_$$_rownumber",
               COUNT(*) over() "rowlimit_$$_total"
        FROM   "DEMO5"."PROJECT_ASSIGNMENTS" "A2") "A1"
WHERE  "A1"."rowlimit_$$_rownumber" <= ceil("A1"."rowlimit_$$_total" * 10 / 100)
ORDER  BY "A1"."rowlimit_$_0",
          "A1"."rowlimit_$_1"

Conclusion

We saw in this part of the Write Less with More series that Row Limiting allows us to write less code in SQL.
The next post will be about another new Oracle 12c feature – SELECT FROM Package-Level Collection Types.

Write Less with More – Part 4 (Temporal Validity)

This post is part 4 of the Write Less with More series of blog posts, focusing on new features of Oracle 12c that allow us developers to write less than in previous versions in order to achieve the same functionality. Each part is dedicated for one new feature that solves some common development task. For more details, including the setup of the examples used throughout the series and the list of development tasks that drive the series, please see part 1.

All the Series Parts

Published

Part 1 – write less configuration with SQL*Loader Express Mode
Part 2 – write less application code with Identity Columns
Part 3 – write less application code with In-Database Archiving
Part 4 – write less application code with Temporal Validity
Part 5 – write less code in SQL with Row Limiting
Part 6 – write less “inappropriately located” code with SELECT FROM Package-Level Collection Types
Part 7 – write less code in SQL with Lateral Inline Views
Part 8 – write less “inappropriately located” code with PL/SQL in the WITH Clause

Task #4

todo4

PROJECT_ASSIGNMENTS is too simplistic. At this point (after it was defined in part 1 and enhanced in Part 2) it only says which people are assigned to which projects, not when or for how long. Now we want to add this missing information.

A Pre-12c Solution

Before 12c we could add a pair of date/time columns to represent the active period boundaries, e.g. PERIOD_START and PERIOD_END:

alter table project_assignments add (
  period_start date,
  period_end   date
);

When dealing with time periods there are three important decisions to make:
One decision is how to represent periods that their start point or end point (or even both) are unlimited. I will use NULL for this.

I recommend reading Stew Ashton’s post about his preference of not using NULL to represent unlimited period boundaries. It doesn’t matter if you agree or not – reading about the reasons is the important part. Actually, I recommend reading everything Stew writes 🙂


Another decision is whether the period boundaries are inclusive or exclusive. I like to include the start point in the period and exclude the end point. In this way, if we want to assign someone to a project for the whole month of February 2015 (for example) we will use quite simple values:

insert into project_assignments (person_id,project_id,period_start,period_end)
values (101,1,date'2015-02-01',date'2015-03-01');

If I’d choose to include both the start point and the end point in the period, I would have to use a more complex value:

insert into project_assignments (person_id,project_id,period_start,period_end)
values (101,1,date'2015-02-01',timestamp'2015-02-28 23:59:59');

And a third decision is which data type to use for the start/end columns – DATE, TIMESTAMP or TIMESTAMP WITH TIME ZONE. This decision is derived from the functional requirements for period boundary granularity (and your data modeling standards, if you have any [I hope you do]).

After we added the columns to the table, we need to add conditions to all the relevant SQL statements in the application (or applications).
For example, if we want to get all the assignments that are active on a specific date:

select * from project_assignments a
where (a.period_start is null or a.period_start <= :my_date)
  and (a.period_end is null or a.period_end > :my_date);

And if we want to get all the assignments that are active sometime within a specific time range:

select * from project_assignments a
where (a.period_start is null or a.period_start < :my_end_date)
  and (a.period_end is null or a.period_end > :my_start_date);

A 12c Solution

Temporal Validity

In Oracle 12c a table can be defined with the PERIOD FOR clause, to associate it with one or more valid time dimensions. Each such dimension consists of a pair of date/time columns. These columns can be created either explicitly or implicitly. As a result, the data can become visible or invisible, based on statement- or session-level definitions.
In our case we can simply add a period called ASSIGNMENT_PERIOD to the table:
 

alter table project_assignments 
       add PERIOD FOR assignment_period;

This statement added two hidden TIMESTAMP WITH TIME ZONE columns – ASSIGNMENT_PERIOD_START and ASSIGNMENT_PERIOD_END – which can (and should) be set explicitly when records of PROJECT_ASSIGNMENTS are inserted and updated.
When we DESCRIBE the table, we see only the columns we explicitly defined:

> desc project_assignments
 Name                                         Null?    Type
 -------------------------------------------- -------- ----------------------
 ASSIGNMENT_ID                                NOT NULL NUMBER(38)
 PERSON_ID                                    NOT NULL NUMBER(38)
 PROJECT_ID                                   NOT NULL NUMBER(38)

But if we select from USER_TAB_COLS we discover the new columns – ASSIGNMENT_PERIOD_START and ASSIGNMENT_PERIOD_END (and one virtual column – ASSIGNMENT_PERIOD – that I will not discuss here):

> select column_name,data_type,nullable,hidden_column,virtual_column,user_generated
    from user_tab_cols
    where table_name = 'PROJECT_ASSIGNMENTS';

COLUMN_NAME                    DATA_TYPE                   NUL HIDDEN_CO VIRTUAL_C USER_GENE
------------------------------ --------------------------- --- --------- --------- ---------
ASSIGNMENT_ID                  NUMBER                      N   NO        NO        YES
PERSON_ID                      NUMBER                      N   NO        NO        YES
PROJECT_ID                     NUMBER                      N   NO        NO        YES
ASSIGNMENT_PERIOD_START        TIMESTAMP(6) WITH TIME ZONE Y   YES       NO        NO
ASSIGNMENT_PERIOD_END          TIMESTAMP(6) WITH TIME ZONE Y   YES       NO        NO
ASSIGNMENT_PERIOD              NUMBER                      Y   YES       YES       NO

Note: there is a major difference between the data dictionary views USER_TAB_COLUMNS and USER_TAB_COLS: while the former exposes only the explicitly defined columns, the latter includes also system-generated columns.

Update [17-Sep-2021] Connor McDonald shows how the period information is stored in the data dictionary – https://connor-mcdonald.com/2021/09/17/is-my-table-temporal/

In addition to the columns, Oracle also implicitly created a check constraint, to make sure the start time is always before the end time:

> select constraint_name,constraint_type,search_condition
  from user_constraints
  where table_name='PROJECT_ASSIGNMENTS'
    and constraint_name like 'ASSIGNMENT_PERIOD%';

CONSTRAINT_NAME                CON SEARCH_CONDITION
------------------------------ --- ------------------------------------------------------
ASSIGNMENT_PERIOD3366E0        C   ASSIGNMENT_PERIOD_START < ASSIGNMENT_PERIOD_END

Note that the condition is “less than” and not “less than or equal to”. The Temporal Validity feature uses the same rule I used in the pre-12c solution – the period includes the start time and excludes the end time.

Now, let’s fill the table with some data. We need to explicitly set values to the system-generated columns:

insert into project_assignments
    (person_id,
     project_id,
     assignment_period_start,
     assignment_period_end)
values
    (101,
     1,
     date'2015-09-16',
     date'2015-09-18');

I added some more records as a preparation for the next examples. This is the content of the table at this point:

> select person_id,
         project_id,
         assignment_period_start,
         assignment_period_end
  from   project_assignments
  order  by assignment_period_start,
            assignment_period_end;

                      PERIOD     PERIOD
 PERSON_ID PROJECT_ID START      END
---------- ---------- ---------- ----------
       101          1 16/09/2015 18/09/2015
       102          1 20/09/2015 22/09/2015
       101          1 22/09/2015 24/09/2015
       101          3 26/09/2015 27/09/2015
       101          2 27/09/2015 30/09/2015
       102          2 27/09/2015 30/09/2015
       101          1 28/09/2015 30/09/2015
       104          2 28/09/2015 30/09/2015
       103          2 28/09/2015 30/09/2015
       102          1 28/09/2015 30/09/2015
       201          2 29/09/2015 30/09/2015
       102          3 29/09/2015
       101          1 04/10/2015 06/10/2015
       102          1 06/10/2015 08/10/2015
       101          1 10/10/2015 12/10/2015
       102          1 14/10/2015 16/10/2015
       102          1 22/10/2015 24/10/2015
       103          1            03/10/2015

18 rows selected.

Nulls represent unlimited start time or end time.

Statement-Level Control

To get, in a specific statement, only records that are active on a specific date, we add the AS OF PERIOD clause.
For example, to get only the project assignments that are active today (when today is September 29th, 2015):

> select person_id,
         project_id,
         assignment_period_start,
         assignment_period_end
  from   project_assignments
  AS OF PERIOD FOR assignment_period SYSDATE;

                      PERIOD     PERIOD
 PERSON_ID PROJECT_ID START      END
---------- ---------- ---------- ----------
       101          1 28/09/2015 30/09/2015
       102          1 28/09/2015 30/09/2015
       103          1            03/10/2015
       101          2 27/09/2015 30/09/2015
       102          2 27/09/2015 30/09/2015
       103          2 28/09/2015 30/09/2015
       104          2 28/09/2015 30/09/2015
       201          2 29/09/2015 30/09/2015
       102          3 29/09/2015

9 rows selected.

Or to get only the project assignments that will be active on October 10th, 2015:

> select person_id,
         project_id,
         assignment_period_start,
         assignment_period_end
  from   project_assignments
  AS OF PERIOD FOR assignment_period date'2015-10-10';

                      PERIOD     PERIOD
 PERSON_ID PROJECT_ID START      END
---------- ---------- ---------- ----------
       101          1 10/10/2015 12/10/2015
       102          3 29/09/2015

If we want to get only records that are active in a specific time range, we use the VERSIONS PERIOD clause. For example:

> select person_id,
         project_id,
         assignment_period_start,
         assignment_period_end
  from   project_assignments
  VERSIONS PERIOD FOR assignment_period
    between date'2015-09-21' and date'2015-09-23';

                      PERIOD     PERIOD
 PERSON_ID PROJECT_ID START      END
---------- ---------- ---------- ----------
       101          1 22/09/2015 24/09/2015
       102          1 20/09/2015 22/09/2015
       103          1            03/10/2015

Session-Level Control

We can also filter non-active records out in the session-level, affecting all the statements without changing them. We do it by calling the procedure DBMS_FLASHBACK_ARCHIVE.ENABLE_AT_VALID_TIME. This procedure has 3 options.
The first option it to pass the value ‘CURRENT’ to the LEVEL parameter (the first parameter), which means that all the subsequent statements in the session will consider only records that are active (or valid) at the execution time, for all the tables that were defined with the PERIOD FOR clause.

> exec dbms_flashback_archive.enable_at_valid_time('CURRENT')

PL/SQL procedure successfully completed.

> select person_id,
         project_id,
         assignment_period_start,
         assignment_period_end
  from   project_assignments;

                      PERIOD     PERIOD
 PERSON_ID PROJECT_ID START      END
---------- ---------- ---------- ----------
       101          1 28/09/2015 30/09/2015
       102          1 28/09/2015 30/09/2015
       103          1            03/10/2015
       101          2 27/09/2015 30/09/2015
       102          2 27/09/2015 30/09/2015
       103          2 28/09/2015 30/09/2015
       104          2 28/09/2015 30/09/2015
       201          2 29/09/2015 30/09/2015
       102          3 29/09/2015

9 rows selected.

The second option is to pass the value ‘ASOF’ (read “as of”) to the LEVEL parameter. In this case we need to pass also a concrete timestamp value to the QUERY_TIME parameter (the second parameter).

> exec dbms_flashback_archive.enable_at_valid_time('ASOF',date'2015-10-10')

PL/SQL procedure successfully completed.

> select person_id,
         project_id,
         assignment_period_start,
         assignment_period_end
  from   project_assignments;

                      PERIOD     PERIOD
 PERSON_ID PROJECT_ID START      END
---------- ---------- ---------- ----------
       101          1 10/10/2015 12/10/2015
       102          3 29/09/2015

And the third option is to pass the value ‘ALL’ to the LEVEL parameter, which means no filtering occurs in subsequent statements.

> exec dbms_flashback_archive.enable_at_valid_time('ALL')

PL/SQL procedure successfully completed.

> select person_id,
         project_id,
         assignment_period_start,
         assignment_period_end
  from   project_assignments;

                      PERIOD     PERIOD
 PERSON_ID PROJECT_ID START      END
---------- ---------- ---------- ----------
       101          1 16/09/2015 18/09/2015
       101          1 22/09/2015 24/09/2015
       101          1 28/09/2015 30/09/2015
       101          1 04/10/2015 06/10/2015
       101          1 10/10/2015 12/10/2015
       102          1 20/09/2015 22/09/2015
       102          1 28/09/2015 30/09/2015
       102          1 06/10/2015 08/10/2015
       102          1 14/10/2015 16/10/2015
       102          1 22/10/2015 24/10/2015
       103          1            03/10/2015
       101          2 27/09/2015 30/09/2015
       102          2 27/09/2015 30/09/2015
       103          2 28/09/2015 30/09/2015
       104          2 28/09/2015 30/09/2015
       201          2 29/09/2015 30/09/2015
       101          3 26/09/2015 27/09/2015
       102          3 29/09/2015

18 rows selected.

Warning:
if the value you pass to the LEVEL parameter of dbms_flashback_archive.enable_at_valid_time is other than ‘CURRENT’, ‘ASOF’ or ‘ALL’, it will NOT change anything and the previous setting will still take effect.
The dangerous part is that the procedure does not throw any exception; it completes successfully (doing nothing).
And these 3 values are case-sensitive. Passing ‘current’ or ‘AsOf’, for example, will silently do nothing.

There are two sides to every coin

We’ve just seen that Temporal Validity adds hidden columns to tables and hidden predicates to SQL statements. This is cool, but also dangerous. When things are hidden, it is easy to forget them, but we shouldn’t. See what I’ve already written about that with regards to In-Database Archiving in part 3.

Conclusion

We saw in this part of the Write Less with More series that Temporal Validity allows us to write less application code.

The next post will be about another new Oracle 12c feature – Row Limiting.

When X+0 and X-0 are not the same

In the old days, when the Rule Based Optimizer (RBO) ruled, there was a very common technique to help the optimizer choose one plan over the other (if they had the same rank) by preventing the use of an index.

Look at the following query, where there are unique indexes on T1.ID and T2.ID and non-unique indexes on T1.A and T2.B:

select *
  from T1,T2
 where T1.ID = T2.ID
   and T1.A = :value1
   and T2.B = :value2;

One possible plan is to use nested loops, starting by accessing T1 via the index on A and then accessing T2 via the index on ID.
A second possible plan is to use nested loops, starting by accessing T2 via the index on B and then accessing T1 via the index on ID.

For the RBO these two plans were similar. If we wanted to convince the RBO to choose the second plan, a common trick was to prevent the use of the index on T1.A by adding “+0” to the condition:

select *
  from T1,T2
 where T1.ID = T2.ID
   and T1.A + 0 = :value1
   and T2.B = :value2;

What happens if we try using the same technique today, with the Cost Based Optimizer (CBO)?
Not that I recommend it!
It’s certainly not the way to influence the CBO, but I admit that I recently used it myself (it was in some weird constrained situation, please don’t ask…) and something surprised me. I’ll show you what in this post.

But first…
What is the cardinality of “column = some_value” (i.e., how many rows will return from this condition)? When the CBO doesn’t have histograms, its best guess is the number of rows in the table / the number of distinct values in the column.
And what is the cardinality of “some_function_of_column = some_value”? Without extended statistics or virtual columns, the CBO simply guesses that the cardinality is 1% of the table.
Keep this in mind.

I’ll start by crafting an example:

-- 10000 rows in T1
-- 5000 distinct values in A
-- A=0 in 5000 rows
create table t1 as 
  select rownum id,
         case when rownum<5000 then rownum else 0 end a, 
         lpad('x',100,'x') x
    from dual
 connect by level<=10000;

alter table t1 
  add constraint t1_pk primary key (id);

create index t1_i1 on t1 (a);

-- 1000 rows in T2
-- 21 distinct values in B
-- B=0 in 1 row only
create table t2 as 
  select rownum id,
         case when rownum<1000 then ceil(rownum/50) else 0 end b,
         lpad('x',10,'x') x 
    from dual 
 connect by level<=1000;

alter table t2 
  add constraint t2_pk primary key (id);

create index t2_i1 on t2 (b);

begin
  dbms_stats.gather_table_stats(user,'T1',method_opt =>'for all columns size 1',cascade => true);
  dbms_stats.gather_table_stats(user,'T2',method_opt =>'for all columns size 1',cascade => true);
end;
/

Note that I gathered statistics, but without histograms.

Now let’s run the following query:

> set autotrace on

> select *
    from T1,T2
   where T1.ID = T2.ID
     and T1.A = 0
     and T2.B = 0;

no rows selected

Execution Plan
----------------------------------------------------------
Plan hash value: 1001848667

---------------------------------------------------------------------------------------
| Id  | Operation                     | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |       |     2 |   252 |     3   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                 |       |     2 |   252 |     3   (0)| 00:00:01 |
|   2 |   NESTED LOOPS                |       |     2 |   252 |     3   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID| T1    |     2 |   216 |     2   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN          | T1_I1 |     2 |       |     1   (0)| 00:00:01 |
|*  5 |    INDEX UNIQUE SCAN          | T2_PK |     1 |       |     0   (0)| 00:00:01 |
|*  6 |   TABLE ACCESS BY INDEX ROWID | T2    |     1 |    18 |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("T1"."A"=0)
   5 - access("T1"."ID"="T2"."ID")
   6 - filter("T2"."B"=0)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         99  consistent gets
          0  physical reads
          0  redo size
        646  bytes sent via SQL*Net to client
        513  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          0  rows processed

Without histograms the optimizer assumes uniform distribution, so the estimated cardinality of “T1.A=0” is 2 rows (=10000 rows in the table / 5000 distinct values in A), as we can see in the execution plan. And the estimated cardinality of “T2.B=0” is about 48 rows (=1000 rows in the table / 21 distinct values).
But we know that the data is skewed. The actual cardinality of “T1.A=0” is 5000 rows (there are 5000 rows out of 10000 with the value 0 in T1.A), and the actual cardinality of “T2.B=0” is 1 row. So it will be much better in this case to start the plan by accessing the index on T2.B.
The sensible way would be to gather histograms on the columns with the skewed data, to use the cardinality hint, or other “modern” approach. But what will happen if we use the old trick of preventing using the index on T1.A by adding “+0”? Let’s see:

> select *
    from T1,T2
   where T1.ID = T2.ID
     and T1.A + 0 = 0
     and T2.B = 0;

no rows selected

Execution Plan
----------------------------------------------------------
Plan hash value: 1837274416

--------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |     2 |   252 |    49   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                |       |     2 |   252 |    49   (0)| 00:00:01 |
|   2 |   NESTED LOOPS               |       |     2 |   252 |    49   (0)| 00:00:01 |
|*  3 |    TABLE ACCESS FULL         | T1    |     2 |   216 |    48   (0)| 00:00:01 |
|*  4 |    INDEX UNIQUE SCAN         | T2_PK |     1 |       |     0   (0)| 00:00:01 |
|*  5 |   TABLE ACCESS BY INDEX ROWID| T2    |     1 |    18 |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("T1"."A"+0=0)
   4 - access("T1"."ID"="T2"."ID")
   5 - filter("T2"."B"=0)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        170  consistent gets
          0  physical reads
          0  redo size
        646  bytes sent via SQL*Net to client
        513  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          0  rows processed

Hurray! we succeeded to prevent the use of the index on T1.A. But, wait, the cardinality estimation was not changed – it is still 2. I would expect it to be 100 rows – 1% of 10000 – the usual guess of the CBO in case of “function_of_some_column = some_value”.
And because of this estimation, the plan still starts from T1, but this time with a full table scan (as we neutralized the index use). And this is a worse plan than the first one – note the 170 consistent gets compared to 99 before.

So the CBO is smart enough to realize that A+0 = A.

What about A-0 or A*1 ? Let’s try:

> select *
    from T1,T2
   where T1.ID = T2.ID
     and T1.A - 0 = 0
     and T2.B = 0;

no rows selected

Execution Plan
----------------------------------------------------------
Plan hash value: 1272512196

---------------------------------------------------------------------------------------
| Id  | Operation                     | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |       |    48 |  6048 |    50   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                 |       |    48 |  6048 |    50   (0)| 00:00:01 |
|   2 |   NESTED LOOPS                |       |    48 |  6048 |    50   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID| T2    |    48 |   864 |     2   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN          | T2_I1 |    48 |       |     1   (0)| 00:00:01 |
|*  5 |    INDEX UNIQUE SCAN          | T1_PK |     1 |       |     0   (0)| 00:00:01 |
|*  6 |   TABLE ACCESS BY INDEX ROWID | T1    |     1 |   108 |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("T2"."B"=0)
   5 - access("T1"."ID"="T2"."ID")
   6 - filter("T1"."A"-0=0)

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
          6  consistent gets
          0  physical reads
          0  redo size
        646  bytes sent via SQL*Net to client
        513  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          0  rows processed

We changed “A+0” to “A-0″ and got a totally different execution plan! Unlike with A+0, the CBO does not realize that A-0=A. It treats A-0 just as any other function, and guesses that the cardinality of “A-0=0” is 1% of T1, which is 100 rows.

Remember that the example I used has no real significance. I just tried to mimic an old technique that was relevant for the RBO and is not appropriate for the CBO.

The important part is that the CBO gives a special treatment to COLUMN + 0 when it calculates cardinality. We can see it more explicitly from the following execution plans:

> select * from T1 where A = :val;

Execution Plan
----------------------------------------------------------
Plan hash value: 1429545322

-------------------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |     2 |   216 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1    |     2 |   216 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T1_I1 |     2 |       |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("A"=TO_NUMBER(:VAL))
> select * from T1 where A + 0 = :val;

Execution Plan
----------------------------------------------------------
Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     2 |   216 |    48   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |     2 |   216 |    48   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("A"+0=TO_NUMBER(:VAL))
> select * from T1 where A - 0 = :val;

Execution Plan
----------------------------------------------------------
Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   100 | 10800 |    48   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |   100 | 10800 |    48   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("A"-0=TO_NUMBER(:VAL))
> select * from T1 where A * 1 = :val;

Execution Plan
----------------------------------------------------------
Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   100 | 10800 |    48   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |   100 | 10800 |    48   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("A"*1=TO_NUMBER(:VAL))

One last check: what happens if instead of 0 we’ll use a bind variable, and we’ll bind 0 in the first hard parse (with bind peeking enabled). Let’s see:

> var zero number
> exec :zero := 0
> select * from T1 where A + :zero = 1;
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |       |       |    48 (100)|          |
|*  1 |  TABLE ACCESS FULL| T1   |   100 | 10800 |    48   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("A"+:ZERO=1)

So the special treatment happens only with 0 as literal, not as bind variable.

Do you know of other cases like this one?

 

Note: I executed all the examples in 11.2.0.4 and in 12.1.0.2.

Split

Here is a small pipelined table function that gets one string that includes a delimited list of values, and returns these values as a table:

create or replace function split (i_str in varchar2, i_delimiter in varchar2 default ',') 
  return sys.odcivarchar2list
  pipelined
  as
    l_current_string varchar2(4000) := i_str;
    l_pos            binary_integer;
  begin
    if i_str is null then
        return;
    end if;
    loop
      l_pos := nullif(instr(l_current_string, i_delimiter), 0);
      pipe row(substr(l_current_string, 1, nvl(l_pos - 1, length(l_current_string))));
      exit when l_pos is null;
      l_current_string := substr(l_current_string, l_pos + length(i_delimiter));
  end loop;
end split;
/

Once the string is converted to a table, it is easy to manipulate it just like any “regular” table. For example:

select initcap(column_value) as name,
       length(column_value) as name_length
from   table(split('paris,london,rome,madrid'))
order  by column_value;

NAME                           NAME_LENGTH
------------------------------ -----------
London                                   6
Madrid                                   6
Paris                                    5
Rome                                     4

Note that the default delimiter is a comma, but other characters (or sub-strings) may be specified:

select * 
from table(split('Paris, France@@London, UK@@Rome, Italy@@Madrid, Spain',
                '@@'));

COLUMN_VALUE
--------------------
Paris, France
London, UK
Rome, Italy
Madrid, Spain

Write Less with More – Part 3 (In-Database Archiving)

This post is part 3 of the Write Less with More series of blog posts, focusing on new features of Oracle 12c that allow us developers to write less than in previous versions in order to achieve the same functionality. Each part is dedicated for one new feature that solves some common development task. For more details, including the setup of the examples used throughout the series and the list of development tasks that drive the series, please see part 1.

All the Series Parts

Published

Part 1 – write less configuration with SQL*Loader Express Mode
Part 2 – write less application code with Identity Columns
Part 3 – write less application code with In-Database Archiving
Part 4 – write less application code with Temporal Validity
Part 5 – write less code in SQL with Row Limiting
Part 6 – write less “inappropriately located” code with SELECT FROM Package-Level Collection Types
Part 7 – write less code in SQL with Lateral Inline Views
Part 8 – write less “inappropriately located” code with PL/SQL in the WITH Clause

Task #3

todo3

We don’t want to “really” delete obsolete PROJECTS records, as we may have PROJECT_ASSIGNMENTS records that reference them (via a foreign key constraint). We just want to “hide” them, or, as it is often called, “logically delete” them.

A Pre-12c Solution

We can add a column to the PROJECTS table – IS_DELETED – which contains either 0 (representing an active project) or 1 (an obsolete project):

ALTER TABLE projects ADD 
  is_deleted NUMBER(1) DEFAULT 0 NOT NULL 
  CHECK (is_deleted IN (0,1)); 

Now, since usually we want to hide the obsolete projects, we can rename the table to, say, ALL_PROJECTS, and create a view that exposes only the active records:

RENAME projects TO all_projects;
CREATE VIEW projects AS
  SELECT * 
  FROM all_projects 
  WHERE is_deleted=0;

By naming the view PROJECTS, just as the table’s original name, we make all the existing references to PROJECTS throughout the application (or applications) see only active projects. Now we only need to handle the rare cases where obsolete projects should be seen, by using ALL_PROJECTS in the relevant SQL statements.

A 12c Solution

In-Database Archiving

In Oracle 12c tables can be defined as ROW ARCHIVAL. As a result, a hidden column – ORA_ARCHIVE_STATE – is implicitly added to the table, holding an archiving (“logical deletion”) state. The default value is ‘0’, representing the “Non-Archived” state, and any other value means “Archived”. Based on a session-level setting, “Archived” rows are either visible or not.

If you want to learn about many more new features of Oracle 12c, you may be interested in one of my in-house training sessions, particularly the Oracle 12c New Features for Developers full-day seminar.

In our case we can recreate the table like this:

drop table projects cascade constraints;
create table projects (
       project_id integer not null constraint projects_pk primary key,
       project_name varchar2(100) not null,
       status number(1) not null,
       last_days_to_show_in_reports integer not null
)
ROW ARCHIVAL;

When we DESCRIBE the table, we see only the columns we explicitly defined:

> desc projects
 Name                                         Null?    Type
 -------------------------------------------- -------- ------------------------------
 PROJECT_ID                                   NOT NULL NUMBER(38)
 PROJECT_NAME                                 NOT NULL VARCHAR2(100)
 STATUS                                       NOT NULL NUMBER(1)
 LAST_DAYS_TO_SHOW_IN_REPORTS                 NOT NULL NUMBER(38)

But if we select from USER_TAB_COLS we discover one more column – ORA_ARCHIVE_STATE:

> select column_name, data_type, column_id, 
         hidden_column, char_length, data_default
  from user_tab_cols 
  where table_name = 'PROJECTS';

COLUMN_NAME                  DATA_TYPE   COLUMN_ID HIDDEN_CO CHAR_LENGTH DATA_DEFAULT
---------------------------- ---------- ---------- --------- ----------- ------------
PROJECT_ID                   NUMBER              1 NO                  0
PROJECT_NAME                 VARCHAR2            2 NO                100
STATUS                       NUMBER              3 NO                  0
LAST_DAYS_TO_SHOW_IN_REPORTS NUMBER              4 NO                  0
ORA_ARCHIVE_STATE            VARCHAR2              YES              4000 0

Note: there is a major difference between the data dictionary views USER_TAB_COLUMNS and USER_TAB_COLS: while the former exposes only the explicitly defined columns, the latter includes also system-generated columns.

Let’s fill the table with some records, without specifying values to the hidden ORA_ARCHIVE_STATE column:

insert into projects values (1,'Project A',1,2);
insert into projects values (2,'Project B',2,3);
insert into projects values (3,'Project C',1,4);
insert into projects values (4,'Project D',2,3);
commit;

Hidden columns are quite shy. By default, they are not shown:

> select * from projects;

                                      LAST_DAYS_TO
PROJECT_ID PROJECT_NAME     STATUS SHOW_IN_REPORTS
---------- ------------ ---------- ---------------
         1 Project A             1               2
         2 Project B             2               3
         3 Project C             1               4
         4 Project D             2               3

But if we explicitly select ORA_ARCHIVE_STATE we can see its value, which is, at this point, the default – ‘0’:

> select p.*,ORA_ARCHIVE_STATE from projects p;

                                      LAST_DAYS_TO
PROJECT_ID PROJECT_NAME     STATUS SHOW_IN_REPORTS ORA_ARCHIVE_STATE
---------- ------------ ---------- --------------- -----------------
         1 Project A             1               2 0
         2 Project B             2               3 0
         3 Project C             1               4 0
         4 Project D             2               3 0

Now, let’s “delete” two records. Not a “real” deletion, only a logical one. We do it by updating ORA_ARCHIVE_STATE to ‘1’ (or any other non-zero value):

> update projects set ORA_ARCHIVE_STATE='1' where project_id in (1,3);

2 rows updated.

And now, if we select again from the table, with no (explicit) condition, we’ll see only the “remaining” two records. Remember that we didn’t actually DELETE any records, and still:

> select * from projects; 

                                      LAST_DAYS_TO
PROJECT_ID PROJECT_NAME     STATUS SHOW_IN_REPORTS
---------- ------------ ---------- ---------------
         2 Project B             2               3
         4 Project D             2               3 

How did it happen? Let’s check the execution plan:

> select * from table(dbms_xplan.display_cursor());

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------
SQL_ID  dcthaywgmzra7, child number 1
-------------------------------------
select * from projects

Plan hash value: 2188942312

------------------------------------------------------------------------------
| Id  | Operation         | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |          |       |       |     3 (100)|          |
|*  1 |  TABLE ACCESS FULL| PROJECTS |     4 |  8372 |     3   (0)| 00:00:01 |
------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("PROJECTS"."ORA_ARCHIVE_STATE"='0')

And here it is. We can see that Oracle added a hidden condition: ORA_ARCHIVE_STATE=’0′.

ROW ARCHIVAL VISIBILITY

We saw the default behavior. If we want to see all the records, including the archived ones, we should change the ROW ARCHIVAL VISIBILITY setting of the session to ALL:

alter session set ROW ARCHIVAL VISIBILITY = ALL;

If we repeat the query now, we’ll see all the records of the table. Oracle will not add the behind-the-scenes condition on ORA_ARCHIVE_STATE this time:

> select p.*,
         case ora_archive_state when '0' then 'Active' else 'Deleted' end is_active
  from projects p;

                                      LAST_DAYS_TO
PROJECT_ID PROJECT_NAME     STATUS SHOW_IN_REPORTS IS_ACTIVE
---------- ------------ ---------- --------------- ---------------------
         1 Project A             1               2 Deleted
         2 Project B             2               3 Active
         3 Project C             1               4 Deleted
         4 Project D             2               3 Active

> select * from table(dbms_xplan.display_cursor());

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------
SQL_ID  bwkf66qbwkaqt, child number 0
-------------------------------------
select p.*,        case ora_archive_state when '0' then 'Active' else
'Deleted' end is_active from projects p

Plan hash value: 2188942312

------------------------------------------------------------------------------
| Id  | Operation         | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |          |       |       |     3 (100)|          |
|   1 |  TABLE ACCESS FULL| PROJECTS |     4 |  8372 |     3   (0)| 00:00:01 |
------------------------------------------------------------------------------

To get back to the default behavior we should change the ROW ARCHIVAL VISIBILITY setting of the session to its default value – ACTIVE:

alter session set ROW ARCHIVAL VISIBILITY = ACTIVE;

> select * from projects; 


                                      LAST_DAYS_TO
PROJECT_ID PROJECT_NAME     STATUS SHOW_IN_REPORTS
---------- ------------ ---------- ---------------
         2 Project B             2               3
         4 Project D             2               3 

The Bad News

It seems that In-Database Archiving is not supported in PL/SQL at the moment (12.1.0.2).

> var rc refcursor
> begin
    open :rc for select p.*,ora_archive_state from projects p;
  end;
  /

PL/SQL procedure successfully completed.

> print rc

                                      LAST_DAYS_TO
PROJECT_ID PROJECT_NAME     STATUS SHOW_IN_REPORTS ORA_ARCHIVE_STATE
---------- ------------ ---------- --------------- -----------------
         1 Project A             1               2 1
         2 Project B             2               3 0
         3 Project C             1               4 1
         4 Project D             2               3 0

Update (20-Jun-2017): this bug has been fixed in the one-off patch 23080557, in patchset 12.1.0.2.170418 for Windows, and is marked as fixed in the future (as of now) version 12.2.0.218.1

There are two sides to every coin

We’ve just seen that In-Database Archiving adds hidden columns to tables and hidden predicates to SQL statements. This is cool, but also dangerous. When things are hidden, it is easy to forget them, but we shouldn’t.
For example, suppose that:

  • there are many records in the PROJECTS table
  • most of them are obsolete (i.e., their ORA_ARCHIVE_STATE != ‘0’)
  • ROW ARCHIVAL VISIBILITY = ACTIVE (the default setting)

Since Oracle adds the condition ORA_ARCHIVE_STATE = ‘0’ to every query, and this filters most of the records out, we may want to create an index on ORA_ARCHIVE_STATE, or add ORA_ARCHIVE_STATE as an additional column to other indexes.

Why VARCHAR2(4000)?

As we saw, the column ORA_ARCHIVE_STATE is created as VARCHAR2(4000), which seems a bit extreme for storing Boolean values – representing that the row is either Active or Deleted. Since the condition that Oracle adds to filter “deleted” rows out is ORA_ARCHIVE_STATE = ‘0’, we can use any string other than ‘0’ to represent deleted rows. One benefit it gives is that we can specify some information regarding the deletion of each record (like the “deletion reason”). Still, I would prefer a NUMBER(1) column rather than VARCHAR2(4000).
This can also limit our ability to add ORA_ARCHIVE_STATE to indexes, as the maximum key length of indexes is limited. For example, suppose that the PROJECT_NAME column is defined as VARCHAR2(4000), and that we want to create a composite index on PROJECT_NAME and ORA_ARCHIVE_STATE. Using a standard block size of 8KB, the index creation will fail:

> alter table projects modify (project_name varchar2(4000));

Table altered.

> create index projects_i1 on projects (project_name,ora_archive_state);
create index projects_i1 on projects (project_name,ora_archive_state)
                            *
ERROR at line 1:
ORA-01450: maximum key length (6398) exceeded

Conclusion

We saw in this part of the Write Less with More series that In-Database Archiving allows us to write less application code.

The next post will be about another new Oracle 12c feature – Temporal Validity.

Write Less with More – Part 2 (Identity Columns)

This post is part 2 of the Write Less with More series of blog posts, focusing on new features of Oracle 12c that allow us developers to write less than in previous versions in order to achieve the same functionality. Each part is dedicated for one new feature that solves some common development task. For more details, including the setup of the examples used throughout the series and the list of development tasks that drive the series, please see part 1.

All the Series Parts

Published

Part 1 – write less configuration with SQL*Loader Express Mode
Part 2 – write less application code with Identity Columns
Part 3 – write less application code with In-Database Archiving
Part 4 – write less application code with Temporal Validity
Part 5 – write less code in SQL with Row Limiting
Part 6 – write less “inappropriately located” code with SELECT FROM Package-Level Collection Types
Part 7 – write less code in SQL with Lateral Inline Views
Part 8 – write less “inappropriately located” code with PL/SQL in the WITH Clause

Task #2

todo2

The PROJECT_ASSIGNMENTS table has a surrogate* primary key – each record should have a unique number in the ASSIGNMENT_ID column. This number has no business meaning; the only requirement for such a key is that the number will be unique – that there will be no two records in the table with the same number.

* In a recent post about defining primary keys using Data Modeler, HeliFromFinland is nicely walking through the decision making process of choosing between natural and surrogate keys


 

A Pre-12c Solution

This is a very common scenario, and almost always involves a sequence in Oracle. This makes sense, as a sequence is an object that generates unique numbers.

Before version 12c of Oracle there is no built-in relationship between tables and sequences, although logically there is almost always a one-to-one relationship between a table and a sequence.

Note: with the lack of a built-in relationship between tables and sequences, it is a good practice to use naming convention to represent this relationship; for example, naming the sequence that is responsible for assigning numbers to the ABC table ABC_SEQ (this specific convention implies yet another rule – that the length of table names should not exceed 26 characters; otherwise, the related sequence name will exceed the maximum length allowed for Oracle objects – 30 characters). Obviously, if two or more columns in one table are populated from different sequences, a more complex naming convention will be used.

So, the first part of the solution for our task is creating a sequence:

CREATE SEQUENCE project_assignments_seq;

The second part is to use the sequence whenever a new record of PROJECT_ASSIGNMENTS is inserted. If we know for sure about all the places that insert records to the table (preferably a single place) we can use the sequence explicitly in those places. However, if we cannot guarantee this, we should make sure the sequence will be used (as stated specifically in the task requirement) implicitly. And the way to achieve this in a pre-12c solution is using a BEFORE INSERT trigger, like this:

CREATE TRIGGER project_assignments_bir_tr
  BEFORE INSERT ON project_assignments
  FOR EACH ROW
BEGIN
  :new.assignment_id := project_assignments_seq.nextval; -- assuming 11g; in earlier versions use SELECT...
END;
/

Note that with this solution the sequence is used for each and every new record. Even if a value is specified explicitly in the INSERT (or MERGE) statement, this value will be ignored and overridden by the trigger, using the next value generated by the sequence.

If the requirement would be weaker – allowing specifying values explicitly and using the sequence only when values are not specified explicitly – then the solution would be:

CREATE TRIGGER project_assignments_bir_tr
  BEFORE INSERT ON project_assignments
  FOR EACH ROW
  WHEN (new.assignment_id IS NULL)
BEGIN
  :new.assignment_id := project_assignments_seq.nextval;
END;
/

Note that this option may cause problems. If you use it, make sure to allocate different (non-overlapping) ranges of numbers to the sequence and to the explicitly specified values. Otherwise, conflicts may happen.

A 12c Solution

Identity Columns

In Oracle 12c a table column can be created as “identity”. As a result, the column implicitly becomes mandatory, and a sequence is automatically created and associated with the table. Then (depending on how exactly the identity is defined) the sequence is automatically used to produce values for the identity column when new rows are inserted.

If you want to learn about many more new features of Oracle 12c, you may be interested in one of my in-house training sessions, particularly the Oracle 12c New Features for Developers full-day seminar.

The GENERATED ALWAYS Option

An identity column can behave in one of three different ways. In the first one – when the column is defined with the GENERATED ALWAYS AS IDENTITY option (or simply GENERATED AS IDENTITY) – the sequence is always used, and specifying explicit values is forbidden. Let’s use this option to implement our task:

drop table project_assignments purge;
create table project_assignments (
       assignment_id integer GENERATED AS IDENTITY constraint project_assignments_pk primary key,
       person_id integer not null constraint assignments_fk_people references people,
       project_id integer not null constraint assignments_fk_projects references projects
);

We’ll insert now two rows into the PROJECT_ASSIGNMENTS table, without specifying the ASSIGNMENT_ID column, and we’ll see that the rows get a unique ASSIGNMENT_ID value:

> insert into project_assignments (person_id,project_id) values (101,1);
1 row created.
> insert into project_assignments (person_id,project_id) values (102,2);
1 row created.
> select * from project_assignments ;

ASSIGNMENT_ID PERSON_ID  PROJECT_ID
------------- ---------- ----------
            1        101          1
            2        102          2

If we try to specify an ASSIGNMENT_ID value explicitly, we’ll get the following exception:

> insert into project_assignments (assignment_id,person_id,project_id) values (3,103,3);

ERROR at line 1:
ORA-32795: cannot insert into a generated always identity column

The GENERATED BY DEFAULT Option

If we do want the ability to specify the ASSIGNMENT_ID explicitly, and let Oracle assign a value automatically only when we do not specify it, we can use instead the GENERATED BY DEFAULT AS IDENTITY option. Let’s change the ASSIGNMENT_ID column to behave in this way:

> alter table project_assignments modify (
>   assignment_id generated BY DEFAULT as identity
> );
Table altered.

Retrying the last INSERT statement will succeed now:

> insert into project_assignments (assignment_id,person_id,project_id) values (3,103,3);
1 row created.

As I noted before, in the “A Pre-12c Solution” section, mixing manual and automatic assignments of values to the same column may cause conflicts, and unless there is a good reason to do it, I would refrain from it. As mentioned above, the implementation of identity columns uses sequences. In our example, this behind-the-scenes sequence has generated already the numbers 1 and 2, so the next time it will be used it will generate the number 3. However, in the third row that we inserted ASSIGNMENT_ID has the value 3, because we specified it explicitly. ASSIGNMENT_ID is the primary key of the table, so it cannot have the same value in two records. I feel an imminent conflict…

And indeed, trying to insert a record without specifying ASSIGNMENT_ID will fail on a unique constraint violation:

> insert into project_assignments (person_id,project_id) values (104,4);

ERROR at line 1:
ORA-00001: unique constraint (DEMO5.PROJECT_ASSIGNMENTS_PK) violated

Another attempt will succeed, as the sequence has already been incremented:

> insert into project_assignments (person_id,project_id) values (104,4);
1 row created.

 

The GENERATED BY DEFAULT ON NULL Option

Now, what will happen if we try to set ASSIGNMENT_ID to NULL during INSERT? An identity column is always mandatory – the column is defined as NOT NULL as part of the identity definition (just like it happens when we define a column as PRIMARY KEY) – and since we defined ASSIGNMENT_ID as GENERATED BY DEFAULT AS IDENTITY, the following statement will simply try to insert NULL to ASSIGNMENT_ID, and will fail:

> insert into project_assignments (assignment_id,person_id,project_id) values (null,201,1);

ERROR at line 1:
ORA-01400: cannot insert NULL into ("DEMO5"."PROJECT_ASSIGNMENTS"."ASSIGNMENT_ID")

We can use the third form of the identity clause – GENERATED BY DEFAULT ON NULL AS IDENTITY – which means that Oracle will generate values from the sequence whenever the identity column is not specified explicitly or it is specified explicitly with NULL. Let’s change the ASSIGNMENT_ID to behave in this way:

> alter table project_assignments modify (
>   assignment_id generated BY DEFAULT ON NULL as identity
> );

Table altered.

Retrying the last INSERT statement will succeed now:

> insert into project_assignments (assignment_id,person_id,project_id) values (null,201,1);
1 row created.

> select * from project_assignments;

ASSIGNMENT_ID PERSON_ID  PROJECT_ID
------------- ---------- ----------
            1        101          1
            2        102          2
            3        103          3
            4        104          4
            5        201          1

5 rows selected.

Note: DEFAULT ON NULL is actually an independent feature (new in 12c) – it can be used in the definition of any column, not only identity columns.

Note: It is also possible (as of 12c) to use a user-defined sequence as the default value of a numeric non-identity column. Identity columns, however, give a more complete solution.

Data Dictionary

Let’s see now what has changed in the data dictionary in order to represent identity columns. I’ll refer to the USER_* views, but the same changes are relevant to the corresponding ALL_* and DBA_* views as well.

USER_TABLES has a new column – HAS_IDENTITY – that contains YES if the table has an identity column and NO if not.

select table_name,HAS_IDENTITY from user_tables;

TABLE_NAME                     HAS_IDENTITY
------------------------------ ------------
PEOPLE                         NO
PROJECTS                       NO
PROJECT_ASSIGNMENTS            YES

The fact that USER_TABLES has such a Boolean attribute may hint that a table cannot have more than one identity column, and this is indeed one of the restrictions of this feature (which seems a very sensible restriction, in my opinion).

USER_TAB_COLUMNS has two new relevant columns: IDENTITY_COLUMN and DEFAULT_ON_NULL:

select column_name,data_type,nullable,
       column_id,IDENTITY_COLUMN,DEFAULT_ON_NULL 
from user_tab_columns
where table_name = 'PROJECT_ASSIGNMENTS'
order by column_id;

COLUMN_NAME   DATA_TYPE  NUL  COLUMN_ID IDENTITY_ DEFAULT_O
------------- ---------- --- ---------- --------- ---------
ASSIGNMENT_ID NUMBER     N            1 YES       YES
PERSON_ID     NUMBER     N            2 NO        NO
PROJECT_ID    NUMBER     N            3 NO        NO

A new view – USER_TAB_IDENTITY_COLS – contains the details of all the identity columns. In 12.1.0.1 this view contained TABLE_NAME, COLUMN_NAME, GENERATION_TYPE (which can be either ALWAYS or BY DEFAULT) and IDENTITY_OPTIONS (the configuration of the associated sequence). In 12.1.0.2 another column was added – SEQUENCE_NAME – which makes the relationship between the table column and the associated sequence quite formal.

select * From USER_TAB_IDENTITY_COLS;

                                  GENERATION
TABLE_NAME          COLUMN_NAME   TYPE       SEQUENCE_NAME IDENTITY_OPTIONS
------------------- ------------- ---------- ------------- -------------------------
PROJECT_ASSIGNMENTS ASSIGNMENT_ID BY DEFAULT ISEQ$$_111567 START WITH: 1, INCREMENT
                                                           BY: 1, MAX_VALUE: 9999999
                                                           999999999999999999999, MI
                                                           N_VALUE: 1, CYCLE_FLAG: N
                                                           , CACHE_SIZE: 20, ORDER_F
                                                           LAG: N

Let’s take a look at the objects we have in our schema at this point:

select object_type,object_name,object_id from user_objects;

OBJECT_TYPE  OBJECT_NAME                     OBJECT_ID
------------ ------------------------------ ----------
SEQUENCE     ISEQ$$_111567                      111568
TABLE        PEOPLE                             111530
INDEX        PEOPLE_PK                          111531
TABLE        PROJECTS                           111532
INDEX        PROJECTS_PK                        111533
TABLE        PROJECT_ASSIGNMENTS                111567
INDEX        PROJECT_ASSIGNMENTS_PK             111569

We can see the sequence that was created automatically by Oracle for supporting the ASSIGNMENT_ID identity column of the PROJECT_ASSIGNMENTS table. Note that the sequence_name contains the object_id of the associated table.

What happens when we drop the table?

drop table project_assignments;

select object_type,object_name,object_id from user_objects;

OBJECT_TYPE  OBJECT_NAME                     OBJECT_ID
------------ ------------------------------ ----------
INDEX        PROJECTS_PK                        111533
TABLE        PROJECTS                           111532
INDEX        PEOPLE_PK                          111531
TABLE        PEOPLE                             111530
SEQUENCE     ISEQ$$_111567                      111568

We can see that the table was dropped but the sequence was not. This may seem disappointing at first, but actually this is quite clever. The table was dropped but not purged, so it is still in the recycle bin (I’m using the default setting of the RECYCLEBIN parameter, which is “on”):

select object_name,original_name,operation,type from user_recyclebin;

OBJECT_NAME                    ORIGINAL_NAME          OPERATION TYPE
------------------------------ ---------------------- --------- ----------
BIN$lNxGd9cXTAuYpOa9kda26w==$0 PROJECT_ASSIGNMENTS    DROP      TABLE
BIN$qvScIgi+Rrel7veFMBxXsQ==$0 PROJECT_ASSIGNMENTS_PK DROP      INDEX

If we revive the table using FLASHBACK TABLE

flashback table project_assignments to before drop;

we can resume working with the table, and particularly with the identity column, because the sequence was not dropped:

insert into project_assignments (person_id,project_id) values (201,1);

select * from project_assignments;

ASSIGNMENT_ID  PERSON_ID PROJECT_ID
------------- ---------- ----------
            1        101          1
            2        102          2
            3        103          3
            4        104          4
            5        201          1
            6        201          1

If we drop the table and purge it (or purge the table from the recyclebin), then the associated sequence is dropped as expected:

drop table project_assignments purge;

select object_type,object_name,object_id from user_objects;

OBJECT_TYPE  OBJECT_NAME                     OBJECT_ID
------------ ------------------------------ ----------
TABLE        PEOPLE                             111530
INDEX        PEOPLE_PK                          111531
TABLE        PROJECTS                           111532
INDEX        PROJECTS_PK                        111533

select object_name,original_name,operation,type from user_recyclebin;

no rows selected

Sequence Configuration

When we create a sequence explicitly we can configure its attributes – from which number to start, the interval between numbers, the cache size, and more. When we create an identity column, we can similarly configure the implicitly-created sequence. For example, let’s recreate the PROJECT_ASSIGNMENTS table, but this time with a sequence that starts with 1000 and increments by 10:

create table project_assignments (
       assignment_id integer GENERATED AS IDENTITY 
                             (start with 1000 increment by 10)
                             constraint project_assignments_pk primary key,
       person_id integer not null  constraint assignments_fk_people references people,
       project_id integer not null constraint assignments_fk_projects references projects
);

insert into project_assignments (person_id,project_id) values (101,1);
insert into project_assignments (person_id,project_id) values (102,2);
insert into project_assignments (person_id,project_id) values (103,3);

select * from project_assignments;

ASSIGNMENT_ID  PERSON_ID PROJECT_ID
------------- ---------- ----------
         1000        101          1
         1010        102          2
         1020        103          3

DROP IDENTITY

A non-identity column cannot become an identity column. The opposite, however, is possible: an identity column can become a non-identity column, and its associated sequence is automatically dropped:

alter table project_assignments modify assignment_id DROP IDENTITY;
select * from user_tab_identity_cols;

no rows selected

select object_type,object_name,object_id from user_objects;

OBJECT_TYPE  OBJECT_NAME                     OBJECT_ID
------------ ------------------------------ ----------
TABLE        PEOPLE                             111530
INDEX        PEOPLE_PK                          111531
TABLE        PROJECTS                           111532
INDEX        PROJECTS_PK                        111533
TABLE        PROJECT_ASSIGNMENTS                111570
INDEX        PROJECT_ASSIGNMENTS_PK             111572

START WITH LIMIT VALUE

Let’s see one last option that is possible only because there is a real association between the table column and the sequence. I’ll drop the PROJECT_ASSIGNMENTS table and recreate it once more, with the GENERATED BY DEFAULT AS IDENTITY clause, and insert some records with explicit values in ASSIGNMENT_ID:

drop table project_assignments purge;
create table project_assignments (
       assignment_id integer GENERATED BY DEFAULT AS IDENTITY constraint project_assignments_pk primary key,
       person_id integer not null constraint assignments_fk_people references people,
       project_id integer not null constraint assignments_fk_projects references projects
);
insert into project_assignments (assignment_id,person_id,project_id) values (18,101,1);
insert into project_assignments (assignment_id,person_id,project_id) values (22,102,2);
insert into project_assignments (assignment_id,person_id,project_id) values (34,103,3);

select * from project_assignments;

ASSIGNMENT_ID  PERSON_ID PROJECT_ID
------------- ---------- ----------
           18        101          1
           22        102          2
           34        103          3

Suppose that now I want to convert the identity column to be GENERATED ALWAYS AS IDENTITY. Since the table already contains records, I need to prevent collisions between the existing values and the future to-be-automatically-generated values, and therefore I’d like to start the sequence from a number that is higher than all the existing ASSIGNMENT_ID values (in our case, 35).

We can make Oracle doing it automatically, by using the START WITH LIMIT VALUE clause:

alter table project_assignments modify assignment_id 
generated always as identity (START WITH LIMIT VALUE);

insert into project_assignments (person_id,project_id) values (104,4);
select * from project_assignments;

ASSIGNMENT_ID  PERSON_ID PROJECT_ID
------------- ---------- ----------
           18        101          1
           22        102          2
           34        103          3
           35        104          4

Conclusion

We saw in this part of the Write Less with More series that Identity Columns allow us to write less application code.

The next post will be about another new Oracle 12c feature – In-Database Archiving.

Write Less with More – Part 1 (SQL*Loader Express Mode)

Introduction

For some time now I have been presenting a lecture that I call in short “Write Less with More”, or in its full name “Write Less (code) with More (Oracle 12c New Features)”. The idea is to focus on some of the evolutionary – rather than revolutionary – features that were added in version 12c of Oracle. Features that  allow us developers to write less than in previous versions in order to achieve the same functionality.

Instead of starting by presenting the features and then giving some arbitrary examples, I prefer to start with a list of common development tasks, and to suggest for each task a pre-12c solution and a solution that uses a new 12c feature making it possible to write less. I like showing problems and how we can solve them, rather than showing features that look for problems to solve.

I’m starting today a series of blog posts that is based on the “Write Less with More” presentation, and follows its structure. In each post I’ll discuss one new Oracle 12c feature that solves some development task. Each such feature falls under one of the following categories:

  • Write Less Configuration
  • Write Less Application Code
  • Write Less Code in SQL Statements
  • Write Less “Inappropriately-Located” Code

All the Series Parts

Published

Part 1 – write less configuration with SQL*Loader Express Mode
Part 2 – write less application code with Identity Columns
Part 3 – write less application code with In-Database Archiving
Part 4 – write less application code with Temporal Validity
Part 5 – write less code in SQL with Row Limiting
Part 6 – write less “inappropriately located” code with SELECT FROM Package-Level Collection Types
Part 7 – write less code in SQL with Lateral Inline Views
Part 8 – write less “inappropriately located” code with PL/SQL in the WITH Clause

Setup

All the examples in the series are based on this data model:
ERD
We have three tables: PEOPLE, PROJECTS and a join table between them – PROJECT_ASSIGNMENTS. At this point PROJECT_ASSIGNMENTS is very simple – it only contains information about which people are assigned to which projects, not when or for how long. It may also contain duplicate rows for the same person+project combination (with different assignment_id though), which seems quite pointless. We will enhance this naïve design as we progress with the series.

Here is the script that creates the tables and fills a few projects:

create table people (
       person_id integer not null constraint people_pk primary key,
       first_name varchar2(20) not null,
       last_name varchar2(30) not null,
       general_info varchar2(100)
);

create table projects (
       project_id integer not null constraint projects_pk primary key,
       project_name varchar2(100) not null,
       status number(1) not null,
       last_days_to_show_in_reports integer not null
);

create table project_assignments (
       assignment_id integer not null constraint project_assignments_pk primary key,
       person_id integer not null constraint assignments_fk_people references people,
       project_id integer not null constraint assignments_fk_projects references projects
);

insert into projects values (1,'Project A',1,2);
insert into projects values (2,'Project B',2,3);
insert into projects values (3,'Project C',1,4);
insert into projects values (4,'Project D',2,3);

commit;

And this is our “To Do” list:
todo_full

So now we are ready to start with the first task.

Task #1todo1

The PEOPLE table is still empty. We have a CSV file – C:\Data\people.dat – that contains data about people and we want to load this data from the file into the table. The file contains one line for each person, the fields are separated by commas, and the order of the fields in the file is the same as the order of the columns in the table. Here is the content of the file:

101,John,Lennon,
102,Paul,McCartney,18/6/1942
103,Ringo,Starr,Drums
104,George,Harisson,
201,Louis,Armstrong,Jazz
202,Ella,Fitzgerald,15/6/1996
203,Etta,James,20/1/2012
317,Julie,Andrews,

You can see that each line contains details about one person, corresponding to the PEOPLE table’s structure: ID, first name, last name, and some (un-modeled, ugh!) general details.

A Pre-12c Solution

Oracle supports loading data from external files for decades now. There is the good old SQL*Loader utility, and since Oracle9i there is also the ability to use External Tables. Both are very powerful, but require quite a complex configuration even for very simple files. For SQL*Loader we need to write a control file with all the configuration. For External Tables we need something similar to the control file – this time as part of the CREATE TABLE statement – in addition to a DIRECTORY object and an INSERT statement that copies the data from the external table to the “real” table.

A 12c Solution

SQL*Loader Express Mode

In Oracle 12c SQL*Loader can be executed in a new mode – the Express Mode. In this mode we do not use a control file, and many defaults are used implicitly. We can override many of these defaults by specifying explicit parameters in the command-line level.

If you want to learn about many more new features of Oracle 12c, you may be interested in one of my in-house training sessions, particularly the Oracle 12c New Features for Developers full-day seminar.

Let’s Load

The only parameter that is mandatory in this mode is the TABLE parameter. In our case, all we need to do is execute SQL*Loader from C:\Data (where the text file is located) and specify TABLE=people.

First, let’s see that the table is really empty.

C:\Data>sqlplus demo5/demo5
> select * from people;

no rows selected

> exit

Now, let’s execute SQL*Loader:

C:\Data>sqlldr demo5/demo5 table=people

SQL*Loader: Release 12.1.0.2.0 - Production on Sat Aug 15 15:58:48 2015

Copyright (c) 1982, 2014, Oracle and/or its affiliates.  All rights reserved.

Express Mode Load, Table: PEOPLE
Path used:      External Table, DEGREE_OF_PARALLELISM=AUTO

Table PEOPLE:
  8 Rows successfully loaded.

Check the log files:
  people.log
  people_%p.log_xt
for more information about the load.

C:\Data>

And we can see now that the table contains the data from the file:

C:\Data>sqlplus demo5/demo5
> select * from people;

PERSON_ID  FIRST_NAME LAST_NAME       GENERAL_INFO
---------- ---------- --------------- --------------------
       101 John       Lennon
       102 Paul       McCartney       18/6/1942
       103 Ringo      Starr           Drums
       104 George     Harisson
       201 Louis      Armstrong       Jazz
       202 Ella       Fitzgerald      15/6/1996
       203 Etta       James           20/1/2012
       317 Julie      Andrews
 
8 rows selected.

> exit

C:\Data>

We specified the table name only. All the rest of the configuration was derived implicitly:

  • The default file name is <tableName>.dat (in our case, people.dat)
  • The default file location is the directory from which we execute sqlldr (in our case, C:\Data)
  • The default record delimiter is newline
  • The default field delimiter is comma
  • By default, the fields have no enclosures
  • Etc.

Let’s take a look at the output we got on the screen.

First of all, it says “Express Mode Load”. Ok, so it really worked in express mode.

Multiple Loading Mechanisms

A more interesting thing to note is “Path used:      External Table”. SQL*Loader actually loaded the data using its “descendant”, more advanced, feature – External Tables. By default, SQL*Loader Express Mode performs the load using External Tables if possible, and, if it cannot use External Tables, it falls back to a direct path load using its “legacy” way.

Beyond Loading

The last lines of the on-screen output  lead us to more output, which may be very valuable:

Check the log files:
  people.log
  people_%p.log_xt
for more information about the load.

SQL*Loader Express Mode not only loads the data, but also prepares for us some items we can re-use later. The log file – people.log – includes, in addition to the standard logging stuff:

A SQL*Loader control file

Generated control file for possible reuse:
OPTIONS(EXTERNAL_TABLE=EXECUTE, TRIM=LRTRIM)
LOAD DATA
INFILE 'people'
APPEND
INTO TABLE PEOPLE
FIELDS TERMINATED BY ","
(
  PERSON_ID,
  FIRST_NAME,
  LAST_NAME,
  GENERAL_INFO
)
End of generated control file for possible reuse.

A statement for creating the external table

CREATE TABLE "SYS_SQLLDR_X_EXT_PEOPLE" 
(
  "PERSON_ID" NUMBER(38),
  "FIRST_NAME" VARCHAR2(20),
  "LAST_NAME" VARCHAR2(30),
  "GENERAL_INFO" VARCHAR2(100)
)
ORGANIZATION external 
(
  TYPE oracle_loader
  DEFAULT DIRECTORY SYS_SQLLDR_XT_TMPDIR_00010
  ACCESS PARAMETERS 
  (
    RECORDS DELIMITED BY NEWLINE CHARACTERSET UTF8
    BADFILE 'SYS_SQLLDR_XT_TMPDIR_00010':'people_%p.bad'
    LOGFILE 'people_%p.log_xt'
    READSIZE 1048576
    FIELDS TERMINATED BY "," LRTRIM 
    REJECT ROWS WITH ALL NULL FIELDS 
    (
      "PERSON_ID" CHAR(255),
      "FIRST_NAME" CHAR(255),
      "LAST_NAME" CHAR(255),
      "GENERAL_INFO" CHAR(255)
   )
  )
  location 
  (
    'people.dat'
  )
)REJECT LIMIT UNLIMITED

An INSERT statement for filling the “real” table from the external table

INSERT /*+ append parallel(auto) */ INTO PEOPLE 
(
  PERSON_ID,
  FIRST_NAME,
  LAST_NAME,
  GENERAL_INFO
)
SELECT 
  "PERSON_ID",
  "FIRST_NAME",
  "LAST_NAME",
  "GENERAL_INFO"
FROM "SYS_SQLLDR_X_EXT_PEOPLE"

Even if we need to do some complex load which is not supported by the Express Mode, we can still use this generated code as a starting point instead of writing everything from scratch.

Command-Line Level Configuration

SQL*Loader Express Mode allows us to override many attributes of the default configuration using optional command-line parameters.

For example, let’s say we want to load another file, which has the following differences with respect to the original people.dat file:

  • The file name is my_people.txt
  • The field delimiter is semicolon
  • The fields may contain semicolons in their content, and are enclosed by double quotes in these cases

We can still use the Express Mode, like this:

C:\Data>sqlldr demo5/demo5 table=people data=my_people.txt terminated_by=';' optionally_enclosed_by='\"'

You can find the complete list of supported parameters in the SQL*Loader Express Mode Documentation.

Loading into Older Versions

One last note: SQL*Loader Express Mode is a new feature of Oracle 12c, but we can use it also for loading data into tables in pre-12c databases (using an Oracle 12c client).

Conclusion

We saw in this part of the Write Less with More series that SQL*Loader Express Mode allows us to write less configuration.

The next post will be about another new Oracle 12c feature – Identity Columns.

Indexes and Indexing in Oracle 12c

One of my presentations in ilOUG Tech Days 2015 was “Indexes and Indexing in Oracle 12c”. I talked about new features added in release 12c that are related to indexing – not necessarily new keywords in the CREATE INDEX statement, but also changes that may affect the way we index (or not).

Here is a summary of the features I talked about, or at least had planned to talk about and then ran out of time 🙁

Note that some of these features are available only in the Enterprise Edition or require a licensed option.

If you want to read about these features in detail, there are many good resources. The best, in my opinion, is Richard Foote’s blog – the “bible” of Oracle indexing  – http://richardfoote.wordpress.com/category/12c/

Multiple indexes on the same column list

In pre-12c versions it is impossible to define several indexes on the same column list.

In 12c it is possible, as long as:

    • Only one of the indexes is visible (the ability to make an index invisible was added in 11g)
    • And the indexes are different somehow (unique vs. non-unique, reverse vs. regular, partitioned vs. non-partitioned, etc.)

What is it good for?

It allows to change the type of an index with (almost) no downtime. Prior to 12c we have to drop the existing index and then create the new one, and during this period we have no available index. In 12c we can create the new index as invisible, and then, very quickly, make the old one invisible and the new one visible.

This feature also allows to validate (or disproof) a hypothesis that another type of an index is better than the existing one. We create a new invisible index of the “other” type, make the necessary comparisons, and based on the conclusion we either drop the new index and keep the old one, or drop the old index and make the new one visible.

If we need this functionality before 12c we can use an old trick and create the new index on almost the same column list – adding one “artificial” column – for example, if we have an index on T (COL1, COL2) we can create another index on T (COL1, COL2, NULL).

Online operations

In 12c more DDL operations can be done online (which roughly means less blocking issues in a multi-user environment). Focusing on indexes, these are the relevant changes:

  • The following statements are offline before 12c and online in 12c:
    • ALTER INDEX VISIBLE
    • ALTER INDEX INVISIBLE
  • The following statements are offline by default, but in 12c they can be online by adding the ONLINE keyword:
    • DROP INDEX [ONLINE]
    • ALTER INDEX UNUSABLE [ONLINE]
    • ALTER TABLE MOVE SUB/PARTITION [ONLINE]

Asynchronous global index maintenance

Suppose we have a partitioned table with global indexes.

Before 12c, dropping or truncating a partition of this table is either a fast operation (that makes the global indexes unusable) or an operation that keeps the global indexes usable (but takes longer to execute).

In 12c, it can be both. When specifying UPDATE GLOBAL INDEXES, the partition is dropped/truncated and the indexes remain usable although the index entries are not deleted.

However, there is a trade-off. When accessing the index entries, extra work is done to check whether the entries belong to an existing partition or to a deleted partition. Also, index blocks containing such “orphaned” entries are usually not reused.

The orphaned entries are cleaned out asynchronously (by a maintenance job, by ALTER INDEX REBUILD, by ALTER INDEX COALESCE CLEANUP, or by calling explicitly to DBMS_PART.CLEANUP_GIDX).

Partial indexes

In 12c it is possible to create an index (either LOCAL or GLOBAL) on a subset of partitions (as long as it is non-unique and not used for enforcing primary/unique keys).

Each table partition is defined as either INDEXING ON or INDEXING OFF (explicitly at the partition level or implicitly by inheriting the definition from the table level).

Then, if the index is created with the new INDEXING PARTIAL clause, then only records from “INDEXING ON” partitions are indexed.

This can be useful, for example, for indexing only recent partitions, where the older partitions are rarely accessed. Another example: loading data into a new partition that is defined as “INDEXING OFF” while older partitions are defined as “INDEXING ON” (for making the loading faster while allowing efficient access to the older partitions), and once the loading is done changing the new partition to “INDEXING ON”.

Advanced index compression

Prefix Index Compression is available since Oracle8i. It is implemented in the leaf block level, and may reduce the size of the index. However, this compression is not optimal, and may even increase the size of the index. Why? One reason is that we need to define the “prefix length” (i.e., how many of the leading index key columns are used for the compression) and while the prefix length may be optimal when it is defined, it may become sub-optimal over time. The second reason is that we define the prefix length at the index level, so the same value is used for all the index leafs; obviously one size does not necessarily fit all, and while this value may be optimal for some of the leafs it may be sub-optimal for others.

In 12c we can use advanced index compression, which uses the same mechanism as the prefix index compression but with some enhancements that address the aforementioned drawbacks: each block  may be compressed or not, the prefix length is determined for each block individually based on its content, and it’s all done automatically.

 

The presentation is available in the Presentations page.

 

ilOUG Tech Days

השבוע השתתפתי ב “ilOUG Tech Days 2015” – כנס בן יומיים של ארגון משתמשי אורקל בישראל. הכנס התקיים במלון בחיפה, ואירח למעלה ממאה משתתפים ונבחרת מרשימה של מרצים מחו”ל ומהארץ.
לדעתי הארוע היה מצויין, מאורגן היטב ומבוצע כהלכה על ידי הנהלת ilOUG, ובפרט עמי אהרונוביץ’ ולירון אמיצי (אתם יכולים לקרוא את הסיכום של לירון כאן).

אני העברתי שתי הרצאות והאזנתי להרבה הרצאות טובות של Bryn Llewellyn, Keith Laker, Heli Helskyaho, Jonathan Lewis, Tom Kyte, Joze Senegacnik ועמי אהרונוביץ’. הייתי רוצה לשמוע הרצאות נוספות, אבל לא יכולתי להיות בכמה מקומות באותו זמן…

ההרצאות שלי היו “Indexes and Indexing in Oracle 12c” ו “Deep Dive into Oracle 12c Pattern Matching”. בהרצאה השניה היה לי אורח מיוחד בקהל – Keith Laker מאורקל, מנהל המוצר של הפיצ’ר שהצגתי. זה היה מגניב שקית’ היה שם, וגם מועיל, כי יכולתי להיעזר בו ולהפנות אליו שאלות, כמו:

  • האם MATCH_RECOGNIZE מוגדר ב- ANSI SQL? (תשובה: עדיין לא)
  • האם נדרש רשיון מיוחד לשימוש בו? (תשובה: לא, הפיצ’ר נתמך גם בגרסת ה-Standard Edition ללא עלות נוספת)
  • האם יש הבדלים בפיצ’ר בין גרסאות 12.1.0.1 ו- 12.1.0.2? (תשובה: לא)

את שתי המצגות שלי הכנתי במיוחד לקראת הכנס, ואני מרגיש שהן עוד צריכות לעבור קצת “כוונון” לעתיד, במיוחד ההרצאה על האינדקסים, שכוללת יותר תוכן ממה שאפשר להעביר ב-45 דקות.
ניתן להוריד את המצגות שלי מעמוד המצגות.
אכתוב יותר על המצגות שלי מאוחר יותר.

עוד נקודות משמעותיות מבחינתי בכנס:

ilOUG Tech Days – My Experience

Last week I participated in “ilOUG Tech Days 2015” – a two day conference of the Israeli Oracle User Group. It took place in a hotel in Haifa, and hosted more than 100 attendees and an impressive league of international and local speakers.
It was a really great event in my opinion, well organized and executed by ilOUG management, especially Ami Aharonovich and Liron Amitzi (you can read Liron’s post about the conference here).
I gave two presentations and attended many good lectures and keynotes presented by Bryn Llewellyn, Keith Laker, Heli Helskyaho, Jonathan Lewis, Tom Kyte, Joze Senegacnik and Ami Aharonovich. I wish I could attend more, but I couldn’t be in multiple places at the same time…

My presentations were about “Indexes and Indexing in Oracle 12c” and “Deep Dive into Oracle 12c Pattern Matching”. In the latter I had a very special guest in the room – Keith Laker from Oracle, the product manager of the feature I was just presenting. It was very cool Keith was there, and also beneficial, as I could get his help answering some questions, like:

  • Is the MATCH_RECOGNIZE clause part of ANSI SQL? (answer: not yet)
  • Does it require a specific Oracle edition or option? (answer: no, it is part of all Oracle editions with no extra cost)
  • Are there any differences in the feature between 12.1.0.1 and 12.1.0.2? (answer: no)

Both my presentations were brand new for this conference, and I feel they still need some “tuning” for the future, especially the one about indexing in 12c, which contains more content than can be delivered in just 45 minutes.
My presentations are available to download from the Presentations page.
I will write more about them later.

Other highlights: