Write Less with More – Part 1 (SQL*Loader Express Mode)

Oren Nakdimon Leave a Reply

Introduction

For some time now I have been presenting a lecture that I call in short “Write Less with More”, or in its full name “Write Less (code) with More (Oracle 12c New Features)”. The idea is to focus on some of the evolutionary – rather than revolutionary – features that were added in version 12c of Oracle. Features that  allow us developers to write less than in previous versions in order to achieve the same functionality.

Instead of starting by presenting the features and then giving some arbitrary examples, I prefer to start with a list of common development tasks, and to suggest for each task a pre-12c solution and a solution that uses a new 12c feature making it possible to write less. I like showing problems and how we can solve them, rather than showing features that look for problems to solve.

I’m starting today a series of blog posts that is based on the “Write Less with More” presentation, and follows its structure. In each post I’ll discuss one new Oracle 12c feature that solves some development task. Each such feature falls under one of the following categories:

  • Write Less Configuration
  • Write Less Application Code
  • Write Less Code in SQL Statements
  • Write Less “Inappropriately-Located” Code

All the Series Parts

Published

Part 1 – write less configuration with SQL*Loader Express Mode
Part 2 – write less application code with Identity Columns
Part 3 – write less application code with In-Database Archiving
Part 4 – write less application code with Temporal Validity
Part 5 – write less code in SQL with Row Limiting
Part 6 – write less “inappropriately located” code with SELECT FROM Package-Level Collection Types
Part 7 – write less code in SQL with Lateral Inline Views
Part 8 – write less “inappropriately located” code with PL/SQL in the WITH Clause

Setup

All the examples in the series are based on this data model:
ERD
We have three tables: PEOPLE, PROJECTS and a join table between them – PROJECT_ASSIGNMENTS. At this point PROJECT_ASSIGNMENTS is very simple – it only contains information about which people are assigned to which projects, not when or for how long. It may also contain duplicate rows for the same person+project combination (with different assignment_id though), which seems quite pointless. We will enhance this naïve design as we progress with the series.

Here is the script that creates the tables and fills a few projects:

create table people (
       person_id integer not null constraint people_pk primary key,
       first_name varchar2(20) not null,
       last_name varchar2(30) not null,
       general_info varchar2(100)
);

create table projects (
       project_id integer not null constraint projects_pk primary key,
       project_name varchar2(100) not null,
       status number(1) not null,
       last_days_to_show_in_reports integer not null
);

create table project_assignments (
       assignment_id integer not null constraint project_assignments_pk primary key,
       person_id integer not null constraint assignments_fk_people references people,
       project_id integer not null constraint assignments_fk_projects references projects
);

insert into projects values (1,'Project A',1,2);
insert into projects values (2,'Project B',2,3);
insert into projects values (3,'Project C',1,4);
insert into projects values (4,'Project D',2,3);

commit;

And this is our “To Do” list:
todo_full

So now we are ready to start with the first task.

Task #1todo1

The PEOPLE table is still empty. We have a CSV file – C:\Data\people.dat – that contains data about people and we want to load this data from the file into the table. The file contains one line for each person, the fields are separated by commas, and the order of the fields in the file is the same as the order of the columns in the table. Here is the content of the file:

101,John,Lennon,
102,Paul,McCartney,18/6/1942
103,Ringo,Starr,Drums
104,George,Harisson,
201,Louis,Armstrong,Jazz
202,Ella,Fitzgerald,15/6/1996
203,Etta,James,20/1/2012
317,Julie,Andrews,

You can see that each line contains details about one person, corresponding to the PEOPLE table’s structure: ID, first name, last name, and some (un-modeled, ugh!) general details.

A Pre-12c Solution

Oracle supports loading data from external files for decades now. There is the good old SQL*Loader utility, and since Oracle9i there is also the ability to use External Tables. Both are very powerful, but require quite a complex configuration even for very simple files. For SQL*Loader we need to write a control file with all the configuration. For External Tables we need something similar to the control file – this time as part of the CREATE TABLE statement – in addition to a DIRECTORY object and an INSERT statement that copies the data from the external table to the “real” table.

A 12c Solution

SQL*Loader Express Mode

In Oracle 12c SQL*Loader can be executed in a new mode – the Express Mode. In this mode we do not use a control file, and many defaults are used implicitly. We can override many of these defaults by specifying explicit parameters in the command-line level.

If you want to learn about many more new features of Oracle 12c, you may be interested in one of my in-house training sessions, particularly the Oracle 12c New Features for Developers full-day seminar.

Let’s Load

The only parameter that is mandatory in this mode is the TABLE parameter. In our case, all we need to do is execute SQL*Loader from C:\Data (where the text file is located) and specify TABLE=people.

First, let’s see that the table is really empty.

C:\Data>sqlplus demo5/demo5
> select * from people;

no rows selected

> exit

Now, let’s execute SQL*Loader:

C:\Data>sqlldr demo5/demo5 table=people

SQL*Loader: Release 12.1.0.2.0 - Production on Sat Aug 15 15:58:48 2015

Copyright (c) 1982, 2014, Oracle and/or its affiliates.  All rights reserved.

Express Mode Load, Table: PEOPLE
Path used:      External Table, DEGREE_OF_PARALLELISM=AUTO

Table PEOPLE:
  8 Rows successfully loaded.

Check the log files:
  people.log
  people_%p.log_xt
for more information about the load.

C:\Data>

And we can see now that the table contains the data from the file:

C:\Data>sqlplus demo5/demo5
> select * from people;

PERSON_ID  FIRST_NAME LAST_NAME       GENERAL_INFO
---------- ---------- --------------- --------------------
       101 John       Lennon
       102 Paul       McCartney       18/6/1942
       103 Ringo      Starr           Drums
       104 George     Harisson
       201 Louis      Armstrong       Jazz
       202 Ella       Fitzgerald      15/6/1996
       203 Etta       James           20/1/2012
       317 Julie      Andrews
 
8 rows selected.

> exit

C:\Data>

We specified the table name only. All the rest of the configuration was derived implicitly:

  • The default file name is <tableName>.dat (in our case, people.dat)
  • The default file location is the directory from which we execute sqlldr (in our case, C:\Data)
  • The default record delimiter is newline
  • The default field delimiter is comma
  • By default, the fields have no enclosures
  • Etc.

Let’s take a look at the output we got on the screen.

First of all, it says “Express Mode Load”. Ok, so it really worked in express mode.

Multiple Loading Mechanisms

A more interesting thing to note is “Path used:      External Table”. SQL*Loader actually loaded the data using its “descendant”, more advanced, feature – External Tables. By default, SQL*Loader Express Mode performs the load using External Tables if possible, and, if it cannot use External Tables, it falls back to a direct path load using its “legacy” way.

Beyond Loading

The last lines of the on-screen output  lead us to more output, which may be very valuable:

Check the log files:
  people.log
  people_%p.log_xt
for more information about the load.

SQL*Loader Express Mode not only loads the data, but also prepares for us some items we can re-use later. The log file – people.log – includes, in addition to the standard logging stuff:

A SQL*Loader control file

Generated control file for possible reuse:
OPTIONS(EXTERNAL_TABLE=EXECUTE, TRIM=LRTRIM)
LOAD DATA
INFILE 'people'
APPEND
INTO TABLE PEOPLE
FIELDS TERMINATED BY ","
(
  PERSON_ID,
  FIRST_NAME,
  LAST_NAME,
  GENERAL_INFO
)
End of generated control file for possible reuse.

A statement for creating the external table

CREATE TABLE "SYS_SQLLDR_X_EXT_PEOPLE" 
(
  "PERSON_ID" NUMBER(38),
  "FIRST_NAME" VARCHAR2(20),
  "LAST_NAME" VARCHAR2(30),
  "GENERAL_INFO" VARCHAR2(100)
)
ORGANIZATION external 
(
  TYPE oracle_loader
  DEFAULT DIRECTORY SYS_SQLLDR_XT_TMPDIR_00010
  ACCESS PARAMETERS 
  (
    RECORDS DELIMITED BY NEWLINE CHARACTERSET UTF8
    BADFILE 'SYS_SQLLDR_XT_TMPDIR_00010':'people_%p.bad'
    LOGFILE 'people_%p.log_xt'
    READSIZE 1048576
    FIELDS TERMINATED BY "," LRTRIM 
    REJECT ROWS WITH ALL NULL FIELDS 
    (
      "PERSON_ID" CHAR(255),
      "FIRST_NAME" CHAR(255),
      "LAST_NAME" CHAR(255),
      "GENERAL_INFO" CHAR(255)
   )
  )
  location 
  (
    'people.dat'
  )
)REJECT LIMIT UNLIMITED

An INSERT statement for filling the “real” table from the external table

INSERT /*+ append parallel(auto) */ INTO PEOPLE 
(
  PERSON_ID,
  FIRST_NAME,
  LAST_NAME,
  GENERAL_INFO
)
SELECT 
  "PERSON_ID",
  "FIRST_NAME",
  "LAST_NAME",
  "GENERAL_INFO"
FROM "SYS_SQLLDR_X_EXT_PEOPLE"

Even if we need to do some complex load which is not supported by the Express Mode, we can still use this generated code as a starting point instead of writing everything from scratch.

Command-Line Level Configuration

SQL*Loader Express Mode allows us to override many attributes of the default configuration using optional command-line parameters.

For example, let’s say we want to load another file, which has the following differences with respect to the original people.dat file:

  • The file name is my_people.txt
  • The field delimiter is semicolon
  • The fields may contain semicolons in their content, and are enclosed by double quotes in these cases

We can still use the Express Mode, like this:

C:\Data>sqlldr demo5/demo5 table=people data=my_people.txt terminated_by=';' optionally_enclosed_by='\"'

You can find the complete list of supported parameters in the SQL*Loader Express Mode Documentation.

Loading into Older Versions

One last note: SQL*Loader Express Mode is a new feature of Oracle 12c, but we can use it also for loading data into tables in pre-12c databases (using an Oracle 12c client).

Conclusion

We saw in this part of the Write Less with More series that SQL*Loader Express Mode allows us to write less configuration.

The next post will be about another new Oracle 12c feature – Identity Columns.

Indexes and Indexing in Oracle 12c

One of my presentations in ilOUG Tech Days 2015 was “Indexes and Indexing in Oracle 12c”. I talked about new features added in release 12c that are related to indexing – not necessarily new keywords in the CREATE INDEX statement, but also changes that may affect the way we index (or not).

Here is a summary of the features I talked about, or at least had planned to talk about and then ran out of time 🙁

Note that some of these features are available only in the Enterprise Edition or require a licensed option.

If you want to read about these features in detail, there are many good resources. The best, in my opinion, is Richard Foote’s blog – the “bible” of Oracle indexing  – http://richardfoote.wordpress.com/category/12c/

Multiple indexes on the same column list

In pre-12c versions it is impossible to define several indexes on the same column list.

In 12c it is possible, as long as:

    • Only one of the indexes is visible (the ability to make an index invisible was added in 11g)
    • And the indexes are different somehow (unique vs. non-unique, reverse vs. regular, partitioned vs. non-partitioned, etc.)

What is it good for?

It allows to change the type of an index with (almost) no downtime. Prior to 12c we have to drop the existing index and then create the new one, and during this period we have no available index. In 12c we can create the new index as invisible, and then, very quickly, make the old one invisible and the new one visible.

This feature also allows to validate (or disproof) a hypothesis that another type of an index is better than the existing one. We create a new invisible index of the “other” type, make the necessary comparisons, and based on the conclusion we either drop the new index and keep the old one, or drop the old index and make the new one visible.

If we need this functionality before 12c we can use an old trick and create the new index on almost the same column list – adding one “artificial” column – for example, if we have an index on T (COL1, COL2) we can create another index on T (COL1, COL2, NULL).

Online operations

In 12c more DDL operations can be done online (which roughly means less blocking issues in a multi-user environment). Focusing on indexes, these are the relevant changes:

  • The following statements are offline before 12c and online in 12c:
    • ALTER INDEX VISIBLE
    • ALTER INDEX INVISIBLE
  • The following statements are offline by default, but in 12c they can be online by adding the ONLINE keyword:
    • DROP INDEX [ONLINE]
    • ALTER INDEX UNUSABLE [ONLINE]
    • ALTER TABLE MOVE SUB/PARTITION [ONLINE]

Asynchronous global index maintenance

Suppose we have a partitioned table with global indexes.

Before 12c, dropping or truncating a partition of this table is either a fast operation (that makes the global indexes unusable) or an operation that keeps the global indexes usable (but takes longer to execute).

In 12c, it can be both. When specifying UPDATE GLOBAL INDEXES, the partition is dropped/truncated and the indexes remain usable although the index entries are not deleted.

However, there is a trade-off. When accessing the index entries, extra work is done to check whether the entries belong to an existing partition or to a deleted partition. Also, index blocks containing such “orphaned” entries are usually not reused.

The orphaned entries are cleaned out asynchronously (by a maintenance job, by ALTER INDEX REBUILD, by ALTER INDEX COALESCE CLEANUP, or by calling explicitly to DBMS_PART.CLEANUP_GIDX).

Partial indexes

In 12c it is possible to create an index (either LOCAL or GLOBAL) on a subset of partitions (as long as it is non-unique and not used for enforcing primary/unique keys).

Each table partition is defined as either INDEXING ON or INDEXING OFF (explicitly at the partition level or implicitly by inheriting the definition from the table level).

Then, if the index is created with the new INDEXING PARTIAL clause, then only records from “INDEXING ON” partitions are indexed.

This can be useful, for example, for indexing only recent partitions, where the older partitions are rarely accessed. Another example: loading data into a new partition that is defined as “INDEXING OFF” while older partitions are defined as “INDEXING ON” (for making the loading faster while allowing efficient access to the older partitions), and once the loading is done changing the new partition to “INDEXING ON”.

Advanced index compression

Prefix Index Compression is available since Oracle8i. It is implemented in the leaf block level, and may reduce the size of the index. However, this compression is not optimal, and may even increase the size of the index. Why? One reason is that we need to define the “prefix length” (i.e., how many of the leading index key columns are used for the compression) and while the prefix length may be optimal when it is defined, it may become sub-optimal over time. The second reason is that we define the prefix length at the index level, so the same value is used for all the index leafs; obviously one size does not necessarily fit all, and while this value may be optimal for some of the leafs it may be sub-optimal for others.

In 12c we can use advanced index compression, which uses the same mechanism as the prefix index compression but with some enhancements that address the aforementioned drawbacks: each block  may be compressed or not, the prefix length is determined for each block individually based on its content, and it’s all done automatically.

 

The presentation is available in the Presentations page.

 

ilOUG Tech Days – My Experience

Oren Nakdimon 1 Reply

Last week I participated in “ilOUG Tech Days 2015” – a two day conference of the Israeli Oracle User Group. It took place in a hotel in Haifa, and hosted more than 100 attendees and an impressive league of international and local speakers.
It was a really great event in my opinion, well organized and executed by ilOUG management, especially Ami Aharonovich and Liron Amitzi (you can read Liron’s post about the conference here).
I gave two presentations and attended many good lectures and keynotes presented by Bryn Llewellyn, Keith Laker, Heli Helskyaho, Jonathan Lewis, Tom Kyte, Joze Senegacnik and Ami Aharonovich. I wish I could attend more, but I couldn’t be in multiple places at the same time…

My presentations were about “Indexes and Indexing in Oracle 12c” and “Deep Dive into Oracle 12c Pattern Matching”. In the latter I had a very special guest in the room – Keith Laker from Oracle, the product manager of the feature I was just presenting. It was very cool Keith was there, and also beneficial, as I could get his help answering some questions, like:

  • Is the MATCH_RECOGNIZE clause part of ANSI SQL? (answer: not yet)
  • Does it require a specific Oracle edition or option? (answer: no, it is part of all Oracle editions with no extra cost)
  • Are there any differences in the feature between 12.1.0.1 and 12.1.0.2? (answer: no)

Both my presentations were brand new for this conference, and I feel they still need some “tuning” for the future, especially the one about indexing in 12c, which contains more content than can be delivered in just 45 minutes.
My presentations are available to download from the Presentations page.
I will write more about them later.

Other highlights:

SET and CARDINALITY

Oren Nakdimon Leave a Reply

In the last issue of Oracle Magazine, Tom Kyte shows cool solutions to some questions he got on asktom.oracle.com, using pure SQL. One of the nice things in SQL is that one problem may have many different solutions (I wrote about it once here).
One of the questions there is about counting distinct values across several columns within each row. You can see the question, Tom’s answer and several alternative answers suggested by other people here. I added a suggested alternative of my own, using two SQL functions that are not very known, so I thought it’s worth mentioning them here.

These two functions are SET and CARDINALITY, and both get a nested table as their input parameter.

SET returns a nested table of the same type as the input parameter, after removing duplicate elements from it.
CARDINALITY returns the number of elements in a nested table.

Both functions exist since Oracle 10g Release 1.

Let’s create a nested table type:

create type string_ntt as table of varchar2(4000)
/

and construct an object of this type, using the default constructor, containing 6 elements, out of which 4 are unique:

select string_ntt('A','B','C','B','D','A') as my_list from dual;

MY_LIST
------------------------------------------
STRING_NTT('A', 'B', 'C', 'B', 'D', 'A')

If we apply the CARDINALITY function on this object, we will get the number of elements it contains:

select CARDINALITY(string_ntt('A','B','C','B','D','A')) as num_of_elements from dual;

NUM_OF_ELEMENTS
---------------
              6

If we apply the SET function on this object, we will get a new object of the same type, with only the 4 unique values:

select SET(string_ntt('A','B','C','B','D','A')) as my_set from dual;

MY_SET
----------------------------------------
STRING_NTT('A', 'B', 'C', 'D')

And by combining both CARDINALITY and SET, we will get the number of unique elements in the nested table object:

select CARDINALITY(SET(string_ntt('A','B','C','B','D','A'))) as num_of_unique_elements from dual;

NUM_OF_UNIQUE_ELEMENTS
----------------------
                     4

ilOUG Day

Oren Nakdimon Leave a Reply

ilOUG, the Israeli Oracle User Group, is introducing a new and refreshing concept for its SIG meetings. On Monday, March 23rd 2015, there will be for the first time a meeting for the entire technological community, unlike the past meetings that were focused each time on a specific interest group (and therefore also small in many cases). In the upcoming event there will be 15 lectures in 5 parallel tracks:

  • Database Administration
  • Database Development
  • Engineered Systems, Hardware & OS
  • Development Technologies and Tools
  • Big Data & BI

There is great variety, and event participants can freely switch tracks at any point, so I’m sure anyone can find something of interest (some of you may even find it hard to choose between simultaneous presentations). In addition, such a community wide event allows more mingling and networking than previous smaller events.

I will present the lecture Write Less (Code) with More (Oracle 12c New Features) in the Database Development track. Here is the abstract:

Oracle 12c introduced many new features that allow us developers to write less code than in previous releases, and become more efficient and productive.
Some features, such as Row Limiting and Lateral Inline Views, enhance the SQL language, so SQL statements can become much shorter and more readable than before.
Other features, such as Temporal Validity and In-Database Archiving, offer built-in functionality that previously had to be implemented by the application.
Attend this session to learn about several of these new features, and see many useful examples.

The meeting will take place at the Dan Panorama hotel in Tel Aviv, from 2pm. Participation is free of charge, but requires registration in advance.

For full details see: http://www.iloug.org.il/ilOUGDay/?page=Agenda2

 

Optimizer bug leads to wrong results

Oren Nakdimon 1 Reply

A few days ago a query that I wrote did not return the expected results. After some investigation I came to conclusion that the reason is a bug of the optimizer. Here is a very simplified example (and quite silly functionality-wise, but I guarantee you that the original, more complex, query does make sense).
I tried it with Oracle 11.2.0.2, 11.2.0.4 and 12.1.0.2, and got the same (wrong) behavior in all three cases.

Update [August 2, 2017]
I tested it with 12.2.0.1, and it’s working fine!

The query is based on the following table:

create table t(
  id   number not null constraint t_pk primary key,
  num1 number not null
);

Let’s fill it with some records:

insert into t values (1,74);
insert into t values (2,96);
insert into t values (3,41);

We start with this query (that works just fine):

select (select max(level) from dual connect by level <= t.num1) 
from   t
where  t.id = :id;

For our sample data we would expect the query to return 74 for :id=1, 96 for :id=2 and 41 for :id=3, and indeed these are the results that we get.
This is the execution plan:

--------------------------------------------------------------------------------------
| Id  | Operation                     | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |      |     1 |    26 |     3   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE               |      |     1 |       |            |          |
|*  2 |   CONNECT BY WITHOUT FILTERING|      |       |       |            |          |
|   3 |    FAST DUAL                  |      |     1 |       |     2   (0)| 00:00:01 |
|   4 |  TABLE ACCESS BY INDEX ROWID  | T    |     1 |    26 |     1   (0)| 00:00:01 |
|*  5 |   INDEX UNIQUE SCAN           | T_PK |     1 |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(LEVEL<=:B1)
   5 - access("T"."ID"=TO_NUMBER(:ID))

Now, I will only add an innocent inline view to the query:

select (select max(level) from dual connect by level <= t.num1) 
from   t,
       (select distinct 1 from dual)
where  t.id = :id;

Obviously the existence of the inline view - (select distinct 1 from dual) - should make no difference. It returns a single record, we don't use its result set, and we have no join conditions in the query (so we actually have a cartesian product between 1 record and [0 or 1] record).

But now the results are wrong. The query returns 1 in all the cases, and the reason is that the scalar subquery - (select max(level) from dual connect by level <= t.num1) - is always executed with t.num1 = null.

Let's look at the execution plan for this case:

-------------------------------------------------------------------------------------------
| Id  | Operation                     | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |           |     1 |    13 |     6  (17)| 00:00:01 |
|   1 |  SORT AGGREGATE               |           |     1 |       |            |          |
|*  2 |   CONNECT BY WITHOUT FILTERING|           |       |       |            |          |
|   3 |    FAST DUAL                  |           |     1 |       |     2   (0)| 00:00:01 |
|   4 |  VIEW                         | VM_NWVW_0 |     1 |    13 |     4  (25)| 00:00:01 |
|   5 |   NESTED LOOPS                |           |     1 |    38 |     3   (0)| 00:00:01 |
|   6 |    FAST DUAL                  |           |     1 |       |     2   (0)| 00:00:01 |
|*  7 |    INDEX UNIQUE SCAN          | T_PK      |     1 |       |     0   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(LEVEL<=:B1)
   7 - access("T"."ID"=TO_NUMBER(:ID))

For some reason, the optimizer decided that it is unnecessary to go to the table T. Oracle accesses only the index T_PK, which contains only the ID column. With this execution plan it is impossible to get the value of the NUM1 column, and indeed the results show that NUM1 is allegedly NULL.

Step 4 in the execution plan refers to VM_NWVW_0, which is usually related to Complex View Merging. So let's try to disable view merging using the NO_MERGE hint:

select (select max(level) from dual connect by level <= t.num1) 
from   t,
       (select /*+ NO_MERGE */ distinct 1 from dual)
where  t.id = :id;

And indeed we get the correct results, with this execution plan:

--------------------------------------------------------------------------------------
| Id  | Operation                     | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |      |     1 |    26 |     6  (17)| 00:00:01 |
|   1 |  SORT AGGREGATE               |      |     1 |       |            |          |
|*  2 |   CONNECT BY WITHOUT FILTERING|      |       |       |            |          |
|   3 |    FAST DUAL                  |      |     1 |       |     2   (0)| 00:00:01 |
|   4 |  NESTED LOOPS                 |      |     1 |    26 |     4  (25)| 00:00:01 |
|   5 |   TABLE ACCESS BY INDEX ROWID | T    |     1 |    26 |     1   (0)| 00:00:01 |
|*  6 |    INDEX UNIQUE SCAN          | T_PK |     1 |       |     1   (0)| 00:00:01 |
|   7 |   VIEW                        |      |     1 |       |     3  (34)| 00:00:01 |
|   8 |    FAST DUAL                  |      |     1 |       |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(LEVEL<=:B1)
   6 - access("T"."ID"=TO_NUMBER(:ID))

Also, if we remove the DISTINCT from the inline view, there will be no complex view merging anymore, and the execution plan (and therefore the results) is fine:

select (select max(level) from dual connect by level <= t.num1) 
from   t,
       (select 1 from dual)
where  t.id = :id;

--------------------------------------------------------------------------------------
| Id  | Operation                     | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |      |     1 |    26 |     5   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE               |      |     1 |       |            |          |
|*  2 |   CONNECT BY WITHOUT FILTERING|      |       |       |            |          |
|   3 |    FAST DUAL                  |      |     1 |       |     2   (0)| 00:00:01 |
|   4 |  NESTED LOOPS                 |      |     1 |    26 |     3   (0)| 00:00:01 |
|   5 |   FAST DUAL                   |      |     1 |       |     2   (0)| 00:00:01 |
|   6 |   TABLE ACCESS BY INDEX ROWID | T    |     1 |    26 |     1   (0)| 00:00:01 |
|*  7 |    INDEX UNIQUE SCAN          | T_PK |     1 |       |     0   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(LEVEL<=:B1)
   7 - access("T"."ID"=TO_NUMBER(:ID))

(However, in my original query the DISTINCT was required...)

You are most welcome to comment or to ask questions in this page, or to write me at oren@db-oriented.com.

DOAG 2014

Oren Nakdimon Leave a Reply

I’ll present two lectures in DOAG 2014 – the annual conference of the German Oracle user group.
The conference will be on November 18th-20th 2014 in Nuremberg.
Here is the full program of the conference.

The first presentation is Edition Based Redefinition best practices. Here is the abstract:
Edition-Based Redefinition (EBR) is a powerful and fascinating feature of Oracle (added in version 11.2), that enables application upgrades with zero downtime, while the application is actively used and operational. Join this session to learn how to use EBR, see many live examples, and get tips from real-life experience in a production site using EBR extensively.

The second presentation is Write Less (Code) with More (Oracle 12c New Features). Here is the abstract:
Oracle 12c introduced many new features that allow us developers to write less code than in previous releases, and become more efficient and productive.
Some features, such as Row Limiting and Lateral Inline Views, enhance the SQL language, so SQL statements can become much shorter and more readable than before.
Other features, such as Temporal Validity and In-Database Archiving, offer built-in functionality that previously had to be implemented by the application.
Attend this session to learn about several of these new features, and see many useful examples.

impdp – which rows failed?

Oren Nakdimon 2 Replies

Since version 11.1, Data Pump Import supports the SKIP_CONSTRAINT_ERRORS option, which specifies that you want the import operation to proceed even if (non-deferred) constraint violations are encountered. It logs any rows that cause constraint violations, but does not stop, and does load the other rows. If SKIP_CONSTRAINT_ERRORS is not set, then the default behavior is to roll back the entire load of the data object on which constraint violations are encountered.

For example:

The table presidents was created by the following statement:

create table presidents (
  id         number       primary key,
  first_name varchar2(20),
  last_name  varchar2(20)
);

and is currently containing the following data:

        ID FIRST_NAME           LAST_NAME
---------- -------------------- --------------------
         1 George               Washington
         2 John                 Adams
         3 Thomas               Jefferson
         4 James                Madison
         5 James                Monroe
         6 John Quincy          Adams
         7 Andrew               Jackson
         8 Martin               Van Buren

We have a dump file that contains an export of the presidents table (from somewhere else or from some previous point in time), and we’d like to append the data from the dump file to the table.
The dump file contains the following data:

        ID FIRST_NAME           LAST_NAME
---------- -------------------- --------------------
         7 William              Harrison
         8 John                 Tyler
         9 James                Polk
        10 Zachary              Taylor
        11 Millard              Fillmore
        12 Franklin             Pierce

Obviously, the records with ID 7 and 8 cannot be added to the table, because a primary key constraint is defined on the ID column, and the table already contains records with these ID values.

If we try to execute the following import, without the SKIP_CONSTRAINT_ERRORS option, we will get an error and the whole operation will be rolled back:

impdp oren/oren tables=presidents dumpfile=presidents.dmp table_exists_action=append logfile=presidents_imp1.log

Import: Release 11.2.0.4.0 - Production on Fri Jul 18 19:33:33 2014
Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved.
Connected to: Oracle Database 11g Release 11.2.0.1.0 - 64bit Production
 Master table "OREN"."SYS_IMPORT_TABLE_01" successfully loaded/unloaded
 Starting "OREN"."SYS_IMPORT_TABLE_01": oren/******** tables=presidents dumpfile=presidents.dmp table_exists_action=append logfile=presidents_imp1.log
 Processing object type TABLE_EXPORT/TABLE/TABLE
 ORA-39152: Table "OREN"."PRESIDENTS" exists. Data will be appended to existing table but all dependent metadata will be skipped due to table_exists_action of append
 Processing object type TABLE_EXPORT/TABLE/TABLE_DATA
 ORA-31693: Table data object "OREN"."PRESIDENTS" failed to load/unload and is being skipped due to error:
 ORA-00001: unique constraint (OREN.SYS_C0023857) violated
 Processing object type TABLE_EXPORT/TABLE/CONSTRAINT/CONSTRAINT
 Job "OREN"."SYS_IMPORT_TABLE_01" completed with 2 error(s) at 16:33:35

As expected, if we check the contents of the table we see that no records have been added from the dump file:

select * from presidents;

        ID FIRST_NAME           LAST_NAME
---------- -------------------- --------------------
         1 George               Washington
         2 John                 Adams
         3 Thomas               Jefferson
         4 James                Madison
         5 James                Monroe
         6 John Quincy          Adams
         7 Andrew               Jackson
         8 Martin               Van Buren

Now let’s execute the import with the SKIP_CONSTRAINT_ERRORS option:

impdp oren/oren tables=presidents dumpfile=presidents.dmp table_exists_action=append logfile=presidents_imp2.log data_options=SKIP_CONSTRAINT_ERRORS

Import: Release 11.2.0.4.0 - Production on Fri Jul 18 19:38:00 2014

Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved.

Connected to: Oracle Database 11g Release 11.2.0.1.0 - 64bit Production
Master table "OREN"."SYS_IMPORT_TABLE_01" successfully loaded/unloaded
Starting "OREN"."SYS_IMPORT_TABLE_01": oren/******** tables=presidents dumpfile=presidents.dmp table_exists_action=append logfile=presidents_imp2.log data_options=SKIP_CONSTRAINT_ERRORS
Processing object type TABLE_EXPORT/TABLE/TABLE
ORA-39152: Table "OREN"."PRESIDENTS" exists. Data will be appended to existing table but all dependent metadata will be skipped due to table_exists_action of append
Processing object type TABLE_EXPORT/TABLE/TABLE_DATA
. . imported "OREN"."PRESIDENTS" 5.945 KB 4 out of 6 rows
2 row(s) were rejected with the following error:
ORA-00001: unique constraint (OREN.SYS_C0023857) violated

Rejected rows with the primary keys are:
Rejected row #1:
column ID: 7
Rejected row #2:
column ID: 8
Processing object type TABLE_EXPORT/TABLE/CONSTRAINT/CONSTRAINT
Job "OREN"."SYS_IMPORT_TABLE_01" completed with 1 error(s) at 16:38:06

We see that 4 rows were imported and that 2 rows were rejected due to unique constraint violation. We can also see from the log which rows were rejected, or more precisely, what are the primary key values of the rejected rows.

But what if we want to know the values of the other columns in the rejected rows?
I don’t know about such a feature, but I made some “digging” and succeeded to come out with a solution.

First I looked at the statements that Data Pump did behind the scenes (by querying V$SQL), and one of the statements I found was this:

INSERT INTO "OREN"."ERR$DP004BC9640001" (
  ORA_ERR_NUMBER$, ORA_ERR_MESG$, ORA_ERR_ROWID$, ORA_ERR_OPTYP$, ORA_ERR_TAG$,
  "ID", "FIRST_NAME", "LAST_NAME")
VALUES (:1, :2, :3, :4, :5, :6, :7, :8)

So apparently Data Pump simply uses DML Error Logging to support the SKIP_CONSTRAINT_ERRORS option. And this means that the values of all the columns (except for LOBs etc.) of the rejected rows are basically available. All we need to do is to select from this ERR$DP004BC9640001 table, right?
Well, there are two problems with this:

  • The name of the table is not consistent. I ran it several times and got a different name in each run (but it always started with ERR$DP).
  • The ERR$DP… table is dropped as soon as the import is done, so by the time we want to select from it, it is already gone.

But we can overcome these issues, with the following flow:

1. Before starting the import process, open a SQL*Plus session and execute:

lock table presidents in share mode;

and keep this session (and transaction) open. This will block the import process (as well as any other process!) when it tries to insert records into the table, and will give us the opportunity to check the name of the error logging table, which, as we’ll see, will already exist at that point in time.

2. Start the import:

impdp oren/oren tables=presidents dumpfile=presidents.dmp table_exists_action=append logfile=presidents_imp.log data_options=SKIP_CONSTRAINT_ERRORS

Import: Release 11.2.0.4.0 - Production on Fri Jul 18 23:13:14 2014

Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved.

Connected to: Oracle Database 11g Release 11.2.0.1.0 - 64bit Production
Master table "OREN"."SYS_IMPORT_TABLE_01" successfully loaded/unloaded
Starting "OREN"."SYS_IMPORT_TABLE_01": oren/******** tables=presidents dumpfile=presidents.dmp table_exists_action=append logfile=presidents_imp.log data_options=SKIP_CONSTRAINT_ERRORS
Processing object type TABLE_EXPORT/TABLE/TABLE
ORA-39152: Table "OREN"."PRESIDENTS" exists. Data will be appended to existing table but all dependent metadata will be skipped due to table_exists_action of append
Processing object type TABLE_EXPORT/TABLE/TABLE_DATA

3. The import is blocked now. Going back to the SQL*Plus session, let’s find out the error logging table name:

select table_name from user_tables where table_name like 'ERR$DP%';

TABLE_NAME
-----------------------
ERR$DP0047ABBF0001

4. Start a second SQL*Plus session and lock the error logging table in a mode that will allow the import to fill it but not to drop it:

lock table ERR$DP0047ABBF0001 in row share mode;

5. In the first SQL*Plus session, release the lock on the presidents table so the import can continue:

Rollback;

And the import continues:

. . imported "OREN"."PRESIDENTS" 5.945 KB 4 out of 6 rows
2 row(s) were rejected with the following error:
ORA-00001: unique constraint (OREN.SYS_C0023889) violated

Rejected rows with the primary keys are:
Rejected row #1:
column ID: 7
Rejected row #2:
column ID: 8
Processing object type TABLE_EXPORT/TABLE/CONSTRAINT/CONSTRAINT
Job "OREN"."SYS_IMPORT_TABLE_01" completed with 1 error(s) at 20:35:21

6. When the import is done, we can see all the rejected rows from one of the SQL*Plus sessions:

select id,first_name,last_name from ERR$DP0047ABBF0001;

ID  FIRST_NAME           LAST_NAME
--- -------------------- --------------------
7   William              Harrison
8   John                 Tyler

7. And finally, to cleanup, we can drop the error logging table from the second SQL*Plus session:

drop table ERR$DP0047ABBF0001;

You are most welcome to comment or to ask questions in this page, or to write me at oren@db-oriented.com.

3 Points about Indexes and Order

Oren Nakdimon 2 Replies

When dealing with indexes, order may be very significant, in several aspects.
Perhaps it’s not surprising after all when talking about a data structure that its purpose is to keep data in order
We’ll refer to three different points:
1. Is the order of columns in a composite index important?
2. Which order is better: filling a table with data and then creating indexes, or creating indexes and then filling the table with data?
3. When creating several indexes, is the order of creation important?
Not always there is one correct answer that covers all the different cases, but it is always worth asking the questions before rushing to execute.

Is the order of columns in a composite index important?
Certainly yes. Let’s take a look at the following two indexes – they both contain the same two columns, but in a different order:

CREATE INDEX T_IDX_1 ON T (COL_A,COL_B);
CREATE INDEX T_IDX_2 ON T (COL_B,COL_A);

Let’s assume that the table T contains many records and that the following queries are highly selective (i.e., they return a relatively small number of records from the table), so it is likely that using an index is better than full scanning the table.

For the following query both indexes are equally good:

SELECT *
FROM T
WHERE COL_A = :VAL1
  AND COL_B = :VAL2;

But for the following query, the index T_IDX_1 is good, while T_IDX_2 is not:

SELECT *
FROM T
WHERE COL_A = :VAL1;

Although the index T_IDX_2 contains the column COL_A, it is not enough, as it does not contain it in its leading part. The order of columns in the index matters.
It’s easy to understand the difference when thinking of the following example: in the phone book the names are ordered first by last name, and then by first name. We can find very quickly all the subscribers whose last name is “Nakdimon”, but we must scan the whole phone book for finding all the subscribers whose first name is “Oren”.

Note: if the table contains a relatively small number of different unique values in the column COL_B, Oracle will still be able to use the index T_IDX_2 for answering the last query by using the Index Skip Scan access path, but still using T_IDX_1 with Index Range Scan will be better.

For the next two questions we’ll consider the following case: we need to create a new table with the following two indexes, and fill it with plenty of data:

CREATE TABLE T (
   COL_A NUMBER,
   COL_B DATE,
   COL_C VARCHAR2(4000),
   …
);
CREATE INDEX T_IDX_A ON T (COL_A);
CREATE INDEX T_IDX_BA ON T (COL_B,COL_A);

Which order is better: filling a table with data and then creating indexes, or creating indexes and then filling the table with data?
Usually working in the former order (creating the indexes when the table is already filled) will take less time than working in the latter order (filling the table when the indexes already exist), since in the latter case the indexes need to be maintained with the insertion of each and every record.

When creating several indexes, is the order of creation important?
Here the answer is positive in certain cases.
Suppose that we created the table T and filled it with many rows, and now it’s time to create the indexes. We can create T_IDX_A first and T_IDX_BA second, or vice versa. Let’s examine both options:

Option 1:

  • We’ll create T_IDX_A first. For that, Oracle will do a Full Table Scan of T (and will take the value of COL_A from every record, and of course the ROWID of every record to know where to point to from the index)
  • Now we’ll create T_IDX_BA. Once again, Oracle will do a Full Table Scan of T (and will take the values of COL_B and COL_A and the ROWID from every record)

Option 2:

  • We’ll create T_IDX_BA first. For that, Oracle will do a Full Table Scan of T (and will take the values of COL_B and COL_A and the ROWID from every record)
  • Now we’ll create T_IDX_A, and this is where the plot changes. Oracle can do a Full Table Scan of T here as well, but in this case it has another alternative, a better one in most cases. The only details that are needed in order to build the index are the values of COL_A and the ROWID of all the records in the table (where COL_A is not null), and these details already exist in the index T_IDX_BA. Therefore, Oracle can do an Index Fast Full Scan of T_IDX_BA, instead of Full Table Scan of the table.

So, if all the columns of one index are included in a second index, it is recommended to create the second index first, and only then the first index, and enable Oracle to consider more alternatives. The more the number of columns in the table that do not exist in the indexes, the more significant the improvement in the creation time of the second index (by doing Index Fast Full Scan instead of Full Table Scan) is.

You are most welcome to comment or to ask questions in this page, or to write me at oren@db-oriented.com.

One Problem – Many Solutions

Oren Nakdimon 1 Reply

One of the things I love in the SQL language is that one problem may have many different solutions, or more specifically, one functional question may be solved by different SQL queries.

It doesn’t mean that all the solutions are equivalent in all aspects. If all the solutions solve the functional question correctly, then all the result sets must be the same. But they may be different in performance, for example (it’s so satisfying reducing the execution time of a query from hours to seconds just by rewriting it).

Another example: one query may be short and elegant while another “equivalent” query may be long and cumbersome.

And not all the solutions are even legitimate in all versions of the database – many times a new feature may make a query much simpler (analytic functions are a great example for that – I still remember how their appearance [in Oracle 8.1.6 if I remember correctly] had a huge effect on my query writing – making them shorter, and many times faster – and actually I see the same effect on people today when I rewrite their queries and at the same time introduce them to analytic functions).

Oracle 12c introduced some new features that let us write less for getting the same functionality, like Lateral Inline Views or the Row Limiting clause.

Here is a nice example for one problem with several solutions.

A few days ago, Lucas Jellema from AMIS raised in this blog post the question of “packing” multiple rows that represent adjacent time periods into a single row that represents the unified period, and suggested the following solution, that demonstrates well some important features such as the LAG and RANK analytic functions, Subquery Factoring (supported since Oracle 9i), and Recursive Subquery Factoring (introduced in Oracle 11g):

with chairpeople as
( select chairperson
  ,      date_from
  ,      date_to
  ,      case date_from - lag(date_to) over (partition by chairperson order by date_from asc)
         when 1 then 'PACK'
         end action
  ,      rank()  over (partition by chairperson order by date_from asc) rnk
  from   chairmanships
)
, packing_chairs (chair, date_from, date_to, lvl) as
( select chairperson, date_from, date_to, 1
  from   chairpeople
  where  action is null
  union all
  select p.chair, p.date_from, c.date_to, lvl+1
  from   chairpeople c
         join
         packing_chairs p
         on (c.chairperson = p.chair and c.rnk = p.lvl+1)
  where  c.action='PACK'
  )
, packed_chairs as
( select chair, date_from, nullif(max(nvl(date_to,date'9999-12-31')),date'9999-12-31') date_to
  from   packing_chairs
  group
  by     chair, date_from
)
select *
from   packed_chairs
order
by     date_from;

-- note: this is a slightly revised version of the query from the original post

I suggested another solution, based on the LAST_VALUE analytic function:

select chairperson,
       date_from,
       max(date_to) keep(dense_rank last order by date_to) date_to
from   (select chairperson,
               last_value(new_period_date_from ignore nulls) over(partition by chairperson order by date_from) date_from,
               date_to
        from   (select chairperson,
                       date_from,
                       date_to,
                       case when lnnvl(date_from – lag(date_to) over(partition by chairperson order by date_from) = 1) then date_from end new_period_date_from
                from   chairmanships))
group  by chairperson,
          date_from
order  by date_from;

In another comment to Lucas’ post, Sayan Malakshinov suggested an even simpler and shorter solution:

select 
  chairperson
 ,min(date_from) keep (dense_rank first order by date_from,date_to) as date_from
 ,max(date_to  ) keep (dense_rank last  order by date_from,date_to) as date_to
from (
      select
           chairperson 
         , date_from 
         , date_to 
         , sum(flag) over(partition by chairperson order by date_from,date_to) grp
      from (
            select 
                 chairperson 
               , date_from 
               , date_to 
               , decode( 1 + lag(date_to)over(partition by chairperson order by date_from,date_to), date_from, 0, 1) flag
            from chairmanships
           )
     )
group by chairperson, grp
order by chairperson, grp;

Finally (for now at least), I suggested yet another solution, using a new Oracle 12c feature – Pattern Matching:

select * from chairmanships 
  match_recognize (
     partition by chairperson
     order by date_from
     measures frst.date_from as date_from,
              date_to as date_to
     one row per match
     pattern (frst nxt*)
     define nxt as nxt.date_from = prev(nxt.date_to)+1) 
order by chairperson,date_from;

So, there you go – one question, four very different solutions.