One Problem – Many Solutions

One of the things I love in the SQL language is that one problem may have many different solutions, or more specifically, one functional question may be solved by different SQL queries.

It doesn’t mean that all the solutions are equivalent in all aspects. If all the solutions solve the functional question correctly, then all the result sets must be the same. But they may be different in performance, for example (it’s so satisfying reducing the execution time of a query from hours to seconds just by rewriting it).

Another example: one query may be short and elegant while another “equivalent” query may be long and cumbersome.

And not all the solutions are even legitimate in all versions of the database – many times a new feature may make a query much simpler (analytic functions are a great example for that – I still remember how their appearance [in Oracle 8.1.6 if I remember correctly] had a huge effect on my query writing – making them shorter, and many times faster – and actually I see the same effect on people today when I rewrite their queries and at the same time introduce them to analytic functions).

Oracle 12c introduced some new features that let us write less for getting the same functionality, like Lateral Inline Views or the Row Limiting clause.

Here is a nice example for one problem with several solutions.

A few days ago, Lucas Jellema from AMIS raised in this blog post the question of “packing” multiple rows that represent adjacent time periods into a single row that represents the unified period, and suggested the following solution, that demonstrates well some important features such as the LAG and RANK analytic functions, Subquery Factoring (supported since Oracle 9i), and Recursive Subquery Factoring (introduced in Oracle 11g):

with chairpeople as
( select chairperson
  ,      date_from
  ,      date_to
  ,      case date_from - lag(date_to) over (partition by chairperson order by date_from asc)
         when 1 then 'PACK'
         end action
  ,      rank()  over (partition by chairperson order by date_from asc) rnk
  from   chairmanships
)
, packing_chairs (chair, date_from, date_to, lvl) as
( select chairperson, date_from, date_to, 1
  from   chairpeople
  where  action is null
  union all
  select p.chair, p.date_from, c.date_to, lvl+1
  from   chairpeople c
         join
         packing_chairs p
         on (c.chairperson = p.chair and c.rnk = p.lvl+1)
  where  c.action='PACK'
  )
, packed_chairs as
( select chair, date_from, nullif(max(nvl(date_to,date'9999-12-31')),date'9999-12-31') date_to
  from   packing_chairs
  group
  by     chair, date_from
)
select *
from   packed_chairs
order
by     date_from;

-- note: this is a slightly revised version of the query from the original post

I suggested another solution, based on the LAST_VALUE analytic function:

select chairperson,
       date_from,
       max(date_to) keep(dense_rank last order by date_to) date_to
from   (select chairperson,
               last_value(new_period_date_from ignore nulls) over(partition by chairperson order by date_from) date_from,
               date_to
        from   (select chairperson,
                       date_from,
                       date_to,
                       case when lnnvl(date_from – lag(date_to) over(partition by chairperson order by date_from) = 1) then date_from end new_period_date_from
                from   chairmanships))
group  by chairperson,
          date_from
order  by date_from;

In another comment to Lucas’ post, Sayan Malakshinov suggested an even simpler and shorter solution:

select 
  chairperson
 ,min(date_from) keep (dense_rank first order by date_from,date_to) as date_from
 ,max(date_to  ) keep (dense_rank last  order by date_from,date_to) as date_to
from (
      select
           chairperson 
         , date_from 
         , date_to 
         , sum(flag) over(partition by chairperson order by date_from,date_to) grp
      from (
            select 
                 chairperson 
               , date_from 
               , date_to 
               , decode( 1 + lag(date_to)over(partition by chairperson order by date_from,date_to), date_from, 0, 1) flag
            from chairmanships
           )
     )
group by chairperson, grp
order by chairperson, grp;

Finally (for now at least), I suggested yet another solution, using a new Oracle 12c feature – Pattern Matching:

select * from chairmanships 
  match_recognize (
     partition by chairperson
     order by date_from
     measures frst.date_from as date_from,
              date_to as date_to
     one row per match
     pattern (frst nxt*)
     define nxt as nxt.date_from = prev(nxt.date_to)+1) 
order by chairperson,date_from;

So, there you go – one question, four very different solutions.

The COALESCE Function

Everybody knows the NVL function, and use it frequently. On the other hand, the COALESCE function, which belongs to the same family of functions, is less known, although it exists since version 9i and has more capabilities than the good old NVL.

Let’s start with the similarities between these functions:

What?

Both functions have a similar purpose – to return the first non-null element from a list of elements. If all the elements in the list are null, the result will be null as well.

Where?

Both functions can be called from SQL and from PL/SQL.

 

Regarding the differences between the functions – there are two major and significant ones:

How Many?

The first difference is in the number of arguments the functions get.

NVL supports in exactly two arguments. The expression

NVL(a,b)

returns the value of a if it is not null, and the value of b if a is null.

The COALESCE function supports two or more arguments.

Let’s look at the expression

COALESCE(X1, X2, X3,..., Xn)
  • If X1 is not null its value is returned
  • Otherwise, if X2 is not null its value is returned
  • Otherwise, if X3 is not null its value is returned
  • And so on…

So if you write sometimes something like

NVL(a,NVL(b,c))

or alternatively

CASE WHEN a IS NOT NULL THEN a WHEN b IS NOT NULL THEN b ELSE c END

then you can write instead

COALESCE(a,b,c)

When?

The second difference is less evident, but very significant in some cases.

For many years I assumed that if I write

NVL(expression1,expression2)

then expression2 will be evaluated only if expression1 is null. It seems logic, doesn’t it? But this is not how it works. in NVL both expressions are always evaluated. In COALESCE, on the other hand, the expressions are evaluated only when needed (and so is the case with DECODE and CASE, by the way).

Let’s see an example where this behavior of NVL is problematic:

In the following query we’d like to return the value of the DESCRIPTION column if it is not null. If the column is empty, we’d like to return the result of some function – get_description – that gets as input the columns A_ID and B_ID.

SELECT NVL(DESCRIPTION, get_description(A_ID,B_ID))
FROM TEST_TABLE;

It doesn’t matter what get_description does exactly. What matters is that every call to the function when DESCRIPTION contains a non-null value is unnecessary, but this is exactly what happens here, and the performance may be hurt significantly.

But if we use COALESCE instead of NVL, we’ll achieve the same functionality, while removing the unnecessary calls. get_description will be called only for records in which the DESCRIPTION column is empty:

SELECT COALESCE(DESCRIPTION, get_description(A_ID,B_ID))
FROM TEST_TABLE;

One advantage NVL does have over COALESCE is the number of characters we need to type.

So if we have exactly two expressions, and both of them are already evaluated, we can continue using NVL. In the other cases, using COALESCE is probably a better choice.

Multiple indexes on the same column list

Zohar Elkayam wrote about the new feature added in Oracle 12c that enables creation of multiple indexes on the same column list (as long as they are different somehow [unique vs. non-unique, b*tree vs. bitmap, etc.] and only one of them is visible).

If the same capability is needed in pre-12c versions: create the new index with the addition of an extra column containing a constant. For example, if the original index is based on (X, Y), then the new index can be based on (X, Y, 1).

Oracle 12c New Features for Developers

One of the seminars I gave in Oracle Week 2013 was “Oracle Database 12c New Features for Developers”. I covered quite a lot of features (well, it was a full day seminar), and I tried to present them in a rational order, by dividing them to categories and talking about one category at a time. Of course, there are features that fall into more than one category, so I had to choose where to talk about such features deeply and where to only mention them.

Following is the complete list of features I covered in the seminar, with a very very short description for each one of them. I plan to write dedicated posts about some of them. In the meantime, I’m adding links to some good references.

The presentation is available here.

 

For starters (probably not the most important new feature 😉 )

  • Last login
C:\Users\orenn>sqlplus

SQL*Plus: Release 12.1.0.1.0 Production on Sat Nov 9 08:24:09 2013

Copyright (c) 1982, 2013, Oracle.  All rights reserved.

Enter user-name: hr
Enter password:
Last Successful login time: Fri Nov 08 2013 17:26:27 +02:00

Connected to: Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options

SQL>

Schema-level new features

  • Invisible columns: Invisible columns are invisible, unless explicitly specified (for example, they will not appear in the result set of “select * from table”) [a post by Tom Kyte]
  • Column defaults [an article by Tim Hall]
    • Sequence: a column can be associated with a user-defined sequence
    • On null: a default can be defined for explicit Null values
    • Identity: a column can be defined as “identity”, which implicitly associates it with a system-generated sequence
    • New optional column: adding an optional column with default is a meta-data only operation
  • Extended strings: the maximum length of RAW and N/VARCHAR2 columns can be extended to 32KB [an article by Tim Hall]
  • Multiple indexes on the same column list: it is possible to define several indexes on the same column list, as long as they are different somehow (unique vs. non-unique, b*tree vs. bitmap, etc.) and only one of them is visible [a post by Richard Foote]
  • Session level sequences: non-persistent sequences, return a range of numbers which is unique within a session
  • Temporary undo: undo for temporary tables can be stored in the temporary tablespace

Optimizer new features

  • Histograms: new types of histograms – “top frequency” and “hybrid” – can give much better information than the legacy “frequency” and “height balanced” ones [posts by Jonathan Lewis: part 1, part 2, part 3 (with a very clear example of hybrid histograms)]
  • Online statistics gathering: table and column (but not index and histogram) statistics are gathered automatically during direct path load into empty tables/partitions
  • Dynamic statistics: a new level of dynamic statistics (formerly known as dynamic sampling)  – 11 – enables the optimizer to automatically decide to use dynamic statistics for any SQL statement, and the results are persisted in the cache
  • Global temporary tables: statistics gathered on global temporary tables are kept and used either in session-level (the default) or shared across sessions (as in pre-12c versions)
  • Concurrent execution of UNION/ALL: different branches of UNION operations may be executed in parallel (if at least one of the branches is considered being processed in parallel)

PL/SQL new features

  • White lists: a new clause – ACCESSIBLE BY – can be added to packages, procedures, functions and types to specify which objects are able to reference the PL/SQL object directly [an article by Tim Hall]
  • Invoker’s rights enhancements
    • Granting roles to program units: roles can be granted to program units [a post by Tom Kyte]
    • BEQUEATH: a new clause in view definition – BEQUEATH – specifies whether functions referenced in the view are executed using the view invoker’s rights or the view definer’s rights
    • INHERIT PRIVILEGES: when executing an invoker’s rights procedure the owner gets access to the invoker’s privileges; in 12c the owner must be granted INHERIT PRIVILEGES on the invoker to achieve this
    • Function result cache: invoker’s rights functions can be RESULT_CACHE-ed too (not only definer’s rights functions)
  • SQL text expansion: DBMS_UTILITY.EXPAND_SQL_TEXT reveals the actual SQL executed for a given query [a post by Tom Kyte]
  • Introspection: the new package UTL_CALL_STACK gives a modeled access to call stacks and error backtrace, and a fine-grained resolution within the program unit [a post by Tom Kyte]
  • Binding: it is possible to bind PL/SQL-only data types to SQL statements
  • SELECT from collections: it is possible to select from package-level collections

Easier migration to Oracle

  • Implicit result sets: it is possible to return result sets not only explicitly through cursor variables, but also implicitly [an article by Tim Hall]
  • SQL translation framework: SQL Translation Framework translates SQL statements of a client program from a non-Oracle dialect into the Oracle dialect (and can also be used to substitute an Oracle SQL statement with another Oracle statement to address a semantic or a performance issue) [a post by Kerry Osborne]

SQL new features

  • Row limiting: a new clause of the SELECT statement allows to fetch a specific chunk of X rows or X% of rows [an article by Tim Hall]
  • PL/SQL in the WITH clause: the WITH clause can include not only subquery factoring but also PL/SQL declarations of functions that can be used in the query (and procedures that can be used in the functions) [an article by Tim Hall]
  • Pattern matching: the SELECT syntax allows enhanced analysis of row sequences
  • TRUNCATE CASCADE: it is possible to truncate the parent table of an enabled foreign key constraint, and to truncate the parent table of a reference-partitioned table (assuming the foreign key is defined with ON DELETE CASCADE)
  • New join syntax: now supported – CROSS APPLY, OUTER APPLY, LATERAL inline views, multi-table “left outer join” with Oracle syntax (+)
  • Online operations: more operations can be executed online (DROP INDEX, DROP CONSTRAINT, SET UNUSED COLUMN, ALTER INDEX UNUSABLE, ALTER INDEX VISIBLE|INVISIBLE, ALTER TABLE MOVE (SUB)PARTITION)

Utilities new features

  • Data Pump enhancements: added options to disable redo logging in impdp, to change table compression in impdp, to export views as tables in expdp, to add timestamps to the log file
  • SQL*Loader express mode: SQL*Loader can be executed with no control file and with many defaults, and it generates a log file for future use including control file, CREATE EXTERNAL TABLE statement and a corresponding INSERT statement

Transaction Guard

  • Transaction Guard: a generic tool for applications to use for at-most-once execution in case of outages

Information lifecycle management

  • Temporal validity: tables can have pairs of columns defining “time periods”, and data can be selected by a specific valid time (or range)
  • In-database archiving: tables can be defined as ROW ARCHIVAL, which implicitly adds a hidden column to the table, holding an archiving (“logical deletion”) state;  “archived” rows become invisible  (or visible, when we want)
  • Heat map: the Heat Map feature provides data access tracking (last read time, last update time, last full table scan, last index scan)
  • Automatic data optimization: policies can be defined to automatically move data between storage tiers and to automatically compress data [an Oracle Magazine article by Arup Nanda]

Partitioning new features

  • Partitioning schemes: the combination of reference partitioning and interval partitioning is possible
  • Asynchronous global index maintenance: DROP/TRUNCATE PARTITION of a table with global indexes can be now a fast operation while keeping the global indexes usable; the “orphaned” index entries are deleted asynchronously [posts by Richard Foote: part 1, part 2, part 3]
  • TRUNCATE/EXCHANGE CASCADE: it is possible to TRUNCATE/EXCHNAGE partitions of the parent table of a reference-partitioned table (assuming the foreign key is defined with ON DELETE CASCADE)
  • Multiple (sub)partitions in single DDL operations: it is possible to handle multiple partitions in a single statement (depending on the partitioning scheme) – ADD | DROP | MERGE | SPLIT | TRUNCATE
  • Partial indexes: it is possible to create an index on a subset of partitions [posts by Richard Foote: part 1, part 2]

Lecture on EBR

In the next meeting of Oracle and ilOUG DBA Forum, on December 10th, I will talk about Edition-Based Redefinition, a powerful and fascinating feature of Oracle (added in version 11.2), that enables application upgrades with zero downtime, while the application is actively used and operational.
I will explain how to use it, give some tips from my hands-on experience with EBR, and present examples.

External Tables

This is a feature I use a lot…

An external table in Oracle is a table that its structure is defined in the data dictionary, but its content is taken from a standalone file, and is not stored inside the database. This feature is very useful as part of an ETL process, where you need to load the content of some files into the database.

The external file must be located in the file system of the database server machine (or at least in a directory that it can access). This is the major drawback I see in this feature with respect to its predecessor, SQL*Loader (which loads files located at the client machine). But the advantages are significantly superior to the disadvantages.

The definition of the external table describes how to locate, parse and process the file: Continue reading “External Tables”

Literals

It is possible to specify a date value as a literal, like that:

DATE '2013-08-29'

This instantiates a date constant, which is equivalent to the result of

to_date('29/08/2013','dd/mm/yyyy')

The date format in this case is always yyyy-mm-dd, and is not influenced by any parameter (the default format of the to_date function is defined by the NLS_DATE_FORMAT parameter).

For example, to find all the employees that were hired during August 2013 we can use the following query:

select * 
from employees 
where hire_date >= date'2013-08-01' 
  and hire_date < date'2013-09-01';

To specify a value that contain also a time part as a literal, you can use:

TIMESTAMP '2013-08-29 13:45:00'

The format is always yyyy-mm-dd hh24:mi:ss (fractions of seconds may also be specified).

The MERGE statement

The SQL statement MERGE (available from version 9i) provides the functionality of “INSERT or UPDATE” in a single statement.

The basic syntax is:

MERGE INTO <target>
USING <source>
ON (<join conditions>)
WHEN MATCHED THEN UPDATE <SET clause>
WHEN NOT MATCHED THEN INSERT (<column list>) VALUES (<expression list>);

where:

  • <target> is the table that we’d like to insert new records into and/or update existing records in
  • <source> is usually some SQL query (it may also be simply a table)
  • <join conditions> specify the relationship between the source and the target – these conditions determine whether a row from the source exists in the target or not

Each row from the source is looked up in the target table, using the conditions specified in the ON clause. If the row exists in the target table, then the WHEN MATCHED clause is executed – the target row is updated. If the row does not exist in the target table, then the WHEN NOT MATCHED clause is executed – a new row is inserted into the target table.

Let’s take an example. Continue reading “The MERGE statement”

Course Addendum

I lectured about Oracle SQL and PL/SQL in a course lately. As always, some issues that had not been part of the planned scope were raised and discussed, so I thought to write short posts about some of them, for the benefit of the participants and everyone else that may find it relevant.

These issues are: