Not-Null impact on performance : Another special case (Null-Aware Anti-Join “HASH JOIN ANTI NA”)

I know that there is many blogs explaining why we should explicitly declare a column “NOT NULL” if it will not hold null data (Like the impact on INDEX usage or join method),but here i will only focus on the join method implemented in 11G that’s Null-Aware Anti-Join and specifically on some ‘limitation’ of the “HASH JOIN ANTI NA” that i have observed.

For more information on “Null-Aware Anti-Join” and it’s algorithm take a look at : http://www.vldb.org/pvldb/2/vldb09-423.pdf

Here is a quick test case :

  • Oracle 11.2.0.3.9
  • OS : HP-UX V3

We will create 4 tables (2 tables with not null constraint and 2 without)

drop table t1;
drop table t2;
drop table t3;
drop table t4;
create table t1 as select rownum a,rownum + 1 b,rownum +2 c,rownum + 3 d from dual connect by level < 900000;
create table t2 as select rownum a,rownum + 1 b,rownum +2 c,rownum + 3 d from dual connect by level < 900000;
create table t3 as select rownum a,rownum + 1 b,rownum +2 c,rownum + 3 d from dual connect by level < 900000;
create table t4 as select rownum a,rownum + 1 b,rownum +2 c,rownum + 3 d from dual connect by level < 900000;

alter table t3 modify (a not null,b not null,c not null,d not null);
alter table t4 modify (a not null,b not null,c not null,d not null);

exec DBMS_STATS.GATHER_TABLE_STATS(null,’T1′);
exec DBMS_STATS.GATHER_TABLE_STATS(null,’T2′);
exec DBMS_STATS.GATHER_TABLE_STATS(null,’T3′);
exec DBMS_STATS.GATHER_TABLE_STATS(null,’T4′);

First test with only one null-able column : (set timing on ; set autotrace on)

select count(*) from t1  where a not in  ( select b from t2 ); (Executed many times)

Capture

select  count(*) from t3  where a not in  ( select b from t4 );

Capture

Although the extra steps added in the Null-Aware algorithm,  there is no big differences on execution performance between the two query (the one using “HASH JOIN ANTI ” and “HASH JOIN ANTI NA”).

Second test with two null-able columns :

select count(*) from t1  where (a,b) not in  ( select b,c from t2 );

Capture

select  count(*) from t3  where (a,b) not in  ( select b,c from t4 );

Capture

Wow the query is two times faster with not null constraints.Also watch the execution plan switched from “HASH JOIN ANTI NA” to “MERGE JOIN ANTI NA”. Let’s try to hint a “HASH JOIN ANTI NA” and see what happen.

select     /*+  USE_HASH(@”SEL$5DA710D3″ “T2″@”SEL$2”) */  count(*) from t1  where (a,b) not in  ( select b,c from t2 );

Capture

The query still use  a “MERGE JOIN ANTI NA” let’s look to the optimizer trace file :

Capture

We can see that the “HASH JOIN” method is not evaluated , may be there is some restriction in it’s use (I will investigate that later but there is nothing in the doc or My oracle support) but the important thing is that this affected seriously our execution time (two times slower).

Obviously the Null-Aware Anti-Join drastically  improved the performance of the not in sub-query compared to the filter row source used in previous release (the optimizer was unable to unnest the sub-query in version priory to 11g in this case) but we have also seen that adding a not-null constraint when permitted is always the best solution.

2 thoughts on “Not-Null impact on performance : Another special case (Null-Aware Anti-Join “HASH JOIN ANTI NA”)

  1. Thanks for posting interesting article… (Y)

    Just an observation.. !!

    Also though we mark both columns of table T2 as NOT NULL.

    alter table t2 modify (c not null);
    alter table t2 modify (b not null);

    it still go for MERGE JOIN ANTI SNA , still no Hash join SNA..!!

    ————————————————————————————-
    | Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
    ————————————————————————————-
    | 0 | SELECT STATEMENT | | | | | 8907 (100)| |
    | 1 | SORT AGGREGATE | | 1 | 20 | | | |
    | 2 | MERGE JOIN ANTI SNA| | 1 | 20 | | 8907 (1)| 00:00:01 |
    | 3 | SORT JOIN | | 899K| 8789K| 34M| 4453 (1)| 00:00:01 |
    | 4 | TABLE ACCESS FULL| T11 | 899K| 8789K| | 863 (1)| 00:00:01 |
    |* 5 | SORT UNIQUE | | 899K| 8789K| 34M| 4454 (1)| 00:00:01 |
    | 6 | TABLE ACCESS FULL| T2 | 899K| 8789K| | 864 (1)| 00:00:01 |
    ————————————————————————————-

    Predicate Information (identified by operation id):
    —————————————————

    5 – access(INTERNAL_FUNCTION(“A”)=INTERNAL_FUNCTION(“B”) AND
    INTERNAL_FUNCTION(“B”)=INTERNAL_FUNCTION(“C”))
    filter((INTERNAL_FUNCTION(“B”)=INTERNAL_FUNCTION(“C”) AND
    INTERNAL_FUNCTION(“A”)=INTERNAL_FUNCTION(“B”)))

    Also difference observe , between not in constraint columns and Nullable columns for multiple columns..

    is addition of “INTERNAL_FUNCTION” in access predicate.
    which was not observe for single column nullable comparison.

  2. Thanks for your interest 😀 . Like you observed there is other differences between the two executions plan like the use of an internal function. I haven’t run many tests but only one similar to what we faced in our production system to demonstrate some restriction that i observed when choosing the join method “HASH JOIN ANTI NA” or like you observed “HASH JOIN ANTI SNA” and it’s impact on the performance.

    Still there is a question.Why we can no more use a “HASH JOIN ANTI S/NA” in this case ?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s