In-Memory : Optimizer cost calculation some anomaly ! (IM Quotient, IM Journal Rows)

When investigating how the optimizer calculate the cost of In-memory operation i hit some anomaly. Let’s begin with a big overview of how the cost of in-memory operation is calculated (For more info please check http://www.vldb.org/pvldb/vol8/p1770-das.pdf ). The cost of in-memory operation is based on this new keys statistics which are maintained in real time (as stated by the previous doc and demonstrated by Riyaj Shamsudeen https://t.co/tp0Q1sNZrB for example function “ktmpjUpdInMemExtStat”  update In-memory extent statistics in the same session ) :

  • Number of in-memory blocks
  • Number of in-memory rows
  • Number of in-memory transaction journal rows
  • In-memory quotient : is the ratio of in-memory blocks to the number of on disk blocks

The cost of in-memory operation is based on a combination of the standard cost model formula and the new enhanced in-memory cost formula (Storage  index  pruning  cost,Decompression   cost,Predicate  evaluation  cost,Row  stitching  cost,Transaction   journal scan cost,etc) depending on the in-memory  quotient . As stated by the previous document : “The in-memory statistics will reflect  the  partial  population  status:  the  in-memory  quotient  will be q (< 1) and number of in-memory blocks will be Dq (< B). A scan of this table will require reading both the in-memory and on-disk portions.  The in-memory scan cost is computed as described above  using  the  in-memory  statistics.    The  on-disk  scan  cost  is computed using the standard cost model for row  major tables but with  prorated  statistics  of D(1-q)  blocks  and  N(1-q)  rows  (this cost  will  include  both  I/O  and  CPU). These  costs  are  then combined to get the total table scan cost”

It’s clear in this formula how important is the In-memory quotient for determining the final cost and here come the first anomaly :

Note : All test are done in 12.1.0.2.5 (Linux).

First Anomaly (In-memory quotient) :

Let’s create two tables TEST_COST1 and TEST_COST2 of different size that cannot both feet in the in-memory store at the same time.TEST_COST1 is more bigger than the in-memory store so it will consume all the space and there will be no space left for TEST_COST2;

Capture 10

Partially populated TEST_COST1:

Lets begin by checking the cost of a table partially populated in the in-memory store :

check trace file after activating optimizer tracing :

alter session set events ‘trace [IM_optimizer|SQL_optimizer] disk=medium’;
alter session set events ‘10053 TRACE NAME CONTEXT FOREVER, LEVEL 1’;

select count(*) from test_cost1;

kdmoInitSegStats(): objn: 45818
kdmoInitSegStatsInt(): IMC objn: 45818 loopInit: 1
kdmoInitSegStatsInt(): DISTRIBUTE mode ON
kdmoEstStatsRacSeg(): nNodes: 1  nRowsCurr: 0
kdmoDumpSegStats(): IM Quotient: 0.637530  IMCUs: 5  IM Rows: 523293  IM Journal Rows: 13082

 

As expected the statistics IM Quotient is less than 1 (0.637530)  because the table is partially populated   let’s check the cost calculation :

***************************************
BASE STATISTICAL INFORMATION
***********************
Table Stats::
Table: TEST_COST1  Alias: TEST_COST1
#Rows: 820814  SSZ: 0  LGR: 0  #Blks:  12241  AvgRowLen:  99.00  NEB: 0  ChainCnt:  0.00  SPC: 0  RFL: 0  RNF: 0
#IMCUs: 5  IMCRowCnt: 523293  IMCJournalRowCnt: 13082  #IMCBlocks: 7804  IMCQuotient: 0.637530

We can verify IMCQuotient = #IMCBlocks /  #Blks = 7804 / 12241 = 0.637530

Access path analysis for TEST_COST1
***************************************
SINGLE TABLE ACCESS PATH
Single Table Cardinality Estimation for TEST_COST1[TEST_COST1]
SPD: Return code in qosdDSDirSetup: NOCTX, estType = TABLE
Table: TEST_COST1  Alias: TEST_COST1
Card: Original: 820814.000000  Rounded: 820814  Computed: 820814.000000  Non Adjusted: 820814.000000
Scan IO  Cost (Disk) =   1203.000000
Scan CPU Cost (Disk) =   86691839.280000
Scan IO  Cost (IMC)  =   78.040000 (read invalid rows)
+ 2.500000 (read extent map)
=   80.540000
kdmoCostPredEvalPerRow(): CostPredEvalPerRow: 0
Scan CPU Cost (IMC)  =   45000.000000 (overhead)
+ 50000.000000 (traverse IMCU)
+ 0.000000 (min-max eval) (= 200.000000 (min-max eval per cu) * 0 (#min-max CUs))
+ 0.000000 (decompression) (= 0.000000 (per IMCU) * 5 (#IMCUs) * 1.000000 (prune ratio))
+ 0.000000 (pred eval) (= 0.000000 (per row) * 523293 (#IMCrows) * 1.000000 (prune ratio))
+ 0.000000 (row stitch) (= 100.000000 (per col) * 0 (#cols) * 523293 (#IMCUs) * 1.000000 (prune ratio))
+ 47096390.902704 (scan journal) (= 3600.000000 (per row) * 13082 (#journal rows))
=   47191390.902704
Total Scan IO  Cost  =   1203.000000 (scan (Disk))
+ 80.540000 (scan (IMC))
=   1283.540000
Total Scan CPU  Cost =   86691839.280000 (scan (Disk))
+ 47191390.902704 (scan (IMC))
=   133883230.182704
Access Path: TableScan
Cost:  1289.062459  Resp: 1289.062459  Degree: 0
Cost_io: 1283.540000  Cost_cpu: 133883230
Resp_io: 1283.540000  Resp_cpu: 133883230
Best:: AccessPath: TableScan
Cost: 1289.062459  Degree: 1  Resp: 1289.062459  Card: 820814.000000  Bytes: 0.000000

 

The optimizer combined the cost of accessing some rows from the rows store ( Scan IO  Cost (Disk)  and Scan CPU Cost (Disk)) with the cost of accessing the column store (Scan IO  Cost (IMC) and Scan CPU Cost (IMC)).The final cost of the plan is 1289.

Not yet populated table (Out of memory) :

Let’s now check the case of no populated table :

select count(*) from test_cost2;

kdmoInitSegStats(): objn: 45819
kdmoInitSegStatsInt(): IMC objn: 45819 loopInit: 1
kdmoInitSegStatsInt(): DISTRIBUTE mode ON
kdmoDumpSegStats(): IM Quotient: 1.000000  IMCUs: 1  IM Rows: 21178  IM Journal Rows: 529

What ? although the table is no populated in the in-memory store the IM Quotient indicate that it’s fully populated.Let’s check the cost calculation :

BASE STATISTICAL INFORMATION
***********************
Table Stats::
  Table: TEST_COST2  Alias: TEST_COST2
  #Rows: 21178  SSZ: 0  LGR: 0  #Blks:  326  AvgRowLen:  100.00  NEB: 0  ChainCnt:  0.00  SPC: 0  RFL: 0  RNF: 0  CBK:
  #IMCUs: 1  IMCRowCnt: 21178  IMCJournalRowCnt: 529  #IMCBlocks: 0  IMCQuotient: 1.000000

#IMCBlocks is 0  . so the correct value for IMCQuotient is 0 / 326 = 0  (It seem buggy)

Access path analysis for TEST_COST2
***************************************
SINGLE TABLE ACCESS PATH
  Single Table Cardinality Estimation for TEST_COST2[TEST_COST2]
  SPD: Return code in qosdDSDirSetup: NOCTX, estType = TABLE
  Table: TEST_COST2  Alias: TEST_COST2
    Card: Original: 21178.000000  Rounded: 21178  Computed: 21178.000000  Non Adjusted: 21178.000000
  Scan IO  Cost (Disk) =   0.000000
  Scan CPU Cost (Disk) =   0.000000
  Scan IO  Cost (IMC)  =   3.260000 (read invalid rows)
                         + 0.500000 (read extent map)
                       =   3.760000
kdmoCostPredEvalPerRow(): CostPredEvalPerRow: 0
  Scan CPU Cost (IMC)  =   45000.000000 (overhead)
                         + 10000.000000 (traverse IMCU)
                         + 0.000000 (min-max eval) (= 200.000000 (min-max eval per cu) * 0 (#min-max CUs))
                         + 0.000000 (decompression) (= 0.000000 (per IMCU) * 1 (#IMCUs) * 1.000000 (prune ratio))
                         + 0.000000 (pred eval) (= 0.000000 (per row) * 21178 (#IMCrows) * 1.000000 (prune ratio))
                         + 0.000000 (row stitch) (= 100.000000 (per col) * 0 (#cols) * 21178 (#IMCUs) * 1.000000 (prune ratio))
                         + 1906020.000000 (scan journal) (= 3600.000000 (per row) * 529 (#journal rows))
                       =   1961020.000000
  Total Scan IO  Cost  =   0.000000 (scan (Disk))
                         + 3.760000 (scan (IMC))
                       =   3.760000
  Total Scan CPU  Cost =   0.000000 (scan (Disk))
                         + 1961020.000000 (scan (IMC))
                       =   1961020.000000
  Access Path: TableScan
    Cost:  3.840889  Resp: 3.840889  Degree: 0
      Cost_io: 3.760000  Cost_cpu: 1961020
      Resp_io: 3.760000  Resp_cpu: 1961020
  Best:: AccessPath: TableScan
         Cost: 3.840889  Degree: 1  Resp: 3.840889  Card: 21178.000000  Bytes: 0.000000

Scan IO  Cost (Disk) and Scan CPU Cost (Disk)  are equal to 0 the optimizer suppose that the table will be read fully from in-memory store !!!

Observation :

It seem that the optimizer will use a value of 1 for IMCQuotient (Fully populated) whenever the table is not populated at all (#IMCBlocks= 0 ), it can be in the first execution when no priority is set (Like the example i demonstrated in this blog post ) or like this case when in-memory store is full.This seems very buggy !!

Second Anomaly (Number of in-memory transaction journal rows) :

You may have noticed that although there is no modification on the previous tables the optimizer indicate that there is some invalid rows indicated by “IMCJournalRowCnt” which is equal  aprox to 0.025 * IMCRowCnt .This statistics will impact the cost calculation.

Let’s populate test_cost2 in-memory and update some rows then check the cost calculation :

alter system set inmemory_max_populate_servers= 0;

update test_cost2 set object_name =’t’  where rownum < 10000;

commit;

SQL>  select OBJD,sum(TOTAL_ROWS),sum(INVALID_ROWS),sum(INVALID_BLOCKS),(sum(INVALID_ROWS)/sum(TOTAL_ROWS))*100 pct_invalid from V$IM_SMU_HEAD group by OBJD order by 5 desc;

       OBJD SUM(TOTAL_ROWS) SUM(INVALID_ROWS) SUM(INVALID_BLOCKS) PCT_INVALID
———- ————— —————– ——————- ———–
     45819           21178              9999                 143  47.2140901

Before update :

kdmoInitSegStats(): objn: 45819
kdmoInitSegStatsInt(): IMC objn: 45819 loopInit: 1
kdmoInitSegStatsInt(): DISTRIBUTE mode ON
kdmoEstStatsRacSeg(): nNodes: 1  nRowsCurr: 0
kdmoDumpSegStats(): IM Quotient: 0.957055  IMCUs: 1  IM Rows: 20269  IM Journal Rows: 507

BASE STATISTICAL INFORMATION
***********************
Table Stats::
  Table: TEST_COST2  Alias: TEST_COST2
  #Rows: 21178  SSZ: 0  LGR: 0  #Blks:  326  AvgRowLen:  86.00  NEB: 0  ChainCnt:  0.00  SPC: 0  RFL: 0  RNF: 0  CBK: 0 
  #IMCUs: 1  IMCRowCnt: 20269  IMCJournalRowCnt: 507  #IMCBlocks: 312  IMCQuotient: 0.957055

After update :

kdmoInitSegStats(): objn: 45819
kdmoInitSegStatsInt(): IMC objn: 45819 loopInit: 1
kdmoInitSegStatsInt(): DISTRIBUTE mode ON
kdmoEstStatsRacSeg(): nNodes: 1  nRowsCurr: 0
kdmoDumpSegStats(): IM Quotient: 0.957055  IMCUs: 1  IM Rows: 20269  IM Journal Rows: 507

BASE STATISTICAL INFORMATION
***********************
Table Stats::
  Table: TEST_COST2  Alias: TEST_COST2
  #Rows: 21178  SSZ: 0  LGR: 0  #Blks:  326  AvgRowLen:  100.00  NEB: 0  ChainCnt:  0.00  SPC: 0  RFL: 0  RNF: 0  CBK:
  #IMCUs: 1  IMCRowCnt: 20269  IMCJournalRowCnt: 507  #IMCBlocks: 312  IMCQuotient: 0.957055

Observation :

It seems that although we have updated many rows the IMCJournalRowCnt that will influence the final cost is still equal to 0.025 * IMCRowCnt.The final cost seem independent of the number of rows invalidated and so the optimizer can not estimate the real work that have to be done.

That’s it 😀

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s