Mapping ORACLE SGA components to numa nodes using NUMA API

How every SGA component (Buffer cache,Shared pool,Large Pool,etc) is distributed across NUMA nodes after initial startup ?  And what it will look like after dynamically shrinking or growing of memory area ?  In this post i will show a way to display memory components distribution across the different NUMA  nodes using the NUMA API.

First of all , we need a NUMA server ! Sadly i don’t have one in my possession now, but no problem we can fake one !  Yves Colin already described how in this blog post , you can find other interesting info here also. But Fake NUMA has one flaw however :

“Fake NUMA has one flaw however and that is the CPU mapping to nodes. There would exist nodes that do not show up as having any CPUs (unde the cpumap file in the node dir of the above mentioned sysfs file). As per the semantics, a CPU must unquely belong to a NUMA node. However, inside the kernel, the CPU is mapped to all the fake nodes.”

TEST SERVER :

  • OEL 6 UEK R4
  • ORACLE 12.1.0.2.6
  • HugePages_Total:    4001
  • Hugepagesize:       2048 kB
  • *.pga_aggregate_target=400M
  • *.sga_target=8000M

Step 1 : building the numa server

This is my fake NUMA server with 4 Nodes :

Capture 1

Capture 0

As already indicated,  we can see that nodes 1,2 and 3 do not show up as having any CPUs.But this not very important for the purpose of this post as we are interested mainly  in writing a mini script for displaying memory distribution across NUMA nodes.

Step 2 : Mapping shared memory segment pages to NUMA nodes using NUMA API.

Begin by downloading the needed development package for building Applications that use numa

“The libnuma library offers a simple programming interface to the NUMA (Non Uniform Memory Access) policy supported by the Linux kernel” ref

Using yum download the package

Capture 3

For our program we will use the function “get_mempolicy” to retrieve the NODE where the specified PAGE address reside for that we need first to attach to the shared memory segment using “shmat”.

This C program will list All the pages mapped to a specific shared memory segment with there corresponding NUMA nodes.It takes as parameter the shared memory identifier and the number of pages (Calculated as shared memory segment size divided by page size)

#include <errno.h>
#include <stdio.h>
#include <numaif.h>
#include <sys/shm.h>


int main(int argc, char **argv )
{
        if(numa_available() < 0){
                printf("System does not support NUMA API!\n");
        }


        void *addr;
        int n = numa_max_node();
        //printf("There are %d nodes on your system\n", n + 1);


        int nb_page=atoi(argv[2]);
        int shmid=atoi(argv[1]);
        addr =shmat(shmid,(void *) 0x00000061000000 ,SHM_RDONLY);

        if(addr == (void *)-1) {
           perror("shmop: shmat failed");
        }

        //SGA BASE address
        void * mem =  (void *) 0x00000061000000;

        int numa_node = -1;
        void * pa;


        //align to page boundary
        unsigned long a;
        a  = (unsigned long) mem;
        a  = a - (a % ((unsigned long) 2097152)); //Huge page size 2M
        pa = (void *) a;


        int ret = 0;
        int i ;

        for (i=0;i <  nb_page;i = i + 1) {

        ret = get_mempolicy(&numa_node, NULL, 0, pa  , MPOL_F_NODE | MPOL_F_ADDR);
        if(ret  ==   0)
        {
          printf("%p|%d\n",(void *) pa ,numa_node);
        }else {
          printf("Error %d %p \n", errno,(void *) pa);
          break;
        }

        a = a + 2097152;
        pa = (void *) a;

        }

        return 0;
}

NOTE:The shared memory base address and page size are not passed as parameter so we need to change them in the code.

Let’s compile the program an test it :

gcc  -lnuma  sga_page_to_node.c  -o sga_page_to_node

Check for shared memory info :

Capture 4

Capture 5

  • Shared memory Base address : 0x00000061000000
  • Shared memory id : 1225097220
  • Shared Memory segment size : 7680MO divided by Huge Page size 2MO give us : 3992 pages

Run the program :

./sga_page_to_node 1225097220 3992 > page_to_node.txt

Page address|numa node

Capture 6

We can clearly see how the huge memory pages are interleaved across the numa nodes.

Step 3 : Mapping SGA to Granules

We will use the following two views for mapping SGA component to granules  X$KSMGE and x$kmgsct (The granule size is 16MO)

Capture 6

Step 4 : Mapping SGA components to NUMA nodes.

Create an external table for reading the generated file in step 2 and map all together :

External table definition :


CREATE OR REPLACE DIRECTORY ext AS '/home/oracle/scripts';

CREATE TABLE ext_tab (
page_address  CHAR(18),
numa_node  CHAR(2)
)
ORGANIZATION EXTERNAL (
  TYPE oracle_loader
  DEFAULT DIRECTORY ext
    ACCESS PARAMETERS (
    RECORDS DELIMITED BY NEWLINE
    BADFILE 'bad_%a_%p.bad'
    LOGFILE 'log_%a_%p.log'
    FIELDS TERMINATED BY '|'
    MISSING FIELD VALUES ARE NULL
    REJECT ROWS WITH ALL NULL FIELDS
    (page_address, numa_node))
    LOCATION ('page_to_node.txt')
  )
PARALLEL
REJECT LIMIT 0
NOMONITORING;

Every Granule is composed of 8 huge pages interleaved between NUMA nodes :


WITH sga_granules
     AS (SELECT t.COMPONENT,
                g.BASEADDR,
                TO_NUMBER (g.BASEADDR, 'XXXXXXXXXXXXXXXX') AS begin,
                TO_NUMBER (g.BASEADDR, 'XXXXXXXXXXXXXXXX') + g.GRANSIZE - 1
                   AS end
           FROM X$KSMGE g, x$kmgsct t
          WHERE t.GRANTYPE = g.GRANTYPE),
     page_to_node
     AS (SELECT PAGE_ADDRESS,
                TO_NUMBER (SUBSTR (PAGE_ADDRESS, 3), 'XXXXXXXXXXXXXXXX')
                   AS page_address_dec,
                NUMA_NODE
           FROM ext_tab),
     granule_to_node
     AS (SELECT *
           FROM sga_granules s, page_to_node p
          WHERE page_address_dec BETWEEN begin AND end)
  SELECT * 
    FROM granule_to_node;

 

Capture 8

Map SGA components to numa nodes :


WITH sga_granules
     AS (SELECT t.COMPONENT,
                g.BASEADDR,
                TO_NUMBER (g.BASEADDR, 'XXXXXXXXXXXXXXXX') AS begin,
                TO_NUMBER (g.BASEADDR, 'XXXXXXXXXXXXXXXX') + g.GRANSIZE - 1
                   AS end
           FROM X$KSMGE g, x$kmgsct t
          WHERE t.GRANTYPE = g.GRANTYPE),
     page_to_node
     AS (SELECT PAGE_ADDRESS,
                TO_NUMBER (SUBSTR (PAGE_ADDRESS, 3), 'XXXXXXXXXXXXXXXX')
                   AS page_address_dec,
                NUMA_NODE
           FROM ext_tab),
     granule_to_node
     AS (SELECT *
           FROM sga_granules s, page_to_node p
          WHERE page_address_dec BETWEEN begin AND end)
  SELECT COMPONENT, NUMA_NODE, COUNT (*) AS pages_count
    FROM granule_to_node
GROUP BY COMPONENT, NUMA_NODE
ORDER BY COMPONENT, NUMA_NODE;

Capture 7

Capture 9

The Memory is evenly distributed across NUMA nodes.Let’s increase the shared pool size and recheck :

alter system set shared_pool_size=3G;

Capture 11

Great the memory is still evenly distributed across numa nodes !

You can try different scenario and use this script to check for you memory distribution.Have fun !

That’s it 😀

 

One thought on “Mapping ORACLE SGA components to numa nodes using NUMA API

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s