Huge Pages Speeds Startup of Web Servers

We had issues deploying changes to a large web farm a few weeks back. We had automated the startup of the 25 web servers using a DevOps tool but when it tried to bring them all up at once it took forever and the web servers got timeout errors. We resolved this by putting in huge pages on the metadata database.

I mentioned the unexpectedly large impact of huge pages on login time in a previous post. But, we had not realized that a small metadata database with a 5 gigabyte SGA would need huge pages. I guess I should mention that this is 11.2.0.3 Oracle on Linux. The web farm has 25 servers with 25 database connections each. I think the number of connections from each server should be 5 or less but I have not convinced others to reduce the number. If you have a metadata database with 4 cores and 625 connections from 25 web servers, the web servers cannot use all of those connections. A 4-core database server can only process 10 or 20 SQL statements at once if they are I/O centric. If they are all CPU you are looking at 4 concurrent active SQL statements. If you kick off 625 SQL statements at once the load will go through the roof and everything will timeout. I thought about putting in shared servers on the metadata database to force the 625 sessions to funnel through 12 shared servers so the database server would not be overwhelmed should all 625 connections choose to run at once. But the only problem we had was with logins. All 625 were logging in at the same time when the web farm was started by our DevOps tool. Huge pages resolved this issue by making the login time 10-20 times faster.

The database was in a 16 gigabyte Linux VM with 4 cores and a 5 gigabyte SGA. Without huge pages each login took about 2.5 seconds. With it the logins took about .15 seconds. Without huge pages the load on the database server went over 100 when all 625 sessions started at once. With huge pages the load never exceeded 1. Without huge pages the web farm never came up cleanly and with it the farm came up quickly without error. So, if you have a Linux VM with several gigabytes in your SGA you might want to use huge pages.

Bobby

P.S. This may be an 11.2 bug. Ever since I noticed this behavior the dramatic time difference between with and without huge pages seemed too large to be caused by having to spin up a larger page table with 4K pages. 2.5 seconds of CPU is an eternity just to set up a page table for 5 gigabytes worth of virtual memory. I timed login on an 12.1.0.2 and 18c database with 5 gig SGA and it was around .15 seconds without huge pages. I verified that the problem exists on 11.2.0.4 as well as 11.2.0.3. I think our 11.2 systems are still on RedHat Linux 6 so it may be a combination of database and OS version.

P.P.S This is why blogging about Oracle is such a pain sometimes. It is hard to get things right without extensive testing. But then again, the stuff I said was all true. Switching to huge pages did resolve our issues. I guess as always, your mileage may vary.

P.P.P.S. See the comments below about the pre_page_sga=TRUE. The people who commented are correct. As far back as Oracle 9.2 databases with
pre_page_sga=TRUE and large SGAs can result in logins taking a couple of seconds. See this Oracle bug:

Bug 5072402 : SLOW CONNECT TIME WITH SQLPLUS TO INSTNACE WITH 22GIG SGA AND PRE_PAGE_SGA

According the Oracle, Bug 5072402 is not a bug, but intended behavior. Oracle changed this behavior in Oracle 12 according to this document:

PRE_PAGE_SGA Behaviour Change in Oracle Release 12c (Doc ID 1987975.1)

My own testing proves out the idea that pre_page_sga=TRUE causes slowness in logins 11.2 and does not cause it in 12.1.

My laptop 11.2.0.3

Slow with pre_page_sga=TRUE

real 0m1.039s

pre_page_sga boolean TRUE
sga_max_size big integer 3G
sga_target big integer 3G

Fast with pre_page_sga=FALSE

real 0m0.218s

pre_page_sga boolean FALSE
sga_max_size big integer 3G
sga_target big integer 3G

My laptop 12.1.0.2

real 0m0.084s

pre_page_sga boolean TRUE
sga_max_size big integer 3G
sga_target big integer 3G

No HugePages

PAGESIZE AVAILABLE_PAGES EXPECTED_PAGES ALLOCATED_PAGES
4K Configured 4 786436
2048K 0 1537 0

PPPPS:

I have a production HP-UX Itanium server with a 46 gig SGA and pre_page_sga=TRUE. It takes about 1 second to login. I think it is the same effect, but on Linux the 4K page size may exaggerate the effect of pre_page_sga=TRUE whereas HP-UX may handle large SGA’s better. Anyway, I learned something new. Thank you for those who participated in the comments.

I spun up a Delphix Oracle 11.2.0.3 HP-UX Itanium VDB with 32 gig SGA and pre_page_sga=TRUE and the login time was instantaneous. Odd. I guess your mileage may vary still applies.

About Bobby

I live in Chandler, Arizona with my wife and three daughters. I work for US Foods, the second largest food distribution company in the United States. I have worked in the Information Technology field since 1989. I have a passion for Oracle database performance tuning because I enjoy challenging technical problems that require an understanding of computer science. I enjoy communicating with people about my work.
This entry was posted in Uncategorized. Bookmark the permalink.

11 Responses to Huge Pages Speeds Startup of Web Servers

  1. zhwsh says:

    I think you set pre_page_sga=true.

    • Bobby says:

      The databases do have pre_page_sga set to True. Do you think the login time would be faster if pre_page_sga was set to False?

      Bobby

      • Anonymous says:

        From the docs:

        PRE_PAGE_SGA can increase the process startup duration, because every process that starts must access every page in the SGA. The cost of this strategy is fixed; however, you might simply determine that 20,000 pages must be touched every time a process starts. This approach can be useful with some applications, but not with all applications. Overhead can be significant if your system frequently creates and destroys processes by, for example, continually logging on and logging off.

        https://docs.oracle.com/cd/E11882_01/server.112/e40402/initparams201.htm#REFRN10174

        You have alleviated this side effect by using huge pages, because smaller number of pages must be touched on process startup.

        • Bobby says:

          I think pre-paging the SGA makes sense. There might be a tiny amount of overhead from having all the pages in memory and having a larger page table. But, it cannot take 2.5 seconds to setup the page table for 5 gigabytes of memory. It does not take that long in 12.1 or 18c. I am pretty sure the docs are referring to a much smaller effect when they talk about increasing the process startup duration. My guess is that there is some bug in 11.2.

          Bobby

          • Bobby says:

            I guess Oracle does not consider this behavior to be a bug. It has been true way back to Oracle 9. I will update the post.

            Bobby

  2. zhwsh says:

    if you set pre_page_sga=true, execute

    $ top -u oracle

    Check that the RES column is very close to the SGA settings, which has an impact.

    If not, there will be no problem.

    • Bobby says:

      Thanks. I will check it out when I get in the office. I wonder why we would want pre_page_sga=TRUE? It seems like more trouble than it is worth at least on 11.2 and earlier dbs.

      Bobby

    • Bobby says:

      I looked around and even on the databases with slow logins RES seemed fine. I don’t really understand why pre_page_sga is causing the login slowness or what it was originally intended to do. Why would you mess with But, in the future I can just turn it off if I need to speed up logins on 11.2 or earlier DBs. One interesting thing about RES in top is that it does not seem to include HugePages.

      Bobby

  3. zhwsh says:

    My friend managed the slow start-up of the database, and I found this problem. When pre_page_sga=true, started database
    and running top-u oracle, found that RES was increasing, and it took 4 minutes to startup( sga=50G).But if you enable
    hugepages + pre_page_sga=true, you can’t see such a phenomenon.

    I think pre_page_sga=true and not using hugepages,Res size equals sga ,It means that each session requires a number of
    pages (size equal to SGA) that need to be touched,This is also the reason why the login is slow, and you can also find
    that cpu is queued(see vmstat output procs column r and b ).

    oddly, if you use 12c, pre_page_sga = true (default setting)) and not using hugepages, You can’t see a lot of RES.

    btw Set pre_page_sga=true by default at 12C. I think PDB is the general trend. If you start database
    (pre_page_sga=false) with very little memory, you may not be able to get memory when you start PDB later.

    However, it is better to use hugeapges + pre_page_sga=true, I think.

    • Bobby says:

      Thanks for all of your comments. I appreciate you and the other person who commented about the pre_page_sga parameter. I learned something new! 🙂

      Bobby

  4. zhwsh says:

    My tests are as follows:

    $ cd /u01/app/oracle/product/11.2.0.4/dbhome_1/dbs
    $ cat initxxxx.ora
    db_name=xxxx
    instance_name=xxxx
    sga_target=20G
    #sga_target=0
    sga_max_size=20G
    pre_page_sga=true

    $ export ORACLE_SID=xxxx

    SYS@xxxx> startup nomount
    ORACLE instance started.
    Total System Global Area 2.1379E+10 bytes
    Fixed Size 2262656 bytes
    Variable Size 2818574720 bytes
    Database Buffers 1.8522E+10 bytes
    Redo Buffers 36073472 bytes

    $ grep -i page /proc/meminfo
    AnonPages: 239336 kB
    PageTables: 701008 kB
    AnonHugePages: 0 kB
    HugePages_Total: 104
    HugePages_Free: 48
    HugePages_Rsvd: 41
    HugePages_Surp: 0
    Hugepagesize: 2048 kB
    –//PageTables=701008 kB

    $ time sqlplus -s -l / as sysdba <<<quit
    real 0m4.944s
    user 0m0.024s
    sys 0m0.016s
    –//5 seconds.

    $ cat aa.sql
    quit

    $ time seq 50 | xargs -I{} -P 50 sqlplus -s -l / as sysdba @ aa.sql
    real 0m13.978s
    user 0m1.833s
    sys 0m1.017s
    –//It takes 14 seconds to complete all of it.

    $ strace -fTr -o /tmp/a2.txt sqlplus -s -l / as sysdba <<<quit
    $ awk '{print $2}' /tmp/a2.txt | sort -nr | head
    5.041521
    0.287119
    0.064745
    0.008298
    0.006386
    0.004288
    0.004272
    0.001319
    0.000982
    0.000978

    –//cat /tmp/a2.txt
    3366 0.000067 stat("/u01/app/oracle/product/11.2.0.4/dbhome_1/lib", {st_mode=S_IFDIR|0755, st_size=12288, …}) = 0
    3366 0.000112 chdir(“/u01/app/oracle/product/11.2.0.4/dbhome_1/dbs”) = 0
    3366 5.041521 mmap(NULL, 143360, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4c2c0b3000
    3366 0.000407 mmap(NULL, 143360, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4c2c090000
    3366 0.000127 mmap(NULL, 143360, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4c2c06d000
    3366 0.000218 getcwd(“/u01/app/oracle/product/11.2.0.4/dbhome_1/dbs”…, 256) = 46
    3366 0.000087 access(“/etc/intchg.ora”, F_OK) = -1 ENOENT (No such file or directory)

    –//Which means there are 5 seconds between the chdir call and the mmap call? (chdir only takes 0.000027 seconds). strace doesn’t see it!

    $ cat aa.sql
    host sleep 1
    quit

    $ cat a.sh
    #! /bin/bash
    export ORACLE_SID=xxxx
    sqlplus -s -l / as sysdba @ aa.sql &
    a=$(($!+2))
    top -p $a -d 1 -b

    $ . a.sh | tee /tmp/aa.txt

    $ grep PID /tmp/aa.txt | head -1; egrep ‘oracle’ /tmp/aa.txt
    PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
    8498 oracle 20 0 20.2g 1.5g 1.5g R 93.9 1.2 0:00.47 oracle
    8498 oracle 20 0 20.2g 5.8g 5.8g R 99.7 4.6 0:01.47 oracle
    8498 oracle 20 0 20.2g 10g 10g R 99.8 8.4 0:02.47 oracle
    8498 oracle 20 0 20.2g 15g 15g R 99.8 12.2 0:03.47 oracle
    –//sleep 1
    8498 oracle 20 0 20.2g 19g 19g S 95.9 15.7 0:04.43 oracle
    8498 oracle 20 0 20.2g 19g 19g S 0.0 15.7 0:04.43 oracle
    8498 oracle 20 0 20.0g 19g 19g R 1.0 15.6 0:04.44 oracle

    You can see that RES,SHR is increasing, with CPU usage at 9x%, and it feels like the 11g version is having a problem
    setting up pre_page_sga=true. The approximate positioning time is located in the process of establishing pagetables.

    if using hugepages+pre_page_sga=true.

    $ grep -i page /proc/meminfo
    AnonPages: 244144 kB
    PageTables: 11508 kB
    AnonHugePages: 0 kB
    HugePages_Total: 26000
    HugePages_Free: 15800
    HugePages_Rsvd: 41
    HugePages_Surp: 0
    Hugepagesize: 2048 kB

    $ time sqlplus -s -l / as sysdba <<<quit
    real 0m0.180s
    user 0m0.029s
    sys 0m0.011s

    $ strace -fTr -o /tmp/a3.txt sqlplus -s -l / as sysdba <<<quit
    $ awk '{print $2}' /tmp/a3.txt | sort -nr | head
    0.121929
    0.009182
    0.009144
    0.009047
    0.005081
    0.002050
    0.001984
    0.001932
    0.001321
    0.001243

    –//cat a3.txt
    8716 0.000095 chdir("/u01/app/oracle/product/11.2.0.4/dbhome_1/dbs") = 0
    –//The waiting time here is very short.
    8716 0.121929 mmap(NULL, 143360, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f552659d000
    8716 0.000143 mmap(NULL, 143360, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f552657a000
    8716 0.000166 mmap(NULL, 143360, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5526557000
    8716 0.000260 getcwd(“/u01/app/oracle/product/11.2.0.4/dbhome_1/dbs”…, 256) = 46

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.