We had issues deploying changes to a large web farm a few weeks back. We had automated the startup of the 25 web servers using a DevOps tool but when it tried to bring them all up at once it took forever and the web servers got timeout errors. We resolved this by putting in huge pages on the metadata database.
I mentioned the unexpectedly large impact of huge pages on login time in a previous post. But, we had not realized that a small metadata database with a 5 gigabyte SGA would need huge pages. I guess I should mention that this is 11.2.0.3 Oracle on Linux. The web farm has 25 servers with 25 database connections each. I think the number of connections from each server should be 5 or less but I have not convinced others to reduce the number. If you have a metadata database with 4 cores and 625 connections from 25 web servers, the web servers cannot use all of those connections. A 4-core database server can only process 10 or 20 SQL statements at once if they are I/O centric. If they are all CPU you are looking at 4 concurrent active SQL statements. If you kick off 625 SQL statements at once the load will go through the roof and everything will timeout. I thought about putting in shared servers on the metadata database to force the 625 sessions to funnel through 12 shared servers so the database server would not be overwhelmed should all 625 connections choose to run at once. But the only problem we had was with logins. All 625 were logging in at the same time when the web farm was started by our DevOps tool. Huge pages resolved this issue by making the login time 10-20 times faster.
The database was in a 16 gigabyte Linux VM with 4 cores and a 5 gigabyte SGA. Without huge pages each login took about 2.5 seconds. With it the logins took about .15 seconds. Without huge pages the load on the database server went over 100 when all 625 sessions started at once. With huge pages the load never exceeded 1. Without huge pages the web farm never came up cleanly and with it the farm came up quickly without error. So, if you have a Linux VM with several gigabytes in your SGA you might want to use huge pages.
Bobby
P.S. This may be an 11.2 bug. Ever since I noticed this behavior the dramatic time difference between with and without huge pages seemed too large to be caused by having to spin up a larger page table with 4K pages. 2.5 seconds of CPU is an eternity just to set up a page table for 5 gigabytes worth of virtual memory. I timed login on an 12.1.0.2 and 18c database with 5 gig SGA and it was around .15 seconds without huge pages. I verified that the problem exists on 11.2.0.4 as well as 11.2.0.3. I think our 11.2 systems are still on RedHat Linux 6 so it may be a combination of database and OS version.
P.P.S This is why blogging about Oracle is such a pain sometimes. It is hard to get things right without extensive testing. But then again, the stuff I said was all true. Switching to huge pages did resolve our issues. I guess as always, your mileage may vary.
P.P.P.S. See the comments below about the pre_page_sga=TRUE. The people who commented are correct. As far back as Oracle 9.2 databases with
pre_page_sga=TRUE and large SGAs can result in logins taking a couple of seconds. See this Oracle bug:
Bug 5072402 : SLOW CONNECT TIME WITH SQLPLUS TO INSTNACE WITH 22GIG SGA AND PRE_PAGE_SGA
According the Oracle, Bug 5072402 is not a bug, but intended behavior. Oracle changed this behavior in Oracle 12 according to this document:
PRE_PAGE_SGA Behaviour Change in Oracle Release 12c (Doc ID 1987975.1)
My own testing proves out the idea that pre_page_sga=TRUE causes slowness in logins 11.2 and does not cause it in 12.1.
My laptop 11.2.0.3
Slow with pre_page_sga=TRUE
real 0m1.039s
pre_page_sga boolean TRUE
sga_max_size big integer 3G
sga_target big integer 3G
Fast with pre_page_sga=FALSE
real 0m0.218s
pre_page_sga boolean FALSE
sga_max_size big integer 3G
sga_target big integer 3G
My laptop 12.1.0.2
real 0m0.084s
pre_page_sga boolean TRUE
sga_max_size big integer 3G
sga_target big integer 3G
No HugePages
PAGESIZE AVAILABLE_PAGES EXPECTED_PAGES ALLOCATED_PAGES
4K Configured 4 786436
2048K 0 1537 0
PPPPS:
I have a production HP-UX Itanium server with a 46 gig SGA and pre_page_sga=TRUE. It takes about 1 second to login. I think it is the same effect, but on Linux the 4K page size may exaggerate the effect of pre_page_sga=TRUE whereas HP-UX may handle large SGA’s better. Anyway, I learned something new. Thank you for those who participated in the comments.
I spun up a Delphix Oracle 11.2.0.3 HP-UX Itanium VDB with 32 gig SGA and pre_page_sga=TRUE and the login time was instantaneous. Odd. I guess your mileage may vary still applies.
I think you set pre_page_sga=true.
The databases do have pre_page_sga set to True. Do you think the login time would be faster if pre_page_sga was set to False?
Bobby
From the docs:
PRE_PAGE_SGA can increase the process startup duration, because every process that starts must access every page in the SGA. The cost of this strategy is fixed; however, you might simply determine that 20,000 pages must be touched every time a process starts. This approach can be useful with some applications, but not with all applications. Overhead can be significant if your system frequently creates and destroys processes by, for example, continually logging on and logging off.
https://docs.oracle.com/cd/E11882_01/server.112/e40402/initparams201.htm#REFRN10174
You have alleviated this side effect by using huge pages, because smaller number of pages must be touched on process startup.
I think pre-paging the SGA makes sense. There might be a tiny amount of overhead from having all the pages in memory and having a larger page table. But, it cannot take 2.5 seconds to setup the page table for 5 gigabytes of memory. It does not take that long in 12.1 or 18c. I am pretty sure the docs are referring to a much smaller effect when they talk about increasing the process startup duration. My guess is that there is some bug in 11.2.
Bobby
I guess Oracle does not consider this behavior to be a bug. It has been true way back to Oracle 9. I will update the post.
Bobby
if you set pre_page_sga=true, execute
$ top -u oracle
Check that the RES column is very close to the SGA settings, which has an impact.
If not, there will be no problem.
Thanks. I will check it out when I get in the office. I wonder why we would want pre_page_sga=TRUE? It seems like more trouble than it is worth at least on 11.2 and earlier dbs.
Bobby
I looked around and even on the databases with slow logins RES seemed fine. I don’t really understand why pre_page_sga is causing the login slowness or what it was originally intended to do. Why would you mess with But, in the future I can just turn it off if I need to speed up logins on 11.2 or earlier DBs. One interesting thing about RES in top is that it does not seem to include HugePages.
Bobby
My friend managed the slow start-up of the database, and I found this problem. When pre_page_sga=true, started database
and running top-u oracle, found that RES was increasing, and it took 4 minutes to startup( sga=50G).But if you enable
hugepages + pre_page_sga=true, you can’t see such a phenomenon.
I think pre_page_sga=true and not using hugepages,Res size equals sga ,It means that each session requires a number of
pages (size equal to SGA) that need to be touched,This is also the reason why the login is slow, and you can also find
that cpu is queued(see vmstat output procs column r and b ).
oddly, if you use 12c, pre_page_sga = true (default setting)) and not using hugepages, You can’t see a lot of RES.
btw Set pre_page_sga=true by default at 12C. I think PDB is the general trend. If you start database
(pre_page_sga=false) with very little memory, you may not be able to get memory when you start PDB later.
However, it is better to use hugeapges + pre_page_sga=true, I think.
Thanks for all of your comments. I appreciate you and the other person who commented about the pre_page_sga parameter. I learned something new! 🙂
Bobby
My tests are as follows:
$ cd /u01/app/oracle/product/11.2.0.4/dbhome_1/dbs
$ cat initxxxx.ora
db_name=xxxx
instance_name=xxxx
sga_target=20G
#sga_target=0
sga_max_size=20G
pre_page_sga=true
$ export ORACLE_SID=xxxx
SYS@xxxx> startup nomount
ORACLE instance started.
Total System Global Area 2.1379E+10 bytes
Fixed Size 2262656 bytes
Variable Size 2818574720 bytes
Database Buffers 1.8522E+10 bytes
Redo Buffers 36073472 bytes
$ grep -i page /proc/meminfo
AnonPages: 239336 kB
PageTables: 701008 kB
AnonHugePages: 0 kB
HugePages_Total: 104
HugePages_Free: 48
HugePages_Rsvd: 41
HugePages_Surp: 0
Hugepagesize: 2048 kB
–//PageTables=701008 kB
$ time sqlplus -s -l / as sysdba <<<quit
real 0m4.944s
user 0m0.024s
sys 0m0.016s
–//5 seconds.
$ cat aa.sql
quit
$ time seq 50 | xargs -I{} -P 50 sqlplus -s -l / as sysdba @ aa.sql
real 0m13.978s
user 0m1.833s
sys 0m1.017s
–//It takes 14 seconds to complete all of it.
$ strace -fTr -o /tmp/a2.txt sqlplus -s -l / as sysdba <<<quit
$ awk '{print $2}' /tmp/a2.txt | sort -nr | head
5.041521
0.287119
0.064745
0.008298
0.006386
0.004288
0.004272
0.001319
0.000982
0.000978
–//cat /tmp/a2.txt
3366 0.000067 stat("/u01/app/oracle/product/11.2.0.4/dbhome_1/lib", {st_mode=S_IFDIR|0755, st_size=12288, …}) = 0
3366 0.000112 chdir(“/u01/app/oracle/product/11.2.0.4/dbhome_1/dbs”) = 0
3366 5.041521 mmap(NULL, 143360, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4c2c0b3000
3366 0.000407 mmap(NULL, 143360, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4c2c090000
3366 0.000127 mmap(NULL, 143360, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4c2c06d000
3366 0.000218 getcwd(“/u01/app/oracle/product/11.2.0.4/dbhome_1/dbs”…, 256) = 46
3366 0.000087 access(“/etc/intchg.ora”, F_OK) = -1 ENOENT (No such file or directory)
…
–//Which means there are 5 seconds between the chdir call and the mmap call? (chdir only takes 0.000027 seconds). strace doesn’t see it!
$ cat aa.sql
host sleep 1
quit
$ cat a.sh
#! /bin/bash
export ORACLE_SID=xxxx
sqlplus -s -l / as sysdba @ aa.sql &
a=$(($!+2))
top -p $a -d 1 -b
$ . a.sh | tee /tmp/aa.txt
…
$ grep PID /tmp/aa.txt | head -1; egrep ‘oracle’ /tmp/aa.txt
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
8498 oracle 20 0 20.2g 1.5g 1.5g R 93.9 1.2 0:00.47 oracle
8498 oracle 20 0 20.2g 5.8g 5.8g R 99.7 4.6 0:01.47 oracle
8498 oracle 20 0 20.2g 10g 10g R 99.8 8.4 0:02.47 oracle
8498 oracle 20 0 20.2g 15g 15g R 99.8 12.2 0:03.47 oracle
–//sleep 1
8498 oracle 20 0 20.2g 19g 19g S 95.9 15.7 0:04.43 oracle
8498 oracle 20 0 20.2g 19g 19g S 0.0 15.7 0:04.43 oracle
8498 oracle 20 0 20.0g 19g 19g R 1.0 15.6 0:04.44 oracle
You can see that RES,SHR is increasing, with CPU usage at 9x%, and it feels like the 11g version is having a problem
setting up pre_page_sga=true. The approximate positioning time is located in the process of establishing pagetables.
if using hugepages+pre_page_sga=true.
$ grep -i page /proc/meminfo
AnonPages: 244144 kB
PageTables: 11508 kB
AnonHugePages: 0 kB
HugePages_Total: 26000
HugePages_Free: 15800
HugePages_Rsvd: 41
HugePages_Surp: 0
Hugepagesize: 2048 kB
$ time sqlplus -s -l / as sysdba <<<quit
real 0m0.180s
user 0m0.029s
sys 0m0.011s
$ strace -fTr -o /tmp/a3.txt sqlplus -s -l / as sysdba <<<quit
$ awk '{print $2}' /tmp/a3.txt | sort -nr | head
0.121929
0.009182
0.009144
0.009047
0.005081
0.002050
0.001984
0.001932
0.001321
0.001243
–//cat a3.txt
8716 0.000095 chdir("/u01/app/oracle/product/11.2.0.4/dbhome_1/dbs") = 0
–//The waiting time here is very short.
8716 0.121929 mmap(NULL, 143360, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f552659d000
8716 0.000143 mmap(NULL, 143360, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f552657a000
8716 0.000166 mmap(NULL, 143360, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5526557000
8716 0.000260 getcwd(“/u01/app/oracle/product/11.2.0.4/dbhome_1/dbs”…, 256) = 46