Check your hints carefully

Back in 2017 I wrote about how I had to disable the result cache after upgrading a database to 11.2.0.4. This week I found one of our top queries and it looked like removing the result cache hints made it run 10 times faster. But this did not make sense because I disabled the result cache. Then I examined the hints closer. They looked like this:

/*+ RESULT CACHE */

There should be an underscore between the two words. I look up hints in the manuals and found that CACHE is a real hint. So, I tried the query with these three additional combinations:

 
/*+ RESULT */
 
/*+ CACHE */
 
/*+ RESULT_CACHE */

It ran slow with the original hint and with just the CACHE hint but none of the others. So, the moral of the story is to check your hints carefully because they may not be what you think they are.

Bobby

Posted in Uncategorized | Leave a comment

Huge Pages Speeds Startup of Web Servers

We had issues deploying changes to a large web farm a few weeks back. We had automated the startup of the 25 web servers using a DevOps tool but when it tried to bring them all up at once it took forever and the web servers got timeout errors. We resolved this by putting in huge pages on the metadata database.

I mentioned the unexpectedly large impact of huge pages on login time in a previous post. But, we had not realized that a small metadata database with a 5 gigabyte SGA would need huge pages. I guess I should mention that this is 11.2.0.3 Oracle on Linux. The web farm has 25 servers with 25 database connections each. I think the number of connections from each server should be 5 or less but I have not convinced others to reduce the number. If you have a metadata database with 4 cores and 625 connections from 25 web servers, the web servers cannot use all of those connections. A 4-core database server can only process 10 or 20 SQL statements at once if they are I/O centric. If they are all CPU you are looking at 4 concurrent active SQL statements. If you kick off 625 SQL statements at once the load will go through the roof and everything will timeout. I thought about putting in shared servers on the metadata database to force the 625 sessions to funnel through 12 shared servers so the database server would not be overwhelmed should all 625 connections choose to run at once. But the only problem we had was with logins. All 625 were logging in at the same time when the web farm was started by our DevOps tool. Huge pages resolved this issue by making the login time 10-20 times faster.

The database was in a 16 gigabyte Linux VM with 4 cores and a 5 gigabyte SGA. Without huge pages each login took about 2.5 seconds. With it the logins took about .15 seconds. Without huge pages the load on the database server went over 100 when all 625 sessions started at once. With huge pages the load never exceeded 1. Without huge pages the web farm never came up cleanly and with it the farm came up quickly without error. So, if you have a Linux VM with several gigabytes in your SGA you might want to use huge pages.

Bobby

P.S. This may be an 11.2 bug. Ever since I noticed this behavior the dramatic time difference between with and without huge pages seemed too large to be caused by having to spin up a larger page table with 4K pages. 2.5 seconds of CPU is an eternity just to set up a page table for 5 gigabytes worth of virtual memory. I timed login on an 12.1.0.2 and 18c database with 5 gig SGA and it was around .15 seconds without huge pages. I verified that the problem exists on 11.2.0.4 as well as 11.2.0.3. I think our 11.2 systems are still on RedHat Linux 6 so it may be a combination of database and OS version.

P.P.S This is why blogging about Oracle is such a pain sometimes. It is hard to get things right without extensive testing. But then again, the stuff I said was all true. Switching to huge pages did resolve our issues. I guess as always, your mileage may vary.

P.P.P.S. See the comments below about the pre_page_sga=TRUE. The people who commented are correct. As far back as Oracle 9.2 databases with
pre_page_sga=TRUE and large SGAs can result in logins taking a couple of seconds. See this Oracle bug:

Bug 5072402 : SLOW CONNECT TIME WITH SQLPLUS TO INSTNACE WITH 22GIG SGA AND PRE_PAGE_SGA

According the Oracle, Bug 5072402 is not a bug, but intended behavior. Oracle changed this behavior in Oracle 12 according to this document:

PRE_PAGE_SGA Behaviour Change in Oracle Release 12c (Doc ID 1987975.1)

My own testing proves out the idea that pre_page_sga=TRUE causes slowness in logins 11.2 and does not cause it in 12.1.

My laptop 11.2.0.3

Slow with pre_page_sga=TRUE

real 0m1.039s

pre_page_sga boolean TRUE
sga_max_size big integer 3G
sga_target big integer 3G

Fast with pre_page_sga=FALSE

real 0m0.218s

pre_page_sga boolean FALSE
sga_max_size big integer 3G
sga_target big integer 3G

My laptop 12.1.0.2

real 0m0.084s

pre_page_sga boolean TRUE
sga_max_size big integer 3G
sga_target big integer 3G

No HugePages

PAGESIZE AVAILABLE_PAGES EXPECTED_PAGES ALLOCATED_PAGES
4K Configured 4 786436
2048K 0 1537 0

PPPPS:

I have a production HP-UX Itanium server with a 46 gig SGA and pre_page_sga=TRUE. It takes about 1 second to login. I think it is the same effect, but on Linux the 4K page size may exaggerate the effect of pre_page_sga=TRUE whereas HP-UX may handle large SGA’s better. Anyway, I learned something new. Thank you for those who participated in the comments.

I spun up a Delphix Oracle 11.2.0.3 HP-UX Itanium VDB with 32 gig SGA and pre_page_sga=TRUE and the login time was instantaneous. Odd. I guess your mileage may vary still applies.

Posted in Uncategorized | 11 Comments

High Processes Cause Semaphore Errors

I could not create a new Delphix virtual database because I kept getting semaphore limit errors. I ended up reducing the processes parameter value to resolve the issue.

I got these errors creating the new VDB:

ERROR : SQL> ORA-27154: post/wait create failed 
ERROR : ORA-27300: OS system dependent operation:semget failed with status: 28
ERROR : ORA-27301: OS failure message: No space left on device
ERROR : ORA-27302: failure occurred at: sskgpcreates

The processes variable was set to 3000 on production so Delphix tried to create the VDB with the same value. I reduced processes to 100 and the VDB came up cleanly.

I have been living with this problem for a year or two but yesterday I really needed to resolve it and I finally figured it out. I got the idea of reducing processes from this Oracle support document:

Instance Startup Fails With Error ORA-27154,ORA-27300,ORA-27301,ORA-27302 (Doc ID 314179.1)

I looked these errors up on Oracle’s support site several times in the past and never saw the idea of reducing processes so I thought I would mention it here if only to remind myself.

Bobby

Posted in Uncategorized | Leave a comment

Python Practice Produced Pretty Pictures

I wrote a Python program that made some fun pictures so I thought I would share them even though this is not really a database post.

I practice Python programming by doing Rosetta Code programming tasks that no one has implemented in Python. This is a fun way of keeping up my Python skills. My most recent contribution made pretty pictures so I thought I would show them here. The code takes a cube and breaks up the faces into smaller and smaller pieces that change the cube into a rounder shape.

Here is the input:

Input to the program, a cube

Here is the output after one subdivision:

After one subdivision. Chunky.

Here is the output after two subdivisions:

Two subdivisions. Pretty round.

Note that it is getting rounder. Lastly, after four subdivisions it is remarkably round considering that it started as a cube:

Four subdivisions. Quite round.

The main point of this post was to show the pretty pictures. But, to be more serious, if someone is reading this blog and looking for a programming task to do for practice you can do what I do and find a Rosetta Code task for the language you are learning and you can get some good practice.

Bobby

Posted in Uncategorized | Leave a comment

Speed of Light

Looking at cloud databases has me thinking about the speed of light. Wikipedia says that the speed of light is about 186,000 miles per second. If my calculations are correct that is 5.37 microseconds per mile. The United States is about 2680 miles wide so it would take light about 14.4 milliseconds to cross the US. If I ping one of my favorite web sites it takes tens of milliseconds to ping so that kind of makes sense because those sites are in other cities and I am going through various routers. I did some tests with my company’s storage and found that reading from our storage when the data is cached in the storage server takes around 200 microseconds. That is 200 microseconds for a round trip. I’m sure that our database servers and storage are a lot less than a mile apart so most of that time has nothing to do with the speed of light. I heard about a cloud vendor whose fast network connection took 100 microseconds plus the speed of light. I guess 100 microseconds is the cost of getting your data to fiber and light does the rest. If your cloud database was on the other side of the country, I guess it could take 14 milliseconds each way at least for each SQL request. If the cloud database was in your own city and say 10 miles away that would only tack on about 53.7 microseconds each way to the 100 microseconds overhead. I guess it makes sense. Maybe 100 microseconds plus the speed of light is the cost of moving data in the best case?

Bobby

Posted in Uncategorized | 4 Comments

New utility Python scripts for DBAs

I pushed out three new Python scripts that might be helpful to Oracle DBAs. They are in my miscpython repository.

Might be helpful to some people.

Bobby

Posted in Uncategorized | 2 Comments

Slides from March AZORA meeting

Here are the slides from our March Arizona Oracle User Group (AZORA) meeting:

Daniel Morgan Security Master Class

We really appreciate Daniel Morgan taking the time to share this information about the increasingly important topic of database security.

Also, AZORA is always looking for people to present at future meetings. We have one more meeting in May before the blazing hot Arizona summer and then we start up again in September. Email be at bobby@bobbydurrettdba.com if you would like to speak at a future meeting.

Bobby

Posted in Uncategorized | Leave a comment

Oracle Database Security Master Class AZORA Meetup Thursday

Next Thursday Daniel Morgan from TidalScale will be giving a free Oracle Database Security Master Class for our March Arizona Oracle User Group meeting. See the details in our Meetup link:

https://www.meetup.com/Arizona-Oracle-User-Group/events/258858868/

Please RSVP so that we know how many people will be there.

Given the ever-increasing number of security threats to our databases this session on Oracle database security should be valuable to us all.

I hope to see you there!

Bobby

P.S. We are trying out a WebEx this time for remote attendees:

Join Webex meeting

Meeting number (access code): 743 918 204

Join by phone 
+1-855-797-9485 US Toll free 
+1-415-655-0002 US Toll 
Global call-in numbers  |  Toll-free calling restrictions

Posted in Uncategorized | Leave a comment

ORA-30926 on MERGE statement

I was on call last week and I got a ticket about a batch job that was failing on a MERGE statement with an ORA-30926 error. Our support team worked around it by deleting some duplicate rows in a table that was feeding into the MERGE. This week I wanted to go back and try to understand what would cause ORA-30926 errors on MERGE statements.

I read through some blog posts and Oracle support documents relating to ORA-30926 and merge. Then I tried building some simple test scripts to see when you get ORA-30926 on a MERGE. At the end of my search I came back to this simple statement from the 18c SQL Language Reference manual’s description of the MERGE statement:

MERGE is a deterministic statement. You cannot update the same row of the target table multiple times in the same MERGE statement.

I was puzzled by the term “deterministic” but “cannot update the same row…multiple times” seems simple enough. I created test scripts to show whether you can update the same row multiple times with a MERGE: zip

Here are the two test tables:

create table target (a number,b number);

create table source (a number,b number);

Here is data and a MERGE statement that causes the error:

SQL> insert into target values (1,1);
SQL> insert into source values (1,2);
SQL> insert into source values (1,3);

SQL> merge into target
  2  using source
  3  on (target.a = source.a)
  4  when matched then
  5  update set target.b=source.b;
using source
      *
ERROR at line 2:
ORA-30926: unable to get a stable set of rows in the source tables 

This seems very simple. We join the source and target tables on column A with value 1. The merge statement tries to update column B on the target table row twice. It tries to set it to 2 and then to 3. I guess this is where the term “deterministic” comes in. If the merge statement updated B to 2 and 3 which would it do first and which second?

The example above with values 2 and 3 for B makes good sense but I saw some odd behavior when I used source rows with 1 and 2 for the B values of the two rows. With B values of 1 and 2 in this case the MERGE does not get an error:

SQL> insert into target values (1,1);
SQL> insert into source values (1,1);
SQL> insert into source values (1,2);

SQL> merge into target
  2  using source
  3  on (target.a = source.a)
  4  when matched then
  5  update set target.b=source.b;

2 rows merged.

SQL> select * from target;

         A          B                                                           
---------- ----------                                                           
         1          2                                                           

This seems to contradict the idea that you cannot update the same row multiple times with a MERGE. In this case it seems like B is updated twice, once to 1 which is what it already was set to and then to 2. I guess this works because B was already set to 1 so the update of B to the same value does not count as an update. It seems like only one update took place and then B ends up set to 2.

This example does not work with a slightly different MERGE statement on Oracle 12.1 or earlier:

SQL> select banner from v$version;

BANNER                                                                          
--------------------------------------------------------------------------------
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production    

SQL> merge into target
  2  using source
  3  on (target.a = source.a)
  4  when matched then
  5  update set target.b=source.b
  6  when not matched then
  7  insert values (source.a,source.b);
using source
      *
ERROR at line 2:
ORA-30926: unable to get a stable set of rows in the source tables 

SQL> select * from source;

         A          B                                                           
---------- ----------                                                           
         1          1                                                           
         1          2                                                           

SQL> 
SQL> select * from target;

         A          B                                                           
---------- ----------                                                           
         1          1                                                           

All that I can say based on these two tests is that sometimes an update of a column to the same value counts as an update and sometimes it does not. The preceding example works on 12.2:

SQL> select banner from v$version;

BANNER                                                                          
--------------------------------------------------------------------------------
Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production    

SQL> merge into target
  2  using source
  3  on (target.a = source.a)
  4  when matched then
  5  update set target.b=source.b
  6  when not matched then
  7  insert values (source.a,source.b);

2 rows merged.

SQL> select * from source;

         A          B                                                           
---------- ----------                                                           
         1          1                                                           
         1          2                                                           

SQL> 
SQL> select * from target;

         A          B                                                           
---------- ----------                                                           
         1          2                                                           

It all seemed so simple when I read the documentation. Maybe instead of saying:

You cannot update the same row of the target table multiple times in the same MERGE statement.

Instead it should say:

You may not be able to update the same row of the target table multiple times in the same MERGE statement.

We should not count on being able to update the same row multiple times with a single MERGE statement, but clearly there are some cases in which we can. If you are like me and you get paged on a job that fails with ORA-30926 on a MERGE, it makes sense to get rid of the multiple updates on a single row because you most likely have hit a case where it is not allowed.

Bobby

Posted in Uncategorized | 6 Comments

One query in MySQL Performance Schema

I am learning about MySQL performance tuning. I read the Performance Schema chapter of the MySQL 5.7 manual and I have a MySQL 5.7.20 database running under Linux. I have a test table and I am running a simple query that does a full scan of the table and returns the sum of a couple of columns. The goal was to look at some Performance Schema tables to see what they would tell me about this simple query. I bounce the MySQL database service before running my test script so that the query pulls data from disk and not memory. A zip of my test script and its output is here: zip

I looked at the tables that had this pattern events_*_history_long. These are the four tables:

There seems to be a hierarchical relationship among these tables something like this:

  • statements->transactions
  • statements->stages->waits->waits

There may be other relationships that my simple test did not capture. If you look at the zip you will see all the levels of the hierarchy that I explored but, in this case, the bottom level seems the most interesting. Here is the query that explores the bottom of the statements->stages->waits->waits path:

select * from events_waits_history_long
where
NESTING_EVENT_ID in 
(select EVENT_ID from events_waits_history_long
where
NESTING_EVENT_ID in 
(select EVENT_ID from events_stages_history_long
where
NESTING_EVENT_ID in 
(select EVENT_ID from events_statements_history_long
where SQL_TEXT like concat('%te','st%'))
and NESTING_EVENT_TYPE = 'STATEMENT')
and NESTING_EVENT_TYPE = 'STAGE')
and NESTING_EVENT_TYPE = 'WAIT'
order by EVENT_ID;

The output reminds me of a trace in Oracle. It shows the individual disk reads and waits for internal locks and mutexs.

Most of the time seems to be taken up on reads from a file, which is what I expected. If you look at the wait/io/file/innodb/innodb_data_file waits they seem to be 16384 byte reads from the file associated with the table. I could use this information to build a query to show a profile of the time spent by the query based on EVENT_NAME. It would be kind of like a tkprof of an Oracle trace.

This post is just a simple first test. I am new to MySQL tuning so feel free to leave a comment or email me if you have suggestions or corrections related to this post.

Bobby

P.S. Looking at this again, it does not look like the file reads are really physical I/O because they are too short. The TIMER_WAIT column uses picoseconds so if you divide 3499928 by 1000000 you should get 3.49 microseconds for the first read which is tiny compared to a normal disc read of 1000 microseconds or more. Also, I added up all the wait times from the statements->stages->waits->waits query and the total was about 100 times less than the query run time. The higher levels of this hierarchy all had waits in the half second range, but the lowest level of waits adds to about one hundredth of that. So, I have more work to do at this point.

P.P.S. Looks like the reads are all from memory. I looked at several things but here is a profile of the waits from the stage level and then at the wait level:

+-------------------------------------------+---------------+
| EVENT_NAME | total_seconds |
+-------------------------------------------+---------------+
| wait/io/table/sql/handler | 0.4819 |
| wait/io/socket/sql/client_connection | 0.0001 |
| wait/synch/mutex/sql/THD::LOCK_query_plan | 0.0000 |
| wait/synch/mutex/sql/LOCK_table_cache | 0.0000 |
| wait/synch/mutex/sql/THD::LOCK_thd_query | 0.0000 |
| wait/synch/mutex/innodb/trx_mutex | 0.0000 |
| wait/synch/mutex/sql/THD::LOCK_thd_data | 0.0000 |
+-------------------------------------------+---------------+
+-------------------------------------------+---------------+
| EVENT_NAME | total_seconds |
+-------------------------------------------+---------------+
| wait/io/file/innodb/innodb_data_file | 0.0041 |
| wait/synch/mutex/innodb/buf_pool_mutex | 0.0004 |
| wait/synch/sxlock/innodb/hash_table_locks | 0.0002 |
| wait/synch/mutex/innodb/fil_system_mutex | 0.0001 |
+-------------------------------------------+---------------+

It seems that the wait/io/table/sql/handler event includes the CPU to process the query which takes about half a second whereas the wait/io/file/innodb/innodb_data_file event seems to just cover the time to read the data from the buffer pool. Here is a zip of my script to create these profiles and its output: zip

I got a profile of the query another way and that confirmed that almost all of the half second run time of the query was CPU:

Sending data spends .52 seconds on the CPU which is most of the query’s .54 second run time in this case.

P.P.P.S. This post, https://mysql.wisborg.dk/2018/08/05/what-does-io-latencies-and-bytes-mean-in-the-performance-and-sys-schemas/, says that wait/io/table/sql/handler events can be reads from the buffer pool but that wait/io/file/innodb/innodb_data_file events are actual reads from the file system. I can only assume that the data is cached by a Linux filesystem cache if this is true, because the wait/io/file/innodb/innodb_data_file events are much too short to be a read from disk. By the way, the disk in this case is an external hard drive on my laptop connected through USB. Typical I/O times are around 20 milliseconds. So, reads from the filesystem that finish in a few microseconds have to be coming from memory.

P.P.P.P.S. I looked at the source code and it looks like the wait/io/file/innodb/innodb_data_file wait events are physical I/O. But I had a hard time forcing a really slow read from the table. Setting innodb_flush_method=O_DIRECT in my /etc/my.cnf file and bouncing the MySQL service resulted in a slower query the first time I ran it, but then everything was cached in the database’s buffer pool and run time was back to half a second. It seems that the first time I run the query it does a small amount of physical I/O but the data for this small table is cached somewhere. I am running VirtualBox on top of my Windows 7 laptop so maybe the table’s data is cached in Windows even after turning on direct I/O to bypass the filesystem cache at the Linux level within the VM. This is all the same sort of thing you see with Oracle database tuning. You must navigate the many layers of cache between the database and the physical disk when you look at physical I/O times.

Posted in Uncategorized | 5 Comments