I found out that my blog backup script was failing so I had to rewrite it to handle dropped connections to my remote sftp server. In the process I broke out as much of the code as I could into a module that I could share. The module is backupremote.py in my miscpython repository. Might be helpful to someone else. It copies the directory tree on a remote Linux server down to a directory on a Windows machine (i.e. a laptop). Uses sftp.
I was asked to look at a batch SQL query that ran for 12 hours on June 4th. I messed around with some other ideas and then went back to my best method for tuning long running batch queries. I ripped it apart. My code ran in 45 minutes.
This only works if you have the code and can change it. I work with a lot of PL/SQL code in Unix shell scripts running SQL*Plus. If someone comes to me with a SQL statement in this kind of code, I do not tear my hair out trying to tune it in some clever way. Also, this assumes that I cannot use a SQL Profile. SQL Profiles are my first choice for production performance problems. But for long running SQL that I have full control over and cannot use a SQL Profile I tear it apart.
Ripping or tearing a SQL statement apart means that I become the optimizer. In this case the problem query joined 5 tables. It summarized 3 months of data from a large fact table and the other 4 tables were joined together with the fact table. I replaced the one query with 5 queries each of which saved their results in a table. This first query summarized the fact table and the remaining four joined one more table to the current results. Something like this:
Summarize 3 months of fact table data – table 1
Join table 2 on surrogate key
Join table 3 on surrogate key
Join table 4 on natural keys
Join table 5 on natural keys
So, I created 5 tables each of which held the results of the previous joins. I dropped the tables as I was done with them to save space.
I have a feeling that I could use some clever hint to force the join order and access methods to match my 5 queries. But my time is short and this works, so I did not bother trying. When you have a query that runs for 12 hours it’s not very motivating to try various hints to get it to run faster. How long do you wait for each try before you give up? Working on one table at a time is nice. I have had this approach work for me time and time again. It is almost a mechanical process without a lot of painful thinking.
Anyway, I pass this on to the internet. People may think that breaking up a 5 table join into 5 queries is ugly, but it works.
I was on call again last week and Sunday night I used a SQL Profile to fix a performance problem. We rolled out some updates Saturday night and the modified query ran fine in some cases, but others ran for hours. When I got on there were 60 sessions running the bad plan and the load on the system was very high. It took me a while to identify the good plan, but then I found it, put in a SQL Profile, killed the 60 bad sessions, and the problem jobs ran in 2 minutes each. A similar situation came up Monday morning after I went off on call and a coworker took over and he ended up applying another SQL Profile on a similar query.
I spent the past couple of hours doing my typical SQL tuning exercise to see if I could figure out why Sunday’s query sometimes chose the bad plan.
The typical scenario includes these elements:
Partitioned table with some near empty partitions and a lot of full partitions
Bind variables used to determine partition choice
In our case we have certain tables that partition data by a code number. Certain numbers were used in the past or for other reasons are not active now. My guess is that the bad plans that we see come from bind variable values that point to the empty partitions. The bad plan works fine with empty partitions but then the optimizer uses it on full partitions and the query spins for hours.
I started to research this further to see if I could come up with a better fix than putting in SQL Profiles but did not get very far. I thought I would just pass this post along as is.
Bobby
P.S. I originally wrote this June 4th, but decided to just publish as is today.
I wanted to share my impressions of the two talks and the meeting as well as share the slides. Of course, these are my opinions shaped by my own experiences and not universally true of everyone!
This meetup had two very different types of talks. I thought of Stephen’s talk as a career development or soft skills sort of talk. I have been to a number of talks like that at national Oracle user group conferences such as Collaborate. They help balance out the purely technical Oracle database content. Once Stephen got into his talk, I really started to appreciate the quality of the content. To me he was talking about keeping in touch with people in an intentional but sincere way. I like the idea of planning on contacting people a certain number of times per year for example.
Years ago, in my first job I worked for Campus Crusade for Christ (now Cru) and I raised money to support myself and my family. I networked with people that I met through churches and friends and family members. It was different than networking as part of a DBA career because I was directly asking for money instead of making career-oriented connections. But the core idea that I remember from Stephen’s talk applied then. Stephen’s idea was to genuinely seek to help the folks in your network without focusing on what they could do for you. In my CCC days the support raising training told us that we were not “fundraising” but instead “friend raising”. I had some great experiences meeting people and getting to know them and I think it was best when my focus was on how to inspire and encourage the people I met rather than to anxiously think about whether they could give money to support what I did.
The other less serious connection I saw between Stephen’s presentation and my Cru days is that Stephen has a hand-written database setup to keep track of his people. Back in the day I had a Paradox database from Borland running on MS-DOS to do a lot of the same things. So, hearing Stephen talk about his contact database was a blast from the past for me.
I am not really doing much in the way of networking myself these days. I write this blog. I speak at conferences every couple of years or so. I help with the local Oracle user group AZORA. But I am not intentionally making and maintaining relationships with other technical people in the way Stephen described so his talk gave me something to think about.
Doug Hood’s talk was at the other end of the spectrum with some cool technology. Doug spoke on several things and with a lot of detail so I cannot do a good job of summarizing what he said. Check the slides for more details. But I do want to write down my impressions. Listening to Doug’s talk reminded me of some of the computer science study that I have been doing on my own recently. I have sort of gone back to school as an independent learner. When Doug talked about the memory hierarchy and caching it tied right back to the assembly language and algorithms study I have been doing.
Doug presented some cool hardware that puts persistent memory close enough to the CPU that it changes the way we think about memory hierarchy. What if you replace your RAM with persistent RAM that did not get cleared when you power off your computer? Evidently in some architectures (maybe all the modern ones these days I don’t know) the RAM is closely connected to the CPU and does not have to be accessed over the bus in the way I/O is. So, persistent RAM would be much faster than some solid-state disk being read over the bus no matter how fast the SSD is. Anyway, see Doug’s slides. I am sure that I am butchering the details, but I am giving my impression and my memory so keep that in mind.
In general database work and database performance has a lot to do with caching. I have heard a little bit about how algorithms can be designed to work well with CPU caches. I recently read a chapter about the B-Tree data structure that is used in databases and it was a big contrast to the other data structures I had studied because it took disk reads and memory accesses into consideration. Anyway, at a high level I took away from Doug’s talk notions about memory and caching and different ways people can tweak the memory hierarchy to get better database performance.
I had to leave a little early to head for the mountains for the weekend but as always, I valued the time I spent at AZORA, my local Oracle user group. I appreciate Stephen and Doug stepping up and giving their presentations. I hope that my links and the way I characterized their talks is accurate enough. I am sure that I made mistakes, but I got something out of my time and appreciate their efforts.
AZORA is taking a break for the hot Arizona summer but coming back strong with our next meeting on September 27th. The details are being finalized so keep an eye on our Meetup page.
I have been doing a lot of Oracle and PeopleSoft work this year, but I am trying to continue to develop my MySQL and Amazon Web Services (AWS) knowledge at the same time. My goal is to learn some new thing about MySQL and AWS each month and then document it either on this blog or on my company’s internal web site.
This month I decided to focus on building a Linux virtual machine on VirtualBox that has the source code for each version of MySQL that we support on AWS through RDS. I already had an Oracle Linux VM with MySQL 5.7.20 installed from source code from the MySQL GitHub site. So, all I had to do was get the source code to the correct release in git and then recompile it and create a test database. Then I could save a VirtualBox snapshot for that release.
I don’t want to spend time here describing how I did the initial MySQL 5.7.20 install except to say that I followed the steps in the 5.7 reference manual section titled “2.9 Installing MySQL from Source“. The GitHub specific instructions were in the section titled “2.9.3 Installing MySQL Using a Development Source Tree“. I can’t remember why it was a problem, but I could not get Boost installed correctly for CMake to pick it up, so I pass the path to Boost to CMake using the following command:
cmake . -DWITH_BOOST=/home/bobby/boost_1_59_0
instead of what is in the manual. Otherwise I just followed the manual.
I looked at our AWS MySQL RDS databases and found 4 versions: 5.5.46, 5.6.34, 5.7.17, and 5.7.25. So, I wanted to install each of these from source. My idea is that if we hit a bug or unexpected behavior, I can try different versions and see if that behavior is version specific. We could also dive into the source if needed and have the correct version.
Here are the steps that I put together that worked for our 5.7 databases:
cd /home/bobby/mysql-server
make clean
rm CMakeCache.txt
git checkout 5.7
git pull
git checkout mysql-5.7.17
cmake . -DWITH_BOOST=/home/bobby/boost_1_59_0
make
su - root
cd /home/bobby/mysql-server
make install
cd /usr/local/mysql/data
rm -rf *
cd ..
bin/mysqld --initialize --user=mysql
bin/mysql_ssl_rsa_setup
mysqld_safe --user=mysql &
mysql -p
use default password
ALTER USER 'root'@'localhost' IDENTIFIED BY 'My!Password1234';
mysql -p
use My!Password1234
My git repository was /home/bobby/mysql-server and my Linux username was bobby. The database is in /usr/local/mysql/data. The 5.6 and 5.5 databases had a different way to create the database and change the password:
Replace these lines:
bin/mysqld --initialize --user=mysql
bin/mysql_ssl_rsa_setup
mysqld_safe --user=mysql &
mysql -p
use default password
ALTER USER 'root'@'localhost' IDENTIFIED BY 'My!Password1234';
Here are some pictures from the 5.7.17 final working install:
Time will tell if this setup really helps us during some sort of problem, but I like having the source code in case we hit a bug or unexpected behavior. The great thing about open source is that we can see the code, so why not use it?
Bobby
Update: 2/4/2021
I have not used this setup since I published this post in May 2019. But today I finished adding MySQL 8.0.20 to my VirtualBox snapshots because we now have a AWS RDS instance of that version. Maybe some day I will need to look at the source code for a particular version, who knows. Here are some of the commands I had to run to get 8.0.20 to work after starting with my 5.7.17 snapshot.
Followed this page to get a new enough version of gcc installed:
yum -y install oracle-softwarecollection-release-el7
/usr/bin/ol_yum_configure.sh
added
sslverify=0
to /etc/yum.conf
yum-config-manager --enable ol7_optional_latest
yum install devtoolset-9
scl enable devtoolset-9 -- gcc --version
source /opt/rh/devtoolset-9/enable
eventually added that line to .bash_profile for root and bobby users
I ended up having to blow away the source tree and reload it due to issues.
rm -rf mysql-server
git clone https://github.com/mysql/mysql-server.git
cd mysql-server
git checkout mysql-8.0.20
mkdir bld
cd bld
cmake .. -DDOWNLOAD_BOOST=1 -DWITH_BOOST=/home/bobby/newerboost
This got the cmake to run without errors. After this the steps were the same as for 5.7.
Unzipped the downloaded file LINUX.X64_193000_db_home.zip in my ORACLE_HOME. Then I ran runInstaller from the ORACLE_HOME through MobaXterm. For some reason this decided to put the install window overlapping my two screens so I couldn’t tell what I was typing, and it didn’t allow me to move it. Fun times.
It seemed to hang on the last step, but it eventually finished. The process named ora_mz00_orcl was spinning on the CPU for part of the time that it seemed hung. I guess it was 5 or 10 minutes, but I didn’t time it.
Seems to be up and working:
SQL*Plus: Release 19.0.0.0.0 - Production on Thu May 2 13:00:59 2019
Version 19.3.0.0.0
Copyright (c) 1982, 2019, Oracle. All rights reserved.
Last Successful login time: Thu May 02 2019 12:51:54 -07:00
Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.3.0.0.0
SQL>
This is our last Meetup until our Summer break. Come check out two great presentations.
Doug Hood from Oracle will talk about the Oracle In-Memory database feature. We appreciate Oracle providing us with this technical content to support AZORA.
AZORA’s own Stephen Andert will be sharing a non-technical presentation on Networking. He just gave the same talk at the national Oracle user group meeting called Collaborate 19 so it will be great to have him share with his local user group.
Looking forward to seeing you there.
Bobby
P.S. AZORA is the Arizona Oracle User Group, and we meet in the Phoenix, Arizona area.
Monday, I used the coe_xfr_sql_profile.sql script from Oracle Support’s SQLT scripts to resolve a performance issue. I had to set the parameter force_match to TRUE so that the SQL Profile I created would apply to all SQL statements with the same FORCE_MATCHING_SIGNATURE value.
I just finished going off the on-call rotation at 8 am Monday and around 4 pm on Monday a coworker came up to me with a performance problem. A PeopleSoft Financials job was running longer than it normally did. Since it had run for several hours, I got an AWR report of the last hour and looked at the SQL ordered by Elapsed Time section and found a number of similar INSERT statements with different SQL_IDs.
The inserts were the same except for certain constant values. So, I used my fmsstat2.sql script with ss.sql_id = ’60dp9r760ja88′ to get the FORCE_MATCHING_SIGNATURE value for these inserts. Here is the output:
Now that I had the FORCE_MATCHING_SIGNATURE value 5442820596869317879 I reran fmsstat2.sql with ss.FORCE_MATCHING_SIGNATURE = 5442820596869317879 instead of ss.sql_id = ’60dp9r760ja88′ and got all of the insert statements and their PLAN_HASH_VALUE values. I needed these to use coe_xfr_sql_profile.sql to generate a script to create a SQL Profile to force a better plan onto the insert statements. Here is the beginning of the output of the fmsstat2.sql script:
The first few lines show the good plan that these inserts ran on earlier runs. The good plan has PLAN_HASH_VALUE 1314604389 and runs in about 600 milliseconds. The bad plan has PLAN_HASH_VALUE 3334601 and runs in 100 or so seconds. I took a look at the plans before doing the SQL Profile but did not really dig into why the plans changed. It was 4:30 pm or so and I was trying to get out the door since I was not on call and wanted to get home at a normal time and leave the problems to the on-call DBA. Here is the good plan:
Here is the bad plan:
Notice that in the bad plan the Rows column has 1 in it on many of the lines, but in the good plan it has larger numbers. Something about the statistics and the values in the where clause caused the optimizer to build the bad plan as if no rows would be accessed from these tables even though many rows would be accessed. So, it made a plan based on wrong information. But I had no time to dig further. I did ask my coworker if anything had changed about this job and nothing had.
So, I created a SQL Profile script by going to the utl subdirectory under sqlt where it was installed on the database server. I generated the script by running coe_xfr_sql_profile gwv75p0fyf9ys 1314604389. I edited the created script by the name coe_xfr_sql_profile_gwv75p0fyf9ys_1314604389.sql and changed the setting force_match=>FALSE to force_match=>TRUE and ran the script. The long running job finished shortly thereafter, and no new incidents have occurred in future runs.
The only thing that confuses me is that when I run fmsstat2.sql now with ss.FORCE_MATCHING_SIGNATURE = 5442820596869317879 I do not see any runs with the good plan. Maybe future runs of the job have a different FORCE_MATCHING_SIGNATURE and the SQL Profile only helped the one job. If that is true, the future runs may have had the correct statistics and run the good plan on their own. It could be that the INSERT statements are now running so fast that they are not recorded in the AWR as one of the top SQL statements.
I wanted to post this to give an example of using force_match=>TRUE with coe_xfr_sql_profile. I had an earlier post about this subject, but I thought another example could not hurt. I also wanted to show how I use fmsstat2.sql to find multiple SQL statements by their FORCE_MATCHING_SIGNATURE value. I realize that SQL Profiles are a kind of band aid rather than a solution to the real problem. But I got out of the door by 5 pm on Monday and did not get woken up in the middle of the night so sometimes a quick fix is what you need.
Back in 2017 I wrote about how I had to disable the result cache after upgrading a database to 11.2.0.4. This week I found one of our top queries and it looked like removing the result cache hints made it run 10 times faster. But this did not make sense because I disabled the result cache. Then I examined the hints closer. They looked like this:
/*+ RESULT CACHE */
There should be an underscore between the two words. I look up hints in the manuals and found that CACHE is a real hint. So, I tried the query with these three additional combinations:
/*+ RESULT */
/*+ CACHE */
/*+ RESULT_CACHE */
It ran slow with the original hint and with just the CACHE hint but none of the others. So, the moral of the story is to check your hints carefully because they may not be what you think they are.