Slides From January 2019 AZORA Meeting

Here are slides from the January 2019 AZORA Meeting with Charles Kim and Viscosity:

Oracle Database 12.2 New Features

Oracle Database 18c New Features

Thank you to Charles and Viscosity for making our meeting a big success, and thanks for sharing these slides.

Bobby

Posted in Uncategorized | Leave a comment

MIT’s Free Algorithms Track

MIT, the Massachusetts Institute of Technology, has published a free set of three undergraduate level Algorithms classes that I recommend to the readers of this blog. These are links to the three classes in the order that they should be taken.

Mathematics for Computer Science (Spring 2015)

Introduction to Algorithms (Fall 2011)

Design and Analysis of Algorithms (Spring 2015)

I took the 2010 version of the Mathematics for Computer Science class and I am almost done with the Introduction to Algorithms class. I am considering taking the third class next. I want to use this post to explain why I recommend this set of three classes.

All three of these classes have video lectures. These classes are part of a massive collection of free educational information that MIT publishes on its OCW or Open Courseware web site: ocw.mit.edu. I have focused on the Computer Science classes out of the many available. But only certain classes contain video. For example, the Database Systems class does not have video. It does have free lecture notes and other content that has real value, but the Algorithms track has three classes in a row with video and that makes them especially valuable.

In addition to the video content all three classes have valuable static content such as lecture notes. The Mathematics for Computer Science class has a free online textbook. It looks like the 2015 version has all kinds of interesting extra content that was not in the 2010 class that I took, and I thought the 2010 class was great. The last two classes have solutions to the problem sets, quizes, and exams which is great when you are studying this on your own.

The Introduction to Algorithms and Design and Analysis of Algorithms classes have a highly regarded standard Algorithms textbook for their readings. It is Introduction to Algorithms, 3rd Edition by by Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein. This textbook is commonly referred to as CLRS and seems to be the standard textbook for people who are serious about Algorithms.

The price is right. All three of these classes are free. I did buy CLRS, or actually I got it as a present, so my wife’s parents bought it, but $65 is a worthwhile investment especially since it is the text for two of the three classes and a valuable reference afterwards. The first class is completely free since it has a free text book included. The three classes represent thousands of dollars’ worth of educational material for free.

These classes are not easy. I do not claim to have mastered this material. It is hard to really discipline yourself to study for a class by yourself without grades and deadlines and others to talk to about it. But I have learned a lot from the two classes that I have worked through. I am a little afraid to take on the third class which is the intermediate level Algorithms class. Yikes! But even if I struggle to fully grasp every assignment, I am learning something. Some of it is a reminder of things I learned long ago in college, but many things are new. So, I do not mind the classes being hard. Better hard than too easy.

So, if you want to learn about Algorithms this three-class sequence is a great way to go. But, why would readers of this blog want to study this subject? In my opinion, Algorithms is just one area of Computer Science that would be helpful for a database professional to learn. Obviously, it is great to learn about database theory if you work with databases. It is great to learn about other systems topics such as operating systems and compilers since your work uses this type of software. It is great to learn about hardware, software development, etc. I think that many aspects of my Computer Science background from long ago in school have helped me in my database job. So, I do not think that Algorithms is the most important or only thing within CS that a database person needs to know, but I do think that Algorithms is very helpful to database workers.

I think that all of the main topics within CS that a database person would want are covered on the list of OCW EE and CS classes. But it seems that Algorithms is unique in that it has a very nice three class sequence that is so fully developed with video and the other resources that I mentioned. Maybe there are other areas that I missed, but I know that Algorithms is well covered by OCW. So, that is why I am singling out this three-class sequence in this blog post. I believe that it is an especially well covered topic within Computer Science that is useful to database professionals and available for free. So, check it out!

Bobby

Posted in Uncategorized | 2 Comments

January 25th AZORA Meetup with Viscosity

If you are in the Phoenix, Arizona area on Friday, January 25th please come to the AZORA meetup with talks by Viscosity. Please RSVP on the meetup site so we know how much food to order: Meetup link. We have a great lineup of speakers with talks about new features of the latest Oracle database versions. I know that I have not kept up with all the latest features, especially with all the other technical training that is on my plate. So, I am looking forward to setting aside this dedicated time to learn about Oracle database features from highly qualified presenters.

Plus, it will be fun and includes lunch! Come join us for our first meeting of the new year.

Bobby

Posted in Uncategorized | Leave a comment

Long running queries and my DBA_HIST_SQLSTAT scripts

On December 2nd we ran into a problem with one of my scripts that uses the DBA_HIST_SQLSTAT view. It did not show the full run time of a SQL statement that ran for several hours. So, it hid the fact that the plan that the SQL statement was using was much less efficient than another better plan. This post will document the two DBA_HIST_SQLSTAT scripts that I have been using so far and will introduce a new script that should make it clearer that a SQL statement is running for several hours on a plan.

There are two scripts that are based on DBA_HIST_SQLSTAT that I use most often for tuning: sqlstat.sql, sqlstat2.sql. I use these scripts all the time to look at the performance of queries and the different execution plans that they use. But in this case the output of sqlstat.sql was misleading. Here is part of its output for the problem SQL statement am47wcwn336yj:

PLAN_HASH_VALUE END_INTERVAL_TIME         EXECUTIONS_DELTA Elapsed Average ms
--------------- ------------------------- ---------------- ------------------
     3771021445 04-NOV-18 05.00.15.814 AM                1            2250696
     2117190322 11-NOV-18 03.01.00.265 AM                6         223502.352
     1484468961 18-NOV-18 06.00.22.364 AM                1         2695964.96
     2117190322 25-NOV-18 02.00.34.302 AM                3         198192.968
     1139203276 02-DEC-18 07.00.21.784 AM                1          322110.53

We got a performance ticket at about 9 am on December 2nd about a session with SQL ID am47wcwn336yj running for longer than normal. Based on the sqlstat.sql output above you would think that the current plan is 1139203276 and that it is a little slower than the other plans but not terrible. The problem is that sqlstat.sql only shows you SQL executions that have completed during the hour prior to the END_INTERVAL_TIME. So, one execution of am47wcwn336yj completed between 6 and 7 am on December 2nd, but the output makes it look like it only ran for 322110.53 milliseconds which is about 5 minutes. Really it probably ran for several hours and completed around 6:05 am. So, we thought that 1139203276 was an ok plan when really 2117190322 is the much more efficient plan. But, the sqlstat.sql output confused us because it did not show the hours where am47wcwn336yj was running with plan 1139203276 but had not completed. It is easy to see why sqlstat.sql leaves out these hours if you look at this condition in its where clause:

and executions_delta > 0

This part of the where clause conditions prevents you from seeing where the SQL statement is still running. But the select clause contains a division by executions_delta so you cannot have divide by zero so that is why I excluded it.

When I am working on a long running SQL statement that spans two or more hours, I run sqlstat2.sql after I run sqlstat.sql. Sqlstat2.sql shows all the time spent per hour regardless of whether a SQL statement’s execution completed. It does not have the condition requiring executions_delta to be > 0 because it does not divide by that column. Here is the output for am47wcwn336yj:

PLAN_HASH_VALUE END_INTERVAL_TIME         EXECUTIONS_DELTA ELAPSED_TIME_DELTA
--------------- ------------------------- ---------------- ------------------
     3771021445 04-NOV-18 03.00.43.465 AM                0         3584030311
     3771021445 04-NOV-18 04.00.27.516 AM                0         3588264762
     3771021445 04-NOV-18 05.00.15.814 AM                1         2250695998
     2117190322 11-NOV-18 03.01.00.265 AM                6         1341014109
     1484468961 18-NOV-18 02.00.07.772 AM                0         3171408998
     1484468961 18-NOV-18 03.00.58.545 AM                0         3648125526
     1484468961 18-NOV-18 04.00.44.680 AM                0         3586687594
     1484468961 18-NOV-18 05.00.32.832 AM                0         3592026797
     1484468961 18-NOV-18 06.00.22.364 AM                1         2695964960
     2117190322 25-NOV-18 02.00.34.302 AM                3          594578904
     1139203276 02-DEC-18 02.00.14.082 AM                0          480039949
     1139203276 02-DEC-18 03.00.05.255 AM                0         3591464884
     1139203276 02-DEC-18 04.00.50.936 AM                0         3645014119
     1139203276 02-DEC-18 05.00.40.304 AM                0         3591817904
     1139203276 02-DEC-18 06.00.31.820 AM                0         3587861998
     1139203276 02-DEC-18 07.00.21.784 AM                1          322110530
     1139203276 02-DEC-18 09.00.58.127 AM                0         2028205794
     1139203276 02-DEC-18 10.00.37.793 AM                0         3582326580

Here you can see that plan 1139203276 ran between 1 and 7 am on 02-DEC-18 and completed and then ran between 8 and 10 am and never completed. We ended up killing the long running session and we reran the SQL another way so there is no more history of it. The problem with sqlstat2.sql is that it does not show average elapsed time per execution like sqlstat.sql does. It just shows the total elapsed time for all executions, if any. This is because we don’t divide by executions_delta. Also, the units are microseconds instead of milliseconds. I find milliseconds to be easier to use.

To me looking at the output from sqlstat.sql and the output from sqlstat2.sql 2117190322 is the best plan. It is clear to me because I wrote these two scripts and am used to the way they look in this situation. If you go back to the first output, you can see that plan 2117190322 averages less than 224 seconds per execution (223502.352 ms and 198192.968 ms). Plans 3771021445, 1484468961, and 1139203276 ran for multiple hours. You can tell this because there are multiple consecutive hours with those plans and 0 executions.

I have used sqlstat.sql and sqlstat2.sql a lot as they are but today, I got the idea of combining them into a script that shows average elapsed time per execution for a given hour and includes hours where an execution did not complete. I have not used it yet in a real performance tuning situation but here it is, cleverly named sqlstat3.sql. Here is its output for this situation:

PLAN_HASH_VALUE END_INTERVAL_TIME         EXECUTIONS_DELTA Elapsed Average ms
--------------- ------------------------- ---------------- ------------------
     3771021445 04-NOV-18 03.00.43.465 AM                0         3584030.31
     3771021445 04-NOV-18 04.00.27.516 AM                0         3588264.76
     3771021445 04-NOV-18 05.00.15.814 AM                1            2250696
     2117190322 11-NOV-18 03.01.00.265 AM                6         223502.352
     1484468961 18-NOV-18 02.00.07.772 AM                0            3171409
     1484468961 18-NOV-18 03.00.58.545 AM                0         3648125.53
     1484468961 18-NOV-18 04.00.44.680 AM                0         3586687.59
     1484468961 18-NOV-18 05.00.32.832 AM                0          3592026.8
     1484468961 18-NOV-18 06.00.22.364 AM                1         2695964.96
     2117190322 25-NOV-18 02.00.34.302 AM                3         198192.968
     1139203276 02-DEC-18 02.00.14.082 AM                0         480039.949
     1139203276 02-DEC-18 03.00.05.255 AM                0         3591464.88
     1139203276 02-DEC-18 04.00.50.936 AM                0         3645014.12
     1139203276 02-DEC-18 05.00.40.304 AM                0          3591817.9
     1139203276 02-DEC-18 06.00.31.820 AM                0            3587862
     1139203276 02-DEC-18 07.00.21.784 AM                1          322110.53
     1139203276 02-DEC-18 09.00.58.127 AM                0         2028205.79
     1139203276 02-DEC-18 10.00.37.793 AM                0         3582326.58

The nice thing is that you can see that 2117190322 ran 6 times averaging 223502.352 milliseconds on November 11 and 3 times averaging 198192.968 milliseconds on November 25th. Plus, you can see that the other plans span multiple hours. Note that the plans that span a full hour run for about 3600000 milliseconds which is an hour so that supports the idea that the SQL statement is active throughout the entire hour. I have only shown some of the total output to fit on the screen, but the full output includes more information such as a CPU and IO breakdown of the run time.

Anyway, the point is that the main DBA_HIST_SQLSTAT query that I use, sqlstat.sql does not show SQL statements that run for several hours very clearly. You can figure it out using sqlstat2.sql in addition to sqlstat.sql. The new sqlstat3.sql script may make it easier to diagnose multi-hour SQL statement executions in the future without having to use two scripts.

Bobby

P.S. See this post about how I consolidated sqlstat.sql, sqlstat2.sql and sqlstat3.sql into one script, sqlstat.sql:

https://www.bobbydurrettdba.com/2020/04/14/fixed-issue-with-dba_hist_sqlstat-script/

Posted in Uncategorized | Leave a comment

To Blog, or Not to Blog?

As 2018 is ending I am thinking about what I spend my time on. Do I have my priorities correct? I know that I should spend some part of my time interacting with people over the Internet. I have gotten so much benefit from talking with other people about Oracle database issues through web-based forums. I have also had some in person interactions. I’m excited about the sort of rebirth of AZORA, my local Arizona Oracle user group. I talk to human beings in person about Oracle at user group meetings and that has great value. But I noticed that it had been a while since my last blog post of any substance. So, I asked myself if I am losing interest in blogging. I have not lost interest in writing blog posts, but I have had many other things tugging for my time, including other ways to interact with people over the Internet. So, I need to think about how much time to allocate to this blog next year.

I blame part of my lack of blog verbosity on Python. I have written a number of Python related posts, but Python has also drawn me into interacting with Rosetta Code and Stack Overflow. I’m trying to keep up my Python skills so that I have a general-purpose programming language available in my tool kit. But any time I might have devoted to my blog may have gotten used up by these Python related sites. Even GitHub is a form of distraction because maintaining Python repositories there takes time. But I don’t regret any time spent on Python because it is such a popular language now and it has really helped me in my work.

I guess the second time sink has been my PeopleSoft work. I don’t really talk much about my PeopleSoft work on this blog, but I have done a lot more of this sort of work in 2018 than I ever would have expected. With the push to move to the cloud and move to non-Oracle databases I have nevertheless been doing a bunch of old fashioned on premises ERP support, PeopleSoft applications on Oracle databases. I’ve been doing PeopleSoft on Oracle database for at least 24 years now so if my employer needs this sort of work, I’m capable of doing it. But PeopleSoft doesn’t excite me as much as database internals and performance tuning so that’s why I don’t blog about it much.

Speaking of the cloud and non-Oracle database, I have done some work in these areas in 2018 but not as much as I would have liked. I probably wouldn’t blog about the basics of using AWS or MySQL RDS, but if I can dig into some MySQL internals and use or build some MySQL performance tools, I can see blogging about that. My experience is mostly with Oracle, but I think open source is neat. I like having the MySQL and PostgreSQL source code even if I am unfamiliar with it. So, I guess I haven’t blogged about the cloud and non-Oracle databases because I just haven’t gotten very far. Maybe next year.

But, why have I not blogged about Oracle performance tuning? That really is the question. There have been several performance issues that I have worked on over the past few months, but in many cases, they were not anything new to me. Still, I think I may have missed out by not documenting the steps I went through in some of these real issues. I think it is easy to feel like what I am writing is not anything new and that there are better Oracle performance tuning bloggers out there. Also, I worry that I will make a mistake and confuse people or mislead them with wrong information. I would not intentionally mislead anyone, but I can certainly be wrong! I think going forward in 2019 I want to make the effort to write blog posts about Oracle performance issues that I have resolved even if they are not especially new territory. I probably can find some interesting angle from most issues. I think it might help people to see how I am using my SQL*Plus and Python scripts in different scenarios. Anyway, I hope to get back to blogging about Oracle performance.

Topics of my 15 blog posts in 2018 to this point:

User Group6
Oracle4
Python3
MySQL2

My Shakespeare inspiration (To be, or not to be):


To be, or not to be, that is the question:
Whether ’tis nobler in the mind to suffer
The slings and arrows of outrageous fortune,
Or to take Arms against a Sea of troubles…

Hamlet. Act III, Scene I

Posted in Uncategorized | 4 Comments

Parallel_degree_policy AUTO caused serial plan in qat

Recently during testing a merge statement was running forever in our qat environment after a simple change. We had simply added a few columns to an existing table. After looking around we found that our qat environment had the parameter parallel_degree_policy set to AUTO, but our other environments had it set to MANUAL. I thought I would show how we figured this out.

I had the SQL_ID for the problem merge statement and used my sqlstat.sql script to see if the same merge statement ran in production. I did not find any production executions. I was told that this merge statement was already in our production code and ran all the time without an issue. So, then I used my findsql.sql script to find the SQL_ID of the production version of the merge statement. 

To use findsql.sql you have to edit it to replace MYUSER with a real user on your database. I use my own personal user. Then you have to update the LIKE statement in the where clause to identify the query that you want. I replaced this line in the findsql.sql text on GitHub:

SQL_TEXT like '%a.FISC_WK_OF_Yr < to_number(to_char(sysdate+1, ''iW''))%' and

with this line which is part of the text of the problem merge statement:

upper(SQL_TEXT) like upper('%MERGE INTO DDMADM.FWB_MDL MDL%') and

I easily found the equivalent merge statement in production. The QAT merge statement had SQL_ID 00cmatqz5wymd and the prod one had 7pgqd82bgdah2. My coworker, who had brought me in on this issue, said that on QAT the query was not running in parallel. So, I used my getplans.sql script to look at the two plans. On prod the plan used parallelism:

-----------------------------------------
| Id  | Operation                        
-----------------------------------------
|   0 | MERGE STATEMENT                  
|   1 |  MERGE                           
|   2 |   TEMP TABLE TRANSFORMATION      
|   3 |    PX COORDINATOR                
|   4 |     PX SEND QC (RANDOM)          
|   5 |      LOAD AS SELECT              
|   6 |       UNION-ALL                  
|   7 |        WINDOW SORT               
|   8 |         PX RECEIVE               
|   9 |          PX SEND HASH            
|  10 |           VIEW                   
|  11 |            UNPIVOT               
|  12 |             PX BLOCK ITERATOR    
|  13 |              TABLE ACCESS FULL   

On QAT it did not:

-----------------------------------------------------
| Id  | Operation                                    
-----------------------------------------------------
|   0 | MERGE STATEMENT                              
|   1 |  MERGE                                       
|   2 |   VIEW                                       
|   3 |    NESTED LOOPS OUTER                        
|   4 |     VIEW                                     
|   5 |      TEMP TABLE TRANSFORMATION               
|   6 |       LOAD AS SELECT                         
|   7 |        UNION-ALL                             
|   8 |         WINDOW SORT                          
|   9 |          VIEW                                
|  10 |           UNPIVOT                            
|  11 |            PARTITION RANGE SINGLE            
|  12 |             TABLE ACCESS BY LOCAL INDEX ROWID
|  13 |              INDEX RANGE SCAN                

I found this extra line at the end of the QAT plan:

- automatic DOP: Computed Degree of Parallelism is 1 because of parallel threshold

This triggered a memory of a coworker of mine who was trying out some degree of parallelism parameters a couple of years ago. After searching the internet, I quickly found that the automatic DOP message was related to the parameter parallel_degree_policy. We found that it was set to manual on prod, so we set it to manual on QAT and the merge statement went back to the expected performance.

I guess the point of this post is to keep a close watch on the messages that show up at the end of your execution plan. They could lead you to a parameter setting that is the source of your problem.

Lastly, this is my first post using the new WordPress Gutenberg plugin. What do you think? It was pretty easy to put in the output from my scripts in little unformatted boxes by themselves. Much easier than with the previous WordPress editor.

Bobby

P.S. I have found one negative to Gutenberg. I can’t find a way to use the Jetpack Proofreading function to check my grammar. So, if you see that I have used passive voice in a bunch of my sentences you will know why. Hopefully they will add this into a future update. Of course, I could just use my brain to check the grammar but where is the fun in that?

P.P.S. I started working on another blog post and found a simple solution to my grammar checking needs using Gutenberg. I just copy and paste the blog post text into Microsoft Word and use its grammar checker. Pretty neat.

Posted in Uncategorized | 2 Comments

Hear my Python for the Oracle DBA talk in Scottsdale on November 16th

I am going to giving a talk about why Python is a good programming language for Oracle DBAs in Scottsdale on November 16th for the Arizona Oracle User Group, AZORA. We may get some other speakers together for the meeting but I will be there for sure. I did this talk for Collaborate and for the RMOUG Training Days earlier this year. I may try to make it a little more interactive if I can get it together in time. I’m thinking of encouraging people to bring their laptops and install Python 3.7.1 so they can play around with a couple of features during the talk. Here is the Meetup information:

https://www.meetup.com/Arizona-Oracle-User-Group/events/255752135/

Let me know if you have any questions about this meeting. It should be fun.

Our meeting sponsor Clarivoyant graciously agreed to provide us with pizza and drinks so don’t worry about lunch!

Search my blog for Python posts if you want to see examples of how I have been using Python for my Oracle DBA work:

https://www.bobbydurrettdba.com/?s=Python

Bobby

Posted in Uncategorized | Leave a comment

Copy Oracle performance reports to SharePoint using Python

I have a couple of Oracle performance reports that I upload from my work Windows 7 laptop to our corporate SharePoint site on Office 365 on a regular basis. I thought that it would save me time to automate the uploading of these reports using Python. This post describes the continuing saga of my attempts to build a working Python script just to upload a file to SharePoint. I have a script that works for me today but there are no guarantees it will work in the future. Still, it might be helpful to other Oracle DBAs and others so I thought it was worth documenting.

The first Python script that I wrote that really helped me in my job was the ashcpu.py script within PythonDBAGraphs. I run this report just about every day at work to monitor the CPU usage on an important database. After I review the report which is an image of a graph I manually copy the report to SharePoint in a certain folder. It doesn’t take long to find the bookmark for the SharePoint folder and to copy the files over. But, what if I could just click on an icon and the files would be copied to SharePoint automatically?

Back in 2015 I wanted to start saving some historical Oracle performance data that went back further than the 6 weeks of AWR history that we keep on our Oracle databases. Ideally we would setup some sort of AWR data warehouse that we could mine for useful information based on data from the past. But setting up an AWR data warehouse would require money and approvals and time. Instead I built a simple script to get monthly AWR reports for all of our production databases. If the database had been up for the entire month it would get a report for that month, otherwise it would report on a subset of the month. I forget how I chose the subset. I think I picked the longest number of days. Anyway, the script tars and gzips the html AWR files for all of the databases into a single .tar.gz file. So, each month I use ftp to copy the .tar.gz file down to my laptop. Then I extract all of the html files and zip them up. Then I copy the zip to Sharepoint. So, in this case I wanted a Python script that would do the ftp, ungzip and untar, zip, and copy the zip to Sharepoint. It is only once a month but it would be nice to just click the button and have it done.

The script to upload the images is pretty simple. All it does is look for .png image files in the right directory and with the right pattern and upload them and then delete them from the source directory.

The script that uploads the zip files with AWR reports is a little more complicated because it does the ftp, tar, gunzip, zip, and copies to SharePoint.

But things like untaring and zipping the files are easy in Python:

The hard part was getting the file to SharePoint.

I had a fully working script that I was using to upload both of these types of reports, images and awr zips, to SharePoint. It probably worked for a week at most. Here is the code on GitHub that no longer works: url

I tried to find out why this script stopped working by asking on Stack Overflow and opening an issue on the GitHub site for the package that my script depended on. But I didn’t get an immediate answer so I moved on to create the copyfiletosharepoint script. It wasn’t easy getting copyfiletosharepoint to work. It is based on Microsoft Graph which seems to be Microsoft’s preferred way to interact with SharePoint on Office 365 through Python. Anyway, as of today 7/27/2018 it works for the two reports that I want to upload. Hopefully it will keep working for longer than my first script!

It has been quite the saga and has taken much longer than I expected. But hopefully the scripts that I have built will make me more productive by automating some of the mindless tasks that I do on a regular basis.

Bobby

Posted in Uncategorized | Leave a comment

MySQL client slow to display the first row of a large result set

A coworker of mine asked me to look at a MySQL query that was running longer than we want it too. If they added a LIMIT 1000 clause at the end of the query it popped up the results in a couple of minutes but without the LIMIT clause it apparently hung forever without returning a row of the result set. I figured out that the query was returning 4 million rows with a bunch of columns in each row which is why it was slow. But the question remained why did it pop up the first row faster with LIMIT clause? I checked the plan with and without the LIMIT clause and didn’t see a difference. Then I did more testing and realized that even a simple query that returns a large result set took a long time to display the first row. This is on MySQL 5.7 running in AWS RDS.

I asked a question about this on two forums that answer MySQL questions:

https://dba.stackexchange.com/questions/208161/mysql-select-from-table-without-conditions-does-not-return-rows-right-away

https://forums.mysql.com/read.php?24,666643

I appreciate people responding to my questions but I didn’t really get the answer I was looking for. I really wanted to know why large result sets made the mysql client take a long time to display the first row. But, I guess people thought that you should just add a LIMIT clause and not worry about it. Why would you want to return the first rows of a large result set quickly? I just want to understand it and I still don’t.

I built a test case and ran it against an Oracle database and a MySQL database and the Oracle database returned the first row immediately and the MySQL version hung for a couple of minutes. I used the Oracle sqlplus utility and MySQL mysql command line client.

Here is the test case:

I ran this from my laptop at home over a VPN so they both had to send results over the internet. But the MySQL database was in AWS so it had to go over the internet twice to get to me. The Oracle database was in our data center. The select * from test statement pops up the first rows instantly on Oracle but took at least a minute to show up on MySQL. A simple Python script against the same MySQL table pops up rows instantly.

So, this leaves me wondering if the slowness in displaying the first row is due to some processing that occurs in the mysql client. I assume, but don’t know, that running the query through Python is no different from through the mysql client from the server’s perspective.

I guess the point is that there is a lot that I don’t know about MySQL’s architecture because most of my experience is with Oracle. But it was helpful to me to know that the mysql client is slow to produce the first row of a large result set and to know that a Python script could be used as a workaround to this slowness.

Bobby

Posted in Uncategorized | Leave a comment

Python script to backup remote directory using sftp

This is the sftp version of the ftp script I published in an earlier post. It logs into a remote host using sftp and backs up a remote directory recursively backing up all subdirectories.

This one uses Paramiko instead of ftplib so you can do encrypted file copies.

Bobby

Posted in Uncategorized | 6 Comments