I have some Python scripts that I use to access a bunch of databases to gather information such as the size of the database. Usually it is not that important how long they take so I have been running queries against a list of 100 databases one at a time and it can take a while depending on what I do. Coworkers of mine have worked on running Unix/Linux shell scripts in parallel to hit a bunch of systems like this more quickly. So, I thought I would look at how to do the same in Python. I am sure there are more complicated ways to do it, but I got good results based on the simplest example from this part of the Python documentation:
https://docs.python.org/3.8/library/multiprocessing.html
Here is a simple python script to get the size of the database files from four databases at the same time:
All I did was take the first example from the multiprocessing documentation and replace f() which squared a number with dbspace() which connects to a database and runs a query to get the total db size. I experimented with different Pool() sizes. It is 4 in the example.
I picked 82 non-production databases that I have access to and ran the same query sequentially in a loop and it took 375 seconds. I took the code above and increased the Pool to 32 processes and it ran in 24 seconds. That is about a 15x speedup.
I am sure that I could get a lot more complicated with all this stuff, but this works well with minimal effort.
Bobby
Parallelizing tasks across server targets or across server => database targets is cool stuff. I started with parallelizing in 2005 using ClusterIt dsh (hard to find on Linux) and GNU Parallel.
I found it takes well designed set of helper tools/scripts to scale up a robust/reliable parallelizing infrastructure. Error checking at scale is tough.
Your example using GNU Parallel might look like this:
me@myhost:1950 [/home/me/Library/dsh]
$ cat z.sql
set heading off
set pagesize 0
column instance_name format a20
column mbytes format 999,999,999,999
with
my$view as
(
select bytes from dba_data_files union all
select bytes from dba_temp_files union all
select bytes from v$log
)
select max( i.instance_name ) instance_name ,( sum(bytes) / power( 2 ,20 ) ) mbytes
from my$view ,v$instance i
;
me@myhost:1950 [/home/me/Library/dsh]
$ time parallel ‘sqlplus -S -L {} @z.sql;’ ::: myuser/mypass@orcl1db myuser/mypass@orcl2db myuser/mypass@orcl3db myuser/mypass@orcl4db
orcl1db1 110,830
orcl2db1 6,030
orcl3db1 327,187
orcl4db1 54,037
real 0m4.80s
user 0m1.54s
sys 0m0.34s
me@myhost:1950 [/home/me/Library/dsh]
$
*****
I abbreviated the output, 4.8s hit 15 databases on a single server.
Ciao, Tim…
Thanks Tim. I have been meaning to look at what my coworkers are using with Linux shell scripts. It is great to have your example and information as well.
Bobby