I encourage people that I work with to put a small number like 8 as the parallel degree when they want to create tables or indexes to use parallel query. For example:
SQL> create table test parallel 8 as select * from dba_tables; Table created. SQL> SQL> select degree from user_tables where table_name='TEST'; DEGREE ---------- 8
But frequently I find tables that were created with the default degree by leaving out a number on the parallel clause:
SQL> create table test parallel as select * from dba_tables; Table created. SQL> SQL> select degree from user_tables where table_name='TEST'; DEGREE ---------- DEFAULT
The problem is that on a large RAC system with a lot of CPUs per node the default degree can be a large number. A table with a large degree can cause a single query to eat up all of the available parallel query processes. That’s fine if only one query needs to run at a time but if you plan to run multiple queries in parallel you need to divide up the parallel query processes among them. I.e. if you have 100 parallel query processes and need to run 10 queries at a time then you need to be sure each query only gets 10 of them. I guess degree=5 is 10 processes but the point is that you don’t want to start running a bunch of queries with a degree of 50 each when you have 100 parallel processes to divide up.
With the default settings default degree is 2 X number of cpus X number of RAC nodes. I tested this on an Exadata V2 with 2 nodes and 16 cpus per node. The result was as expected, degree=64:
Final cost for query block SEL$1 (#0) - All Rows Plan: Best join order: 1 Cost: 2.1370 Degree: 64 Card: 4203.0000 Bytes: 54639 Resc: 123.0910 Resc_io: 123.0000 Resc_cpu: 2311650 Resp: 2.1370 Resp_io: 2.1354 Resc_cpu: 40133
Just to verify that a query with parallel 8 would really use degree 8 I ran the same test with the same table but parallel 8:
Final cost for query block SEL$1 (#0) - All Rows Plan: Best join order: 1 Cost: 6.5419 Degree: 8 Card: 4715.0000 Bytes: 61295 Resc: 47.1020 Resc_io: 47.0000 Resc_cpu: 2593250 Resp: 6.5419 Resp_io: 6.5278 Resc_cpu: 360174
Also note the lower cost(2) in the plan with default degree:
--------------------------------------------------+-----------------------------------+-------------------------+ | Id | Operation | Name | Rows | Bytes | Cost | Time | TQ |IN-OUT|PQ Distrib | --------------------------------------------------+-----------------------------------+-------------------------+ | 0 | SELECT STATEMENT | | | | 2 | | | | | | 1 | SORT AGGREGATE | | 1 | 13 | | | | | | | 2 | PX COORDINATOR | | | | | | | | | | 3 | PX SEND QC (RANDOM) | :TQ10000| 1 | 13 | | |:Q1000| P->S |QC (RANDOM)| | 4 | SORT AGGREGATE | | 1 | 13 | | |:Q1000| PCWP | | | 5 | PX BLOCK ITERATOR | | 4203 | 53K | 2 | 00:00:01 |:Q1000| PCWC | | | 6 | TABLE ACCESS STORAGE FULL | TEST | 4203 | 53K | 2 | 00:00:01 |:Q1000| PCWP | | --------------------------------------------------+-----------------------------------+-------------------------+
Compared to degree 8 (cost=7):
--------------------------------------------------+-----------------------------------+-------------------------+ | Id | Operation | Name | Rows | Bytes | Cost | Time | TQ |IN-OUT|PQ Distrib | --------------------------------------------------+-----------------------------------+-------------------------+ | 0 | SELECT STATEMENT | | | | 7 | | | | | | 1 | SORT AGGREGATE | | 1 | 13 | | | | | | | 2 | PX COORDINATOR | | | | | | | | | | 3 | PX SEND QC (RANDOM) | :TQ10000| 1 | 13 | | |:Q1000| P->S |QC (RANDOM)| | 4 | SORT AGGREGATE | | 1 | 13 | | |:Q1000| PCWP | | | 5 | PX BLOCK ITERATOR | | 4715 | 60K | 7 | 00:00:01 |:Q1000| PCWC | | | 6 | TABLE ACCESS STORAGE FULL | TEST | 4715 | 60K | 7 | 00:00:01 |:Q1000| PCWP | | --------------------------------------------------+-----------------------------------+-------------------------+
So, this shows that in this case with 2 node RAC and 16 cpus per node that the optimizer uses degree 64 for default degree and is more likely to choose a full scan over an index scan because the cost of the degree 64 full scan is less than that of a degree 8 full scan.
The key point is to understand that putting the keyword PARALLEL by itself on a table or index creation statement instead of PARALLEL 8 (or 4 or 16) can result in unexpectedly high degree. This high degree can cause performance to degrade by allowing individual queries to eat up the parallel query processes leaving other queries to run inefficiently without the expected parallelism. The high degree also reduces the cost of a full scan potentially causing them to be favored over index scans where the index scan would be more efficient.
– Bobby
p.s. Here is a zip of the scripts and logs that I used to create the 10053 traces: zip
Great post. Many thanks.
You’re welcome. Thank you for the comment.