24.08.2014

Optimizing perfomance by re-writing where clause

Technical Value

During processing of our AS Cubes we found ourselves with a strange behavior we couldn’t explain. In order to get better performance with split the fact partition into 8 partitions. The first division is done over contract ID with modulo. The second division takes place by the column „partition_id“ which is part of a dimension.

In order to process all the partitions we need 8 queries, whereby i only show the part which caused the problem:

  • Below you find the working query. Interesting is the bottom part where the NULL Handling takes place. Sine the join is a left outer we have to make sure we have a value. in case we don't, take a 1 instead.

 

  1. SELECT *
  2.  
  3. FROM  fact1
  4.  
  5. .........
  6.  
  7. left outer join dim1 on fact1.random_id = dim1.random_id
  8.  
  9. WHERE  fact1.Month_ID = 201407
  10.  
  11. AND ISNULL(dim1.partition_id,1) = 1
  • Now we show you the query which is causing problems. The only difference between the query is the comparison value for the NULL Handling. Instead of 1 it is 2.
  1. SELECT *
  2.  
  3. FROM  fact1
  4.  
  5. .........
  6.  
  7. left outer join dim1 on fact1.random_id = dim1.random_id
  8.  
  9. WHERE  fact1.Month_ID = 201407
  10.  
  11. AND ISNULL(dim1.partition_id,1) = 2

 

Reading the execution plans there was no difference at all. A simple ETL Package shows the difference in performance. Even though the second query returns way less rows then the first one, its still magnificantly slower.

Partition_ID 1 finished in about 5 Minutes, whereby Partition_ID 2 is stopped afar 10 Minutes with only 20k rows read.

After changing the query to the following the performance was the same.

  • Even though the first one was working fine, we changed it in order to be the same and homogeneous.

 

  1. SELECT *
  2.  
  3. FROM  fact1
  4.  
  5. .........
  6.  
  7. left outer join dim1 on fact1.random_id = dim1.random_id
  8.  
  9. WHERE  fact1.Month_ID = 201407
  10.  
  11.  AND NOT(dim1.Partition_ID != 1 AND NOT(1 = 2 AND dim1.Partition_ID is null))
  • After changing the second query it is fast as expected.
  1. SELECT *
  2.  
  3. FROM  fact1
  4.  
  5. .........
  6.  
  7. left outer join dim1 on fact1.random_id = dim1.random_id
  8.  
  9. WHERE  fact1.Month_ID = 201407
  10.  
  11. AND NOT(dim1.Partition_ID != 2 AND NOT(2 = 2 AND dim1.Partition_ID is null))

 

Since this behavior isn’t correct (at least in my opinion) a ticket at Microsoft has been opened. Till then, this might help other people facing the same situation.

Teilen auf

Newsletter Anmeldung

Abonnieren Sie unseren Newsletter!
Lassen Sie sich regelmäßig über alle Neuigkeiten rundum ORAYLIS und die BI- & Big-Data-Branche informieren.

Jetzt anmelden