24.08.2014

Optimizing perfomance by re-writing where clause

Technical Value

During processing of our AS Cubes we found ourselves with a strange behavior we couldn’t explain. In order to get better performance with split the fact partition into 8 partitions. The first division is done over contract ID with modulo. The second division takes place by the column „partition_id“ which is part of a dimension.

In order to process all the partitions we need 8 queries, whereby i only show the part which caused the problem:

  • Below you find the working query. Interesting is the bottom part where the NULL Handling takes place. Sine the join is a left outer we have to make sure we have a value. in case we don't, take a 1 instead.

 

  1. SELECT *
  2.  
  3. FROM  fact1
  4.  
  5. .........
  6.  
  7. left outer join dim1 on fact1.random_id = dim1.random_id
  8.  
  9. WHERE  fact1.Month_ID = 201407
  10.  
  11. AND ISNULL(dim1.partition_id,1) = 1
  • Now we show you the query which is causing problems. The only difference between the query is the comparison value for the NULL Handling. Instead of 1 it is 2.
  1. SELECT *
  2.  
  3. FROM  fact1
  4.  
  5. .........
  6.  
  7. left outer join dim1 on fact1.random_id = dim1.random_id
  8.  
  9. WHERE  fact1.Month_ID = 201407
  10.  
  11. AND ISNULL(dim1.partition_id,1) = 2

 

Reading the execution plans there was no difference at all. A simple ETL Package shows the difference in performance. Even though the second query returns way less rows then the first one, its still magnificantly slower.

Partition_ID 1 finished in about 5 Minutes, whereby Partition_ID 2 is stopped afar 10 Minutes with only 20k rows read.

After changing the query to the following the performance was the same.

  • Even though the first one was working fine, we changed it in order to be the same and homogeneous.

 

  1. SELECT *
  2.  
  3. FROM  fact1
  4.  
  5. .........
  6.  
  7. left outer join dim1 on fact1.random_id = dim1.random_id
  8.  
  9. WHERE  fact1.Month_ID = 201407
  10.  
  11.  AND NOT(dim1.Partition_ID != 1 AND NOT(1 = 2 AND dim1.Partition_ID is null))
  • After changing the second query it is fast as expected.
  1. SELECT *
  2.  
  3. FROM  fact1
  4.  
  5. .........
  6.  
  7. left outer join dim1 on fact1.random_id = dim1.random_id
  8.  
  9. WHERE  fact1.Month_ID = 201407
  10.  
  11. AND NOT(dim1.Partition_ID != 2 AND NOT(2 = 2 AND dim1.Partition_ID is null))

 

Since this behavior isn’t correct (at least in my opinion) a ticket at Microsoft has been opened. Till then, this might help other people facing the same situation.

Neuen Kommentar schreiben

Der Inhalt dieses Feldes wird nicht öffentlich zugänglich angezeigt.

Klartext

  • Keine HTML-Tags erlaubt.
  • HTML - Zeilenumbrüche und Absätze werden automatisch erzeugt.
  • Web page addresses and email addresses turn into links automatically.
Teilen auf

Newsletter Anmeldung

Abonnieren Sie unseren Newsletter!
Lassen Sie sich regelmäßig über alle Neuigkeiten rundum ORAYLIS und die BI- & Big-Data-Branche informieren.

Jetzt anmelden