SQL Performance Tuning by avoiding not-in-clauses
I have already written about left outer joins. Now I want to show you an example how to use left outer joins in order to write efficient SQL.
At one of our customers we have the following query that runs several hours:
- SELECT DISTINCT [INSERTTIMESTAMP] FROM DATATABLE WHERE [INSERTTIMESTAMP] NOT IN (SELECT [INSERTTIMESTAMP] FROM CTRLTABLE WHERE TARGET_TABLENAME = 'source_01' )
The query gets the distinct insert timestamps of DATATABLE that has not already been used in CTRLTABLE. The query is used for determining in a process flow which data has not already been processed. After processing the data the processed data would be inserted in the CTRLTABLE. The DATATABLE has about 40 million rows. The CTRLTABLE has about 400 rows. The execution plan of this query is shown below: In my opinion it is not advisable to use a not-in-construct in a where clause. If you can avoid it, you should avoid it. In this case you can rewrite the query by using a left outer join:
- SELECT DATATABLE.[INSERTTIMESTAMP] FROM DATATABLE LEFT OUTER JOIN CTRLTABLE ON DATATABLE.[INSERTTIMESTAMP] = CTRLTABLE.[INSERTTIMESTAMP] AND CTRLTABLE.TARGET_TABLENAME = 'source_01' WHERE CTRLTABLE.[INSERTTIMESTAMP] IS NULL GROUP BY DATATABLE.[INSERTTIMESTAMP]
Please note that the filter condition for target_tablename must be used in the on-clause and not in the where-clause. Otherwise we would always get an empty answer set because target_tablename cannot be null and „source_01“ at the same time.
The group by clause is used in order to get distinct values.
The execution plan for the rewritten query looks like this: The new execution plan looks simpler and the query runs only in about 10 seconds instead of several hours.