04.01.2014

T-SQL 2012 Window Functions in ELT (part 2)

Technical Value

SQL Server 2012

In my last post I explained some of the capabilities of the SQL Server 2012 window functions. Window function allow us to do computations involving other rows in a result set. Much like in my last post, I’m using the same sensor source for some more demonstrations of window functions and how they can help us in data transformation. Here is a sample of my sensor data:

image_thumb2

As you can see, the sensors deliver their data approximately every 15 minutes. But imagine you need the data exactly in a 15 minutes interval with data being interpolated based on their distance to the 15 minutes slot. Again, ETL (a script task for example) or cursor come into ones mind but let’s assume we want to do the computation in T-SQL. How could we do that.

Here comes a simple approach for such kinds of problems:

  • Perform a union all of the existing data with a fixed interval time stamp data.
  • Sort by timestamp and compute time difference to the row before/after
  • Perform the interpolation

For the time stamp data I use a table like the following one, which contains the true 15 minutes intervals:

image

The union-operation brings both tables together:

  1. select SensorID      
  2.        , [DateTime]      
  3.        , Temperature      
  4.        , Throughput      
  5.        , FailCount      
  6.        , 'Orig' as [Type]      
  7. from [dbo].[SensorData]      
  8. union all      
  9. (      
  10.   select distinct SD.SensorID, T.[Time], null,null,null, 'Gen' from [dbo].[SensorData] SD      
  11.   cross join [dbo].[Time] T      
  12. )      
  13. order by SensorID, [DateTime]

I included an information if the data is from the original data set (‘Orig’) or from the generated time stamps (‘Gen’). The result looks like this:

image

For a more readable query, I load this into a temporary table (#tmp) by wrapping the statement from above with a

  1. select * into #tmp      
  2. from ( << query from above >>) Mix

Next step is to compute the previous and next values for the generated rows of data. I load this in a temporary table (#tmp2) too and also compute the time difference to the previous and next row in seconds:

  1. select      
  2.     *      
  3.     , LAG([DateTime],1) over (partition by sensorid order by [DateTime]) TimeLag      
  4.     , LEAD([DateTime],1) over (partition by sensorid order by [DateTime]) TimeLead
  5.     , coalesce(DATEDIFF(second, LAG([DateTime],1) over (partition by sensorid order by [DateTime]), [DateTime]),0) SecondsLag      
  6.     , coalesce(DATEDIFF(second, [DateTime], LEAD([DateTime],1) over (partition by sensorid order by [DateTime])),0) SecondsLead
  7.     , LAG([Temperature],1) over (partition by sensorid order by [DateTime]) TemperatureLag      
  8.     , LEAD([Temperature],1) over (partition by sensorid order by [DateTime]) TemperatureLead      
  9.     , LAG([Throughput],1) over (partition by sensorid order by [DateTime]) ThroughputLag      
  10.     , LEAD([Throughput],1) over (partition by sensorid order by [DateTime]) ThroughputLead      
  11.     , LAG([FailCount],1) over (partition by sensorid order by [DateTime]) FailCountLag      
  12.     , LEAD([FailCount],1) over (partition by sensorid order by [DateTime]) FailCountLead      
  13. into #Tmp2      
  14. from #Tmp

The most important columns are shown here (only for the temperature):

image

 

Next task is to do the interpolation. I’m using a simple linear blending based on the time difference by updating the records of Type ‘Gen’ appropriately. On a PDW, this would not be a recommended operation. You would rather CTAS (create table as select, similar to select into) the result to a result table:

  1. update #Tmp2      
  2. set      
  3.     Temperature =      
  4.         case      
  5.             when TemperatureLag IS NULL then TemperatureLead      
  6.             when TemperatureLead IS NULL then TemperatureLag      
  7.             else (TemperatureLag*SecondsLead+TemperatureLead*SecondsLag) / (SecondsLag+SecondsLead)      
  8.         end
  9.     , ThroughPut =      
  10.         case      
  11.             when ThroughputLag IS NULL then ThroughputLead      
  12.             when ThroughputLead IS NULL then ThroughputLag      
  13.             else (ThroughputLag*SecondsLead+ThroughputLead*SecondsLag) / (SecondsLag+SecondsLead)      
  14.         end      
  15.    
  16.     , FailCount =      
  17.         iif (coalesce(FailCountLag,0)>coalesce(FailCountLead,0),coalesce(FailCountLag,0), coalesce(FailCountLead,0))
  18.     from #Tmp2      
  19. where [Type]='Gen'

The result now shows the sensor data perfectly aligned and interpolated on a 15 minutes interval:

image

Teilen auf

Newsletter Anmeldung

Abonnieren Sie unseren Newsletter!
Lassen Sie sich regelmäßig über alle Neuigkeiten rundum ORAYLIS und die BI- & Big-Data-Branche informieren.

Jetzt anmelden