BLOG

T-SQL 2012 Window Functions in ELT (part 2)

19.01.2014 Hilmar Buchta

SQL Server 2012 

In my last post I explained some of the capabilities of the SQL Server 2012 window functions. Window function allow us to do computations involving other rows in a result set. Much like in my last post, I’m using the same sensor source for some more demonstrations of window functions and how they can help us in data transformation. Here is a sample of my sensor data:

image_thumb2

As you can see, the sensors deliver their data approximately every 15 minutes. But imagine you need the data exactly in a 15 minutes interval with data being interpolated based on their distance to the 15 minutes slot. Again, ETL (a script task for example) or cursor come into ones mind but let’s assume we want to do the computation in T-SQL. How could we do that.

Here comes a simple approach for such kinds of problems:

  • Perform a union all of the existing data with a fixed interval time stamp data.
  • Sort by timestamp and compute time difference to the row before/after
  • Perform the interpolation

For the time stamp data I use a table like the following one, which contains the true 15 minutes intervals:

image

The union-operation brings both tables together:

  1. select SensorID
  2.        , [DateTime]
  3.        , Temperature
  4.        , Throughput
  5.        , FailCount
  6.        , ‚Orig‘ as [Type]
  7. from [dbo].[SensorData]
  8. union all
  9. (
  10.   select distinct SD.SensorID, T.[Time], null,null,null, ‚Gen‘ from [dbo].[SensorData] SD
  11.   cross join [dbo].[Time] T
  12. )
  13. order by SensorID, [DateTime]

I included an information if the data is from the original data set (‘Orig’) or from the generated time stamps (‘Gen’). The result looks like this:

image

For a more readable query, I load this into a temporary table (#tmp) by wrapping the statement from above with a

  1. select * into #tmp
  2. from ( << query from above >>) Mix

Next step is to compute the previous and next values for the generated rows of data. I load this in a temporary table (#tmp2) too and also compute the time difference to the previous and next row in seconds:

  1. select
  2.     *
  3.     , LAG([DateTime],1) over (partition by sensorid order by [DateTime]) TimeLag
  4.     , LEAD([DateTime],1) over (partition by sensorid order by [DateTime]) TimeLead
  5.     , coalesce(DATEDIFF(second, LAG([DateTime],1) over (partition by sensorid order by [DateTime]), [DateTime]),0) SecondsLag
  6.     , coalesce(DATEDIFF(second, [DateTime], LEAD([DateTime],1) over (partition by sensorid order by [DateTime])),0) SecondsLead
  7.     , LAG([Temperature],1) over (partition by sensorid order by [DateTime]) TemperatureLag
  8.     , LEAD([Temperature],1) over (partition by sensorid order by [DateTime]) TemperatureLead
  9.     , LAG([Throughput],1) over (partition by sensorid order by [DateTime]) ThroughputLag
  10.     , LEAD([Throughput],1) over (partition by sensorid order by [DateTime]) ThroughputLead
  11.     , LAG([FailCount],1) over (partition by sensorid order by [DateTime]) FailCountLag
  12.     , LEAD([FailCount],1) over (partition by sensorid order by [DateTime]) FailCountLead
  13. into #Tmp2
  14. from #Tmp

The most important columns are shown here (only for the temperature):

image

 

Next task is to do the interpolation. I’m using a simple linear blending based on the time difference by updating the records of Type ‘Gen’ appropriately. On a PDW, this would not be a recommended operation. You would rather CTAS (create table as select, similar to select into) the result to a result table:

  1. update #Tmp2
  2. set
  3.     Temperature =
  4.         case
  5.             when TemperatureLag IS NULL then TemperatureLead
  6.             when TemperatureLead IS NULL then TemperatureLag
  7.             else (TemperatureLag*SecondsLead+TemperatureLead*SecondsLag) / (SecondsLag+SecondsLead)
  8.         end
  9.     , ThroughPut =
  10.         case
  11.             when ThroughputLag IS NULL then ThroughputLead
  12.             when ThroughputLead IS NULL then ThroughputLag
  13.             else (ThroughputLag*SecondsLead+ThroughputLead*SecondsLag) / (SecondsLag+SecondsLead)
  14.         end
  15.     , FailCount =
  16.         iif (coalesce(FailCountLag,0)>coalesce(FailCountLead,0),coalesce(FailCountLag,0), coalesce(FailCountLead,0))
  17.     from #Tmp2
  18. where [Type]=‚Gen‘

The result now shows the sensor data perfectly aligned and interpolated on a 15 minutes interval:

image

jetzt Teilen auf

Your email address will not be published.

Join #teamoraylispeople

Gestalte mit uns
die Welt der Daten