BLOG

Fill down table in T-SQL (last-non-empty value)

11.01.2016 Hilmar Buchta

A common task in data processing is to fill null values in a table with the latest existing value. For example, look at the following inventory table

image

Let’s assume we only get an inventory record, if the inventory changes but we want to fill the missing values (for example to create a snapshot fact table).

The final result should look like this with null values being filled with the last existing value:

image

In Power Query (or “Get and Transform” if you’re using Excel 2016) this is an easy task using the fill-down function, which does exactly what want:

image

So, how can we get the same effect in T-SQL? We could use a sub query, but in this post I’m using a window function. The idea is to generate row groups for which each row group starts with a not-null value and has a unique identifier. This allows us to use the first_value function for each group to get the existing value for that group.

In order to build the row groups I start with a change indicator for the table values. The idendicator is 1 if we have a balance, 0 if not. The resulting query is quite simple:

 

  1. select *, case when UnitsBalance is null then 0 else 1 end ChangeIndicator from [dbo].[Inventory]

image

The next step is to create a running total over the change indicator. In fact, this running total already defines our row groups. I’m using the sum() over window function here. The query from above is converted into a common table expression (cte) as this is easier to read:

  1. with
  2. help1 as (
  3. select *, case when UnitsBalance is null then 0 else 1 end ChangeIndicator
  4. from [dbo].[Inventory]
  5. )
  6.  
  7. select *, Sum(ChangeIndicator) over (order by DateKey) RowGroup from help1
  8. order by Datekey

image

As you see, the number for the row group increments with each existing balance value which means that the first row always contains the number (with the exception of the very first line for which we don’t have a value at all). In order to finalize the query, we just need the first_value function to get the value of the first row per row group:

  1. with
  2. help1 as (
  3. select *, case when UnitsBalance is null then 0 else 1 end ChangeIndicator
  4. from [dbo].[Inventory]
  5. )
  6. , help2 as (
  7. select *, Sum(ChangeIndicator) over (order by DateKey) RowGroup from help1
  8. )
  9. select *,
  10.  
  11.  
  12. case when UnitsBalance is not null then UnitsBalance
  13. else first_value(UnitsBalance) over (partition by RowGroup order by DateKey)
  14. end UnitsBalanceFillDown
  15. from help2
  16. order by datekey

 

image

So this is exactly what we wanted (if you remove the columns that we just needed during the process from the final output, you get exactly the result from above).

 

Conclusion

Windows functions in T-SQL are a good solution to fill gaps in your data set similar to last-non-empty.
As I’m using the first_value function here, just as a reminder to avoid frustration: first_value order by date descending and last_value order by date ascending do not give the same result.

Your email address will not be published. Required fields are marked *

Join #teamoraylispeople

Gestalte mit uns
die Welt der Daten