06.01.2015

Using Open Street Map data in Power Query

Technical Value

This blog post is about using XML data from OpenStreetMap in PowerQuery for example to create a local geocoding database. PowerQuery is just perfect for this purpose as it allows us to interactively shape the input data to the desired format. First, we need to download some data from OpenStreetMap. You may go to the OpenStreetMap webpage, zoom to the desired area and then hit the export button. Alternatively, download links are provided for full country or region files after hitting ‘Export’. image The downloaded OSM file is in XML format and starts like this:

  1. <?xml version="1.0" encoding="UTF-8"?>
  2. <osm version="0.6" generator="CGImap 0.3.3 (733 thorn-01.openstreetmap.org)" copyright="OpenStreetMap and contributors" attribution="
  3. http://www.openstreetmap.org/copyright"
  4. license="
  5. http://opendatacommons.org/licenses/odbl/1-0/"
  6. >

Now, open Excel and create a new PowerQuery from an XML file: image Next, select the downloaded file and hit Enter. PowerQuery loads with the file contents: imageThe data we’re interested in is stored in the nodes-table. So we can now click on the ‘Table’ link in the nodes column: imageWe now have to expand the ‘tag’ information: imageSince we’re not interested in all the data from the file, we limit the information to country, city, postcode, street and housenumber by using the filter function: imageWe can now delete all columns but the following:

  • tag.attribute.k
  • tag.attribute.v
  • attribute:id
  • attribute:lat
  • attribute:long

imageimageThis is pretty much the data we’d like to obtain from the OSM file. However, we still need to pivot to information from the rows into columns to see city, postcode etc. side by side with the geo coordinates. In order to do so, we select column tag.Attribut:k and use the Pivot function from the Transform ribbon as shown below: imageFinally, we can rename the columns and filter for rows where country is not null: imageThe list now contains addresses (city, postcode, street, house number) together with the corresponding geographical coordinates (latitude, longitude). For your reference, this is the script we created so far:

  1. let
  2. Source = Xml.Tables(File.Contents("C:\Temp\download.osm")),
  3. node = Source{0}[node],
  4. #"Changed Type" = Table.TransformColumnTypes(node,{{"Attribute:id", Int64.Type}, {"Attribute:visible", type logical}, {"Attribute:version", Int64.Type}, {"Attribute:changeset", Int64.Type}, {"Attribute:timestamp", type datetime}, {"Attribute:user", type text}, {"Attribute:uid", Int64.Type}, {"Attribute:lat", type number}, {"Attribute:lon", type number}}),
  5. #"Expand tag" = Table.ExpandTableColumn(#"Changed Type", "tag", {"Attribute:k", "Attribute:v"}, {"tag.Attribute:k", "tag.Attribute:v"}),
  6. #"Filtered Rows" = Table.SelectRows(#"Expand tag", each ([#"tag.Attribute:k"] = "addr:city" or [#"tag.Attribute:k"] = "addr:country" or [#"tag.Attribute:k"] = "addr:housenumber" or [#"tag.Attribute:k"] = "addr:postcode" or [#"tag.Attribute:k"] = "addr:street")),
  7. #"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"Attribute:visible", "Attribute:version", "Attribute:changeset", "Attribute:timestamp", "Attribute:user", "Attribute:uid"}),
  8. #"Pivoted Column" = Table.Pivot(#"Removed Columns", List.Distinct(#"Removed Columns"[#"tag.Attribute:k"]), "tag.Attribute:k", "tag.Attribute:v"),
  9. #"Renamed Columns" = Table.RenameColumns(#"Pivoted Column",{{"Attribute:id", "id"}, {"Attribute:lat", "latitude"}, {"Attribute:lon", "longitude"}, {"addr:city", "city"}, {"addr:postcode", "postcode"}, {"addr:street", "street"}, {"addr:country", "country"}, {"addr:housenumber", "housenumber"}}),
  10. #"Filtered Rows1" = Table.SelectRows(#"Renamed Columns", each ([country] = "DE"))
  11. in
  12. #"Filtered Rows1"

The result can then be loaded to a PowerPivot data model (click on File, then Load to…): imageAfter loading the data into the model, you can for example use PowerMaps to visualize the addresses on a map: imageOf course, you could have received the similar result using the geocoding functionality of PowerMaps. Therefore, you simply need to add another column for the full address like this: imageThe result is pretty much the same but it takes a significant amount of time to geo code many addresses on this level of detail compared to the direct approach from above, which doesn’t need to pass geo coding web service because we already provided latitude and longitude. imageSo, if you have to geo code a lot of addresses, the OpenStreetMap XML format may be a lot of help. Using PowerQuery makes it easy to load this data into a PowerPivot data model. For larger regions make sure you have enough memory available in the local machine (and use the 64bit version of Microsoft Excel).

Teilen auf

Newsletter Anmeldung

Abonnieren Sie unseren Newsletter!
Lassen Sie sich regelmäßig über alle Neuigkeiten rundum ORAYLIS und die BI- & Big-Data-Branche informieren.

Jetzt anmelden