Analyze GeoJSON Recorded During a Commute from Miami to Fort Lauderdale

GeoJSON recorded during an evening commute from Miami to Fort Lauderdale

Interact with this notebook on Binder .


Load the data from a Minio instance I have deployed.

In [58]:
import urllib.request
import json
import itertools as it
from pprint import pprint
from functools import partial, reduce
import operator as op

# Define configured pprint suitable for notebooks
_print = partial(pprint, indent=4)

def dhead(d: dict, n=5):
    """Return the first n items from a dictionary."""
    return {k: v for k, v in it.islice(d.items(), 0, n)}

with urllib.request.urlopen(
) as res:
    data = json.load(res)

for n in range(1, len(data)):
    _print(dhead(data, n))
{'type': 'FeatureCollection'}
{   'crs': {   'properties': {'name': 'urn:ogc:def:crs:OGC:1.3:CRS84'},
               'type': 'name'},
    'type': 'FeatureCollection'}

Practice laziness in the sense of one of the Three Virtues


  1. I don't like having to retype strings that are dict keys. It's error-prone and taxes my memory. I would prefer a variable that I didn't have to manually define.
    Use Enum to create variables programatically. A plain dict would probably work, too. I like the way that an Enum is represented in output and it's type feature. And I am trying to find use cases for an Enum.
In [99]:
example = dict(FOO="foo")
FOO, example
('foo', {'FOO': 'foo'})

Walk the data structure to get all the keys.

I wrote this function as an inspiration from the Stack Overflow question Access nested dictionary items via a list of keys?

In [30]:
def paths_in_data(data: dict, parent=()):
    """Calculate keys and/or indices in a nested dict."""

    if not any(isinstance(data, type_) for type_ in (dict, list, tuple)):
        return (parent,)
        try:  # Handle dict
            return reduce(
                (paths_in_data(v, op.add(parent, (k,))) for k, v in data.items()),
        except AttributeError:  # Handle indexable sequences.
            return reduce(
                (paths_in_data(v, op.add(parent, (data.index(v),))) for v in data),

Truncated example of the paths generated from paths_in_data.

In [32]:
[path for path in it.takewhile(lambda x: x[-1] != 2, paths_in_data(data))]
 ('crs', 'type'),
 ('crs', 'properties', 'name'),
 ('features', 0, 'type'),
 ('features', 0, 'properties', 'GUID'),
 ('features', 0, 'properties', 'LABEL_EXPR'),
 ('features', 0, 'properties', 'TITLE'),
 ('features', 0, 'properties', 'LABEL_TEXT'),
 ('features', 0, 'properties', 'NOTES'),
 ('features', 0, 'geometry', 'type'),
 ('features', 0, 'geometry', 'coordinates', 0, 0),
 ('features', 0, 'geometry', 'coordinates', 0, 1)]

Get a set of all the keys.

In [44]:
data_key_set = sorted(
    {key for key in it.chain.from_iterable(paths_in_data(data)) if isinstance(key, str)}
[   'GUID',

Cast data_key_set into valid variable names

In [47]:
from string import digits, whitespace, punctuation

# Transform all whitespace and punctuation into underscores
# Not needed but left here as an example
translation = str.maketrans(dict(zip((*whitespace, *punctuation), it.cycle("_"))))

data_key_set_names = [
    key.translate(translation).strip(digits).upper() for key in data_key_set
[   'GUID',

Define an Enum using the functional API.

In [50]:
from enum import Enum

DataKeys = Enum("DataKeys", type=str, names=zip(data_key_set_names, data_key_set))
mappingproxy({   'COORDINATES': <DataKeys.COORDINATES: 'coordinates'>,
                 'CRS': <DataKeys.CRS: 'crs'>,
                 'FEATURES': <DataKeys.FEATURES: 'features'>,
                 'GEOMETRY': <DataKeys.GEOMETRY: 'geometry'>,
                 'GUID': <DataKeys.GUID: 'GUID'>,
                 'LABEL_EXPR': <DataKeys.LABEL_EXPR: 'LABEL_EXPR'>,
                 'LABEL_TEXT': <DataKeys.LABEL_TEXT: 'LABEL_TEXT'>,
                 'NAME': <DataKeys.NAME: 'name'>,
                 'NOTES': <DataKeys.NOTES: 'NOTES'>,
                 'PROPERTIES': <DataKeys.PROPERTIES: 'properties'>,
                 'TITLE': <DataKeys.TITLE: 'TITLE'>,
                 'TYPE': <DataKeys.TYPE: 'type'>})

Add names from DataKeys to global namespace.

In [51]:

Inspect a variable

In [70]:
_print((FEATURES, type(FEATURES), isinstance(FEATURES, str)))
(<DataKeys.FEATURES: 'features'>, <enum 'DataKeys'>, True)

Get some specific data

In [71]:
def get_from(data, path):
    """Get a leaf from iterable of keys and/or indices.
    :data: Collection where nodes are either a dict or list.
    :path: Collection of keys and/or indices leading to a leaf.
    return reduce(op.getitem, path, data)
In [76]:
paths = [
    (CRS, TYPE),

for path in paths:
    _print(get_from(data, path))

View in Pandas DataFrame

In [93]:
names = "lon lat ele".split()

class PandasColumn(Enum):
    """Extend Enum so that when a member is used as a Pandas data frame column its value is displayed."""

    def __str__(self):
        return self.value

CoordinateColumns = PandasColumn(
    "CoordinateColumn", type=str, names=zip((name.upper() for name in names), names)
In [94]:
import pandas as pd

df = pd.DataFrame(
    get_from(data, (FEATURES, 0, GEOMETRY, COORDINATES)),
lon lat ele
0 -80.203793 25.801538 -0.058535
1 -80.203824 25.801507 10.088560
2 -80.203784 25.801589 11.503721
3 -80.203711 25.801508 9.746153
4 -80.203605 25.801513 9.274504
In [90]:
0       25.801538
1       25.801507
2       25.801589
3       25.801508
4       25.801513
1102    26.119918
1103    26.119874
1104    26.119792
1105    26.119739
1106    26.119739
Name: CoordinateColumn.LAT, Length: 1107, dtype: float64


I was hoping that there would be some time information in the GeoJSON data.

After exploring the export options in Map Plus, I discovered an XML format that includes times. This will be more interesting.

The GeoJSON is adequate for longitude, latitude and elevation data. TODO: Display GeoJSON data in a Jupyter notebook.

GPX formatted data for same trip.

<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<gpx xmlns="" xmlns:xsi="" xmlns:gpx_style="" xsi:schemaLocation="" version="1.1" creator="Map Plus">
    <link href="">
      <text>Map Plus</text>

    <cmt>50 km, 1 h 29 min</cmt>
      <trkpt lat="25.80153849443961" lon="-80.20379332833011">
      <trkpt lat="25.80150727185029" lon="-80.20382425755281">