Analyze GeoJSON Recorded During a Commute from Miami to Fort Lauderdale

GeoJSON recorded during an evening commute from Miami to Fort Lauderdale

Interact with this notebook on Binder .

Resources

Load the data from a Minio instance I have deployed.

In [58]:
import urllib.request
import json
import itertools as it
from pprint import pprint
from functools import partial, reduce
import operator as op

# Define configured pprint suitable for notebooks
_print = partial(pprint, indent=4)


def dhead(d: dict, n=5):
    """Return the first n items from a dictionary."""
    return {k: v for k, v in it.islice(d.items(), 0, n)}


with urllib.request.urlopen(
    "https://minio.apps.selfip.com/mymedia/geojson/MIA-to-FLL-TriRail-2019-10-08.geojson"
) as res:
    data = json.load(res)

for n in range(1, len(data)):
    _print(dhead(data, n))
{'type': 'FeatureCollection'}
{   'crs': {   'properties': {'name': 'urn:ogc:def:crs:OGC:1.3:CRS84'},
               'type': 'name'},
    'type': 'FeatureCollection'}

Practice laziness in the sense of one of the Three Virtues

Issues

  1. I don't like having to retype strings that are dict keys. It's error-prone and taxes my memory. I would prefer a variable that I didn't have to manually define.
    Use Enum to create variables programatically. A plain dict would probably work, too. I like the way that an Enum is represented in output and it's type feature. And I am trying to find use cases for an Enum.
In [99]:
example = dict(FOO="foo")
globals().update(example)
FOO, example
Out[99]:
('foo', {'FOO': 'foo'})

Walk the data structure to get all the keys.

I wrote this function as an inspiration from the Stack Overflow question Access nested dictionary items via a list of keys?

In [30]:
def paths_in_data(data: dict, parent=()):
    """Calculate keys and/or indices in a nested dict."""

    if not any(isinstance(data, type_) for type_ in (dict, list, tuple)):
        return (parent,)
    else:
        try:  # Handle dict
            return reduce(
                op.add,
                (paths_in_data(v, op.add(parent, (k,))) for k, v in data.items()),
                (),
            )
        except AttributeError:  # Handle indexable sequences.
            return reduce(
                op.add,
                (paths_in_data(v, op.add(parent, (data.index(v),))) for v in data),
                (),
            )

Truncated example of the paths generated from paths_in_data.

In [32]:
[path for path in it.takewhile(lambda x: x[-1] != 2, paths_in_data(data))]
Out[32]:
[('type',),
 ('crs', 'type'),
 ('crs', 'properties', 'name'),
 ('features', 0, 'type'),
 ('features', 0, 'properties', 'GUID'),
 ('features', 0, 'properties', 'LABEL_EXPR'),
 ('features', 0, 'properties', 'TITLE'),
 ('features', 0, 'properties', 'LABEL_TEXT'),
 ('features', 0, 'properties', 'NOTES'),
 ('features', 0, 'geometry', 'type'),
 ('features', 0, 'geometry', 'coordinates', 0, 0),
 ('features', 0, 'geometry', 'coordinates', 0, 1)]

Get a set of all the keys.

In [44]:
data_key_set = sorted(
    {key for key in it.chain.from_iterable(paths_in_data(data)) if isinstance(key, str)}
)
_print(data_key_set)
[   'GUID',
    'LABEL_EXPR',
    'LABEL_TEXT',
    'NOTES',
    'TITLE',
    'coordinates',
    'crs',
    'features',
    'geometry',
    'name',
    'properties',
    'type']

Cast data_key_set into valid variable names

In [47]:
from string import digits, whitespace, punctuation

# Transform all whitespace and punctuation into underscores
# Not needed but left here as an example
translation = str.maketrans(dict(zip((*whitespace, *punctuation), it.cycle("_"))))

data_key_set_names = [
    key.translate(translation).strip(digits).upper() for key in data_key_set
]
_print(data_key_set_names)
[   'GUID',
    'LABEL_EXPR',
    'LABEL_TEXT',
    'NOTES',
    'TITLE',
    'COORDINATES',
    'CRS',
    'FEATURES',
    'GEOMETRY',
    'NAME',
    'PROPERTIES',
    'TYPE']

Define an Enum using the functional API.

In [50]:
from enum import Enum

DataKeys = Enum("DataKeys", type=str, names=zip(data_key_set_names, data_key_set))
_print(DataKeys.__members__)
mappingproxy({   'COORDINATES': <DataKeys.COORDINATES: 'coordinates'>,
                 'CRS': <DataKeys.CRS: 'crs'>,
                 'FEATURES': <DataKeys.FEATURES: 'features'>,
                 'GEOMETRY': <DataKeys.GEOMETRY: 'geometry'>,
                 'GUID': <DataKeys.GUID: 'GUID'>,
                 'LABEL_EXPR': <DataKeys.LABEL_EXPR: 'LABEL_EXPR'>,
                 'LABEL_TEXT': <DataKeys.LABEL_TEXT: 'LABEL_TEXT'>,
                 'NAME': <DataKeys.NAME: 'name'>,
                 'NOTES': <DataKeys.NOTES: 'NOTES'>,
                 'PROPERTIES': <DataKeys.PROPERTIES: 'properties'>,
                 'TITLE': <DataKeys.TITLE: 'TITLE'>,
                 'TYPE': <DataKeys.TYPE: 'type'>})

Add names from DataKeys to global namespace.

In [51]:
globals().update(DataKeys.__members__)

Inspect a variable

In [70]:
_print((FEATURES, type(FEATURES), isinstance(FEATURES, str)))
(<DataKeys.FEATURES: 'features'>, <enum 'DataKeys'>, True)

Get some specific data

In [71]:
def get_from(data, path):
    """Get a leaf from iterable of keys and/or indices.
    
    :data: Collection where nodes are either a dict or list.
    :path: Collection of keys and/or indices leading to a leaf.
    """
    return reduce(op.getitem, path, data)
In [76]:
paths = [
    (TYPE,),
    (CRS, TYPE),
    (CRS, PROPERTIES, NAME),
    (FEATURES, 0, GEOMETRY, COORDINATES, 0, 1),
]

for path in paths:
    _print(get_from(data, path))
'FeatureCollection'
'name'
'urn:ogc:def:crs:OGC:1.3:CRS84'
25.80153849443961

View in Pandas DataFrame

In [93]:
names = "lon lat ele".split()


class PandasColumn(Enum):
    """Extend Enum so that when a member is used as a Pandas data frame column its value is displayed."""

    def __str__(self):
        return self.value


CoordinateColumns = PandasColumn(
    "CoordinateColumn", type=str, names=zip((name.upper() for name in names), names)
)
globals().update(CoordinateColumns.__members__)
In [94]:
import pandas as pd

df = pd.DataFrame(
    get_from(data, (FEATURES, 0, GEOMETRY, COORDINATES)),
    columns=CoordinateColumns.__members__.values(),
)
df.head()
Out[94]:
lon lat ele
0 -80.203793 25.801538 -0.058535
1 -80.203824 25.801507 10.088560
2 -80.203784 25.801589 11.503721
3 -80.203711 25.801508 9.746153
4 -80.203605 25.801513 9.274504
In [90]:
df[LAT]
Out[90]:
0       25.801538
1       25.801507
2       25.801589
3       25.801508
4       25.801513
          ...    
1102    26.119918
1103    26.119874
1104    26.119792
1105    26.119739
1106    26.119739
Name: CoordinateColumn.LAT, Length: 1107, dtype: float64

Conclusions

I was hoping that there would be some time information in the GeoJSON data.

After exploring the export options in Map Plus, I discovered an XML format that includes times. This will be more interesting.

The GeoJSON is adequate for longitude, latitude and elevation data. TODO: Display GeoJSON data in a Jupyter notebook.

GPX formatted data for same trip.

<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<gpx xmlns="http://www.topografix.com/GPX/1/1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:gpx_style="http://www.topografix.com/GPX/gpx_style/0/2" xsi:schemaLocation="http://www.topografix.com/GPX/1/1 http://www.topografix.com/GPX/1/1/gpx.xsd http://www.topografix.com/GPX/gpx_style/0/2 http://www.topografix.com/GPX/gpx_style/0/2/gpx_style.xsd" version="1.1" creator="Map Plus 2.8.6.2">
  <metadata>
    <link href="http://www.duweis.com">
      <text>Map Plus</text>
    </link>
    <time>2019-10-09T15:18:41Z</time>
  </metadata>

  <trk>
    <name>10/8/19</name>
    <cmt>50 km, 1 h 29 min</cmt>
    <extensions>
      <gpx_style:line>
        <gpx_style:color>ff7a00</gpx_style:color>
        <gpx_style:width>4000</gpx_style:width>
      </gpx_style:line>
    </extensions>
    <trkseg>
      <trkpt lat="25.80153849443961" lon="-80.20379332833011">
        <ele>-0.05853462</ele>
        <time>2019-10-09T00:55:50Z</time>
      </trkpt>
      <trkpt lat="25.80150727185029" lon="-80.20382425755281">
        <ele>10.08856</ele>
        <time>2019-10-09T00:55:54Z</time>
      </trkpt>