Analyze GPX Data Recorded During A Flight from Fort Lauderdale to New Orleans

Currently editing on the plane.

GPX data recorded during an early morning Southwest Flight from Fort Lauderdale to New Orleans

Interact with this notebook on Binder .

Resources

Load the data from a Minio instance I have deployed.

In [144]:
import urllib.request
import itertools as it
from pprint import pprint
from functools import partial, reduce
import operator as op

# Define configured pprint suitable for notebooks
pprint_ = partial(pprint, indent=4)


def dhead(d: dict, n=5):
    """Return the first n items from a dictionary."""
    return {k: v for k, v in it.islice(d.items(), 0, n)}


with urllib.request.urlopen(
    "https://minio.apps.selfip.com/mymedia/gpx/fort_lauderdale__to__new_orleans.gpx"
) as res:
    data = res.read()

print(data.splitlines()[:10])
[b'<?xml version="1.0" encoding="UTF-8" standalone="no" ?>', b'<gpx xmlns="http://www.topografix.com/GPX/1/1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:gpx_style="http://www.topografix.com/GPX/gpx_style/0/2" xsi:schemaLocation="http://www.topografix.com/GPX/1/1 http://www.topografix.com/GPX/1/1/gpx.xsd http://www.topografix.com/GPX/gpx_style/0/2 http://www.topografix.com/GPX/gpx_style/0/2/gpx_style.xsd" version="1.1" creator="Map Plus 2.8.7.1">', b'  <metadata>', b'    <link href="http://www.duweis.com">', b'      <text>Map Plus</text>', b'    </link>', b'    <time>2019-10-17T12:33:03Z</time>', b'  </metadata>', b'', b'  <trk>']

Parse the GPX file

In [145]:
from lxml import etree
In [146]:
tree = etree.fromstring(data, etree.XMLParser())

Display set of tags

In [147]:
{element.tag for element in tree.iter()}
Out[147]:
{'{http://www.topografix.com/GPX/1/1}cmt',
 '{http://www.topografix.com/GPX/1/1}ele',
 '{http://www.topografix.com/GPX/1/1}extensions',
 '{http://www.topografix.com/GPX/1/1}gpx',
 '{http://www.topografix.com/GPX/1/1}link',
 '{http://www.topografix.com/GPX/1/1}metadata',
 '{http://www.topografix.com/GPX/1/1}name',
 '{http://www.topografix.com/GPX/1/1}text',
 '{http://www.topografix.com/GPX/1/1}time',
 '{http://www.topografix.com/GPX/1/1}trk',
 '{http://www.topografix.com/GPX/1/1}trkpt',
 '{http://www.topografix.com/GPX/1/1}trkseg',
 '{http://www.topografix.com/GPX/gpx_style/0/2}color',
 '{http://www.topografix.com/GPX/gpx_style/0/2}line',
 '{http://www.topografix.com/GPX/gpx_style/0/2}width'}
In [149]:
set(tree.iterchildren()) == set(tree.iter())
Out[149]:
False
In [153]:
meta, trk, = tree.iterchildren()
In [160]:
*_, trkseg = trk.iterchildren()
In [162]:
data_points = list(trkseg.iter())
In [166]:
tags = {item.tag for item in data_points}

There is 1 trkseg element. It may be the root of all the location points.

In [171]:
[(tag, len([element for element in data_points if element.tag == tag])) for tag in tags]
Out[171]:
[('{http://www.topografix.com/GPX/1/1}ele', 1824),
 ('{http://www.topografix.com/GPX/1/1}time', 1829),
 ('{http://www.topografix.com/GPX/1/1}trkpt', 1829),
 ('{http://www.topografix.com/GPX/1/1}trkseg', 1)]
In [211]:
trkpnt_children = list(trkseg.iterchildren())
In [226]:
from collections import namedtuple
In [244]:
TrackPoint = namedtuple('TrackPoint', ('coordinate', 'ele', 'time'))
In [247]:
trkpnts_ = (
    ((element.attrib,), tuple(e.text for e in element.iterdescendants()))
    for element in trkpnt_children
)

trkpnts = [tuple(it.chain(*item)) for item in trkpnts_]

Not all items have a all three of ('coordinate', 'ele', 'time')

In [248]:
{len(items) for items in trkpnts}
Out[248]:
{2, 3}

See what is missing in those with only 2 parts.

In [251]:
[items for items in trkpnts if len(items) == 2]
Out[251]:
[({'lat': '26.07364077867658', 'lon': '-80.13974719286466'},
  '2019-10-17T10:32:23Z'),
 ({'lat': '26.07330806773481', 'lon': '-80.13861468216476'},
  '2019-10-17T10:32:44Z'),
 ({'lat': '26.07329140328246', 'lon': '-80.13861683228613'},
  '2019-10-17T10:33:09Z'),
 ({'lat': '26.07358034244973', 'lon': '-80.13865072072218'},
  '2019-10-17T10:34:37Z'),
 ({'lat': '26.07370411580322', 'lon': '-80.14069088505309'},
  '2019-10-17T10:34:46Z')]
In [252]:
[items for items in trkpnts if len(items) == 3][:5]
Out[252]:
[({'lat': '26.07408333333334', 'lon': '-80.136275'},
  '14',
  '2019-10-17T10:16:44Z'),
 ({'lat': '26.07371', 'lon': '-80.13643666666667'},
  '5.8',
  '2019-10-17T10:16:52Z'),
 ({'lat': '26.07379666666666', 'lon': '-80.13646999999999'},
  '1.1',
  '2019-10-17T10:17:24Z'),
 ({'lat': '26.07390333333334', 'lon': '-80.13640000000001'},
  '5.3',
  '2019-10-17T10:17:47Z'),
 ({'lat': '26.07400833333334', 'lon': '-80.13633'},
  '5.9',
  '2019-10-17T10:18:32Z')]

Rewrite the comprehensions to account for a lack of ele in a trkpnt.

In [ ]:
def trkpnt_handler(items):
    """Insert a None if there is no ele data point."""
    # Item at index 1 should be a digit.
    try:
        ele = float(items)
In [247]:
trkpnts_ = (
    ((element.attrib,), tuple(e.text for e in element.iterdescendants()))
    for element in trkpnt_children
)

trkpnts = [tuple(it.chain(*item)) for item in trkpnts_]

Begin copied cells below from another post.

Practice laziness in the sense of one of the Three Virtues: laziness, impatience, hubris

Issues

  1. I don't like having to retype strings that are dict keys. It's error-prone and taxes my memory. I would prefer a variable that I didn't have to manually define.
    Use Enum to create variables programatically. A plain dict would probably work, too. I like the way that an Enum is represented in output and it's type feature. And I am trying to find use cases for an Enum.
In [99]:
example = dict(FOO="foo")
globals().update(example)
FOO, example
Out[99]:
('foo', {'FOO': 'foo'})

Walk the data structure to get all the keys.

I wrote this function as an inspiration from the Stack Overflow question Access nested dictionary items via a list of keys?

In [30]:
def paths_in_data(data: dict, parent=()):
    """Calculate keys and/or indices in a nested dict."""

    if not any(isinstance(data, type_) for type_ in (dict, list, tuple)):
        return (parent,)
    else:
        try:  # Handle dict
            return reduce(
                op.add,
                (paths_in_data(v, op.add(parent, (k,))) for k, v in data.items()),
                (),
            )
        except AttributeError:  # Handle indexable sequences.
            return reduce(
                op.add,
                (paths_in_data(v, op.add(parent, (data.index(v),))) for v in data),
                (),
            )

Truncated example of the paths generated from paths_in_data.

In [32]:
[path for path in it.takewhile(lambda x: x[-1] != 2, paths_in_data(data))]
Out[32]:
[('type',),
 ('crs', 'type'),
 ('crs', 'properties', 'name'),
 ('features', 0, 'type'),
 ('features', 0, 'properties', 'GUID'),
 ('features', 0, 'properties', 'LABEL_EXPR'),
 ('features', 0, 'properties', 'TITLE'),
 ('features', 0, 'properties', 'LABEL_TEXT'),
 ('features', 0, 'properties', 'NOTES'),
 ('features', 0, 'geometry', 'type'),
 ('features', 0, 'geometry', 'coordinates', 0, 0),
 ('features', 0, 'geometry', 'coordinates', 0, 1)]

Get a set of all the keys.

In [44]:
data_key_set = sorted(
    {key for key in it.chain.from_iterable(paths_in_data(data)) if isinstance(key, str)}
)
_print(data_key_set)
[   'GUID',
    'LABEL_EXPR',
    'LABEL_TEXT',
    'NOTES',
    'TITLE',
    'coordinates',
    'crs',
    'features',
    'geometry',
    'name',
    'properties',
    'type']

Cast data_key_set into valid variable names

In [47]:
from string import digits, whitespace, punctuation

# Transform all whitespace and punctuation into underscores
# Not needed but left here as an example
translation = str.maketrans(dict(zip((*whitespace, *punctuation), it.cycle("_"))))

data_key_set_names = [
    key.translate(translation).strip(digits).upper() for key in data_key_set
]
_print(data_key_set_names)
[   'GUID',
    'LABEL_EXPR',
    'LABEL_TEXT',
    'NOTES',
    'TITLE',
    'COORDINATES',
    'CRS',
    'FEATURES',
    'GEOMETRY',
    'NAME',
    'PROPERTIES',
    'TYPE']

Define an Enum using the functional API.

In [50]:
from enum import Enum

DataKeys = Enum("DataKeys", type=str, names=zip(data_key_set_names, data_key_set))
_print(DataKeys.__members__)
mappingproxy({   'COORDINATES': <DataKeys.COORDINATES: 'coordinates'>,
                 'CRS': <DataKeys.CRS: 'crs'>,
                 'FEATURES': <DataKeys.FEATURES: 'features'>,
                 'GEOMETRY': <DataKeys.GEOMETRY: 'geometry'>,
                 'GUID': <DataKeys.GUID: 'GUID'>,
                 'LABEL_EXPR': <DataKeys.LABEL_EXPR: 'LABEL_EXPR'>,
                 'LABEL_TEXT': <DataKeys.LABEL_TEXT: 'LABEL_TEXT'>,
                 'NAME': <DataKeys.NAME: 'name'>,
                 'NOTES': <DataKeys.NOTES: 'NOTES'>,
                 'PROPERTIES': <DataKeys.PROPERTIES: 'properties'>,
                 'TITLE': <DataKeys.TITLE: 'TITLE'>,
                 'TYPE': <DataKeys.TYPE: 'type'>})

Add names from DataKeys to global namespace.

In [51]:
globals().update(DataKeys.__members__)

Inspect a variable

In [70]:
_print((FEATURES, type(FEATURES), isinstance(FEATURES, str)))
(<DataKeys.FEATURES: 'features'>, <enum 'DataKeys'>, True)

Get some specific data

In [71]:
def get_from(data, path):
    """Get a leaf from iterable of keys and/or indices.
    
    :data: Collection where nodes are either a dict or list.
    :path: Collection of keys and/or indices leading to a leaf.
    """
    return reduce(op.getitem, path, data)
In [76]:
paths = [
    (TYPE,),
    (CRS, TYPE),
    (CRS, PROPERTIES, NAME),
    (FEATURES, 0, GEOMETRY, COORDINATES, 0, 1),
]

for path in paths:
    _print(get_from(data, path))
'FeatureCollection'
'name'
'urn:ogc:def:crs:OGC:1.3:CRS84'
25.80153849443961

View in Pandas DataFrame

In [93]:
names = "lon lat ele".split()


class PandasColumn(Enum):
    """Extend Enum so that when a member is used as a Pandas data frame column its value is displayed."""

    def __str__(self):
        return self.value


CoordinateColumns = PandasColumn(
    "CoordinateColumn", type=str, names=zip((name.upper() for name in names), names)
)
globals().update(CoordinateColumns.__members__)
In [94]:
import pandas as pd

df = pd.DataFrame(
    get_from(data, (FEATURES, 0, GEOMETRY, COORDINATES)),
    columns=CoordinateColumns.__members__.values(),
)
df.head()
Out[94]:
lon lat ele
0 -80.203793 25.801538 -0.058535
1 -80.203824 25.801507 10.088560
2 -80.203784 25.801589 11.503721
3 -80.203711 25.801508 9.746153
4 -80.203605 25.801513 9.274504
In [90]:
df[LAT]
Out[90]:
0       25.801538
1       25.801507
2       25.801589
3       25.801508
4       25.801513
          ...    
1102    26.119918
1103    26.119874
1104    26.119792
1105    26.119739
1106    26.119739
Name: CoordinateColumn.LAT, Length: 1107, dtype: float64