Generators in JavaScript: A Lazy Zip

In [60]:
function *zip(...[first, ...iterables]){
    const length = iterables.length;
    let item;    
    while(true){
        item = first.next();
        if (item.done) break;
        const acc = [];
        for(let i = 0; i < length; i++){
            const item_ = iterables[i].next();
            acc.push(item_.value);
        }
        yield [item.value, ...acc];
    }
}
In [61]:
function *iter(item){
    yield* Array.from(item);
}
In [108]:
function *range(start, stop, step=1){
    if(step < 1) throw new Error('step must be 1 or greater.');
    if(stop <= start) throw new Error('stop must be larger than start');
    yield* Array.from({length: ((stop - start) / step)}, (_, i) => start + (i * step));
}

Convenient method to remember where the letter 'A' or 'a' starts in ASCII table.

ASCII uses the last 7 bits of a byte (8 bits): 00000000

Set the first bit to '0' and the second bit to '1'.

Set all except the last bit to 0.

Set the last bit to 1.

01000001

That makes 65.

Set the third bit to 1 and that makes 97.

01100001

I learned this tidbit here:

Characters, Symbols, and the Unicode Miracle

In [126]:
var [UPPERCASE, LOWERCASE] = [
    0b01000001, // 65
    0b01100001, // 97
].map(n => String.fromCharCode(...range(n, n+26)))
In [127]:
for (let item of zip(range(0, 26, 1), iter(UPPERCASE), iter(LOWERCASE))){
    console.log(item);
}
[ 0, 'A', 'a' ]
[ 1, 'B', 'b' ]
[ 2, 'C', 'c' ]
[ 3, 'D', 'd' ]
[ 4, 'E', 'e' ]
[ 5, 'F', 'f' ]
[ 6, 'G', 'g' ]
[ 7, 'H', 'h' ]
[ 8, 'I', 'i' ]
[ 9, 'J', 'j' ]
[ 10, 'K', 'k' ]
[ 11, 'L', 'l' ]
[ 12, 'M', 'm' ]
[ 13, 'N', 'n' ]
[ 14, 'O', 'o' ]
[ 15, 'P', 'p' ]
[ 16, 'Q', 'q' ]
[ 17, 'R', 'r' ]
[ 18, 'S', 's' ]
[ 19, 'T', 't' ]
[ 20, 'U', 'u' ]
[ 21, 'V', 'v' ]
[ 22, 'W', 'w' ]
[ 23, 'X', 'x' ]
[ 24, 'Y', 'y' ]
[ 25, 'Z', 'z' ]

Explore Python Function Decorators

I finished reading this book on Python function decorators.

Time to explore. I have used a lot of function decorators.

I have never written my own.

In [1]:
import logging

logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.DEBUG)
In [2]:
def verbose(func):
    
    def wrapper():
        logger.debug(f'{verbose.__name__} before: "{func.__name__}"')
        result = func()
        logger.debug(f'{verbose.__name__} after: "{func.__name__}"')
        return result
    
    return wrapper
In [3]:
def hello_world():
    print('Hello, world.')
In [4]:
hello_world()
Hello, world.
In [5]:
hello_world = verbose(hello_world)
type(hello_world)
Out[5]:
function
In [6]:
hello_world()
DEBUG:__main__:verbose before: "hello_world"
DEBUG:__main__:verbose after: "hello_world"
Hello, world.

Add an accumulator to a Fiboncacci generator.

Module: itertools The module itertools is a collection of very powerful—and carefully designed—functions for performing iterator algebra. That is, these allow you to combine iterators in sophisticated ways without having to concretely instantiate anything more than is currently required. …we might simply create a single lazy iterator to generate both the current number and this sum

Excerpt From: David Mertz. “Function Programming in Python.” iBooks.

In [7]:
from itertools import accumulate, tee
from collections import namedtuple

def include_accumulator(func):
    
    def wraps():
        t, s = tee(func())
        return zip(t, accumulate(s))
    return wraps
In [8]:
@include_accumulator
def fibonacci():
    a, b = 1, 1
    while True:
        yield a
        a, b = b, a + b
In [9]:
Result = namedtuple('Result', 'fib total')
for _, (fib, total) in zip(range(7), fibonacci()):
    print(Result(fib, total))
Result(fib=1, total=1)
Result(fib=1, total=2)
Result(fib=2, total=4)
Result(fib=3, total=7)
Result(fib=5, total=12)
Result(fib=8, total=20)
Result(fib=13, total=33)

Use a class to define the decorator.

Most any callable will likely work as expected.

In [10]:
class CallCounter:
    def __init__(self, func):
        for message in (
            f'"{self}" initialization.',
            repr(func),
        ):
            logger.debug(message)
        self.count = 0
        self.func = func
    
    def __call__(self):
        self.count += 1
        return self.func()
In [11]:
@CallCounter
def hello_world():
    return 'Hello, world.'
DEBUG:__main__:"<__main__.CallCounter object at 0x7f4fcffe1870>" initialization.
DEBUG:__main__:<function hello_world at 0x7f4fe48562d0>
In [12]:
hello_world
Out[12]:
<__main__.CallCounter at 0x7f4fcffe1870>
In [13]:
Result = namedtuple('Result', 'output count')

for _ in range(10):
    print(Result(hello_world(), hello_world.count))
Result(output='Hello, world.', count=1)
Result(output='Hello, world.', count=2)
Result(output='Hello, world.', count=3)
Result(output='Hello, world.', count=4)
Result(output='Hello, world.', count=5)
Result(output='Hello, world.', count=6)
Result(output='Hello, world.', count=7)
Result(output='Hello, world.', count=8)
Result(output='Hello, world.', count=9)
Result(output='Hello, world.', count=10)

Accept arguments in decorator.

This is a contrived example just to explore how decorators work.

In [14]:
class apply:
    
    def __init__(self, f=None, *args, **kwargs):
        logger.debug('__init__ called')
        for item in (args, kwargs):
            logger.debug(repr(item))
        self.f = f
    
        
    def __call__(self, *args, **kwargs):
        logger.debug('__call__')
        for item in (self.f, args, kwargs):
            logger.debug(repr(item))
        func, = args
    
        def f():
            return self.f(func())
        return f
    
@apply(lambda x: x.split())
def hello_world():
    return 'Hello, world.'

hello_world()
DEBUG:__main__:__init__ called
DEBUG:__main__:()
DEBUG:__main__:{}
DEBUG:__main__:__call__
DEBUG:__main__:<function <lambda> at 0x7f4fcffdbc30>
DEBUG:__main__:(<function hello_world at 0x7f4fe4856230>,)
DEBUG:__main__:{}
Out[14]:
['Hello,', 'world.']

A Number Guessing Game in JavaScript Node.js

Create a simple REPL game using npm package readline-sync.

In [ ]:
 

This is a rough translation of the REPL game done in this post.

The code below has a readline-sync dependency.

const readline_sync = require("readline-sync"),
  randint = (min, max) => Math.floor(Math.random() * (max - min + 1)) + min,
  [MIN, MAX] = [1, 10],
  ANSWER = randint(MIN, MAX),
  QUIT = "quit",
  ExitWords = new Set([QUIT, "q", "bye", "exit", ANSWER.toString()]),
  shouldExit = item => ExitWords.has(item),
  MAX_GUESSES = 3;
let counter = 0,
  message = "";

const count = () => (counter += 1);

function getMessage(item) {
  if (shouldExit(item)) return QUIT;
  const message = `"${item}" is not a valid number`,
    guess = parseInt(item, 10) || message;
  if (guess < MIN || guess > MAX) return `"${guess}" is out of range`;
  if (guess === ANSWER) {
    return `"${guess}" is correct.`;
  } else if (guess === message) {
    return message;
  } else {
    const hint = guess < ANSWER ? "higher" : "lower";
    return `Guess ${hint}`;
  }
}

function identity(item) {
  console.log(getMessage(item));
  count();
  return item;
}

const echo = function echo() {
  return (
    [
      shouldExit(
        identity(
          readline_sync.question(`Type a number between ${MIN} and ${MAX}:\n`)
        )
      ),
      counter >= MAX_GUESSES
    ].some(i => i) || echo()
  );
};
echo();
if (counter > MAX_GUESSES) {
  message = `Max guesses exceeded: ${MAX_GUESSES}`;
} else {
  message = `The answer is correct: ${ANSWER}`;
}
[
  message,
  `The answer is ${ANSWER}.`,
  `${counter} attempts.`,
  "Exiting."
].forEach(item => console.log(item));

Update the Metadata on Every Save in a Jupyter Notebook

I am using Jupyter notebooks to create blog posts in the Nikola static site generator.

I have built a workflow using make files and a Python package called Watchdog.

When a Jupyter notebook is saved, a Watchdog monitor running in a tmux session triggers a build and a push to the server where this site is hosted. Another Watchdog monitor running in a tmux session automatically then runs a Bash script that deploys the latest version of this blog using Dokku.

I want to use this same sort of method to monitor the ./posts directory in the Nikola working directory.

When a notebook is saved, the date in the metadata should be updated to the current date.

Get notebook and update the metadata.

This code is using the notebook that is the source of this blog post.

In [1]:
import json
from pathlib import Path
from pprint import pprint
import operator as op
from functools import reduce
from time import sleep
from copy import deepcopy

from IPython.display import display, HTML

FILENAME = 'update-the-metadata-on-every-save-in-a-jupyter-notebook.ipynb'
this_notebook_path = Path(FILENAME)
assert this_notebook_path.exists(), 'File does not exist.'
KEYS = METADATA, NIKOLA = ('metadata', 'nikola')
DATE = 'date'
ALL_KEYS = (*KEYS, DATE)
H2 = '<h2>{text}</h2>' # for notebook blog

# A Jupyter notebook is JSON.
this_notebook = json.loads(this_notebook_path.read_text())
display(HTML(H2.format(text="Jupyter notebook keys.")))
pprint(list(this_notebook.keys()), width=min(len(key) for key in this_notebook.keys()))
display(HTML(H2.format(text="Jupyter notebook metadata.")))
pprint(this_notebook[METADATA])

prev_notebook = deepcopy(this_notebook)

def get_item_by(keys, dictionary):
    try:
        return reduce(op.getitem, keys, dictionary)
    except KeyError:
        return None


nikola_metadata = get_item_by(KEYS, this_notebook)
if nikola_metadata:
    from datetime import datetime
    from datetime import timezone
    
    sleep(1) # let some time pass
    nikola_metadata[DATE] = datetime.utcnow().replace(
        tzinfo=timezone.utc).strftime('%Y-%m-%d %H:%M:%S %Z')
    this_notebook[METADATA][NIKOLA] = nikola_metadata
    this_notebook_path.write_text(json.dumps(this_notebook))
    this_notebook = json.loads(this_notebook_path.read_text())
    display(HTML(H2.format(text="Updated Jupyter notebook metadata.")))

    pprint(this_notebook[METADATA])
    assert (get_item_by(ALL_KEYS, prev_notebook) < get_item_by(ALL_KEYS, this_notebook), 
            "Date not updated")

Jupyter notebook keys.

['cells',
 'metadata',
 'nbformat',
 'nbformat_minor']

Jupyter notebook metadata.

{'kernelspec': {'display_name': 'Python 3',
                'language': 'python',
                'name': 'python3'},
 'language_info': {'codemirror_mode': {'name': 'ipython', 'version': 3},
                   'file_extension': '.py',
                   'mimetype': 'text/x-python',
                   'name': 'python',
                   'nbconvert_exporter': 'python',
                   'pygments_lexer': 'ipython3',
                   'version': '3.8.0b4'},
 'nikola': {'category': '',
            'date': '2019-09-29 13:48:32 UTC',
            'description': '',
            'link': '',
            'slug': 'update-the-metadata-on-every-save-in-a-jupyter-notebook',
            'tags': '',
            'title': 'Update the Metadata on Every Save in a Jupyter Notebook',
            'type': 'text'}}

Updated Jupyter notebook metadata.

{'kernelspec': {'display_name': 'Python 3',
                'language': 'python',
                'name': 'python3'},
 'language_info': {'codemirror_mode': {'name': 'ipython', 'version': 3},
                   'file_extension': '.py',
                   'mimetype': 'text/x-python',
                   'name': 'python',
                   'nbconvert_exporter': 'python',
                   'pygments_lexer': 'ipython3',
                   'version': '3.8.0b4'},
 'nikola': {'category': '',
            'date': '2019-09-29 14:07:13 UTC',
            'description': '',
            'link': '',
            'slug': 'update-the-metadata-on-every-save-in-a-jupyter-notebook',
            'tags': '',
            'title': 'Update the Metadata on Every Save in a Jupyter Notebook',
            'type': 'text'}}

Makefile contents

NAMESPACE=dmmmd
SHELL := /bin/bash


update_meta:
    ~/.virtualenvs/seven-notebooks/bin/watchmedo shell-command \
        --patterns="*.ipynb" \
        --command='clear && ../update_meta.py $${watch_src_path}' \
        -w -W \
        --recursive .

execute chmod 755 update_meta.py so that the file is executable.

#!/usr/bin/env python
# contents of update_meta.py


if __name__ == "__main__":
    import sys
    from pathlib import Path
    import json
    from pprint import pprint
    import operator as op
    from functools import reduce
    from copy import deepcopy
    from time import sleep

    this_notebook_path = Path(sys.argv[1])
    try:
        assert this_notebook_path.exists()
    except AssertionError:
        print("File does not exist (yet).")
        sys.exit(0)
    print(f"updating metadata\n {this_notebook_path}")

    KEYS = METADATA, NIKOLA = ("metadata", "nikola")
    DATE = "date"
    ALL_KEYS = (*KEYS, DATE)

    # A Jupyter notebook is JSON.
    this_notebook = json.loads(this_notebook_path.read_text())
    pprint(
        list(this_notebook.keys()), width=min(len(key) for key in this_notebook.keys())
    )
    pprint(this_notebook[METADATA])

    prev_notebook = deepcopy(this_notebook)

    def get_item_by(keys, dictionary):
        try:
            return reduce(op.getitem, keys, dictionary)
        except KeyError:
            return None

    nikola_metadata = get_item_by(KEYS, this_notebook)
    if nikola_metadata:
        from datetime import datetime
        from datetime import timezone

        # Infinite loop happens when the notebook is saved after the update.
        # Don't update if it were just updated.
        WAITTIME = 60
        CURRENT = datetime.utcnow()
        FORMAT = "%Y-%m-%d %H:%M:%S %Z"
        NIKOLA_DATE = nikola_metadata[DATE]
        DELTA = CURRENT - datetime.strptime(NIKOLA_DATE, FORMAT)
        if DELTA.seconds < WAITTIME:
            print("No updated needed.")
            sys.exit(0)

        nikola_metadata[DATE] = (
            datetime.utcnow().replace(tzinfo=timezone.utc).strftime(FORMAT)
        )
        this_notebook[METADATA][NIKOLA] = nikola_metadata
        this_notebook_path.write_text(json.dumps(this_notebook))
        this_notebook = json.loads(this_notebook_path.read_text())

        pprint(this_notebook[METADATA])
        assert (
            get_item_by(ALL_KEYS, prev_notebook) < get_item_by(ALL_KEYS, this_notebook),
            "Date not updated",
        )
        sleep(5)  # Attempt to prevent multiple executions.
        sys.exit(0)

Learning the Pandas Library: Series Methods

Explore Pandas series data structure.

I purchased the book Learning the Pandas Library via a Humble Bundle.

A Series object has many attributes and methods that are useful for data analysis. This section will cover a few of them.

— Matt Harrison. Learning the Pandas Library: Python Tools for Data Munging, Data Analysis, and Visualization (Kindle Locations 664-665).

Get some fake data.

In [40]:
import random

import faker
import pandas as pd
from IPython.display import display, HTML
In [41]:
FAKE = faker.Faker()
LENGTH = 5
COUNT = 2
In [42]:
first_name_getter_attrs = tuple(filter(lambda x: 'first' in x, dir(FAKE)))[1:]
first_name_getter_attrs
Out[42]:
('first_name_female', 'first_name_male')
In [43]:
indices_ = set()
while len(indices_) < LENGTH: # Ensure uniqueness.
    indices_.update([getattr(FAKE, random.choice(first_name_getter_attrs))()])
indices = [list(indices_)] * COUNT
indices
Out[43]:
[['Alexandra', 'Raymond', 'Brian', 'Kenneth', 'Timothy'],
 ['Alexandra', 'Raymond', 'Brian', 'Kenneth', 'Timothy']]
In [44]:
series = [pd.Series([random.choice([random.randint(0, 100), None]) for _ in range(LENGTH)], index=index) 
          for index in indices]
series
Out[44]:
[Alexandra    68.0
 Raymond      63.0
 Brian         NaN
 Kenneth      47.0
 Timothy      23.0
 dtype: float64, Alexandra    99
 Raymond      72
 Brian         1
 Kenneth      15
 Timothy       0
 dtype: int64]

Iteration

Iteration over a series iterates over the values:

— Matt Harrison. Learning the Pandas Library: Python Tools for Data Munging, Data Analysis, and Visualization (Kindle Location 679).

In [45]:
HR = '*' * 5
In [46]:
for series_ in series:
    for value in series_:
        print(value)
    print(HR)
68.0
63.0
nan
47.0
23.0
*****
99
72
1
15
0
*****

There is an .iteritems method to loop over the index, value tuples. We can use tuple unpacking to create the idx and value variables in the for statment:

— Matt Harrison. Learning the Pandas Library: Python Tools for Data Munging, Data Analysis, and Visualization (Kindle Locations 681-684).

In [47]:
for series_ in series:
    for index, value in series_.iteritems():
        print(index, value)
    print(HR)
Alexandra 68.0
Raymond 63.0
Brian nan
Kenneth 47.0
Timothy 23.0
*****
Alexandra 99
Raymond 72
Brian 1
Kenneth 15
Timothy 0
*****

Overloaded operations

Matt Harrison. Learning the Pandas Library: Python Tools for Data Munging, Data Analysis, and Visualization (Kindle Location 698).

I am going to try all the operations using the functional rather than infix versions.

In [48]:
import operator as op
In [49]:
op_attrs = (filter(lambda attr: not attr.startswith('_'), dir(op)))
for attr in op_attrs:
    a, b = (item.copy() for item in series)
    try:
        display(HTML(f'<h3>{attr}</h3>'))
        print(getattr(op, attr)(a, b))
    except (TypeError, ValueError, IndexError) as err:
        print(f"'{attr}' failed: {err}")

abs

'abs' failed: abs() takes exactly one argument (2 given)

add

Alexandra    167.0
Raymond      135.0
Brian          NaN
Kenneth       62.0
Timothy       23.0
dtype: float64

and_

'and_' failed: unsupported operand type(s) for &: 'float' and 'int'

attrgetter

'attrgetter' failed: attribute name must be a string

concat

Alexandra    167.0
Raymond      135.0
Brian          NaN
Kenneth       62.0
Timothy       23.0
dtype: float64

contains

'contains' failed: 'Series' objects are mutable, thus they cannot be hashed

countOf

'countOf' failed: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

delitem

'delitem' failed: 'Alexandra    99
Raymond      72
Brian         1
Kenneth      15
Timothy       0
dtype: int64' is an invalid key

eq

Alexandra    False
Raymond      False
Brian        False
Kenneth      False
Timothy      False
dtype: bool

floordiv

Alexandra    0.0
Raymond      0.0
Brian        NaN
Kenneth      3.0
Timothy      inf
dtype: float64

ge

Alexandra    False
Raymond      False
Brian        False
Kenneth       True
Timothy       True
dtype: bool

getitem

'getitem' failed: index 99 is out of bounds for axis 0 with size 5

gt

Alexandra    False
Raymond      False
Brian        False
Kenneth       True
Timothy       True
dtype: bool

iadd

Alexandra    167.0
Raymond      135.0
Brian          NaN
Kenneth       62.0
Timothy       23.0
dtype: float64

iand

'iand' failed: unsupported operand type(s) for &: 'float' and 'int'

iconcat

Alexandra    167.0
Raymond      135.0
Brian          NaN
Kenneth       62.0
Timothy       23.0
dtype: float64

ifloordiv

Alexandra    0.0
Raymond      0.0
Brian        NaN
Kenneth      3.0
Timothy      inf
dtype: float64

ilshift

'ilshift' failed: unsupported operand type(s) for <<=: 'Series' and 'Series'

imatmul

nan

imod

Alexandra    68.0
Raymond      63.0
Brian         NaN
Kenneth       2.0
Timothy       NaN
dtype: float64

imul

Alexandra    6732.0
Raymond      4536.0
Brian           NaN
Kenneth       705.0
Timothy         0.0
dtype: float64

index

'index' failed: index() takes exactly one argument (2 given)

indexOf

'indexOf' failed: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

inv

'inv' failed: inv() takes exactly one argument (2 given)

invert

'invert' failed: invert() takes exactly one argument (2 given)

ior

'ior' failed: unsupported operand type(s) for |: 'float' and 'int'

ipow

Alexandra    2.620489e+181
Raymond      3.568778e+129
Brian                  NaN
Kenneth       1.206335e+25
Timothy       1.000000e+00
dtype: float64

irshift

'irshift' failed: unsupported operand type(s) for >>=: 'Series' and 'Series'

is_

False

is_not

True

isub

Alexandra   -31.0
Raymond      -9.0
Brian         NaN
Kenneth      32.0
Timothy      23.0
dtype: float64

itemgetter

operator.itemgetter(Alexandra    68.0
Raymond      63.0
Brian         NaN
Kenneth      47.0
Timothy      23.0
dtype: float64, Alexandra    99
Raymond      72
Brian         1
Kenneth      15
Timothy       0
dtype: int64)

itruediv

Alexandra    0.686869
Raymond      0.875000
Brian             NaN
Kenneth      3.133333
Timothy           inf
dtype: float64

ixor

'ixor' failed: unsupported operand type(s) for ^: 'float' and 'int'

le

Alexandra     True
Raymond       True
Brian        False
Kenneth      False
Timothy      False
dtype: bool

length_hint

'length_hint' failed: 'Series' object cannot be interpreted as an integer

lshift

'lshift' failed: unsupported operand type(s) for <<: 'Series' and 'Series'

lt

Alexandra     True
Raymond       True
Brian        False
Kenneth      False
Timothy      False
dtype: bool

matmul

nan

methodcaller

'methodcaller' failed: method name must be a string

mod

Alexandra    68.0
Raymond      63.0
Brian         NaN
Kenneth       2.0
Timothy       NaN
dtype: float64

mul

Alexandra    6732.0
Raymond      4536.0
Brian           NaN
Kenneth       705.0
Timothy         0.0
dtype: float64

ne

Alexandra    True
Raymond      True
Brian        True
Kenneth      True
Timothy      True
dtype: bool

neg

'neg' failed: neg() takes exactly one argument (2 given)

not_

'not_' failed: not_() takes exactly one argument (2 given)

or_

'or_' failed: unsupported operand type(s) for |: 'float' and 'int'

pos

'pos' failed: pos() takes exactly one argument (2 given)

pow

Alexandra    2.620489e+181
Raymond      3.568778e+129
Brian                  NaN
Kenneth       1.206335e+25
Timothy       1.000000e+00
dtype: float64

rshift

'rshift' failed: unsupported operand type(s) for >>: 'Series' and 'Series'

setitem

'setitem' failed: setitem expected 3 arguments, got 2

sub

Alexandra   -31.0
Raymond      -9.0
Brian         NaN
Kenneth      32.0
Timothy      23.0
dtype: float64

truediv

Alexandra    0.686869
Raymond      0.875000
Brian             NaN
Kenneth      3.133333
Timothy           inf
dtype: float64

truth

'truth' failed: truth() takes exactly one argument (2 given)

xor

'xor' failed: unsupported operand type(s) for ^: 'float' and 'int'

There is an .eq ( == ) method and an .equals method. They have slightly different behavior. The later treats NaN and equal, while the former does not. If you were writing unit tests to compare dataframes, this distinction is important.

— Matt Harrison. Learning the Pandas Library: Python Tools for Data Munging, Data Analysis, and Visualization (Kindle Locations 746-749).

In [63]:
a, b = series
a
Out[63]:
Alexandra    68.0
Raymond      63.0
Brian         NaN
Kenneth      47.0
Timothy      23.0
dtype: float64
In [64]:
a.equals(a)
Out[64]:
True
In [65]:
a.eq(a)
Out[65]:
Alexandra     True
Raymond       True
Brian        False
Kenneth       True
Timothy       True
dtype: bool

Learning the Pandas Library: Series

Explore Pandas series data structure.

I purchased the book Learning the Pandas Library via a Humble Bundle.

In [19]:
%%html
<blockquote class="twitter-tweet"><p lang="en" dir="ltr">Dive deeper into Python learning with our latest bundle filled with ebooks, software, and videos!<a href="https://t.co/ACULekLqBx">https://t.co/ACULekLqBx</a></p>&mdash; Humble Bundle (@humble) <a href="https://twitter.com/humble/status/1171483506167255046?ref_src=twsrc%5Etfw">September 10, 2019</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script> 

The book breaks the concepts down into digestible chunks. It's one of the best I have found so far about Pandas.

In [2]:
import operator as op
from pprint import pprint
In [3]:
import faker
import pandas as pd
from IPython.display import display
In [4]:
fake = faker.Faker()

Create a Series.

In [5]:
list(filter(lambda x: 'year' in x, dir(fake)))
Out[5]:
['date_this_year', 'date_time_this_year', 'year']
In [8]:
import random

YEAR_COUNT = 5

unique_years = set()
while len(unique_years) < YEAR_COUNT:
    unique_years.update([fake.year(), ])  
UNIQUE_YEARS = tuple(unique_years)
In [12]:
DATA_LENGTH = 20
years, names = zip(*((random.choice(UNIQUE_YEARS), fake.name()) 
                   for _ in range(DATA_LENGTH)))
assert len(set(years)) < len(names), f'"years" is not unique.'

Create

In [17]:
birth_years = pd.Series(names, index=years, name="birth_years").sort_index()
In [18]:
display(birth_years)
1970     Rebecca Richardson
1970            Keith Smith
1970         Dawn Gutierrez
1974    Dr. Louis Hernandez
1976           Brandi Glenn
1981           Paul Meadows
1982          Hannah Turner
1984    Natalie Fitzpatrick
1987        Patty Schneider
1996           Brett Jacobs
2004       Matthew Fletcher
2004           Karen Holden
2006            Miguel Lynn
2006           Mary Harrell
2006            Roy Johnson
2007           Tami Higgins
2007            Susan Lopez
2012        Karina Reynolds
2012          Andrea Hughes
2014            Laura Hicks
Name: birth_years, dtype: object

Read

In [20]:
birth_years['2004']
Out[20]:
2004    Matthew Fletcher
2004        Karen Holden
Name: birth_years, dtype: object

Update

Because an index operation either updates or appends, one must be aware of the data they are dealing with. Be careful if you intend to add a value with an index entry that already exists in the series. Assignment via an index operation to an existing index entry will overwrite previous entries.

— Matt Harrison. Learning the Pandas Library: Python Tools for Data Munging, Data Analysis, and Visualization (Kindle Locations 407-409).

In [22]:
pprint(birth_years['2004'])
new_name = None
while new_name not in names:
    new_name = fake.name()

birth_years['2004'] = f'ad hoc update: {new_name}'
birth_years['2004']
2004    Matthew Fletcher
2004        Karen Holden
Name: birth_years, dtype: object
Out[22]:
2004    ad hoc update: Susan Lopez
2004    ad hoc update: Susan Lopez
Name: birth_years, dtype: object

If you had to deal with data such as this… We can update values, based purely on position, by performing an index assignment on the .iloc attribute…

— Matt Harrison. Learning the Pandas Library: Python Tools for Data Munging, Data Analysis, and Visualization (Kindle Locations 414-417).

In [27]:
for index in (i for i, year in enumerate(birth_years.index) if year == '2004'):
    birth_years.iloc[index] = fake.name()
birth_years
Out[27]:
1970     Rebecca Richardson
1970            Keith Smith
1970         Dawn Gutierrez
1974    Dr. Louis Hernandez
1976           Brandi Glenn
1981           Paul Meadows
1982          Hannah Turner
1984    Natalie Fitzpatrick
1987        Patty Schneider
1996           Brett Jacobs
2004            Carol Wolfe
2004          Donna Roberts
2006            Miguel Lynn
2006           Mary Harrell
2006            Roy Johnson
2007           Tami Higgins
2007            Susan Lopez
2012        Karina Reynolds
2012          Andrea Hughes
2014            Laura Hicks
Name: birth_years, dtype: object

Delete

In [41]:
del birth_years['2004']
pprint(birth_years)
1970     Rebecca Richardson
1970            Keith Smith
1970         Dawn Gutierrez
1974    Dr. Louis Hernandez
1976           Brandi Glenn
1981           Paul Meadows
1982          Hannah Turner
1984    Natalie Fitzpatrick
1987        Patty Schneider
1996           Brett Jacobs
2006            Miguel Lynn
2006           Mary Harrell
2006            Roy Johnson
2007           Tami Higgins
2007            Susan Lopez
2012        Karina Reynolds
2012          Andrea Hughes
2014            Laura Hicks
Name: birth_years, dtype: object
In [33]:
people = pd.Series(
    [fake.name() for _ in range(20)],
    name='people'
)
In [34]:
display(people)
0     Stephanie Jacobson
1          Scott Fleming
2            Gary Castro
3          Anna Gonzales
4            Joel Hayden
5          Tamara Torres
6           Sarah Santos
7            Mary Rivera
8          Corey Estrada
9           Emily Harris
10            Amy Newman
11     Rebecca Contreras
12         Steven Turner
13          Jimmy Nguyen
14         Darren Sparks
15             Mark Koch
16         Chelsea Avila
17           Sarah Moore
18            Gary White
19          Nicole Short
Name: people, dtype: object

Search for all names that contain either a 'D' or a 'J'

Use the functional version of bitwise OR operator: |.

In [35]:
LETTERS = 'DJ'
masks = [pd.Series([letter in item for item in people]) 
         for letter in LETTERS]
people[op.or_(*masks)]
Out[35]:
0     Stephanie Jacobson
4            Joel Hayden
13          Jimmy Nguyen
14         Darren Sparks
Name: people, dtype: object

Search for all names that contain the first half of uppercase letters.

Use the functional version of bitwise OR operator: | along with reduce from the functools module.

In [36]:
from string import ascii_uppercase
from functools import reduce
In [37]:
LETTERS = ascii_uppercase[:len(ascii_uppercase)//2]
LETTERS
Out[37]:
'ABCDEFGHIJKLM'
In [38]:
mask = reduce(op.or_, (pd.Series([letter in item for item in people]) 
                        for letter in LETTERS))
pprint(mask)
pprint(people[mask])
0      True
1      True
2      True
3      True
4      True
5     False
6     False
7      True
8      True
9      True
10     True
11     True
12    False
13     True
14     True
15     True
16     True
17     True
18     True
19    False
dtype: bool
0     Stephanie Jacobson
1          Scott Fleming
2            Gary Castro
3          Anna Gonzales
4            Joel Hayden
7            Mary Rivera
8          Corey Estrada
9           Emily Harris
10            Amy Newman
11     Rebecca Contreras
13          Jimmy Nguyen
14         Darren Sparks
15             Mark Koch
16         Chelsea Avila
17           Sarah Moore
18            Gary White
Name: people, dtype: object
In [39]:
neg_mask = [bool(item ^ 1) for item in mask]
In [40]:
people[neg_mask]
Out[40]:
5     Tamara Torres
6      Sarah Santos
12    Steven Turner
19     Nicole Short
Name: people, dtype: object

Fizz Buzz with Bitwise Operations

I first saw the computer programmer interview screening device FizzBuzz while I was a coding bootcamp student at Wyncode Academy. The Wikipedia entry for the problem is described as follows:

a trivial problem for any would-be computer programmer

After I became an instructor at Wyncode Academy, I had to repeatedly revisit grading and explaining the FizzBuzz problem at the start of each cohort of students.

To date that is talking FizzBuzz to roughly 24 cohorts times approximately 20 people for roughly 480 people times many more times!

I wondered if there weren't a way to turn it into something I like to call a "two-for-one learning experience". By that I mean that one is learning additional useful skills while working on something trivial.

I did some Google searches and found something that caught my eye: bitwise FizzBuzz

I only had some glancing experience with bitwise operators, and I decided this would be my chance to go a little deeper.

The most useful example I found was this JavaScript version that references this post: Which FizzBuzz solution is the most efficient?

Solve FizzBuzz with a canonical solution.

In [1]:
x, y = 3, 5
xy = 3 * 5
WORDS = (
    FIZZ, 
    BUZZ, 
) = (
    "Fizz", 
    "Buzz", 
)
FIZZBUZZ = f"{FIZZ}{BUZZ}"

for i in range(1, 16):
    if i % xy == 0:
        print(FIZZBUZZ)
    elif i % x == 0:
        print(FIZZ)
    elif i % y == 0:
        print(BUZZ)
    else:
        print(i)
1
2
Fizz
4
Buzz
Fizz
7
8
Fizz
Buzz
11
Fizz
13
14
FizzBuzz

Approach the problem in a different way.

Note that there are some patterns to help with a different approach.

The x maps to FIZZ, the y maps to BUZZ and xy maps to FIZZBUZZ.

And the same pattern of [i, i, Fizz, i, Buzz, Fizz, i, i, Fizz, Buzz, i, Fizz, i, i, FizzBuzz] repeats infinitely in an infinite range of positive integers from 1 to infinity.

Imagine an array with the following [i + 1, FIZZ, BUZZ, FIZZBUZZ].

If we could get a pattern of 0 0 1 0 2 1 0 0 1 2 0 1 0 0 3 to cycle, then we could do FizzBuzz infinitely with repeating the calculation of the modulo.

In [2]:
import itertools as it
from pprint import pprint
from collections import namedtuple

INDENT, WIDTH = 4, 1

x, y = 3, 5
xy = 3 * 5
(
    FIZZ, 
    BUZZ, 
) = (
    "Fizz", 
    "Buzz", 
)
FIZZBUZZ = f"{FIZZ}{BUZZ}"
WORDS = (FIZZ, BUZZ, FIZZBUZZ)
REPEATING_PATTERN = '0 0 1 0 2 1 0 0 1 2 0 1 0 0 3'.split()
keys = it.cycle(REPEATING_PATTERN)
words_lookup = dict(zip('123', WORDS))
pprint(words_lookup, indent=INDENT, width=min(len(item) for item in WORDS))

pprint(REPEATING_PATTERN, indent=INDENT, width=WIDTH)

# Define a named tuple for prettier display.
Result = namedtuple('Result', 'count key word') 

CYCLES = 2
STOP = xy * CYCLES
for count, key in zip(it.count(start=1), keys):
    result = words_lookup.get(key, count)
    pprint(Result(count, key, result), indent=INDENT)
    if count == STOP:
        break
{   '1': 'Fizz',
    '2': 'Buzz',
    '3': 'FizzBuzz'}
[   '0',
    '0',
    '1',
    '0',
    '2',
    '1',
    '0',
    '0',
    '1',
    '2',
    '0',
    '1',
    '0',
    '0',
    '3']
Result(count=1, key='0', word=1)
Result(count=2, key='0', word=2)
Result(count=3, key='1', word='Fizz')
Result(count=4, key='0', word=4)
Result(count=5, key='2', word='Buzz')
Result(count=6, key='1', word='Fizz')
Result(count=7, key='0', word=7)
Result(count=8, key='0', word=8)
Result(count=9, key='1', word='Fizz')
Result(count=10, key='2', word='Buzz')
Result(count=11, key='0', word=11)
Result(count=12, key='1', word='Fizz')
Result(count=13, key='0', word=13)
Result(count=14, key='0', word=14)
Result(count=15, key='3', word='FizzBuzz')
Result(count=16, key='0', word=16)
Result(count=17, key='0', word=17)
Result(count=18, key='1', word='Fizz')
Result(count=19, key='0', word=19)
Result(count=20, key='2', word='Buzz')
Result(count=21, key='1', word='Fizz')
Result(count=22, key='0', word=22)
Result(count=23, key='0', word=23)
Result(count=24, key='1', word='Fizz')
Result(count=25, key='2', word='Buzz')
Result(count=26, key='0', word=26)
Result(count=27, key='1', word='Fizz')
Result(count=28, key='0', word=28)
Result(count=29, key='0', word=29)
Result(count=30, key='3', word='FizzBuzz')

Use bitwise operations to achieve REPEATING_PATTERN = '0 0 1 0 2 1 0 0 1 2 0 1 0 0 3'

Instead of using base 10 numbers use binary numbers using 2 bits.

This code uses the Format Specification Mini-Language to convert the int types to their binary string equivalents.

In [3]:
['{:02b}'.format(int(item)) 
 for item in '0 0 1 0 2 1 0 0 1 2 0 1 0 0 3'.split()]
Out[3]:
['00',
 '00',
 '01',
 '00',
 '10',
 '01',
 '00',
 '00',
 '01',
 '10',
 '00',
 '01',
 '00',
 '00',
 '11']

This array of string representations of binary numbers can be joined into one string.

In [4]:
''.join('{:02b}'.format(int(item)) for item in '0 0 1 0 2 1 0 0 1 2 0 1 0 0 3'.split())
Out[4]:
'000001001001000001100001000011'

And that string can be turned into an integer after reversing it. It has to be reversed so that the subsequent bitwise operations will work as expected.

In [5]:
RADIX = 2 # A radix is the base of a system of numeration
START_NUMBER = int(''.join(
    '{:02b}'.format(int(item)) 
    for item in reversed('0 0 1 0 2 1 0 0 1 2 0 1 0 0 3'.split())), RADIX)
START_NUMBER
Out[5]:
810092048
In [6]:
x, y = 3, 5
xy = 3 * 5
BITCOUNT = 2
for i in range(xy):
    print(START_NUMBER>>(i%xy*BITCOUNT))
    # >> Right shift bits by 15 mod count+1 times BITCOUNT of 2
    # >> Returns a number with the bits shifted to the right by a number of places. 
810092048
202523012
50630753
12657688
3164422
791105
197776
49444
12361
3090
772
193
48
12
3
In [7]:
x, y = 3, 5
xy = 3 * 5
BITCOUNT = 2
for i in range(xy):
    print(START_NUMBER>>((i%xy)*BITCOUNT)&3) 
    # >> Right shift bits by 15 mod count+1 times BITCOUNT of 2
    # >> Returns a number with the bits shifted to the right by a number of places. 
    # Then perform bitwise AND "bitwise multiplcation" by 3 to get 0, 1, 2, or 3
0
0
1
0
2
1
0
0
1
2
0
1
0
0
3

The above intergers can be used as an index to retrieve a value from an array:

In [8]:
x, y = 3, 5
xy = 3 * 5
(
    FIZZ, 
    BUZZ, 
) = (
    "Fizz", 
    "Buzz", 
)
FIZZBUZZ = f"{FIZZ}{BUZZ}"
BITCOUNT = 2
CYCLES = 2
for i in range(xy*CYCLES):
    index = START_NUMBER>>((i%xy)*BITCOUNT)&3
    choices = [i+1, FIZZ, BUZZ, FIZZBUZZ]
    print(choices[index])
1
2
Fizz
4
Buzz
Fizz
7
8
Fizz
Buzz
11
Fizz
13
14
FizzBuzz
16
17
Fizz
19
Buzz
Fizz
22
23
Fizz
Buzz
26
Fizz
28
29
FizzBuzz

The above can be refactors like so:

In [9]:
x, y = 3, 5
xy = 3 * 5
(
    FIZZ, 
    BUZZ, 
) = (
    "Fizz", 
    "Buzz", 
)
FIZZBUZZ = f"{FIZZ}{BUZZ}"
BITCOUNT = 2
CYCLES = 2
for i in range(xy*CYCLES): print([i+1, FIZZ, BUZZ, FIZZBUZZ][START_NUMBER>>i%xy*BITCOUNT&3])
1
2
Fizz
4
Buzz
Fizz
7
8
Fizz
Buzz
11
Fizz
13
14
FizzBuzz
16
17
Fizz
19
Buzz
Fizz
22
23
Fizz
Buzz
26
Fizz
28
29
FizzBuzz

Memorizing the start number.

810092048 is probably harder to memorize than its hexidecimal representation:

In [10]:
hex(810092048)
Out[10]:
'0x30490610'
In [11]:
for i in range(xy*CYCLES): print([i+1, FIZZ, BUZZ, FIZZBUZZ][0x30490610>>i%xy*BITCOUNT&3])
1
2
Fizz
4
Buzz
Fizz
7
8
Fizz
Buzz
11
Fizz
13
14
FizzBuzz
16
17
Fizz
19
Buzz
Fizz
22
23
Fizz
Buzz
26
Fizz
28
29
FizzBuzz

Pipe Function in Python

In [1]:
from functools import reduce

This pipe function was inspired by this JavaScript-related tweet:

In [2]:
%%html
<blockquote class="twitter-tweet"><p lang="en" dir="ltr">Reduce is versatile:<br>const pipe = (...fns) =&gt; x =&gt; fns.reduce((v, f) =&gt; f(v), x);</p>&mdash; Eric Elliott (@_ericelliott) <a href="https://twitter.com/_ericelliott/status/1111737323518156802?ref_src=twsrc%5Etfw">March 29, 2019</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script> 
In [3]:
def pipe(item, *funcs):
    r"""Combine the outputs an arbitrary number of functions.
    >>> attrs = (
    ...           'strip',
    ...           'upper',
    ...         )
    >>> funcs = (get_func(attr) for attr in attrs)
    >>> pipe('   hello, world   \t\n ', *(*funcs, exclaim))
    'HELLO, WORLD!'
    """
    
    def handler(value, f):
        return f(value)
    
    return reduce(handler, funcs, item)
In [4]:
def get_func(attr: str):
    """Return a function that returns the result of calling a callable attribute on an item.
    >>> get_func('upper')('hello')
    'HELLO'
    """
    
    def func(item):
        return getattr(item, attr)()
    
    return func
In [5]:
def exclaim(s: str):
    """Append an exclaimation point to the end of a string.
    >>> exclaim('hello')
    'hello!'
    """
    return f"{s}!"
In [6]:
if __name__ == "__main__":
    import doctest
    import sys
   
    sys.argv[-1] = "-v" # Make doctest.testmod verbose
    doctest.testmod()
Trying:
    exclaim('hello')
Expecting:
    'hello!'
ok
Trying:
    get_func('upper')('hello')
Expecting:
    'HELLO'
ok
Trying:
    attrs = (
              'strip',
              'upper',
            )
Expecting nothing
ok
Trying:
    funcs = (get_func(attr) for attr in attrs)
Expecting nothing
ok
Trying:
    pipe('   hello, world   \t\n ', *(*funcs, exclaim))
Expecting:
    'HELLO, WORLD!'
ok
1 items had no tests:
    __main__
3 items passed all tests:
   1 tests in __main__.exclaim
   1 tests in __main__.get_func
   3 tests in __main__.pipe
5 tests in 4 items.
5 passed and 0 failed.
Test passed.

Zip in JavaScript

Define a funcion that zips arrays in a way similar to Python's itertools.zip_longest.

itertools.zip_longest documentation

Answer on Stack Overflow

var _zip = (arr, ...arrs) => {
  return arr.map((val, i) => arrs.reduce((a, arr) => [...a, arr[i]], [val]));
}

I made changes to make the function variadic.

In [5]:
var zip = (...[arr, ...arrs]) => {
  return arr.map((val, i) => arrs.reduce((a, arr) => [...a, arr[i] || null], [val]));
}
In [6]:
var data = [
    (_, i) => i + 65,
    (_, i) => i + 97,
    (_, i) => 33 + 15 + i,
]
    .map(f => String.fromCharCode(...Array.from({length: 26}, f)))
    .map(item => Array.from(item))
In [7]:
_zip(data[0], ...data.slice(1))
Out[7]:
[
  [ 'A', 'a', '0' ], [ 'B', 'b', '1' ],
  [ 'C', 'c', '2' ], [ 'D', 'd', '3' ],
  [ 'E', 'e', '4' ], [ 'F', 'f', '5' ],
  [ 'G', 'g', '6' ], [ 'H', 'h', '7' ],
  [ 'I', 'i', '8' ], [ 'J', 'j', '9' ],
  [ 'K', 'k', ':' ], [ 'L', 'l', ';' ],
  [ 'M', 'm', '<' ], [ 'N', 'n', '=' ],
  [ 'O', 'o', '>' ], [ 'P', 'p', '?' ],
  [ 'Q', 'q', '@' ], [ 'R', 'r', 'A' ],
  [ 'S', 's', 'B' ], [ 'T', 't', 'C' ],
  [ 'U', 'u', 'D' ], [ 'V', 'v', 'E' ],
  [ 'W', 'w', 'F' ], [ 'X', 'x', 'G' ],
  [ 'Y', 'y', 'H' ], [ 'Z', 'z', 'I' ]
]
In [12]:
zip(...data)
Out[12]:
[
  [ 'A', 'a', '0' ], [ 'B', 'b', '1' ],
  [ 'C', 'c', '2' ], [ 'D', 'd', '3' ],
  [ 'E', 'e', '4' ], [ 'F', 'f', '5' ],
  [ 'G', 'g', '6' ], [ 'H', 'h', '7' ],
  [ 'I', 'i', '8' ], [ 'J', 'j', '9' ],
  [ 'K', 'k', ':' ], [ 'L', 'l', ';' ],
  [ 'M', 'm', '<' ], [ 'N', 'n', '=' ],
  [ 'O', 'o', '>' ], [ 'P', 'p', '?' ],
  [ 'Q', 'q', '@' ], [ 'R', 'r', 'A' ],
  [ 'S', 's', 'B' ], [ 'T', 't', 'C' ],
  [ 'U', 'u', 'D' ], [ 'V', 'v', 'E' ],
  [ 'W', 'w', 'F' ], [ 'X', 'x', 'G' ],
  [ 'Y', 'y', 'H' ], [ 'Z', 'z', 'I' ]
]
In [13]:
zip(Array.from('asdf'), Array.from('tyu'))
Out[13]:
[ [ 'a', 't' ], [ 's', 'y' ], [ 'd', 'u' ], [ 'f', null ] ]

Hello, world.

The computer does not exist.

Build one…

figuratively speaking of course.

My remote workspace:

  • Linode instances.

    • Ubuntu 18

      This server I consider my personal computer.

      It is the environment in which I do all my coding mainly in Python and Node.js.

      It is headless until I start a TigerVNC instance. The connection is secured using a private IP address and a virtual private network (VPN).

      Then I have a Xcfe desktop to use if needed.

      Xfce is a lightweight desktop environment for UNIX-like operating systems. It aims to be fast and low on system resources, while still being visually appealing and user friendly.

    • Ubuntu 16

      This Linode instance has Dokku on it. It is where I host applications that I build.

      The smallest PaaS implementation you've ever seen

      Dokku helps you build and manage the lifecycle of applications

  • Software

    Vim is a highly configurable text editor for efficiently creating and changing any kind of text.

    tmux is a terminal multiplexer. It lets you switch easily between several programs in one terminal, detach them (they keep running in the background) and reattach them to a different terminal.

    GoTTY is a simple command line tool that turns your CLI tools into web applications.

My local workspace:

  • Primary device.

    • 2009 MacBook Pro

      • OS X El Capitan v10.11.6

      • 8gb of RAM

      • 250gb SSD hard drive

      It is old thus inexpensive. I consider this a feature. If lose it or drop it, a few hundred rather than a few thousand dollars will replace it.

      I have only had to replace it once. I dropped my current MacBook's predecessor while commuting on a train.

      The device I use only needs the capability to connect to remote servers.

  • Software

    Remote terminal application that allows roaming, supports intermittent connectivity, and provides intelligent local echo and line editing of user keystrokes.

    Mosh is a replacement for interactive SSH terminals. It's more robust and responsive, especially over Wi-Fi, cellular, and long-distance links.

    A terminal replacement.