Learning the Pandas Library: Series Methods

Explore Pandas series data structure.

I purchased the book Learning the Pandas Library via a Humble Bundle.

A Series object has many attributes and methods that are useful for data analysis. This section will cover a few of them.

— Matt Harrison. Learning the Pandas Library: Python Tools for Data Munging, Data Analysis, and Visualization (Kindle Locations 664-665).

Get some fake data.

In [40]:
import random

import faker
import pandas as pd
from IPython.display import display, HTML
In [41]:
FAKE = faker.Faker()
LENGTH = 5
COUNT = 2
In [42]:
first_name_getter_attrs = tuple(filter(lambda x: 'first' in x, dir(FAKE)))[1:]
first_name_getter_attrs
Out[42]:
('first_name_female', 'first_name_male')
In [43]:
indices_ = set()
while len(indices_) < LENGTH: # Ensure uniqueness.
    indices_.update([getattr(FAKE, random.choice(first_name_getter_attrs))()])
indices = [list(indices_)] * COUNT
indices
Out[43]:
[['Alexandra', 'Raymond', 'Brian', 'Kenneth', 'Timothy'],
 ['Alexandra', 'Raymond', 'Brian', 'Kenneth', 'Timothy']]
In [44]:
series = [pd.Series([random.choice([random.randint(0, 100), None]) for _ in range(LENGTH)], index=index) 
          for index in indices]
series
Out[44]:
[Alexandra    68.0
 Raymond      63.0
 Brian         NaN
 Kenneth      47.0
 Timothy      23.0
 dtype: float64, Alexandra    99
 Raymond      72
 Brian         1
 Kenneth      15
 Timothy       0
 dtype: int64]

Iteration

Iteration over a series iterates over the values:

— Matt Harrison. Learning the Pandas Library: Python Tools for Data Munging, Data Analysis, and Visualization (Kindle Location 679).

In [45]:
HR = '*' * 5
In [46]:
for series_ in series:
    for value in series_:
        print(value)
    print(HR)
68.0
63.0
nan
47.0
23.0
*****
99
72
1
15
0
*****

There is an .iteritems method to loop over the index, value tuples. We can use tuple unpacking to create the idx and value variables in the for statment:

— Matt Harrison. Learning the Pandas Library: Python Tools for Data Munging, Data Analysis, and Visualization (Kindle Locations 681-684).

In [47]:
for series_ in series:
    for index, value in series_.iteritems():
        print(index, value)
    print(HR)
Alexandra 68.0
Raymond 63.0
Brian nan
Kenneth 47.0
Timothy 23.0
*****
Alexandra 99
Raymond 72
Brian 1
Kenneth 15
Timothy 0
*****

Overloaded operations

Matt Harrison. Learning the Pandas Library: Python Tools for Data Munging, Data Analysis, and Visualization (Kindle Location 698).

I am going to try all the operations using the functional rather than infix versions.

In [48]:
import operator as op
In [49]:
op_attrs = (filter(lambda attr: not attr.startswith('_'), dir(op)))
for attr in op_attrs:
    a, b = (item.copy() for item in series)
    try:
        display(HTML(f'<h3>{attr}</h3>'))
        print(getattr(op, attr)(a, b))
    except (TypeError, ValueError, IndexError) as err:
        print(f"'{attr}' failed: {err}")

abs

'abs' failed: abs() takes exactly one argument (2 given)

add

Alexandra    167.0
Raymond      135.0
Brian          NaN
Kenneth       62.0
Timothy       23.0
dtype: float64

and_

'and_' failed: unsupported operand type(s) for &: 'float' and 'int'

attrgetter

'attrgetter' failed: attribute name must be a string

concat

Alexandra    167.0
Raymond      135.0
Brian          NaN
Kenneth       62.0
Timothy       23.0
dtype: float64

contains

'contains' failed: 'Series' objects are mutable, thus they cannot be hashed

countOf

'countOf' failed: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

delitem

'delitem' failed: 'Alexandra    99
Raymond      72
Brian         1
Kenneth      15
Timothy       0
dtype: int64' is an invalid key

eq

Alexandra    False
Raymond      False
Brian        False
Kenneth      False
Timothy      False
dtype: bool

floordiv

Alexandra    0.0
Raymond      0.0
Brian        NaN
Kenneth      3.0
Timothy      inf
dtype: float64

ge

Alexandra    False
Raymond      False
Brian        False
Kenneth       True
Timothy       True
dtype: bool

getitem

'getitem' failed: index 99 is out of bounds for axis 0 with size 5

gt

Alexandra    False
Raymond      False
Brian        False
Kenneth       True
Timothy       True
dtype: bool

iadd

Alexandra    167.0
Raymond      135.0
Brian          NaN
Kenneth       62.0
Timothy       23.0
dtype: float64

iand

'iand' failed: unsupported operand type(s) for &: 'float' and 'int'

iconcat

Alexandra    167.0
Raymond      135.0
Brian          NaN
Kenneth       62.0
Timothy       23.0
dtype: float64

ifloordiv

Alexandra    0.0
Raymond      0.0
Brian        NaN
Kenneth      3.0
Timothy      inf
dtype: float64

ilshift

'ilshift' failed: unsupported operand type(s) for <<=: 'Series' and 'Series'

imatmul

nan

imod

Alexandra    68.0
Raymond      63.0
Brian         NaN
Kenneth       2.0
Timothy       NaN
dtype: float64

imul

Alexandra    6732.0
Raymond      4536.0
Brian           NaN
Kenneth       705.0
Timothy         0.0
dtype: float64

index

'index' failed: index() takes exactly one argument (2 given)

indexOf

'indexOf' failed: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

inv

'inv' failed: inv() takes exactly one argument (2 given)

invert

'invert' failed: invert() takes exactly one argument (2 given)

ior

'ior' failed: unsupported operand type(s) for |: 'float' and 'int'

ipow

Alexandra    2.620489e+181
Raymond      3.568778e+129
Brian                  NaN
Kenneth       1.206335e+25
Timothy       1.000000e+00
dtype: float64

irshift

'irshift' failed: unsupported operand type(s) for >>=: 'Series' and 'Series'

is_

False

is_not

True

isub

Alexandra   -31.0
Raymond      -9.0
Brian         NaN
Kenneth      32.0
Timothy      23.0
dtype: float64

itemgetter

operator.itemgetter(Alexandra    68.0
Raymond      63.0
Brian         NaN
Kenneth      47.0
Timothy      23.0
dtype: float64, Alexandra    99
Raymond      72
Brian         1
Kenneth      15
Timothy       0
dtype: int64)

itruediv

Alexandra    0.686869
Raymond      0.875000
Brian             NaN
Kenneth      3.133333
Timothy           inf
dtype: float64

ixor

'ixor' failed: unsupported operand type(s) for ^: 'float' and 'int'

le

Alexandra     True
Raymond       True
Brian        False
Kenneth      False
Timothy      False
dtype: bool

length_hint

'length_hint' failed: 'Series' object cannot be interpreted as an integer

lshift

'lshift' failed: unsupported operand type(s) for <<: 'Series' and 'Series'

lt

Alexandra     True
Raymond       True
Brian        False
Kenneth      False
Timothy      False
dtype: bool

matmul

nan

methodcaller

'methodcaller' failed: method name must be a string

mod

Alexandra    68.0
Raymond      63.0
Brian         NaN
Kenneth       2.0
Timothy       NaN
dtype: float64

mul

Alexandra    6732.0
Raymond      4536.0
Brian           NaN
Kenneth       705.0
Timothy         0.0
dtype: float64

ne

Alexandra    True
Raymond      True
Brian        True
Kenneth      True
Timothy      True
dtype: bool

neg

'neg' failed: neg() takes exactly one argument (2 given)

not_

'not_' failed: not_() takes exactly one argument (2 given)

or_

'or_' failed: unsupported operand type(s) for |: 'float' and 'int'

pos

'pos' failed: pos() takes exactly one argument (2 given)

pow

Alexandra    2.620489e+181
Raymond      3.568778e+129
Brian                  NaN
Kenneth       1.206335e+25
Timothy       1.000000e+00
dtype: float64

rshift

'rshift' failed: unsupported operand type(s) for >>: 'Series' and 'Series'

setitem

'setitem' failed: setitem expected 3 arguments, got 2

sub

Alexandra   -31.0
Raymond      -9.0
Brian         NaN
Kenneth      32.0
Timothy      23.0
dtype: float64

truediv

Alexandra    0.686869
Raymond      0.875000
Brian             NaN
Kenneth      3.133333
Timothy           inf
dtype: float64

truth

'truth' failed: truth() takes exactly one argument (2 given)

xor

'xor' failed: unsupported operand type(s) for ^: 'float' and 'int'

There is an .eq ( == ) method and an .equals method. They have slightly different behavior. The later treats NaN and equal, while the former does not. If you were writing unit tests to compare dataframes, this distinction is important.

— Matt Harrison. Learning the Pandas Library: Python Tools for Data Munging, Data Analysis, and Visualization (Kindle Locations 746-749).

In [63]:
a, b = series
a
Out[63]:
Alexandra    68.0
Raymond      63.0
Brian         NaN
Kenneth      47.0
Timothy      23.0
dtype: float64
In [64]:
a.equals(a)
Out[64]:
True
In [65]:
a.eq(a)
Out[65]:
Alexandra     True
Raymond       True
Brian        False
Kenneth       True
Timothy       True
dtype: bool