Series accessors¶

Series.datetime¶

Series.datetime.add¶

datetime.__add__(rhs)¶

Series.datetime.radd¶

datetime.__radd__(lhs)¶

Series.datetime.sub¶

datetime.__sub__(rhs)¶

Series.datetime.rsub¶

datetime.__rsub__(lhs)¶

Series.scaler¶

Series.scaler.minmax¶

scaler.minmax(feature_range=(0, 1))¶

Transform series by scaling to a given range

Parameters:	feature_range (tuple, optional) – Desired range of transformed data, by default (0, 1)

Example

>>> import pandas as pd
>>> import pandas_lightning
>>> sr = pd.Series([1,2,3,4,5])
>>> sr.scaler.minmax()
0    0.00
1    0.25
2    0.50
3    0.75
4    1.00
dtype: float64

Returns:	A transformed copy of the series.
Return type:	pandas.Series

Series.scaler.standardize¶

scaler.standardize(ddof=1)¶

Standardize features by removing the mean and scaling to unit variance. Similar to scikit-learn’s StandardScaler.

Parameters:	ddof (int, optional) – Degrees of freedom, by default 1

Examples

>>> import pandas as pd
>>> import pandas_lightning
>>> sr = pd.Series([1,2,3,4,5])
>>> sr.scaler.standardize()
0   -1.264911
1   -0.632456
2    0.000000
3    0.632456
4    1.264911
dtype: float64

Returns:	A transformed copy of the series
Return type:	pandas.Series

Series.scaler.log1p¶

scaler.log1p()¶

Transform to log(1+x)

Notes

This transformation is numerically stable for small numbers compared to the log(x) transformation.

Example

>>> import pandas as pd
>>> import pandas_lightning
>>> sr = pd.Series([1,2,3,4,5])
>>> sr.scaler.log1p()
0    0.693147
1    1.098612
2    1.386294
3    1.609438
4    1.791759
dtype: float64

Returns:	A transformed copy of the series
Return type:	pandas.Series

Series.scaler.expm1¶

scaler.expm1()¶

Transform to exp(x)-1

Notes

This transformation is numerically stable for small numbers compared to the exp(x) transformation.

Example

>>> import pandas as pd
>>> import pandas_lightning
>>> sr = pd.Series([1,2,3,4,5])
>>> sr.scaler.expm1()
0      1.718282
1      6.389056
2     19.085537
3     53.598150
4    147.413159
dtype: float64

Returns:	A transformed copy of the series
Return type:	pandas.Series

Series.pctg¶

Provides convenience functions for common calculations of missing values.

Series.pctg.zeros¶

pctg.zeros¶

Get the percentage of zeros in the Series

Example

>>> import pandas as pd
>>> import pandas_lightning
>>> sr = pd.Series([1, 2, 0, 8.3, 0])
>>> sr.pctg.zeros
0.4

Returns:	Return a copy of the Series
Return type:	Series

Series.pctg.nans¶

pctg.nans¶

Get the percentage of missing values in the Series

Example

>>> import pandas as pd
>>> import numpy as np
>>> import pandas_lightning
>>> sr = pd.Series([1, np.nan, np.nan, 8.3, np.nan])
>>> sr.pctg.nans
0.6

Returns:	Return a copy of the Series
Return type:	Series

Series.pctg.uniques¶

pctg.uniques¶

Get the percentage of number of uniques divided by the length of the series.

Notes

This is useful to check the cardinality of a column with respect to its length. If percentage of uniques is close to 1, it probably means this column does not follow a categorical distribution.

Example

>>> import pandas as pd
>>> import pandas_lightning
>>> sr = pd.Series(["hey", "I", "just", "met", "you"])
>>> sr.pctg.uniques
1.0

Returns:	Return a copy of the Series
Return type:	Series

Series.tests¶

Series.tests.is_normal¶

tests.is_normal = <function tests.is_normal>¶

Series.asciiplot¶

Series.asciiplot.hist¶

asciiplot.hist(size: int = 10, hashes: int = 30, len_label: int = 10, max_categories: int = 20)¶

Plots a horizontal histogram using #

Parameters:	size (int, optional) – Size of bins, by default 10 hashes (int, optional) – Maximum number of hashes `#` to display on the the label with the highest frequency, by default 30 len_label (int, optional) – Maximum length of the text label, by default 10 max_categories (int, optional) – Maximum number of categories to display, by default 50

Notes

This would be useful if you want to get a quick sense of the distribution of your data or if you do not have access to say a Jupyter notebook. The API is deliberately named after the standard library’s .hist() API.

Examples

>>> import pandas as pd
>>> import pandas_lightning
>>> sr = pd.Series(["red", "blue", "red", "red", "orange", "blue"])
>>> sr.ascii.hist()
       red ##############################
      blue ####################
    orange ##########

Series.map_numerical_binning¶

map_numerical_binning.__call__(binning: Union[list, range, dict, int, tuple], by_quantiles: bool = False, ordered: bool = True)¶

Bin a numerical feature into groups. This is useful to transform a continuous variable to categorical.

Parameters:	binning (Union[list, range, dict, int]) – Criteria to bin by. by_quantiles (bool, optional) – If the `binning` is by quantiles, by default False. This is only applicable if `binning` is an integer. ordered (bool, optional) – Whether to treat the bins as ordinal variable, by default True

Notes

The underlying APIs are pandas.cut and pandas.qcut.

Examples

>>> import pandas as pd
>>> import pandas_lightning
>>> sr = pd.Series([23, 94, 44, 95, 29, 8, 17, 42, 29, 48,
...                 96, 95, 17, 97, 9, 85, 62, 71, 37, 10,
...                 41, 88, 18, 56, 85, 22, 97, 27, 69, 19,
...                 37, 10, 85, 11, 73, 96, 56, 0, 18, 3,
...                 54, 50, 91, 38, 46, 13, 78, 22, 6, 61])

Ranged binning using range. Below is grouping in 10’s.

>>> sr_cat = sr.map_numerical_binning(range(0,110,10))  # or (0,110,10)
>>> sr_cat.ascii.hist()
   (0, 10] ######################
  (10, 20] ##########################
  (20, 30] ######################
  (30, 40] ###########
  (40, 50] ######################
  (50, 60] ###########
  (60, 70] ###########
  (70, 80] ###########
  (80, 90] ###############
 (90, 100] ##############################

Ranged binning using list. Below is grouping using the elements in the array as the bounds.

>>> sr_cat = sr.map_numerical_binning([0, 18, 21, 25, 30, 100])
>>> sr_cat.ascii.hist()
   (0, 18] ############
  (18, 21] #
  (21, 25] ###
  (25, 30] ###
 (30, 100] ##############################

Ranged binning using dictionary. Any number that is not in these ranges are considered null.

Kids: 0 < age <= 12

Teens: 12 < age <= 24

Adults: 24 < age <= 60

>>> GROUPS = {
        "": 0,  # You must define this
        "kids": 12,
        "teens": 24,
        "adults": 60
    }
>>> sr_bin_group = sr.map_numerical_binning(GROUPS)
>>> sr_bin_group.ascii.hist()
      kids ##############
     teens ##################
    adults ##############################

Binning with equal size range using int. Below the size of each range label is about 25.

>>> sr_bin = sr.map_numerical_binning(4)
>>> sr_bin.ascii.hist(len_label=15)
(-0.097, 24.25] ##############################
  (24.25, 48.5] ###################
  (48.5, 72.75] ##############
  (72.75, 97.0] ########################

Binning by quantiles (equal frequencies) using int and by_quantiles keyword argument. The resulting distribution is close to a uniform distribution. Below we see the frequencies of each label (the hashes) is about 13.

>>> sr_bin_quant = sr.map_numerical_binning(4, by_quantiles=True)
>>> sr_bin_quant.ascii.hist(len_label=15)
(-0.001, 18.25] ##############################
  (18.25, 43.0] ###########################
  (43.0, 76.75] ###########################
  (76.75, 97.0] ##############################

Returns:	A transformed copy of the original series
Return type:	pandas.Series

Series.map_categorical_binning¶

map_categorical_binning.__call__(binning: dict, ordered: bool = False)¶

Group categories into another set of categories.

Parameters:	binning (dict) – Mapping where the key is the name of the new category and the value is a list of the current categories. ordered (bool, optional) – Whether to use the order in the binning to represent the inherent order in the new categories, by default False

Example

>>> import pandas as pd
>>> import pandas_lightning
>>> sr = pd.Series(["apple", "spinach", "cashew", "pear", "kailan",
...                 "macadamia", "orange"])
>>> sr
0        apple
1      spinach
2       cashew
3         pear
4       kailan
5    macadamia
6       orange
dtype: object

Then create a mapping:

>>> GROUPS = {
...     "fruits": ["apple", "pear", "orange"],
...     "vegetables": ["kailan", "spinach"],
...     "nuts": ["cashew", "macadamia"]}
>>> sr.map_categorical_binning(GROUPS)
0        fruits
1    vegetables
2          nuts
3        fruits
4    vegetables
5          nuts
6        fruits
dtype: category
Categories (3, object): [fruits, vegetables, nuts]

Returns:	A transformed copy of the series
Return type:	pandas.Series

Series accessors¶

Series.datetime¶

Series.datetime.__add__¶

Series.datetime.__radd__¶

Series.datetime.__sub__¶

Series.datetime.__rsub__¶

Series.scaler¶

Series.scaler.minmax¶

Series.scaler.standardize¶

Series.scaler.log1p¶

Series.scaler.expm1¶

Series.pctg¶

Series.pctg.zeros¶

Series.pctg.nans¶

Series.pctg.uniques¶

Series.tests¶

Series.tests.is_normal¶

Series.asciiplot¶

Series.asciiplot.hist¶

Series.map_numerical_binning¶

Series.map_categorical_binning¶

Series.datetime.add¶

Series.datetime.radd¶

Series.datetime.sub¶

Series.datetime.rsub¶