Series accessors¶
Series.datetime¶
Series.scaler¶
Series.scaler.minmax¶
-
scaler.minmax(feature_range=(0, 1))¶ Transform series by scaling to a given range
Parameters: feature_range (tuple, optional) – Desired range of transformed data, by default (0, 1) Example
>>> import pandas as pd >>> import pandas_lightning >>> sr = pd.Series([1,2,3,4,5]) >>> sr.scaler.minmax() 0 0.00 1 0.25 2 0.50 3 0.75 4 1.00 dtype: float64
Returns: A transformed copy of the series. Return type: pandas.Series
Series.scaler.standardize¶
-
scaler.standardize(ddof=1)¶ Standardize features by removing the mean and scaling to unit variance. Similar to scikit-learn’s StandardScaler.
Parameters: ddof (int, optional) – Degrees of freedom, by default 1 Examples
>>> import pandas as pd >>> import pandas_lightning >>> sr = pd.Series([1,2,3,4,5]) >>> sr.scaler.standardize() 0 -1.264911 1 -0.632456 2 0.000000 3 0.632456 4 1.264911 dtype: float64
Returns: A transformed copy of the series Return type: pandas.Series
Series.scaler.log1p¶
-
scaler.log1p()¶ Transform to log(1+x)
Notes
This transformation is numerically stable for small numbers compared to the log(x) transformation.
Example
>>> import pandas as pd >>> import pandas_lightning >>> sr = pd.Series([1,2,3,4,5]) >>> sr.scaler.log1p() 0 0.693147 1 1.098612 2 1.386294 3 1.609438 4 1.791759 dtype: float64
Returns: A transformed copy of the series Return type: pandas.Series
Series.scaler.expm1¶
-
scaler.expm1()¶ Transform to exp(x)-1
Notes
This transformation is numerically stable for small numbers compared to the exp(x) transformation.
Example
>>> import pandas as pd >>> import pandas_lightning >>> sr = pd.Series([1,2,3,4,5]) >>> sr.scaler.expm1() 0 1.718282 1 6.389056 2 19.085537 3 53.598150 4 147.413159 dtype: float64
Returns: A transformed copy of the series Return type: pandas.Series
Series.pctg¶
Provides convenience functions for common calculations of missing values.
Series.pctg.zeros¶
-
pctg.zeros¶ Get the percentage of zeros in the Series
Example
>>> import pandas as pd >>> import pandas_lightning >>> sr = pd.Series([1, 2, 0, 8.3, 0]) >>> sr.pctg.zeros 0.4
Returns: Return a copy of the Series Return type: Series
Series.pctg.nans¶
-
pctg.nans¶ Get the percentage of missing values in the Series
Example
>>> import pandas as pd >>> import numpy as np >>> import pandas_lightning >>> sr = pd.Series([1, np.nan, np.nan, 8.3, np.nan]) >>> sr.pctg.nans 0.6
Returns: Return a copy of the Series Return type: Series
Series.pctg.uniques¶
-
pctg.uniques¶ Get the percentage of number of uniques divided by the length of the series.
Notes
This is useful to check the cardinality of a column with respect to its length. If percentage of uniques is close to 1, it probably means this column does not follow a categorical distribution.
Example
>>> import pandas as pd >>> import pandas_lightning >>> sr = pd.Series(["hey", "I", "just", "met", "you"]) >>> sr.pctg.uniques 1.0
Returns: Return a copy of the Series Return type: Series
Series.asciiplot¶
Series.asciiplot.hist¶
-
asciiplot.hist(size: int = 10, hashes: int = 30, len_label: int = 10, max_categories: int = 20)¶ Plots a horizontal histogram using
#Parameters: - size (int, optional) – Size of bins, by default 10
- hashes (int, optional) – Maximum number of hashes
#to display on the the label with the highest frequency, by default 30 - len_label (int, optional) – Maximum length of the text label, by default 10
- max_categories (int, optional) – Maximum number of categories to display, by default 50
Notes
This would be useful if you want to get a quick sense of the distribution of your data or if you do not have access to say a Jupyter notebook. The API is deliberately named after the standard library’s
.hist()API.Examples
>>> import pandas as pd >>> import pandas_lightning >>> sr = pd.Series(["red", "blue", "red", "red", "orange", "blue"]) >>> sr.ascii.hist() red ############################## blue #################### orange ##########
Series.map_numerical_binning¶
-
map_numerical_binning.__call__(binning: Union[list, range, dict, int, tuple], by_quantiles: bool = False, ordered: bool = True)¶ Bin a numerical feature into groups. This is useful to transform a continuous variable to categorical.
Parameters: - binning (Union[list, range, dict, int]) – Criteria to bin by.
- by_quantiles (bool, optional) – If the
binningis by quantiles, by default False. This is only applicable ifbinningis an integer. - ordered (bool, optional) – Whether to treat the bins as ordinal variable, by default True
Notes
The underlying APIs are
pandas.cutandpandas.qcut.Examples
>>> import pandas as pd >>> import pandas_lightning >>> sr = pd.Series([23, 94, 44, 95, 29, 8, 17, 42, 29, 48, ... 96, 95, 17, 97, 9, 85, 62, 71, 37, 10, ... 41, 88, 18, 56, 85, 22, 97, 27, 69, 19, ... 37, 10, 85, 11, 73, 96, 56, 0, 18, 3, ... 54, 50, 91, 38, 46, 13, 78, 22, 6, 61])
Ranged binning using
range. Below is grouping in 10’s.>>> sr_cat = sr.map_numerical_binning(range(0,110,10)) # or (0,110,10) >>> sr_cat.ascii.hist() (0, 10] ###################### (10, 20] ########################## (20, 30] ###################### (30, 40] ########### (40, 50] ###################### (50, 60] ########### (60, 70] ########### (70, 80] ########### (80, 90] ############### (90, 100] ##############################
Ranged binning using
list. Below is grouping using the elements in the array as the bounds.>>> sr_cat = sr.map_numerical_binning([0, 18, 21, 25, 30, 100]) >>> sr_cat.ascii.hist() (0, 18] ############ (18, 21] # (21, 25] ### (25, 30] ### (30, 100] ##############################
Ranged binning using
dictionary. Any number that is not in these ranges are considered null.- Kids: 0 < age <= 12
- Teens: 12 < age <= 24
- Adults: 24 < age <= 60
>>> GROUPS = { "": 0, # You must define this "kids": 12, "teens": 24, "adults": 60 } >>> sr_bin_group = sr.map_numerical_binning(GROUPS) >>> sr_bin_group.ascii.hist() kids ############## teens ################## adults ##############################
Binning with equal size range using
int. Below the size of each range label is about 25.>>> sr_bin = sr.map_numerical_binning(4) >>> sr_bin.ascii.hist(len_label=15) (-0.097, 24.25] ############################## (24.25, 48.5] ################### (48.5, 72.75] ############## (72.75, 97.0] ########################
Binning by quantiles (equal frequencies) using
intandby_quantileskeyword argument. The resulting distribution is close to a uniform distribution. Below we see the frequencies of each label (the hashes) is about 13.>>> sr_bin_quant = sr.map_numerical_binning(4, by_quantiles=True) >>> sr_bin_quant.ascii.hist(len_label=15) (-0.001, 18.25] ############################## (18.25, 43.0] ########################### (43.0, 76.75] ########################### (76.75, 97.0] ##############################
Returns: A transformed copy of the original series Return type: pandas.Series
Series.map_categorical_binning¶
-
map_categorical_binning.__call__(binning: dict, ordered: bool = False)¶ Group categories into another set of categories.
Parameters: - binning (dict) – Mapping where the key is the name of the new category and the value is a list of the current categories.
- ordered (bool, optional) – Whether to use the order in the binning to represent the inherent order in the new categories, by default False
Example
>>> import pandas as pd >>> import pandas_lightning >>> sr = pd.Series(["apple", "spinach", "cashew", "pear", "kailan", ... "macadamia", "orange"]) >>> sr 0 apple 1 spinach 2 cashew 3 pear 4 kailan 5 macadamia 6 orange dtype: object
Then create a mapping:
>>> GROUPS = { ... "fruits": ["apple", "pear", "orange"], ... "vegetables": ["kailan", "spinach"], ... "nuts": ["cashew", "macadamia"]} >>> sr.map_categorical_binning(GROUPS) 0 fruits 1 vegetables 2 nuts 3 fruits 4 vegetables 5 nuts 6 fruits dtype: category Categories (3, object): [fruits, vegetables, nuts]
Returns: A transformed copy of the series Return type: pandas.Series