API Reference

This is the class and function reference for skoot. Please refer to the full user guide for further details, as the class and function raw specifications may not be enough to give full guidelines on their uses.

skoot.base: Base metaclasses and utility functions

Base classes

base.BasePDTransformer([cols, as_df]) The base class for all Pandas frame transformers.

Helper functions

base.make_transformer(func, **kwargs) Make a function into a scikit-learn TransformerMixin.

skoot.decorators: Decorator utilities

Decorator methods

decorators.overrides(interface_class) Decorator for methods that override super methods.
decorators.suppress_warnings(func) Force a method to suppress all warnings it may raise.

skoot.balance: Class imbalance remedies

Methods for addressing class imbalance.

User guide: See the The balance submodule section for further details.

Balancing functions

balance.over_sample_balance(X, y[, …]) Over sample a minority class to a specified ratio.
balance.smote_balance(X, y[, …]) Balance a dataset using SMOTE.
balance.under_sample_balance(X, y[, …]) Under sample the majority class to a specified ratio.

skoot.datasets: Dataset loaders

User guide: See the The datasets submodule section for further details.

Dataset loading functions

datasets.load_adult_df([include_tgt, …]) Load and return the adult dataset (classification).
datasets.load_boston_df([include_tgt, …]) Get the Boston housing dataset.
datasets.load_breast_cancer_df([…]) Get the breast cancer dataset.
datasets.load_iris_df([include_tgt, …]) Get the iris dataset.

skoot.decomposition: Various matrix decompositions

User guide: See the The decomposition submodule section for further details.

Decomposition classes

decomposition.SelectiveIncrementalPCA([…]) Incremental principal components analysis (IPCA).
decomposition.SelectiveKernelPCA([cols, …]) Kernel Principal component analysis (KPCA) (applied to selected columns).
decomposition.SelectiveNMF([cols, as_df, …]) Non-Negative Matrix Factorization (NMF) (applied to selected columns).
decomposition.SelectivePCA([cols, as_df, …]) Principal component analysis (PCA) (applied to selected columns).
decomposition.SelectiveTruncatedSVD([cols, …]) Dimensionality reduction using truncated SVD (aka LSA).
decomposition.QRDecomposition(X[, pivot]) Perform the QR decomposition on a matrix.

skoot.exploration: Exploratory data analysis

User guide: See the The exploration submodule section for further details.

Exploratory analysis functions

exploration.summarize(X) Summarize a dataframe.

skoot.feature_extraction: Feature extraction methods

User guide: See the The feature extraction submodule section for further details.

Feature extraction estimators

feature_extraction.DateFactorizer([cols, …]) Extract new features from datetime features.
feature_extraction.InteractionTermTransformer([…]) Create interaction terms between predictors.
feature_extraction.TimeDeltaFeatures([cols, …]) Compute the time lapse between timestamp events.

skoot.feature_selection: Feature selection methods

User guide: See the The feature selection submodule section for further details.

Feature selection estimators

feature_selection.BaseFeatureSelector([…]) Base class for feature selectors.
feature_selection.FeatureFilter([cols, as_df]) A simple feature-dropping transformer class.
feature_selection.LinearCombinationFilter([…]) Filter any perfect linear combinations in a matrix.
feature_selection.MultiCorrFilter([cols, …]) Remove highly correlated features.
feature_selection.NearZeroVarianceFilter([…]) Identify near zero variance predictors.
feature_selection.SparseFeatureFilter([…]) Drop overly sparse features.

skoot.model_validation: Model validation & monitoring

User guide: See the The model validation submodule section for further details.

Model validators & monitoring classes

model_validation.CustomValidator([cols, …]) Validate test features given custom functions.
model_validation.DistHypothesisValidator([…]) Validate test distributions using various hypothesis tests.

skoot.preprocessing: Pre-processing transformers

User guide: See the The preprocessing submodule section for further details.

Continuous feature binning

preprocessing.BinningTransformer([cols, …]) Bin continuous variables.

Dataframe schema transformers

preprocessing.SchemaNormalizer(schema) Enforce a schema on an input dataframe.

Encoding transformers

preprocessing.DummyEncoder([cols, as_df, …]) Dummy encode categorical data.

Scalers/normalizers

preprocessing.SelectiveMaxAbsScaler([cols, …]) Scale each feature by its maximum absolute value.
preprocessing.SelectiveMinMaxScaler([cols, …]) Transforms features by scaling each feature to a given range.
preprocessing.SelectiveRobustScaler([cols, …]) Scale features using statistics that are robust to outliers.
preprocessing.SelectiveStandardScaler([…]) Standardize features by removing the mean and scaling to unit variance (applied to selected columns).

Skewness transformers

preprocessing.BoxCoxTransformer([cols, …]) Apply the Box-Cox transformation to select features in a dataframe.
preprocessing.YeoJohnsonTransformer([cols, …]) Apply the Yeo-Johnson transformation to a dataset.

Other transformers

preprocessing.DateTransformer([cols, …]) Cast features to datetime.

skoot.utils: Common utility functions

User guide: See the The utils submodule section for further details.

DataFrame utilities

utils.dataframe_or_array(X, as_df) Get a dataframe or numpy array.
utils.get_continuous_columns(X) Get all continuous features from a pandas DataFrame.
utils.get_datetime_columns(X) Get all datetime features from a pandas DataFrame.
utils.get_numeric_columns(X) Get all numeric columns from a pandas DataFrame.
utils.safe_drop_samples(X, drop_samples) Drop samples (rows) from a matrix.
utils.safe_mask_samples(X, mask) Select samples (rows) from a matrix from a mask.
utils.safe_vstack(a, b) Stack two arrays on top of one another.

Iterable utilities

utils.chunk(v, n) Chunk a vector into k roughly equal parts.
utils.ensure_iterable(element) Make an element an iterable.
utils.flatten_all(container) Recursively flattens an arbitrarily nested iterable.
utils.is_iterable(x) Determine whether an element is iterable.

Metaestimator classes/decorators

utils.timed_instance_method([attribute_name]) Function timer decorator.

Profiling utilities

utils.profile_estimator(estimator) Profile the timed functions of an estimator.

Series utilities

utils.is_datetime_type(series) Determine whether a series is a datetime.

Validation utilities

utils.check_dataframe(X[, cols, …]) Check an input dataframe.
utils.type_or_iterable_to_col_mapping(cols, …) Map a parameter to various columns in a dict.
utils.validate_multiple_cols(clsname, cols) Validate that there are at least two columns to evaluate.
utils.validate_multiple_rows(clsname, frame) Validate that there are at least two samples to evaluate.
utils.validate_test_set_columns(fit_columns, …) Validate that the test set columns will work.