API Reference¶
This is the class and function reference for skoot. Please refer to the full user guide for further details, as the class and function raw specifications may not be enough to give full guidelines on their uses.
skoot.base
: Base metaclasses and utility functions¶
Base classes¶
base.BasePDTransformer ([cols, as_df]) |
The base class for all Pandas frame transformers. |
Helper functions¶
base.make_transformer (func, **kwargs) |
Make a function into a scikit-learn TransformerMixin. |
skoot.decorators
: Decorator utilities¶
Decorator methods¶
decorators.overrides (interface_class) |
Decorator for methods that override super methods. |
decorators.suppress_warnings (func) |
Force a method to suppress all warnings it may raise. |
skoot.balance
: Class imbalance remedies¶
Methods for addressing class imbalance.
User guide: See the The balance submodule section for further details.
Balancing functions¶
balance.over_sample_balance (X, y[, …]) |
Over sample a minority class to a specified ratio. |
balance.smote_balance (X, y[, …]) |
Balance a dataset using SMOTE. |
balance.under_sample_balance (X, y[, …]) |
Under sample the majority class to a specified ratio. |
skoot.datasets
: Dataset loaders¶
User guide: See the The datasets submodule section for further details.
Dataset loading functions¶
datasets.load_adult_df ([include_tgt, …]) |
Load and return the adult dataset (classification). |
datasets.load_boston_df ([include_tgt, …]) |
Get the Boston housing dataset. |
datasets.load_breast_cancer_df ([…]) |
Get the breast cancer dataset. |
datasets.load_iris_df ([include_tgt, …]) |
Get the iris dataset. |
skoot.decomposition
: Various matrix decompositions¶
User guide: See the The decomposition submodule section for further details.
Decomposition classes¶
decomposition.SelectiveIncrementalPCA ([…]) |
Incremental principal components analysis (IPCA). |
decomposition.SelectiveKernelPCA ([cols, …]) |
Kernel Principal component analysis (KPCA) (applied to selected columns). |
decomposition.SelectiveNMF ([cols, as_df, …]) |
Non-Negative Matrix Factorization (NMF) (applied to selected columns). |
decomposition.SelectivePCA ([cols, as_df, …]) |
Principal component analysis (PCA) (applied to selected columns). |
decomposition.SelectiveTruncatedSVD ([cols, …]) |
Dimensionality reduction using truncated SVD (aka LSA). |
decomposition.QRDecomposition (X[, pivot]) |
Perform the QR decomposition on a matrix. |
skoot.exploration
: Exploratory data analysis¶
User guide: See the The exploration submodule section for further details.
Exploratory analysis functions¶
exploration.summarize (X) |
Summarize a dataframe. |
skoot.feature_extraction
: Feature extraction methods¶
User guide: See the The feature extraction submodule section for further details.
Feature extraction estimators¶
feature_extraction.DateFactorizer ([cols, …]) |
Extract new features from datetime features. |
feature_extraction.InteractionTermTransformer ([…]) |
Create interaction terms between predictors. |
feature_extraction.TimeDeltaFeatures ([cols, …]) |
Compute the time lapse between timestamp events. |
skoot.feature_selection
: Feature selection methods¶
User guide: See the The feature selection submodule section for further details.
Feature selection estimators¶
feature_selection.BaseFeatureSelector ([…]) |
Base class for feature selectors. |
feature_selection.FeatureFilter ([cols, as_df]) |
A simple feature-dropping transformer class. |
feature_selection.LinearCombinationFilter ([…]) |
Filter any perfect linear combinations in a matrix. |
feature_selection.MultiCorrFilter ([cols, …]) |
Remove highly correlated features. |
feature_selection.NearZeroVarianceFilter ([…]) |
Identify near zero variance predictors. |
feature_selection.SparseFeatureFilter ([…]) |
Drop overly sparse features. |
skoot.model_validation
: Model validation & monitoring¶
User guide: See the The model validation submodule section for further details.
Model validators & monitoring classes¶
model_validation.CustomValidator ([cols, …]) |
Validate test features given custom functions. |
model_validation.DistHypothesisValidator ([…]) |
Validate test distributions using various hypothesis tests. |
skoot.preprocessing
: Pre-processing transformers¶
User guide: See the The preprocessing submodule section for further details.
Continuous feature binning¶
preprocessing.BinningTransformer ([cols, …]) |
Bin continuous variables. |
Dataframe schema transformers¶
preprocessing.SchemaNormalizer (schema) |
Enforce a schema on an input dataframe. |
Encoding transformers¶
preprocessing.DummyEncoder ([cols, as_df, …]) |
Dummy encode categorical data. |
Scalers/normalizers¶
preprocessing.SelectiveMaxAbsScaler ([cols, …]) |
Scale each feature by its maximum absolute value. |
preprocessing.SelectiveMinMaxScaler ([cols, …]) |
Transforms features by scaling each feature to a given range. |
preprocessing.SelectiveRobustScaler ([cols, …]) |
Scale features using statistics that are robust to outliers. |
preprocessing.SelectiveStandardScaler ([…]) |
Standardize features by removing the mean and scaling to unit variance (applied to selected columns). |
Skewness transformers¶
preprocessing.BoxCoxTransformer ([cols, …]) |
Apply the Box-Cox transformation to select features in a dataframe. |
preprocessing.YeoJohnsonTransformer ([cols, …]) |
Apply the Yeo-Johnson transformation to a dataset. |
Other transformers¶
preprocessing.DateTransformer ([cols, …]) |
Cast features to datetime. |
skoot.utils
: Common utility functions¶
User guide: See the The utils submodule section for further details.
DataFrame utilities¶
utils.dataframe_or_array (X, as_df) |
Get a dataframe or numpy array. |
utils.get_continuous_columns (X) |
Get all continuous features from a pandas DataFrame. |
utils.get_datetime_columns (X) |
Get all datetime features from a pandas DataFrame. |
utils.get_numeric_columns (X) |
Get all numeric columns from a pandas DataFrame. |
utils.safe_drop_samples (X, drop_samples) |
Drop samples (rows) from a matrix. |
utils.safe_mask_samples (X, mask) |
Select samples (rows) from a matrix from a mask. |
utils.safe_vstack (a, b) |
Stack two arrays on top of one another. |
Iterable utilities¶
utils.chunk (v, n) |
Chunk a vector into k roughly equal parts. |
utils.ensure_iterable (element) |
Make an element an iterable. |
utils.flatten_all (container) |
Recursively flattens an arbitrarily nested iterable. |
utils.is_iterable (x) |
Determine whether an element is iterable. |
Metaestimator classes/decorators¶
utils.timed_instance_method ([attribute_name]) |
Function timer decorator. |
Profiling utilities¶
utils.profile_estimator (estimator) |
Profile the timed functions of an estimator. |
Series utilities¶
utils.is_datetime_type (series) |
Determine whether a series is a datetime. |
Validation utilities¶
utils.check_dataframe (X[, cols, …]) |
Check an input dataframe. |
utils.type_or_iterable_to_col_mapping (cols, …) |
Map a parameter to various columns in a dict. |
utils.validate_multiple_cols (clsname, cols) |
Validate that there are at least two columns to evaluate. |
utils.validate_multiple_rows (clsname, frame) |
Validate that there are at least two samples to evaluate. |
utils.validate_test_set_columns (fit_columns, …) |
Validate that the test set columns will work. |