Anonymous transformers in skoot

Sometimes you have a pre-processing stage that finds itself awkwardly positioned in the middle of your pipeline and you’re left with one of two options:

  1. Write a full transformer class
  2. Break your pipeline up into pieces

Obviously, the preferable action is the first, however many times your function doesn’t actually fit any training set parameters, so the transformer feels like overkill.

This tutorial will introduce you to making anonymous, lightweight transformers on the fly that will fit into your modeling pipeline seamlessly.


Out:

Absolute scaled values:
     StandardScaler1  ...  StandardScaler4
73          0.354517  ...         0.022248
18          0.133071  ...         1.179118
118         2.304867  ...         1.490583
78          0.232620  ...         0.422703
76          1.207795  ...         0.289218
31          0.498762  ...         1.045633
64          0.254968  ...         0.155733
141         1.329692  ...         1.490583
68          0.476414  ...         0.422703
82          0.011174  ...         0.022248
110         0.842104  ...         1.090128
12          1.230143  ...         1.446088
36          0.376865  ...         1.312603
9           1.108246  ...         1.446088
19          0.864452  ...         1.179118
56          0.598311  ...         0.556188
104         0.842104  ...         1.357098
69          0.254968  ...         0.111238
55          0.133071  ...         0.155733
132         0.720208  ...         1.357098
29          1.352040  ...         1.312603
127         0.354517  ...         0.823158
26          0.986349  ...         1.045633
128         0.720208  ...         1.223613
131         2.548661  ...         1.090128
145         1.085898  ...         1.490583
108         1.085898  ...         0.823158
143         1.207795  ...         1.490583
45          1.230143  ...         1.179118
30          1.230143  ...         1.312603

[30 rows x 4 columns]

print(__doc__)

# Author: Taylor Smith <taylor.smith@alkaline-ml.com>

# #############################################################################
# Introduce an interesting scenario
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from skoot.preprocessing import SelectiveStandardScaler
from skoot.base import make_transformer
from skoot.datasets import load_iris_df

X = load_iris_df(tgt_name="target")
y = X.pop('target')
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42,
                                                    test_size=0.2)

# Let's say we want to scale our features with the StandardScaler, but
# for whatever reason we only want the ABSOLUTE value of the scaled values.
# We *could* create a transformer or split our pipeline, but either case is
# klunky and could interrupt our CV process in a grid search.
#
# So we'll instead define a simple commutative function that will be wrapped
# in an "anonymous" transformer
def make_abs(X):
    return X.abs()


pipe = Pipeline([
    ("scale", SelectiveStandardScaler()),
    ("abs", make_transformer(make_abs))
])

pipe.fit(X_train, y_train)
print("Absolute scaled values: ")
print(pipe.transform(X_test))

Total running time of the script: ( 0 minutes 0.017 seconds)

Gallery generated by Sphinx-Gallery