.. _sphx_glr_auto_examples_preprocessing_ex_one_hot.py: ================ One-hot encoding ================ Demonstrates how to use the DummyEncoder. For a more comprehensive explanation, take a look at the `demo on alkaline-ml.com `_. .. raw:: html
.. rst-class:: sphx-glr-script-out Out:: Test transformation: age ... native-country_Vietnam 14160 27 ... 0.0 27048 45 ... 0.0 28868 29 ... 0.0 5667 30 ... 0.0 7827 29 ... 0.0 [5 rows x 99 columns] Applied on a row with a new native-country: age ... native-country_Vietnam 14160 27 ... 0.0 [1 rows x 99 columns] | .. code-block:: python print(__doc__) # Author: Taylor Smith from skoot.datasets import load_adult_df from skoot.preprocessing import DummyEncoder from skoot.utils.dataframe import get_categorical_columns from sklearn.model_selection import train_test_split import pandas as pd # ############################################################################# # load & split the data adult = load_adult_df(tgt_name="target") y = adult.pop("target") # we don't want this column _ = adult.pop("education-num") X_train, X_test, y_train, y_test = train_test_split(adult, y, random_state=42, test_size=0.2) # ############################################################################# # Fit a dummy encoder obj_cols = get_categorical_columns(X_train).columns encoder = DummyEncoder(cols=obj_cols, handle_unknown='ignore', n_jobs=4) encoder.fit(X_train, y_train) # ############################################################################# # Apply to the test set print("Test transformation:") print(encoder.transform(X_test).head()) # ############################################################################# # Show we can work with levels we've never seen before test_row = X_test.iloc[0] test_row.set_value("native-country", "Atlantis") trans = encoder.transform(pd.DataFrame([test_row])) print("\nApplied on a row with a new native-country:") print(trans) nc_mask = trans.columns.str.contains("native-country") assert trans[trans.columns[nc_mask]].sum().sum() == 0 **Total running time of the script:** ( 0 minutes 1.526 seconds) .. only :: html .. container:: sphx-glr-footer .. container:: sphx-glr-download :download:`Download Python source code: ex_one_hot.py ` .. container:: sphx-glr-download :download:`Download Jupyter notebook: ex_one_hot.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_