.. _sphx_glr_auto_examples_balance_ex_smote.py: ==================== SMOTE class balancer ==================== This example creates an imbalanced classification dataset, and applies the `Synthetic Minority Observation TEchnique (SMOTE) `_ class balancing technique in order to balance it by supplementing synthetic minority class observations. .. raw:: html
.. image:: /auto_examples/balance/images/sphx_glr_ex_smote_001.png :align: center .. code-block:: python print(__doc__) # Author: Taylor Smith from matplotlib import pyplot as plt from sklearn.datasets import make_blobs from sklearn.utils.validation import check_random_state from skoot.balance import smote_balance plt.figure(figsize=(6, 4)) # ############################################################################# # make blobs and grossly undersample one of them random_state = check_random_state(42) n_samples = 1000 X, y = make_blobs(n_samples=n_samples, centers=2, random_state=random_state, n_features=5, cluster_std=4.5) minority_mask = y == 0 sample_mask = ((~minority_mask) | (minority_mask & (random_state.rand(minority_mask.shape[0]) < 0.05))) X, y = X[sample_mask, :], y[sample_mask] plt.subplot(121) plt.scatter(X[:, 0], X[:, 1], c=y) plt.title("Initial imbalanced dataset") # ############################################################################# # Balance the dataset X_balance, y_balance = smote_balance(X, y, balance_ratio=0.2, random_state=42) plt.subplot(122) plt.scatter(X_balance[:, 0], X_balance[:, 1], c=y_balance) plt.title("Balanced dataset") plt.show() **Total running time of the script:** ( 0 minutes 0.022 seconds) .. only :: html .. container:: sphx-glr-footer .. container:: sphx-glr-download :download:`Download Python source code: ex_smote.py ` .. container:: sphx-glr-download :download:`Download Jupyter notebook: ex_smote.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_