skoot.balance.over_sample_balance

skoot.balance.over_sample_balance(X, y, balance_ratio=0.2, random_state=None, shuffle=True)[source][source]

Over sample a minority class to a specified ratio.

One strategy for balancing data is to over-sample the minority class until it is represented at the prescribed balance_ratio. While there is significant literature to show that this is not the best technique, and can sometimes lead to over-fitting, there are instances where it can work well.

Parameters:

X : array-like, shape (n_samples, n_features)

The training array. Samples from this array will be resampled with replacement for the minority class.

y : array-like, shape (n_samples,)

Training labels corresponding to the samples in X.

balance_ratio : float, optional (default=0.2)

The minimum acceptable ratio of $MINORITY_CLASS : $MAJORITY_CLASS representation, where 0 < ratio <= 1

random_state : int, None or numpy RandomState, optional (default=None)

The seed to construct the random state to generate random selections.

shuffle : bool, optional (default=True)

Whether to shuffle the output.

Examples

>>> from sklearn.datasets import make_classification
>>> X, y = make_classification(n_samples=1000, random_state=42,
...                            n_classes=2, weights=[0.99, 0.01])
>>> X_bal, y_bal = over_sample_balance(X, y, balance_ratio=0.2,
...                                    random_state=42)
>>> ratio = round((y_bal == 1).sum() / float((y_bal == 0).sum()), 1)
>>> assert ratio == 0.2, ratio

Note that the count of samples is now greater than it initially was:

>>> assert X_bal.shape[0] > 1000

Examples using skoot.balance.over_sample_balance