skoot.balance
.over_sample_balance¶
-
skoot.balance.
over_sample_balance
(X, y, balance_ratio=0.2, random_state=None, shuffle=True)[source][source]¶ Over sample a minority class to a specified ratio.
One strategy for balancing data is to over-sample the minority class until it is represented at the prescribed
balance_ratio
. While there is significant literature to show that this is not the best technique, and can sometimes lead to over-fitting, there are instances where it can work well.Parameters: X : array-like, shape (n_samples, n_features)
The training array. Samples from this array will be resampled with replacement for the minority class.
y : array-like, shape (n_samples,)
Training labels corresponding to the samples in
X
.balance_ratio : float, optional (default=0.2)
The minimum acceptable ratio of
$MINORITY_CLASS : $MAJORITY_CLASS
representation, where 0 <ratio
<= 1random_state : int, None or numpy RandomState, optional (default=None)
The seed to construct the random state to generate random selections.
shuffle : bool, optional (default=True)
Whether to shuffle the output.
Examples
>>> from sklearn.datasets import make_classification >>> X, y = make_classification(n_samples=1000, random_state=42, ... n_classes=2, weights=[0.99, 0.01]) >>> X_bal, y_bal = over_sample_balance(X, y, balance_ratio=0.2, ... random_state=42) >>> ratio = round((y_bal == 1).sum() / float((y_bal == 0).sum()), 1) >>> assert ratio == 0.2, ratio
Note that the count of samples is now greater than it initially was:
>>> assert X_bal.shape[0] > 1000