skoot.feature_extraction
.DateFactorizer¶
-
class
skoot.feature_extraction.
DateFactorizer
(cols=None, as_df=True, drop_original=True, sep='_', features=('year', 'month', 'day', 'hour'))[source][source]¶ Extract new features from datetime features.
Automatically extract new features from datetime features. This class operates on datetime series objects and extracts features such as “year”, “month”, etc. These can then be expanded via one-hot encoding or further processed via other pre-processing techniques.
Parameters: cols : array-like, shape=(n_features,), optional (default=None)
The names of the columns on which to apply the transformation. Will apply to all columns if None specified. Note that in this class, the columns applied-to must be DateTime types or this will raise a ValueError.
as_df : bool, optional (default=True)
Whether to return a Pandas
DataFrame
in thetransform
method. If False, will return a Numpyndarray
instead.drop_original : bool, optional (default=True)
Whether to drop the original features from the dataframe prior to returning from the
transform
method.sep : str or unicode, optional (default=”_”)
The string separator between the existing feature name and the extracted feature. E.g., for a feature named “Transaction” and for
features=("year", "month")
, the original variable will be split into two new ones: “Transaction_year” and “Transaction_month”.features : iterable, optional (default=(“year”, “month”, “day”, “hour”))
The features to extract. These are attributes of the DateTime class and will raise an AttributeError if an invalid feature is passed.
Attributes
fit_cols_ (list) The columns the transformer was fit on. Examples
>>> import pandas as pd >>> from datetime import datetime as dt >>> strp = dt.strptime >>> data = [ ... [1, dt.strptime("06-01-2018", "%m-%d-%Y")], ... [2, dt.strptime("06-02-2018", "%m-%d-%Y")], ... [3, dt.strptime("06-03-2018", "%m-%d-%Y")], ... [4, dt.strptime("06-04-2018", "%m-%d-%Y")], ... [5, None] ... ] >>> df = pd.DataFrame.from_records(data, columns=["a", "b"]) >>> DateFactorizer(cols=['b']).fit_transform(df) a b_year b_month b_day b_hour 0 1 2018.0 6.0 1.0 0.0 1 2 2018.0 6.0 2.0 0.0 2 3 2018.0 6.0 3.0 0.0 3 4 2018.0 6.0 4.0 0.0 4 5 NaN NaN NaN NaN
Methods
fit
(X[, y])Fit the date factorizer. fit_transform
(X[, y])Fit the estimator and apply the date factorization to a dataframe. get_params
([deep])Get parameters for this estimator. set_params
(**params)Set the parameters of this estimator. transform
(X)Apply the date transformation to a dataframe. -
__init__
(cols=None, as_df=True, drop_original=True, sep='_', features=('year', 'month', 'day', 'hour'))[source][source]¶ Initialize self. See help(type(self)) for accurate signature.
-
fit
(X, y=None)[source][source]¶ Fit the date factorizer.
This is a tricky class because the “fit” isn’t super necessary… But we use it as a validation stage to ensure the defined cols genuinely are datetime columns. That’s the only reason this all happens in the fit portion.
Parameters: X : pd.DataFrame, shape=(n_samples, n_features)
The Pandas frame to fit. The frame will only be fit on the prescribed
cols
(see__init__
) or all of them ifcols
is None.y : array-like or None, shape=(n_samples,), optional (default=None)
Pass-through for
sklearn.pipeline.Pipeline
.
-
fit_transform
(X, y=None, **kwargs)[source][source]¶ Fit the estimator and apply the date factorization to a dataframe.
This is a tricky class because the “fit” isn’t super necessary… But we use it as a validation stage to ensure the defined cols genuinely are datetime types. That’s the only reason this all happens in the fit portion.
Parameters: X : pd.DataFrame, shape=(n_samples, n_features)
The Pandas frame to fit. The operation will be applied to a copy of the input data, and the result will be returned.
y : array-like or None, shape=(n_samples,), optional (default=None)
Pass-through for
sklearn.pipeline.Pipeline
.Returns: X : pd.DataFrame or np.ndarray, shape=(n_samples, n_features)
The operation is applied to a copy of
X
, and the result set is returned.
-
get_params
(deep=True)[source]¶ Get parameters for this estimator.
Parameters: deep : boolean, optional
If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: params : mapping of string to any
Parameter names mapped to their values.
-
set_params
(**params)[source]¶ Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it’s possible to update each component of a nested object.Returns: self
-
transform
(X)[source][source]¶ Apply the date transformation to a dataframe.
This method will extract features from datetime features as specified by the
features
arg.Parameters: X : pd.DataFrame, shape=(n_samples, n_features)
The Pandas frame to transform. The operation will be applied to a copy of the input data, and the result will be returned.
Returns: X : pd.DataFrame or np.ndarray, shape=(n_samples, n_features)
The operation is applied to a copy of
X
, and the result set is returned.
-