skoot.preprocessing
.DateTransformer¶
-
class
skoot.preprocessing.
DateTransformer
(cols=None, date_format=None, allowed_types=('object', 'datetime64[ns]'))[source][source]¶ Cast features to datetime.
Convert multiple features with potentially differing formats to datetime with specified formats or by inferring the formats. Note that unlike most other Skoot transformers, this one requires that the output be a DataFrame (note the lack of the
as_df
constructor arg).Parameters: cols : array-like, shape=(n_features,), optional (default=None)
The names of the columns on which to apply the transformation. Will apply to all columns if None specified
date_format : str, iterable or None, optional (default=None)
The date format. If None, will infer. If a string, will be used to parse the datetime. If an iterable, should contain strings or None positionally corresponding to
cols
(or a dict mapping columns to formats).allowed_types : iterable, optional (default=(“object”, “datetime64[ns]”))
Permitted Series types. This is used to prevent accidentally casting Series of unexpected types to DateTime. For instance, integer types can be cast to DateTime even though the behavior may be unexpected.
Attributes
DEFAULT_PERMITTED_DTYPES (tuple) This is a static attribute that controls what types can be cast. These are the default permitted pandas dtypes that are allowed. If a column is not one of these types, a ValueError will be raised. To cast an int to datetime, for instance, the allowed_types
arg will need to include “int64”:: allowed_types=DateTransformer.DEFAULT_PERMITTED_DTYPES + (‘int64’,)fit_cols_ (list) The columns the transformer was fit on. formats_ (dict) Maps column name to date format, in case of varying date formats passed in the date_format
parameter.Notes
The
fit
method here is only used for validation that the columns can be cast to datetime.Examples
>>> import pandas as pd >>> from datetime import datetime as dt >>> data = [ ... [1, "06/01/2018", dt.strptime("06-01-2018", "%m-%d-%Y")], ... [2, "06/02/2018", dt.strptime("06-02-2018", "%m-%d-%Y")], ... [3, "06/03/2018", dt.strptime("06-03-2018", "%m-%d-%Y")], ... [4, None, dt.strptime("06-04-2018", "%m-%d-%Y")], ... [5, "06/05/2018", None] ... ] >>> df = pd.DataFrame.from_records(data, columns=["a", "b", "c"]) >>> converter = DateTransformer(cols=["b", "c"], ... date_format=["%m/%d/%Y", None]) >>> converter.fit_transform(df) a b c 0 1 2018-06-01 2018-06-01 1 2 2018-06-02 2018-06-02 2 3 2018-06-03 2018-06-03 3 4 NaT 2018-06-04 4 5 2018-06-05 NaT
Methods
fit
(X[, y])Fit the date transformer. fit_transform
(X[, y])Fit the estimator and apply the date transformation to a dataframe. get_params
([deep])Get parameters for this estimator. set_params
(**params)Set the parameters of this estimator. transform
(X)Apply the date transformation to a dataframe. -
__init__
(cols=None, date_format=None, allowed_types=('object', 'datetime64[ns]'))[source][source]¶ Initialize self. See help(type(self)) for accurate signature.
-
fit
(X, y=None)[source][source]¶ Fit the date transformer.
This is a tricky class because the “fit” isn’t super necessary… But we use it as a validation stage to ensure the defined cols genuinely can be cast to datetime. That’s the only reason this all happens in the fit portion.
Parameters: X : pd.DataFrame, shape=(n_samples, n_features)
The Pandas frame to fit. The frame will only be fit on the prescribed
cols
(see__init__
) or all of them ifcols
is None.y : array-like or None, shape=(n_samples,), optional (default=None)
Pass-through for
sklearn.pipeline.Pipeline
.
-
fit_transform
(X, y=None, **kwargs)[source][source]¶ Fit the estimator and apply the date transformation to a dataframe.
This is a tricky class because the “fit” isn’t super necessary… But we use it as a validation stage to ensure the defined cols genuinely can be cast to datetime. That’s the only reason this all happens in the fit portion.
Parameters: X : pd.DataFrame, shape=(n_samples, n_features)
The Pandas frame to fit. The operation will be applied to a copy of the input data, and the result will be returned.
y : array-like or None, shape=(n_samples,), optional (default=None)
Pass-through for
sklearn.pipeline.Pipeline
.Returns: X : pd.DataFrame or np.ndarray, shape=(n_samples, n_features)
The operation is applied to a copy of
X
, and the result set is returned.
-
get_params
(deep=True)[source]¶ Get parameters for this estimator.
Parameters: deep : boolean, optional
If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: params : mapping of string to any
Parameter names mapped to their values.
-
set_params
(**params)[source]¶ Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it’s possible to update each component of a nested object.Returns: self
-
transform
(X)[source][source]¶ Apply the date transformation to a dataframe.
This method will cast string features to datetimes as specified by the
date_format
arg.Parameters: X : pd.DataFrame, shape=(n_samples, n_features)
The Pandas frame to transform. The operation will be applied to a copy of the input data, and the result will be returned.
Returns: X : pd.DataFrame or np.ndarray, shape=(n_samples, n_features)
The operation is applied to a copy of
X
, and the result set is returned.
-