skoot.preprocessing.DateTransformer

class skoot.preprocessing.DateTransformer(cols=None, date_format=None, allowed_types=('object', 'datetime64[ns]'))[source][source]

Cast features to datetime.

Convert multiple features with potentially differing formats to datetime with specified formats or by inferring the formats. Note that unlike most other Skoot transformers, this one requires that the output be a DataFrame (note the lack of the as_df constructor arg).

Parameters:

cols : array-like, shape=(n_features,), optional (default=None)

The names of the columns on which to apply the transformation. Will apply to all columns if None specified

date_format : str, iterable or None, optional (default=None)

The date format. If None, will infer. If a string, will be used to parse the datetime. If an iterable, should contain strings or None positionally corresponding to cols (or a dict mapping columns to formats).

allowed_types : iterable, optional (default=(“object”, “datetime64[ns]”))

Permitted Series types. This is used to prevent accidentally casting Series of unexpected types to DateTime. For instance, integer types can be cast to DateTime even though the behavior may be unexpected.

Attributes

DEFAULT_PERMITTED_DTYPES (tuple) This is a static attribute that controls what types can be cast. These are the default permitted pandas dtypes that are allowed. If a column is not one of these types, a ValueError will be raised. To cast an int to datetime, for instance, the allowed_types arg will need to include “int64”:: allowed_types=DateTransformer.DEFAULT_PERMITTED_DTYPES + (‘int64’,)
fit_cols_ (list) The columns the transformer was fit on.
formats_ (dict) Maps column name to date format, in case of varying date formats passed in the date_format parameter.

Notes

The fit method here is only used for validation that the columns can be cast to datetime.

Examples

>>> import pandas as pd
>>> from datetime import datetime as dt
>>> data = [
...     [1, "06/01/2018", dt.strptime("06-01-2018", "%m-%d-%Y")],
...     [2, "06/02/2018", dt.strptime("06-02-2018", "%m-%d-%Y")],
...     [3, "06/03/2018", dt.strptime("06-03-2018", "%m-%d-%Y")],
...     [4, None, dt.strptime("06-04-2018", "%m-%d-%Y")],
...     [5, "06/05/2018", None]
... ]
>>> df = pd.DataFrame.from_records(data, columns=["a", "b", "c"])
>>> converter = DateTransformer(cols=["b", "c"],
...                             date_format=["%m/%d/%Y", None])
>>> converter.fit_transform(df)
   a          b          c
0  1 2018-06-01 2018-06-01
1  2 2018-06-02 2018-06-02
2  3 2018-06-03 2018-06-03
3  4        NaT 2018-06-04
4  5 2018-06-05        NaT

Methods

fit(X[, y]) Fit the date transformer.
fit_transform(X[, y]) Fit the estimator and apply the date transformation to a dataframe.
get_params([deep]) Get parameters for this estimator.
set_params(**params) Set the parameters of this estimator.
transform(X) Apply the date transformation to a dataframe.
__init__(cols=None, date_format=None, allowed_types=('object', 'datetime64[ns]'))[source][source]

Initialize self. See help(type(self)) for accurate signature.

fit(X, y=None)[source][source]

Fit the date transformer.

This is a tricky class because the “fit” isn’t super necessary… But we use it as a validation stage to ensure the defined cols genuinely can be cast to datetime. That’s the only reason this all happens in the fit portion.

Parameters:

X : pd.DataFrame, shape=(n_samples, n_features)

The Pandas frame to fit. The frame will only be fit on the prescribed cols (see __init__) or all of them if cols is None.

y : array-like or None, shape=(n_samples,), optional (default=None)

Pass-through for sklearn.pipeline.Pipeline.

fit_transform(X, y=None, **kwargs)[source][source]

Fit the estimator and apply the date transformation to a dataframe.

This is a tricky class because the “fit” isn’t super necessary… But we use it as a validation stage to ensure the defined cols genuinely can be cast to datetime. That’s the only reason this all happens in the fit portion.

Parameters:

X : pd.DataFrame, shape=(n_samples, n_features)

The Pandas frame to fit. The operation will be applied to a copy of the input data, and the result will be returned.

y : array-like or None, shape=(n_samples,), optional (default=None)

Pass-through for sklearn.pipeline.Pipeline.

Returns:

X : pd.DataFrame or np.ndarray, shape=(n_samples, n_features)

The operation is applied to a copy of X, and the result set is returned.

get_params(deep=True)[source]

Get parameters for this estimator.

Parameters:

deep : boolean, optional

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params : mapping of string to any

Parameter names mapped to their values.

set_params(**params)[source]

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns:self
transform(X)[source][source]

Apply the date transformation to a dataframe.

This method will cast string features to datetimes as specified by the date_format arg.

Parameters:

X : pd.DataFrame, shape=(n_samples, n_features)

The Pandas frame to transform. The operation will be applied to a copy of the input data, and the result will be returned.

Returns:

X : pd.DataFrame or np.ndarray, shape=(n_samples, n_features)

The operation is applied to a copy of X, and the result set is returned.

Examples using skoot.preprocessing.DateTransformer