skoot.feature_extraction.TimeDeltaFeatures

class skoot.feature_extraction.TimeDeltaFeatures(cols=None, as_df=True, units='days', sep='_', astype=<class 'float'>, absolute_difference=False, name_suffix='delta')[source][source]

Compute the time lapse between timestamp events.

A transformer to compute time deltas between different date features. This can be useful, for instance, when the target is temporally sensitive to the lapse in time between certain events.

This class will combinatorially calculate the deltas between features, expanding the dimensionality by \({N \choose 2}\), where \(N\) is the number of features included in cols. Note that prescribed column order does matter in this transformer, as deltas are computed from left to right:

['a', 'b', 'c'] -> ['a_b_delta', 'a_c_delta', 'b_c_delta']
Parameters:

cols : array-like, shape=(n_features,), optional (default=None)

The names of the columns on which to apply the transformation. Will apply to all columns if None specified. Note that in this class, the columns applied-to must be DateTime types or this will raise a ValueError.

as_df : bool, optional (default=True)

Whether to return a Pandas DataFrame in the transform method. If False, will return a Numpy ndarray instead.

units : str or unicode, optional (default=’days’)

The unit of time to compute between events. One of (‘seconds’, ‘minutes’, ‘hours’, ‘days’).

sep : str or unicode (optional, default=”_”)

The separator between the new feature names. The names will be in the form of:

<left><sep><right><sep><suffix>

For examples, for columns ‘a’ and ‘b’, sep="_" and name_suffix="delta", the new column name would be:

a_b_delta

astype : type, optional (default=float)

The type to which to coerce the time deltas.

absolute_difference : bool, optional (default=False)

Whether to compute the absolute difference between dates. If False, the order of cols will matter, as that defines the subtractive order. (right-most columns will be subtracted from the left combinatorially)

name_suffix : str, optional (default=’delta’)

The suffix to add to the new feature name in the form of:

<feature_x>_<feature_y>_<suffix>

See sep for more details about how new column names are formed.

Notes

  • Unlike the DateFactorizer class, this transformer does not remove the original date features after extracting the new features.
  • Column deltas are computed from left to right. This means that the order in which columns are defined in cols does matter.

Examples

>>> import pandas as pd
>>> from datetime import datetime as dt
>>> stp = dt.strptime
>>> data = [
...     [1, stp("06-01-2018", "%m-%d-%Y"), stp("06-02-2018", "%m-%d-%Y")],
...     [2, stp("06-02-2018", "%m-%d-%Y"), stp("06-03-2018", "%m-%d-%Y")],
...     [3, stp("06-03-2018", "%m-%d-%Y"), stp("06-04-2018", "%m-%d-%Y")],
...     [4, stp("06-04-2018", "%m-%d-%Y"), stp("06-05-2018", "%m-%d-%Y")],
...     [5, None, stp("06-04-2018", "%m-%d-%Y")]
... ]
>>> df = pd.DataFrame.from_records(data, columns=['a', 'b', 'c'])
>>> tdf = TimeDeltaFeatures(cols=['b', 'c'], units='hours')
>>> tdf.fit_transform(df)
   a          b          c  b_c_delta
0  1 2018-06-01 2018-06-02      -24.0
1  2 2018-06-02 2018-06-03      -24.0
2  3 2018-06-03 2018-06-04      -24.0
3  4 2018-06-04 2018-06-05      -24.0
4  5        NaT 2018-06-04        NaN

Notice that column order makes a difference. If ‘c’ is defined before ‘b’, the delta is positive:

>>> TimeDeltaFeatures(cols=['c', 'b'], units='hours').fit_transform(df)
   a          b          c  c_b_delta
0  1 2018-06-01 2018-06-02       24.0
1  2 2018-06-02 2018-06-03       24.0
2  3 2018-06-03 2018-06-04       24.0
3  4 2018-06-04 2018-06-05       24.0
4  5        NaT 2018-06-04        NaN

Methods

fit(X[, y]) Fit the time-between transformer.
fit_transform(X[, y]) Fit the estimator and apply the date factorization to a dataframe.
get_params([deep]) Get parameters for this estimator.
set_params(**params) Set the parameters of this estimator.
transform(X) Apply the date transformation to a dataframe.
__init__(cols=None, as_df=True, units='days', sep='_', astype=<class 'float'>, absolute_difference=False, name_suffix='delta')[source][source]

Initialize self. See help(type(self)) for accurate signature.

fit(X, y=None)[source][source]

Fit the time-between transformer.

This is a tricky class because the “fit” isn’t super necessary… But we use it as a validation stage to ensure the defined cols genuinely are datetime columns. That’s the only reason this all happens in the fit portion.

Parameters:

X : pd.DataFrame, shape=(n_samples, n_features)

The Pandas frame to fit. The frame will only be fit on the prescribed cols (see __init__) or all of them if cols is None.

y : array-like or None, shape=(n_samples,), optional (default=None)

Pass-through for sklearn.pipeline.Pipeline.

fit_transform(X, y=None, **kwargs)[source][source]

Fit the estimator and apply the date factorization to a dataframe.

This is a tricky class because the “fit” isn’t super necessary… But we use it as a validation stage to ensure the defined cols genuinely are datetime types. That’s the only reason this all happens in the fit portion.

Parameters:

X : pd.DataFrame, shape=(n_samples, n_features)

The Pandas frame to fit. The operation will be applied to a copy of the input data, and the result will be returned.

y : array-like or None, shape=(n_samples,), optional (default=None)

Pass-through for sklearn.pipeline.Pipeline.

Returns:

X : pd.DataFrame or np.ndarray, shape=(n_samples, n_features)

The operation is applied to a copy of X, and the result set is returned.

get_params(deep=True)[source]

Get parameters for this estimator.

Parameters:

deep : boolean, optional

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params : mapping of string to any

Parameter names mapped to their values.

set_params(**params)[source]

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns:self
transform(X)[source][source]

Apply the date transformation to a dataframe.

This method will compute the deltas between provided datetime features.

Parameters:

X : pd.DataFrame, shape=(n_samples, n_features)

The Pandas frame to transform. The operation will be applied to a copy of the input data, and the result will be returned.

Returns:

X : pd.DataFrame or np.ndarray, shape=(n_samples, n_features)

The operation is applied to a copy of X, and the result set is returned.