Date transformerΒΆ

Demonstrates how to automatically transform string date-representation fields into datetime type fields.


Out:

Applied to cols ['b', 'c'] with inferred format:
   a  ...          c
0  1  ... 2018-06-01
1  2  ... 2018-06-02
2  3  ... 2018-06-03
3  4  ... 2018-06-04
4  5  ...        NaT

[5 rows x 3 columns]

Applied to cols ['b', 'c'] with specified format:
   a  ...          c
0  1  ... 2018-06-01
1  2  ... 2018-06-02
2  3  ... 2018-06-03
3  4  ... 2018-06-04
4  5  ...        NaT

[5 rows x 3 columns]

Applied to cols ['a', 'b', 'c'] with inferred format:
                              a  ...          c
0 1970-01-01 00:00:00.000000001  ... 2018-06-01
1 1970-01-01 00:00:00.000000002  ... 2018-06-02
2 1970-01-01 00:00:00.000000003  ... 2018-06-03
3 1970-01-01 00:00:00.000000004  ... 2018-06-04
4 1970-01-01 00:00:00.000000005  ...        NaT

[5 rows x 3 columns]

print(__doc__)

# Author: Taylor Smith <taylor.smith@alkaline-ml.com>

from skoot.preprocessing import DateTransformer
import pandas as pd
from datetime import datetime as dt

# #############################################################################
# create data
data = [
    [1, "06/01/2018", dt.strptime("06-01-2018", "%m-%d-%Y")],
    [2, "06/02/2018", dt.strptime("06-02-2018", "%m-%d-%Y")],
    [3, "06/03/2018", dt.strptime("06-03-2018", "%m-%d-%Y")],
    [4, None, dt.strptime("06-04-2018", "%m-%d-%Y")],
    [5, "06/05/2018", None]
]

df = pd.DataFrame.from_records(data, columns=["a", "b", "c"])

# the date transformer will automatically handle existing datetime fields
# and infer the format of string datetime fields:
print("Applied to cols ['b', 'c'] with inferred format:")
print(DateTransformer(cols=['b', 'c']).fit_transform(df))

# we can also supply the format, if desired:
print("\nApplied to cols ['b', 'c'] with specified format:")
print(DateTransformer(cols=['b', 'c'],
                      date_format="%m/%d/%Y").fit_transform(df))

# Finally, if we wanted to apply the transformer to int types, we can
# add this to the permitted types
allowed = DateTransformer.DEFAULT_PERMITTED_DTYPES + ("int64",)
print("\nApplied to cols ['a', 'b', 'c'] with inferred format:")
print(DateTransformer(cols=['a', 'b', 'c'],
                      allowed_types=allowed).fit_transform(df))

Total running time of the script: ( 0 minutes 0.032 seconds)

Gallery generated by Sphinx-Gallery