Date transformerΒΆ
Demonstrates how to automatically transform string date-representation fields into datetime type fields.
Out:
Applied to cols ['b', 'c'] with inferred format:
   a  ...          c
0  1  ... 2018-06-01
1  2  ... 2018-06-02
2  3  ... 2018-06-03
3  4  ... 2018-06-04
4  5  ...        NaT
[5 rows x 3 columns]
Applied to cols ['b', 'c'] with specified format:
   a  ...          c
0  1  ... 2018-06-01
1  2  ... 2018-06-02
2  3  ... 2018-06-03
3  4  ... 2018-06-04
4  5  ...        NaT
[5 rows x 3 columns]
Applied to cols ['a', 'b', 'c'] with inferred format:
                              a  ...          c
0 1970-01-01 00:00:00.000000001  ... 2018-06-01
1 1970-01-01 00:00:00.000000002  ... 2018-06-02
2 1970-01-01 00:00:00.000000003  ... 2018-06-03
3 1970-01-01 00:00:00.000000004  ... 2018-06-04
4 1970-01-01 00:00:00.000000005  ...        NaT
[5 rows x 3 columns]
print(__doc__)
# Author: Taylor Smith <taylor.smith@alkaline-ml.com>
from skoot.preprocessing import DateTransformer
import pandas as pd
from datetime import datetime as dt
# #############################################################################
# create data
data = [
    [1, "06/01/2018", dt.strptime("06-01-2018", "%m-%d-%Y")],
    [2, "06/02/2018", dt.strptime("06-02-2018", "%m-%d-%Y")],
    [3, "06/03/2018", dt.strptime("06-03-2018", "%m-%d-%Y")],
    [4, None, dt.strptime("06-04-2018", "%m-%d-%Y")],
    [5, "06/05/2018", None]
]
df = pd.DataFrame.from_records(data, columns=["a", "b", "c"])
# the date transformer will automatically handle existing datetime fields
# and infer the format of string datetime fields:
print("Applied to cols ['b', 'c'] with inferred format:")
print(DateTransformer(cols=['b', 'c']).fit_transform(df))
# we can also supply the format, if desired:
print("\nApplied to cols ['b', 'c'] with specified format:")
print(DateTransformer(cols=['b', 'c'],
                      date_format="%m/%d/%Y").fit_transform(df))
# Finally, if we wanted to apply the transformer to int types, we can
# add this to the permitted types
allowed = DateTransformer.DEFAULT_PERMITTED_DTYPES + ("int64",)
print("\nApplied to cols ['a', 'b', 'c'] with inferred format:")
print(DateTransformer(cols=['a', 'b', 'c'],
                      allowed_types=allowed).fit_transform(df))
Total running time of the script: ( 0 minutes 0.032 seconds)