alkaline-ml

Auto-deploying your Sphinx doc on gh-pages

Automating your documentation deployments

Posted on December 23, 2018

The issue [Read More]

Tags: python tutorials cicd gh-pages deployment bear sphinx

How to ensure model obsolescence (part 2)

Data dredging

Posted on August 23, 2018

In the spirit of my last post, I want to continue talking about some common mistakes I see among machine learning practitioners. Last time, we saw how covariate shift can be accidentally introduced by (seemingly harmlessly) applying a fit_transform to your test data. This time, I want to cover an... [Read More]

Tags: python machine-learning scikit-learn best-practices interview-prep tutorials

How to ensure model obsolescence (part 1)

Fitting your test set and other terrible practices in ML

Posted on July 23, 2018

Throughout my tenure in the field of data science and machine learning, I’ve had the privilege to interview a lot of candidates. Some interviews have gone splendidly; others have gone horrendously. And, to the credit of those who have bombed, sometimes you’re just nervous and trip up, I get it.... [Read More]

Tags: python machine-learning scikit-learn best-practices interview-prep tutorials

Conda envs in Pyspark

3 reasons you should be deploying your Conda environments for your Pyspark jobs

Posted on July 2, 2018

If you’ve only ever tinkered with Hadoop within the context of a sandbox, you may never have encountered one of the inevitabililities of Enterprise-scale distributed computing: different machines have different configurations. Even when synchronized with tools such as Puppet, datanodes in a Hadoop cluster may not be a mirror image... [Read More]

Tags: python pyspark tutorials

An intro to dummy encoding with Skoot

Using Skoot to accelerate your ML pre-processing workflow

Posted on June 18, 2018

This post will introduce you to dummy coding in skoot, one of my projects dedicated to helping machine learning practitioners automate as much of their workflow as possible. Those who have worked in the field for a while know that 80 - 90% of a data scientist’s time is spent... [Read More]

Tags: skoot python machine-learning dummy-encoding tutorials