How to ensure model obsolescence (part 1)

Fitting your test set and other terrible practices in ML

Throughout my tenure in the field of data science and machine learning, I’ve had the privilege to interview a lot of candidates. Some interviews have gone splendidly; others have gone horrendously. And, to the credit of those who have bombed, sometimes you’re just nervous and trip up, I get it.... [Read More]

Conda envs in Pyspark

3 reasons you should be deploying your Conda environments for your Pyspark jobs

If you’ve only ever tinkered with Hadoop within the context of a sandbox, you may never have encountered one of the inevitabililities of Enterprise-scale distributed computing: different machines have different configurations. Even when synchronized with tools such as Puppet, datanodes in a Hadoop cluster may not be a mirror image... [Read More]