DevOps for Data Science – Defining DevOps


I’m wading into treacherous waters here. Computing terms often defy explanation, especially newer ones. While “DevOps” or Developer Operations has been around for a while, it’s still not as mature a term as, say, “Relational Database Management System (RDBMS)”. That term is well known, understood, and accepted. (It wasn’t when it came out). Whatever definition I give you here will be contested – and I’m OK with that. Nothing brings out a good flame-war like defining a new technical term.  

Regardless of the danger, we have to define the terms we’re using. Andrew Shafer and Patrick Debois  used the term first, from what I can tell,  in 2008 at a conference on Agile – Agile being a newer term as well.  They posited in their talk the breaking down of barriers between developers, operations, and other departments. Since then, the term DevOps has come to mean much more.  

First, think about getting software in a user’s hands (or another system’s…er, hands). Thinking sequentially, the process looks something like this:

 Design -> Infrastructure setup -> Code -> Build -> Test -> Package -> Release -> Monitor

 With a few exceptions, that’s how software is done. Data Science is somewhere in there during the Code phase, usually. And in most cases, there are clearly defined boundaries for what gets done by whom. For instance, developers write the code after the business sends over requirements. The deployment team handles packaging and releasing. And the operations team (Ops) handles monitoring and updating. Maybe it’s a little different in your organization, but in general each team has an area they are responsible for. And that's mostly all they focus on.

We’re all busy. I barely have enough time in my day to write code and the commensurate documentation, much less think about other parts of the process.

But we have to. Imagine if Equifax, as the business owners were requesting the software to be written, had said “And remember, we need to build right in to the software things that require the right security to be in place. And let’s make sure we have a plan for when things go wrong.” Imagine if the developers had included a patch-check for the frameworks they use to ensure everything was up to date. Imagine if the Ops team cared that proper security testing be done way back in the development stage.

And that’s my definition of DevOps. At its simplest, DevOps is including all parties involved in getting an application deployed and maintained to think about all the phases that follow  and precede their part of the solution. That means the developer needs to care about monitoring. Business owners need to care about security. Deployment teams need to care about testing. And everyone needs to talk, build the process into their tools, and follow processes that involve all phases of the release and maintenance of software solutions.

That also means DevOps isn’t a tool, or even a team – it’s a thought process. Sure, there are tools and teams that help implement it, but if only a few people are part of DevOps, then you don’t have DevOps.

In this series I’ll cover more about the intersection of DevOps and Data Science, and in particular the things you need to be careful about in implementing DevOps for Data Science. Use the references below to inform yourself, as a Data Scientist, what DevOps is. I’ll show you how to integrate it into your projects as we go

(Or just go with my definition for now - and ready yourself for the flaming)

Comments (0)

Skip to main content