DevOps for Data Science – Defining DevOps

I’m wading into treacherous waters here. Computing terms often defy explanation, especially newer ones. While “DevOps” or Developer Operations has been around for a while, it’s still not as mature a term as, say, “Relational Database Management System (RDBMS)”. That term is well known, understood, and accepted. (It wasn’t when it came out). Whatever definition I give you here will be contested – and I’m OK with that. Nothing brings out a good flame-war like defining a new technical term.  

Regardless of the danger, we have to define the terms we’re using. Andrew Shafer and Patrick Debois  used the term first, from what I can tell,  in 2008 at a conference on Agile – Agile being a newer term as well.  They posited in their talk the breaking down of barriers between developers, operations, and other departments. Since then, the term DevOps has come to mean much more.  

First, think about getting software in a user’s hands (or another system’s…er, hands). Thinking sequentially, the process looks something like this:

 Design -> Infrastructure setup -> Code -> Build -> Test -> Package -> Release -> Monitor

 With a few exceptions, that’s how software is done. Data Science is somewhere in there during the Code phase, usually. And in most cases, there are clearly defined boundaries for what gets done by whom. For instance, developers write the code after the business sends over requirements. The deployment team handles packaging and releasing. And the operations team (Ops) handles monitoring and updating. Maybe it’s a little different in your organization, but in general each team has an area they are responsible for. And that's mostly all they focus on.

We’re all busy. I barely have enough time in my day to write code and the commensurate documentation, much less think about other parts of the process.

But we have to. Imagine if Equifax, as the business owners were requesting the software to be written, had said “And remember, we need to build right in to the software things that require the right security to be in place. And let’s make sure we have a plan for when things go wrong.” Imagine if the developers had included a patch-check for the frameworks they use to ensure everything was up to date. Imagine if the Ops team cared that proper security testing be done way back in the development stage.

And that’s my definition of DevOps. At its simplest, DevOps is including all parties involved in getting an application deployed and maintained to think about all the phases that follow  and precede their part of the solution. That means the developer needs to care about monitoring. Business owners need to care about security. Deployment teams need to care about testing. And everyone needs to talk, build the process into their tools, and follow processes that involve all phases of the release and maintenance of software solutions.

That also means DevOps isn’t a tool, or even a team – it’s a thought process. Sure, there are tools and teams that help implement it, but if only a few people are part of DevOps, then you don’t have DevOps.

In this series I’ll cover more about the intersection of DevOps and Data Science, and in particular the things you need to be careful about in implementing DevOps for Data Science. Use the references below to inform yourself, as a Data Scientist, what DevOps is. I’ll show you how to integrate it into your projects as we go

(Or just go with my definition for now - and ready yourself for the flaming)

Comments (1)
  1. Terry McCann says:

    Applying DevOps (or if you really want to start a flame war, DataOps) to machine learning is a personal interest of mine and the focus of my Masters thesis. I am interested in how your blog series will develop. I agree that DevOps is as much culture as it is anything else, however you need the tools to make the DevOps process work. Does Microsoft have those tools today? For traditional software development, absolutely! For machine learning? They have some, but not all in my opinion (that is not to say other vendors do). There are problems in machine learning which are might look like traditional software development, however once you start trying to manage models in production you quickly find machine learning is not traditional software development. I would love to chat more at some point if you’re interested.

Comments are closed.

Skip to main content