How to implement feature flags and A|B testing

Finally, continuous integration, testing, and delivery are part of your organization’s engineering process. You understand the cloud and big data; you’ve begun to introduce monitoring into your workflows. Your environments are built on IaC (Infrastructure as Code), and reliability seems to be significantly improved. Your engineering teams are getting comfortable with containers.” But, you know you’ve only just begun.

You’ve taken the first big leap, but the road ahead offers even more opportunities, such as:

  • Faster innovation
  • Faster delivery
  • Ability to react to feedback or change
  • Continuously deliver value
  • Ask your customer what they want

One way to keep the pace up, is with the use of feature flags, particularly for A/B testing.

In this first part of the series, we’re going to cover an introduction. In subsequent parts, we’ll focus on “what” and “how” we implemented feature flags and A/B testing.


What are Feature flags and A/B testing?

A 'feature flag' (or Feature Toggle) allows you to turn features (sub-sections) of your application on or off at a moment’s notice. (Read more about feature flags here: Feature flags, Toggles, Controls .)

image

It’s that simple. With a simple change, a 'feature flag' enables you to transform your delivery processes to drive customer feedback, test new mechanisms with less engineering impact, and release software faster, with less risk and greater control of who has access to a feature.

A/B testing (sometimes called split testing),can be viewed as an experiment led by a hypothesis. You compare different versions by showing variants (let's call them A and B) to different users with common attributes, to determine which one performs better. For example, a stakeholder could ask:

  • Which one of two navigation pages will result in a higher user satisfaction?
  • Do users prefer to have the navigation pane on the right (A) or left (B)?

Run A/B testing in production using Feature flags to test the hypothesis. As shown, 80% of the users prefer the feature with the navigation pane on the left. Option (B) is more popular and wins!

image


Why should we care about FF?

Feature flags give you the power to reduce risk, iterate quicker, and get more control by separating feature rollout from code deployment.

Capabilities you can achieve with feature flag driven development:

  • Separation of feature rollout from code deployment
    • Spend less time addressing merge conflicts and refactoring old code
    • Spend more time delivering value to your users
  • A/B Testing
    • Get feedback from your users in production using experiments
  • Mitigate Risk
    • Ability to gradually reveal a feature (10%, 20% ….)
    • Remove a feature without the need to re-deploy, rollback or hotfix
    • Use canary releases
  • Iterate more quickly
    • Wrap and deploy features, even if they are “half-baked”
  • Segmentation
    • Turn on a feature to a subset of users
    • Plan management, for example, community, normal, or premium
    • Allow users to opt-in to experience the latest features
    • Test in production
    • Control cultural dependent features
  • Collaboration
    • Branching in code enables teams to work on the code mainline instead of creating separate feature branches.

Here are a few solutions to help you implement your first feature flag (listed in no specific order)


Case studies

Microsoft

Have you ever wondered how the Visual Studio Team Services teams are able to roll out tons of new features every 3 weeks? In this detailed article, Buck Hodges (Director of Engineering for Visual Studio Team Services) shares insight into these topics:

  • goals to decouple deployment and exposure
  • implementing feature flags (with lots of code samples)
  • creating a staged roll-out process

Facebook

A couple of years ago, Facebook came up with an in-house implementation of feature flags. Facebook calls their tool “Gatekeeper” because it controls consumer access to each new feature.

Other examples

Flickr, Twitter, and Instagram


Feature flags come at a cost!

  • Technical debt in code
    Feature flags add complexity to your code. You’ll need a robust engineering process and a mature life-cycle management that follows policies, conventions and cleanup retention.
  • Cultural shift
    FF and A/B testing involves a cultural transition that affects all related parties in the ALM process and beyond.
  • FF at scale
    Feature flags become difficult to manage on an enterprise scale. It’s easy to manage one feature flag by modifying a configuration file. Tracking and synchronizing multiple feature flags can be challenging.
  • Performance and scale in mind
    Poor feature flag implementations can introduce a performance penalty. Consider an in-memory store, such as “Redis”, for your flag’s state and users, instead of configuration files and traditional databases.
  • Building vs. buying
    Companies that have built internal feature flagging tools (e.g. Microsoft, Google, Facebook, and Flickr) have dedicated large teams of engineers and DevOps experts to build and maintain the platform. See building vs. buying to make the right choice for your organization.

Where do we stand with feature flags and A|B testing?

We’ve come to the conclusion that we need to invest in feature flags and A|B testing to be able to react to feedback and change, continuously deliver value, and determine user preferences through controlled experiments.

However, based on our research and experience of rolling a custom solution, we recommend that you explore a SaaS solution. We, for example, cannot afford to develop and especially maintain a custom solution. It’s not a scenario that is suited to our community and volunteer driven program.

Watch this space for part 2 of our A|B testing investigations, and good luck with your DevOps journey!

Resources

Abhishek Tiwari blog, Buck Hodges blog, DZone , Feature Flags, Toggles, Controls, James Mckay blog, LaunchDarkly, Martin flower, and Optimizely.