November 8, 2023

Implementing Feature Flags for Safer Deployments

deploymentfeature-flagsrisk-managementtooling

Context

Our deployment process was all-or-nothing. New features went live to all users immediately upon deployment, making rollbacks disruptive and limiting our ability to test in production.

Decision

Implement a feature flag system using LaunchDarkly for gradual rollouts and instant kill switches

Alternatives Considered

Build custom feature flag system

Pros

Full control over implementation
No external dependency
No per-seat licensing costs

Cons

Significant development effort
Need to build targeting, analytics, UI
Maintenance burden on the team

Use LaunchDarkly

Pros

Battle-tested at scale
Rich targeting capabilities
Built-in analytics and experimentation
Good SDK support

Cons

Monthly cost (~$500/month for our scale)
External dependency
Data leaves our infrastructure

Use environment variables

Pros

Simple to implement
No external dependencies

Cons

Requires redeployment to change
No gradual rollout capability
No user targeting

Reasoning

The cost of LaunchDarkly is justified by the development time saved and the risk reduction from gradual rollouts. Building a comparable system in-house would take months and require ongoing maintenance. The ability to instantly disable problematic features without redeployment is invaluable.

The Problem

Our deployment anxiety was high:

Every deploy was a potential incident
Rollbacks required full redeployment (5-10 minutes)
No way to test features with subset of users
Product couldn’t run A/B tests

Feature Flag Strategy

We established patterns for flag usage:

Release Flags: Temporary flags for new features

if (flags.isEnabled('new-checkout-flow', user)) {
  return newCheckoutFlow();
}
return legacyCheckoutFlow();

Ops Flags: Permanent flags for operational control

if (flags.isEnabled('enable-cache', { service: 'api' })) {
  return cachedResponse();
}

Experiment Flags: For A/B testing

const variant = flags.getVariant('pricing-test', user);
return pricingPages[variant];

Rollout Process

New features now follow this process:

Deploy with flag disabled (0%)
Enable for internal users (dogfooding)
Enable for 1% of users, monitor
Gradually increase: 5% → 25% → 50% → 100%
Remove flag after feature is stable

Results

Deployment frequency: 3x increase (less fear)
Incident recovery time: 90% reduction (instant kill switch)
A/B tests run: 12 in first quarter (previously 0)
Developer confidence: Significantly improved

The $500/month cost has paid for itself many times over in reduced incident impact and faster iteration.

Lessons Learned

Flag hygiene matters: We schedule flag cleanup to avoid technical debt
Default to off: New flags should be disabled by default
Document flag purpose: Every flag needs an owner and expiration date