Your feature flag system has 347 toggles. Twelve engineers can explain what they do. None of them work here anymore.

Feature flags start as a deployment safety net—ship code dark, enable progressively, roll back instantly. Six months later, you’re debugging why the checkout flow behaves differently for users whose account IDs are divisible by seven, and the flag controlling it references a Jira ticket from 2023 that nobody can access.

The problem isn’t feature flags. The problem is treating them like configuration instead of code with a lifecycle.

The Four Flag Types You Actually Need

Most flag systems become garbage dumps because teams don’t differentiate between fundamentally different toggle types. Each type has different longevity, different ownership, and different cleanup expectations.

Release flags wrap incomplete features. They exist to decouple deployment from release. Expected lifespan: days to weeks. These should be the most common and the most aggressively cleaned up.

Experiment flags support A/B tests and gradual rollouts. They live until statistical significance or full rollout. Expected lifespan: weeks to months. These require telemetry integration—if you can’t measure the flag’s impact, you can’t decide when to remove it.

Ops flags provide circuit breakers for external dependencies or expensive operations. They’re operational levers, not feature gates. Expected lifespan: indefinite, but reviewed quarterly. These are the only flags that should live forever, and there should be fewer than ten of them.

Permission flags gate features by customer tier or entitlement. Expected lifespan: until the business model changes. These belong in your authorization system, not your feature flag platform.

If you can’t categorize a flag into one of these buckets in five seconds, you don’t understand what it does well enough to have created it.

Enforce Expiration at Creation Time

Every flag should have a mandatory expiration date set when it’s created. Not optional metadata. Not a nice-to-have. A hard requirement that blocks flag creation if missing.

Here’s a LaunchDarkly SDK wrapper that enforces this:

// flag-service.ts
import * as ld from '@launchdarkly/node-server-sdk';

interface FlagMetadata {
  type: 'release' | 'experiment' | 'ops' | 'permission';
  expiresAt: Date;
  owner: string;
  ticket: string;
}

class ManagedFlagService {
  private client: ld.LDClient;
  private metadata: Map;

  async createFlag(key: string, meta: FlagMetadata): Promise {
    if (meta.type === 'release' && this.daysUntil(meta.expiresAt) > 90) {
      throw new Error('Release flags cannot exist longer than 90 days');
    }
    
    if (meta.type === 'experiment' && this.daysUntil(meta.expiresAt) > 180) {
      throw new Error('Experiment flags cannot exist longer than 180 days');
    }
    
    // Store metadata in your database, not just in LaunchDarkly
    await this.storeMetadata(key, meta);
  }

  async checkExpired(): Promise {
    const expired = [];
    for (const [key, meta] of this.metadata) {
      if (new Date() > meta.expiresAt) {
        expired.push(key);
      }
    }
    return expired;
  }

  private daysUntil(date: Date): number {
    return Math.ceil((date.getTime() - Date.now()) / (1000 * 60 * 60 * 24));
  }
}

Run checkExpired() in CI. Fail the build if any non-ops flags are past their expiration date. Make flag cleanup a deployment blocker, not a backlog grooming exercise that never happens.

Stale Flags Break More Than You Think

Unmaintained flags create four categories of pain, and none of them are obvious until you’re bleeding:

Combinatorial explosion. Five boolean flags create 32 possible states. Twenty flags create over a million. You cannot test a million states. You’re shipping code paths that have never executed. When they do execute—in production, naturally—they fail in ways you never imagined because you never imagined they’d run.

Performance degradation. Every flag evaluation is a conditional branch. Modern feature flag SDKs are fast, but “fast” times a thousand flags times a million requests is not fast. We’ve seen P95 latency improve by 40ms after removing 200 dead flags from a hot path. That’s two full render frames in a 60fps interface.

Onboarding friction. New engineers spend their first month asking “what does this flag do?” If the answer is “probably nothing, but we’re afraid to remove it,” you’ve announced that your codebase is a minefield and nobody knows where the mines are.

Incident complexity. During outages, every flag is a suspect. Stale flags muddy the water. We’ve watched teams waste hours during SEV-1s investigating flags that hadn’t changed state in eighteen months because nobody knew they were inert.

The Flag Cleanup Pipeline

Cleaning up flags cannot be someone’s side project. It must be automated, visible, and enforced.

Step one: Weekly automated reports of flags approaching expiration, sent to the owner specified in metadata. Not to a team channel where it gets ignored. To the individual who created it.

Step two: Two weeks before expiration, the flag is automatically set to 100% rollout (or 0%, depending on default state). If nothing breaks, the flag is doing nothing. If something breaks, you just discovered untested code paths.

Step three: One week before expiration, a pull request is auto-generated to remove the flag and all associated conditionals. The PR is assigned to the owner. If they don’t respond, it’s assigned to their manager.

Step four: On expiration date, the flag is deleted from the feature flag platform. The code still references it, but the platform returns the default value. This will break something if the flag was still meaningful. That’s the point.

This sounds aggressive because it is. The alternative is 347 flags and nobody knows what they do.

Ops Flags Are Different: Treat Them Like Infrastructure

Ops flags—circuit breakers, kill switches, rate limit toggles—don’t expire. They’re operational infrastructure, and they should be managed like it.

Ops flags get quarterly reviews, not expiration dates. Each review answers three questions: Is this flag still necessary? Has it been toggled in the last 90 days? Is the runbook for toggling it up to date?

If a flag hasn’t been toggled in a year, it’s either protecting against a risk that no longer exists or it’s not tested. Either way, remove it or prove it works by testing the failure mode in a staging environment.

What We Actually Do

At Champlin, our flag metadata lives in PostgreSQL, not just in LaunchDarkly’s dashboard. Every flag has an owner, a type, an expiration date, and a link to the originating PR. We run a nightly job that checks for expired flags and posts to Slack with owner mentions. We run a weekly job that attempts to delete flags that have been expired for more than seven days.

We’ve deleted 83 flags in the last six months. We currently have 31 active flags. Four are ops flags. The rest will be gone within 90 days, or we’ll know exactly why they’re not.

Toggle hell isn’t inevitable. It’s a choice you make every time you create a flag without a plan to delete it.

Start Small, Enforce Hard

You don’t need to retrofit this onto 300 existing flags tomorrow. Start with new flags only. Require metadata. Require expiration dates. Enforce them in CI. Let the old flags rot in place if you must, but stop creating new rot.

In six months, your new flags will be cleaned up automatically, and your old flags will be obviously old. At that point, dedicate one sprint to the archaeology. You’ll find that most of those ancient flags can be deleted with zero impact because they’re already effectively dead.

Feature flags are a powerful deployment tool. They’re not a permanent configuration layer. Treat them like temporary scaffolding, not like load-bearing walls, and you’ll never end up in toggle hell wondering why checkout only works on Tuesdays.