If you’ve been in software development for any length of time, you’ve likely heard the mantra “Don’t Repeat Yourself” (DRY) repeated so often that it’s become almost sacred. The principle suggests that “every piece of knowledge must have a single, unambiguous, authoritative representation within a system.” This principle has guided countless developers to create more maintainable, less error prone codebases.

But like all principles in software development, applying DRY without context or nuance can lead to problems that are worse than the duplication it was meant to solve. In this article, we’ll explore why code duplication isn’t always the cardinal sin it’s made out to be, and when it might actually be the better choice.

The Traditional Case Against Duplication

Before we dive into when duplication might be acceptable or even preferable, let’s understand why the DRY principle became so fundamental in the first place.

Maintenance Nightmares

When the same code exists in multiple places, any change or bug fix must be applied to all instances. Miss one, and you’ve introduced an inconsistency that can lead to subtle, hard to track bugs.

Consider this simple example: you have a function that validates email addresses used in three different parts of your application:

// In user registration
function validateEmail(email) {
    const regex = /^[^\s@]+@[^\s@]+\.[^\s@]+$/;
    return regex.test(email);
}

// In password reset
function checkEmailFormat(email) {
    const regex = /^[^\s@]+@[^\s@]+\.[^\s@]+$/;
    return regex.test(email);
}

// In contact form
function isValidEmail(email) {
    const regex = /^[^\s@]+@[^\s@]+\.[^\s@]+$/;
    return regex.test(email);
}

If you discover that your regex doesn’t handle certain valid email formats correctly, you now need to update three different functions. Miss one, and users will have inconsistent experiences depending on which part of your application they’re using.

Cognitive Load

Code duplication increases the amount of code a developer needs to understand. More code means more cognitive load, which can slow down development and increase the likelihood of errors.

Increased Testing Burden

Duplicated code means duplicated tests, or worse, untested code paths because developers assumed they were identical to tested paths elsewhere.

When Duplication Might Be Better Than the Alternative

Despite these compelling reasons to avoid duplication, there are situations where duplicating code can be the more pragmatic choice. Let’s explore when and why.

When Premature Abstraction Leads to Complexity

One of the most common problems with overzealous application of DRY is premature abstraction. Developers see two pieces of similar code and immediately try to unify them into a shared abstraction. But what if those two pieces of code, while syntactically similar today, represent different concepts that will evolve differently over time?

Consider this scenario: you have two functions that both retrieve user data, but for different purposes:

function getUserForAuthentication(userId) {
    return database.query(`
        SELECT id, username, password_hash, login_attempts
        FROM users
        WHERE id = ${userId}
    `);
}

function getUserForProfileDisplay(userId) {
    return database.query(`
        SELECT id, username, full_name, bio, avatar_url
        FROM users
        WHERE id = ${userId}
    `);
}

A developer focused on DRY might be tempted to combine these into a single function:

function getUser(userId, purpose) {
    let fields;
    if (purpose === 'authentication') {
        fields = 'id, username, password_hash, login_attempts';
    } else if (purpose === 'profile') {
        fields = 'id, username, full_name, bio, avatar_url';
    }
    
    return database.query(`
        SELECT ${fields}
        FROM users
        WHERE id = ${userId}
    `);
}

This might seem cleaner at first glance, but it creates several problems:

  1. The function now has multiple responsibilities, violating the Single Responsibility Principle
  2. It introduces a new parameter that makes the function’s behavior less predictable
  3. Future changes to either use case will affect a shared function, increasing the risk of regressions
  4. The abstraction leaks implementation details (the caller needs to know about “purposes”)

In this case, keeping the functions separate with some duplication would be cleaner and safer than forcing an abstraction.

When the Cost of Coupling Exceeds the Benefit

Removing duplication often involves creating shared code that couples previously independent components. This coupling can create dependencies that make the system harder to change over time.

Consider a utility function shared between two modules:

// Shared utility
function formatCurrency(amount, currency) {
    return `${currency}${amount.toFixed(2)}`;
}

// Module A
function generateInvoice(items) {
    const total = items.reduce((sum, item) => sum + item.price, 0);
    return {
        items: items,
        total: formatCurrency(total, '$')
    };
}

// Module B
function displayProductPrice(product) {
    const element = document.createElement('div');
    element.textContent = formatCurrency(product.price, '$');
    return element;
}

This seems reasonable until Module A needs to change how it formats currency:

// Updated requirement for Module A
function generateInvoice(items) {
    const total = items.reduce((sum, item) => sum + item.price, 0);
    // Now needs to include thousand separators
    return {
        items: items,
        total: formatCurrency(total, '$', true)  // Added parameter
    };
}

Now we need to update the shared function:

function formatCurrency(amount, currency, useThousandSeparator = false) {
    if (useThousandSeparator) {
        return `${currency}${amount.toLocaleString('en-US', {minimumFractionDigits: 2, maximumFractionDigits: 2})}`;
    }
    return `${currency}${amount.toFixed(2)}`;
}

Module B is now affected by a change it didn’t need, and the shared function is becoming more complex with conditional logic. In some cases, having separate implementations might be cleaner:

// Module A
function formatInvoiceCurrency(amount) {
    return `$${amount.toLocaleString('en-US', {minimumFractionDigits: 2, maximumFractionDigits: 2})}`;
}

// Module B
function formatProductPrice(amount) {
    return `$${amount.toFixed(2)}`;
}

Yes, there’s duplication, but each module can evolve independently without affecting the other.

When Duplication Improves Readability

Sometimes, a little duplication can make code more readable and self contained. This is particularly true for small pieces of code where the overhead of abstraction would make the code less clear.

// With duplication
function processUser(user) {
    // Capitalize the name for display
    const displayName = user.name.charAt(0).toUpperCase() + user.name.slice(1);
    
    // Rest of the function
}

function formatComment(comment) {
    // Capitalize the author name
    const authorName = comment.author.charAt(0).toUpperCase() + comment.author.slice(1);
    
    // Rest of the function
}

In this example, extracting the capitalization logic into a separate function might actually make the code less readable:

function capitalize(string) {
    return string.charAt(0).toUpperCase() + string.slice(1);
}

function processUser(user) {
    const displayName = capitalize(user.name);
    // Rest of the function
}

function formatComment(comment) {
    const authorName = capitalize(comment.author);
    // Rest of the function
}

For something as simple as capitalization, having the logic inline makes the function more self contained and easier to understand at a glance, without needing to jump to another function definition.

When Duplication Is Temporary

Sometimes, duplication is a necessary step in the evolution of a codebase. When you first encounter similar code patterns, it might not be immediately clear what the right abstraction should be. In such cases, it can be better to allow some duplication until a clear pattern emerges.

Kent Beck, creator of Extreme Programming, has a relevant quote: “Make it work, make it right, make it fast.” In the context of duplication, we might say: “Make it work (possibly with duplication), make it right (refactor when the pattern is clear), make it fast (optimize if needed).”

Martin Fowler calls this approach “Rule of Three” – wait until you have three instances of similar code before abstracting, as that gives you more information about what the abstraction should look like.

The WET Principle: Write Everything Twice

As a counterpoint to DRY, some developers advocate for the “Write Everything Twice” (WET) principle. This doesn’t mean you should intentionally duplicate code, but rather that you shouldn’t rush to abstract until you’ve seen a pattern repeated enough times to understand the proper abstraction.

The idea is that the second time you write similar code, you’ll have more context about how it’s used, which will inform a better abstraction if you need to write it a third time.

Practical Guidelines for Managing Duplication

Given that duplication isn’t always bad, how should you decide when to eliminate it and when to tolerate it? Here are some practical guidelines:

Consider the “Three Strikes” Rule

As mentioned earlier, consider waiting until you’ve seen the same pattern three times before abstracting. This gives you more information about what varies and what stays the same across different uses.

Evaluate the Type of Knowledge Being Duplicated

Not all duplication is created equal. Consider what type of knowledge is being duplicated:

Consider Future Change Patterns

Ask yourself: “Are these similar pieces of code likely to change together, or evolve separately?” If they’re likely to change together, eliminating duplication is more important. If they’re likely to evolve independently, keeping them separate might be better.

Balance Abstraction vs. Duplication

Remember that abstractions have costs too. They can introduce indirection, increase cognitive load for understanding the codebase, and create coupling. The benefits of eliminating duplication must outweigh these costs.

Use Comments to Highlight Intentional Duplication

If you decide to keep duplication, consider adding a comment explaining why. This prevents future developers (including your future self) from attempting to “fix” the duplication without understanding the rationale.

// Note: This is intentionally similar to the validation in UserController.
// We're keeping it separate because the validation rules for comments
// may diverge from user input validation in the future.
function validateCommentText(text) {
    return text.length > 0 && text.length <= 1000;
}

Real World Examples of "Good" Duplication

Let's look at some real world scenarios where duplication might be the better choice:

Microservices with Similar Functionality

In a microservices architecture, services are meant to be independent and deployable on their own. While you might have similar validation logic in multiple services, extracting this to a shared library creates coupling between services that might outweigh the benefits of deduplication.

For example, both an Order Service and a User Service might need to validate email addresses. Duplicating the validation code keeps the services truly independent, allowing them to evolve at different rates without coordinating library updates.

Feature Branches That Will Diverge

Sometimes you create similar features for different user segments that look almost identical initially but will evolve differently. For instance, checkout flows for retail customers versus business customers might start very similar but accumulate different business rules over time.

Keeping these implementations separate from the beginning, despite initial duplication, can prevent complex conditional logic and make the divergent evolution easier to manage.

Test Code

Test code often contains duplication in setup and assertions. While test helpers can reduce this, sometimes duplicating setup code in each test makes tests more readable and self contained, as each test clearly shows all its preconditions without requiring the reader to look elsewhere.

// More duplication, but each test is self-contained
test('user can view their own profile', () => {
    const user = createUser({ name: 'Test User' });
    loginAs(user);
    const response = visitProfilePage(user.id);
    expect(response.status).toBe(200);
});

test('user cannot view another user\'s profile', () => {
    const user1 = createUser({ name: 'Test User 1' });
    const user2 = createUser({ name: 'Test User 2' });
    loginAs(user1);
    const response = visitProfilePage(user2.id);
    expect(response.status).toBe(403);
});

Generated Code

Code that's generated by tools often contains duplication. Since it's not maintained by hand, the usual downsides of duplication don't apply. Examples include code generated by ORMs, protocol buffers, or GraphQL type generators.

The Evolution of Thinking on Duplication

Software development thinking on duplication has evolved over time. Early on, when storage and memory were at a premium, eliminating duplication was partly motivated by efficiency concerns. As hardware constraints relaxed, the focus shifted to maintenance benefits.

More recently, nuanced views have emerged, recognizing that sometimes the pursuit of DRY can lead to premature or inappropriate abstractions that cause more harm than good. This evolution reflects a broader trend in software development toward pragmatism over dogmatism.

The Sandi Metz Perspective

Sandi Metz, author of "Practical Object-Oriented Design in Ruby," has a famous quote about duplication: "Duplication is far cheaper than the wrong abstraction."

Her point is that while duplication has a straightforward cost (you need to change code in multiple places), the wrong abstraction creates ongoing costs that compound over time, as developers try to bend and extend an abstraction that doesn't quite fit the problem.

According to Metz, if you're faced with an abstraction that no longer serves its purpose well, it's often better to "inline" the abstraction back to duplicated code, then create a new, more appropriate abstraction based on the current understanding of the problem.

Duplication in Different Programming Paradigms

How we think about duplication can vary based on the programming paradigm:

Object Oriented Programming

In OOP, duplication is often addressed through inheritance, composition, and polymorphism. The risk is creating deep inheritance hierarchies or complex object graphs that are hard to understand and maintain.

Functional Programming

Functional programming tends to favor composition of small, pure functions. This can sometimes lead to less duplication naturally, as functions are designed to be reusable by default. However, the pursuit of purity can sometimes lead to complex abstractions that are harder to understand than simpler, slightly duplicated code.

Procedural Programming

In procedural programming, duplication is typically addressed through shared procedures and functions. The challenge can be managing global state that these shared procedures might depend on.

Tools and Techniques for Managing Necessary Duplication

When you do decide that some duplication is necessary, there are tools and techniques to help manage it:

Code Generation

For some types of duplication, code generation can be a solution. Rather than writing duplicate code by hand, you generate it from a single source of truth. This approach is common in interface definitions, database schemas, and cross-language compatibility layers.

Explicit Documentation

When code is intentionally duplicated, document it. Comments explaining why the duplication exists and which other parts of the code contain similar logic can help future maintainers understand the design decisions.

Code Ownership

In some organizations, making specific teams or individuals responsible for related areas of code can help manage necessary duplication. The owners can coordinate changes across duplicated areas when needed.

Automated Checks

Tools like linters and static analyzers can be configured to detect and flag duplication that exceeds certain thresholds, prompting a review of whether the duplication is justified.

Case Study: React Components

Frontend development with React provides an interesting case study in the balance between duplication and abstraction.

Consider two similar but not identical UI components:

function UserCard({ user }) {
    return (
        <div className="card">
            <img src={user.avatar} alt={user.name} />
            <h3>{user.name}</h3>
            <p>{user.bio}</p>
            <button onClick={() => viewProfile(user.id)}>View Profile</button>
        </div>
    );
}

function ProductCard({ product }) {
    return (
        <div className="card">
            <img src={product.image} alt={product.name} />
            <h3>{product.name}</h3>
            <p>{product.description}</p>
            <button onClick={() => addToCart(product.id)}>Add to Cart</button>
        </div>
    );
}

These components have similar structure but different data and behavior. A DRY-focused approach might try to create a generic Card component:

function Card({ image, imageAlt, title, description, buttonText, onButtonClick }) {
    return (
        <div className="card">
            <img src={image} alt={imageAlt} />
            <h3>{title}</h3>
            <p>{description}</p>
            <button onClick={onButtonClick}>{buttonText}</button>
        </div>
    );
}

function UserCard({ user }) {
    return (
        <Card
            image={user.avatar}
            imageAlt={user.name}
            title={user.name}
            description={user.bio}
            buttonText="View Profile"
            onButtonClick={() => viewProfile(user.id)}
        />
    );
}

function ProductCard({ product }) {
    return (
        <Card
            image={product.image}
            imageAlt={product.name}
            title={product.name}
            description={product.description}
            buttonText="Add to Cart"
            onButtonClick={() => addToCart(product.id)}
        />
    );
}

This looks cleaner initially, but what happens when requirements change? Perhaps the UserCard needs to show the user's role, or the ProductCard needs to display the price and availability. The generic Card component would need to become increasingly complex to accommodate these differences, eventually defeating its purpose.

In React, it's often better to accept some duplication in component structure rather than creating overly generic components that try to handle too many cases. This is partly why "composition over props" is a common React pattern - it allows for reuse of behavior without forcing a one-size-fits-all structure.

Conclusion: Pragmatic Duplication

The DRY principle remains valuable, but like all principles in software development, it needs to be applied with judgment rather than dogmatically. Code duplication isn't always bad - sometimes it's the lesser evil compared to inappropriate abstractions, tight coupling, or overly complex code.

Here's a summary of when duplication might be acceptable or even preferable:

  1. When the duplicated code represents concepts that are likely to evolve differently
  2. When removing duplication would create coupling that limits future changes
  3. When the duplicated code is simple and the abstraction would be more complex
  4. When you don't yet have enough examples to see the right abstraction
  5. When the duplication exists in areas that change infrequently

The next time you encounter duplication in your codebase, resist the immediate urge to abstract it away. Instead, ask yourself whether the duplication is truly harmful, or whether it might actually be serving a purpose in keeping your code flexible, readable, and maintainable.

Remember Sandi Metz's wisdom: "Duplication is far cheaper than the wrong abstraction." Sometimes, a little duplication is the right choice for the long-term health of your codebase.

Further Reading