Find Duplicates in Array in O(n) Time and O(n) Space using C++


Given an array of integers, return a new array containing only the duplicates.

The resulting array's values can be stored in any order.

Example:

Input: [2, 3, 1, 1, 4, 3, 2, 1]
Output: [2, 1, 3]
			

Note:

Your algorithm should run in O(n) time and use O(n) extra space.


Understanding the Problem

The core challenge of this problem is to identify all the duplicate elements in the array efficiently. This problem is significant in various applications such as data cleaning, where duplicates need to be identified and handled. A common pitfall is to use a nested loop approach, which results in O(n^2) time complexity, making it inefficient for large arrays.

Approach

To solve this problem efficiently, we can use a hash map (unordered_map in C++) to keep track of the frequency of each element. This allows us to identify duplicates in O(n) time. Here’s a step-by-step approach:

  1. Initialize an unordered_map to store the frequency of each element.
  2. Traverse the array and update the frequency of each element in the map.
  3. Traverse the map and collect elements that have a frequency greater than 1.

Let's discuss a naive approach first and then move to the optimized solution.

Naive Approach

The naive approach involves using two nested loops to compare each element with every other element, resulting in O(n^2) time complexity. This is not optimal for large arrays.

Optimized Approach

Using a hash map, we can achieve the desired O(n) time complexity. The hash map allows us to count the occurrences of each element in a single pass and then identify duplicates in another pass.

Algorithm

Here is a step-by-step breakdown of the optimized algorithm:

  1. Initialize an unordered_map to store the frequency of each element.
  2. Traverse the array and for each element, increment its count in the map.
  3. Initialize a result vector to store the duplicates.
  4. Traverse the map and add elements with a count greater than 1 to the result vector.

Code Implementation

#include <iostream>
#include <vector>
#include <unordered_map>

std::vector<int> findDuplicates(const std::vector<int>& nums) {
    // Step 1: Initialize an unordered_map to store the frequency of each element
    std::unordered_map<int, int> frequency;
    
    // Step 2: Traverse the array and update the frequency of each element
    for (int num : nums) {
        frequency[num]++;
    }
    
    // Step 3: Initialize a result vector to store the duplicates
    std::vector<int> result;
    
    // Step 4: Traverse the map and collect elements with a frequency greater than 1
    for (const auto& pair : frequency) {
        if (pair.second > 1) {
            result.push_back(pair.first);
        }
    }
    
    return result;
}

int main() {
    std::vector<int> nums = {2, 3, 1, 1, 4, 3, 2, 1};
    std::vector<int> duplicates = findDuplicates(nums);
    
    std::cout << "Duplicates: ";
    for (int num : duplicates) {
        std::cout << num << " ";
    }
    std::cout << std::endl;
    
    return 0;
}

Complexity Analysis

The time complexity of this approach is O(n) because we traverse the array once to build the frequency map and then traverse the map to collect duplicates. The space complexity is also O(n) due to the additional storage used by the hash map and the result vector.

Edge Cases

Consider the following edge cases:

Example:

Input: [1, 1, 1, 1]
Output: [1]

Testing

To test the solution comprehensively, consider the following test cases:

Thinking and Problem-Solving Tips

When approaching such problems, consider the following tips:

Conclusion

In this blog post, we discussed how to find duplicates in an array efficiently using a hash map. We covered the problem definition, approach, algorithm, code implementation, complexity analysis, edge cases, and testing. Understanding and solving such problems is crucial for improving problem-solving skills and preparing for coding interviews.

Additional Resources

For further reading and practice, consider the following resources: