Find Duplicates in Array in O(n) Time and O(n) Space using Python


Given an array of integers, return a new array containing only the duplicates.

The resulting array's values can be stored in any order.

Example:

Input: [2, 3, 1, 1, 4, 3, 2, 1]
Output: [2, 1, 3]
			

Note:

Your algorithm should run in O(n) time and use O(n) extra space.


Problem Definition

Given an array of integers, the task is to return a new array containing only the duplicate elements from the original array. The order of elements in the resulting array does not matter.

Input:

Output:

Constraints:

Example:

Input: [2, 3, 1, 1, 4, 3, 2, 1]
Output: [2, 1, 3]

Understanding the Problem

The core challenge of this problem is to identify and return the duplicate elements from the array efficiently. This problem is significant in various applications such as data cleaning, where duplicates need to be identified and handled.

Potential pitfalls include misunderstanding the requirement to return only duplicates and not all elements, and ensuring the solution meets the time and space complexity constraints.

Approach

To solve this problem, we need to think about how to efficiently track the occurrences of each element in the array. A naive solution might involve nested loops to compare each element with every other element, but this would result in O(n^2) time complexity, which is not optimal.

Naive Solution

The naive approach involves using two nested loops to compare each element with every other element to find duplicates. This approach is not optimal due to its O(n^2) time complexity.

Optimized Solution

An optimized solution involves using a hash map (dictionary in Python) to track the occurrences of each element. This allows us to achieve O(n) time complexity and O(n) space complexity.

Steps:

  1. Initialize an empty dictionary to keep track of the count of each element.
  2. Iterate through the array and update the count of each element in the dictionary.
  3. Iterate through the dictionary and collect elements that have a count greater than 1.
  4. Return the list of duplicates.

Algorithm

Here is a step-by-step breakdown of the optimized algorithm:

  1. Create an empty dictionary called element_count to store the count of each element.
  2. Iterate through each element in the input array.
  3. For each element, update its count in the element_count dictionary.
  4. After counting all elements, create a list called duplicates to store elements that appear more than once.
  5. Iterate through the element_count dictionary and add elements with a count greater than 1 to the duplicates list.
  6. Return the duplicates list.

Code Implementation

def find_duplicates(arr):
    # Dictionary to store the count of each element
    element_count = {}
    
    # Iterate through the array and count each element
    for num in arr:
        if num in element_count:
            element_count[num] += 1
        else:
            element_count[num] = 1
    
    # List to store the duplicates
    duplicates = []
    
    # Collect elements that have a count greater than 1
    for num, count in element_count.items():
        if count > 1:
            duplicates.append(num)
    
    return duplicates

# Example usage
input_array = [2, 3, 1, 1, 4, 3, 2, 1]
print(find_duplicates(input_array))  # Output: [2, 1, 3]

Complexity Analysis

The time complexity of the optimized solution is O(n) because we iterate through the array once to count the elements and then iterate through the dictionary to collect duplicates. The space complexity is O(n) due to the additional space used by the dictionary to store the counts.

Edge Cases

Potential edge cases include:

Examples:

Input: []
Output: []

Input: [1, 2, 3, 4]
Output: []

Input: [1, 1, 1, 1]
Output: [1]

Testing

To test the solution comprehensively, consider the following test cases:

Example Test Cases:

def test_find_duplicates():
    assert find_duplicates([]) == []
    assert find_duplicates([1, 2, 3, 4]) == []
    assert find_duplicates([1, 1, 1, 1]) == [1]
    assert find_duplicates([2, 3, 1, 1, 4, 3, 2, 1]) == [2, 1, 3]
    assert find_duplicates([5, 5, 5, 5, 5]) == [5]
    print("All test cases pass")

test_find_duplicates()

Thinking and Problem-Solving Tips

When approaching such problems, consider the following tips:

Conclusion

In this blog post, we discussed how to find duplicates in an array efficiently using a dictionary to track element counts. We covered the problem definition, approach, algorithm, code implementation, complexity analysis, edge cases, and testing. Understanding and solving such problems is crucial for improving problem-solving skills and preparing for coding interviews.

Additional Resources

For further reading and practice, consider the following resources: