Remove Duplicates from Array II in O(n log n) Time Complexity using Python


Given an array of integers, remove the duplicates in-place such that each unique element appears only once.

Do not allocate extra space for another array. You must do this by modifying the input array in-place with O(1) extra memory.

The resulting array can be in any order.

Example:

Input: [2, 3, 1, 1, 4, 3, -2, 1]
Output: [2, 3, 1, 4, -2]
			

Note:

Your algorithm should run in O(n log n) time and use O(1) extra space.


Understanding the Problem

The core challenge of this problem is to remove duplicates from an array in-place, meaning we cannot use extra space for another array. This is significant in scenarios where memory usage is critical. Common applications include data cleaning and preprocessing in data science.

Potential pitfalls include misunderstanding the in-place requirement and using extra space inadvertently.

Approach

To solve this problem, we can follow these steps:

  1. Sort the array. This ensures that duplicate elements are adjacent.
  2. Use a pointer to track the position of the next unique element.
  3. Traverse the sorted array, and for each unique element, place it at the position indicated by the pointer.
  4. Return the subarray up to the pointer's position.

Sorting the array takes O(n log n) time, and the traversal takes O(n) time, resulting in an overall time complexity of O(n log n).

Algorithm

Here is a step-by-step breakdown of the algorithm:

  1. Sort the array.
  2. Initialize a pointer p to 0.
  3. Traverse the array starting from the second element (index 1).
  4. If the current element is different from the previous element, increment the pointer and update the element at the pointer's position.
  5. Return the subarray from the start to the pointer's position.

Code Implementation

def remove_duplicates(nums):
    # Step 1: Sort the array
    nums.sort()
    
    # Step 2: Initialize the pointer
    p = 0
    
    # Step 3: Traverse the array
    for i in range(1, len(nums)):
        # Step 4: Check if the current element is different from the previous one
        if nums[i] != nums[i - 1]:
            p += 1
            nums[p] = nums[i]
    
    # Step 5: Return the subarray from the start to the pointer's position
    return nums[:p + 1]

# Example usage
input_array = [2, 3, 1, 1, 4, 3, -2, 1]
output_array = remove_duplicates(input_array)
print(output_array)  # Output: [2, 3, 1, 4, -2]

Complexity Analysis

The time complexity of this approach is O(n log n) due to the sorting step. The space complexity is O(1) as we are modifying the array in-place and not using any extra space.

Edge Cases

Consider the following edge cases:

These cases can be tested to ensure the robustness of the solution.

Testing

To test the solution comprehensively, consider the following test cases:

def test_remove_duplicates():
    assert remove_duplicates([]) == []
    assert remove_duplicates([1, 1, 1, 1]) == [1]
    assert remove_duplicates([1, 2, 3, 4]) == [1, 2, 3, 4]
    assert remove_duplicates([2, 3, 1, 1, 4, 3, -2, 1]) == [-2, 1, 2, 3, 4]
    print("All test cases pass")

test_remove_duplicates()

Thinking and Problem-Solving Tips

When approaching such problems, consider the following tips:

Conclusion

In this blog post, we discussed how to remove duplicates from an array in-place with O(n log n) time complexity using Python. We covered the problem definition, approach, algorithm, code implementation, complexity analysis, edge cases, and testing. Understanding and solving such problems is crucial for improving problem-solving skills and preparing for coding interviews.

Additional Resources

For further reading and practice, consider the following resources: