Merge Sort in Python with O(n log n) Time Complexity


In this video lesson we will study how the Merge Sort algorithm works, we will analyze its time and space complexities and then we will implement it:


Problem - Merge Sort:

Given an array of integers nums, sort it in ascending order using Merge Sort

Example 1:

Input: nums = [3, 1, 3, 2, 5, 4]
Output: [1, 2, 3, 3, 4, 5]

Note:

Your algorithm should run in O(n log n) time and use O(n) extra space.


Understanding the Problem

The core challenge of this problem is to sort an array of integers efficiently. Merge Sort is a classic sorting algorithm that uses the divide-and-conquer approach to achieve this. It is significant because it guarantees a time complexity of O(n log n), making it suitable for large datasets. Common applications include sorting data in databases, file systems, and more.

Potential pitfalls include misunderstanding the recursive nature of the algorithm and mishandling the merging process, which can lead to incorrect sorting.

Approach

To solve this problem, we need to understand the Merge Sort algorithm:

Let's start with a naive approach:

Naive Solution: We could use a simple sorting algorithm like Bubble Sort, but it has a time complexity of O(n^2), which is not efficient for large arrays.

Optimized Solution: Merge Sort is more efficient with a time complexity of O(n log n). It divides the array into smaller subarrays, sorts them, and then merges them back together.

Algorithm

Here is a step-by-step breakdown of the Merge Sort algorithm:

  1. If the array has one or zero elements, it is already sorted. Return the array.
  2. Divide the array into two halves.
  3. Recursively sort each half.
  4. Merge the two sorted halves into a single sorted array.

Code Implementation

def merge_sort(nums):
    # Base case: if the array is of length 0 or 1, it is already sorted
    if len(nums) <= 1:
        return nums

    # Step 1: Divide the array into two halves
    mid = len(nums) // 2
    left_half = nums[:mid]
    right_half = nums[mid:]

    # Step 2: Recursively sort each half
    left_sorted = merge_sort(left_half)
    right_sorted = merge_sort(right_half)

    # Step 3: Merge the sorted halves
    return merge(left_sorted, right_sorted)

def merge(left, right):
    sorted_array = []
    i = j = 0

    # Merge the two sorted arrays
    while i < len(left) and j < len(right):
        if left[i] < right[j]:
            sorted_array.append(left[i])
            i += 1
        else:
            sorted_array.append(right[j])
            j += 1

    # Append any remaining elements from the left array
    while i < len(left):
        sorted_array.append(left[i])
        i += 1

    # Append any remaining elements from the right array
    while j < len(right):
        sorted_array.append(right[j])
        j += 1

    return sorted_array

# Example usage
nums = [3, 1, 3, 2, 5, 4]
sorted_nums = merge_sort(nums)
print(sorted_nums)  # Output: [1, 2, 3, 3, 4, 5]

Complexity Analysis

The time complexity of Merge Sort is O(n log n) because the array is divided in half log n times, and the merging process takes linear time. The space complexity is O(n) due to the additional arrays used during the merging process.

Edge Cases

Potential edge cases include:

Examples:

merge_sort([])  # Output: []
merge_sort([1])  # Output: [1]
merge_sort([2, 1, 2])  # Output: [1, 2, 2]

Testing

To test the solution comprehensively, consider a variety of test cases:

Using a testing framework like unittest in Python can help automate and organize these tests.

Thinking and Problem-Solving Tips

When approaching such problems, consider the following tips:

Conclusion

Merge Sort is a powerful and efficient sorting algorithm that guarantees a time complexity of O(n log n). Understanding and implementing this algorithm is crucial for solving sorting problems effectively. Practice and exploration of similar problems will help solidify your understanding and improve your problem-solving skills.

Additional Resources

For further reading and practice, consider the following resources: