Hash Sets in Python


Hash Sets are collections of unique items i.e no element can be repeated.

Hash Sets were designed to give us a way of adding and looking for unique values in a collection in a quick manner.

The values in a Hash Set can be either simple primitives like strings or integers as well as more complex object types like object literals or arrays.

In Python, the Hash Set is implemented as the set object.


Creation:

You can create a new empty set by using set():

emptySet = set()

Creating an empty set takes O(1) time.


Constructing from iterable:

The set constructor also accepts an optional iterable object. If you pass an iterable object to the set constructor, all the unique elements from the object will be added to the new set:

elements = [1, '2', 'apple', 1]
mySet = set(elements)

print(mySet) # {1, '2', 'apple'}

This takes O(n) time, where n is the number of elements in the iterable object.


Initialization

Sets containing values can also be initialized by using curly braces:

dataEngineer = {'Python', 'Java', 'Scala', 'Git', 'SQL', 'Hadoop'}

Keep in mind that curly braces can only be used to initialize a set containing values. Empty curly braces ({}) create an empty dictionary, not a set.


Adding elements:

To add an element to the set, you use the add() method:

mySet = {'a', 'b', 'c'}

# Adding new element:
mySet.add('d')

# Adding existing element:
mySet.add('a') # nothing happens

print(mySet) # {'a', 'b', 'c', 'd'}

The set first checks if that element already exists and if so, it does nothing and also doesn't raise an error.

Adding an element to a set takes O(1) time.


Checking if a value exists:

To check if a set has a specific element, you use the in operator:

mySet = {'a', 'b', 'c'}

print('a' in mySet) # True

exist = 'z' in mySet
print(exist) # False

Checking if a value exists in a set takes O(1) time.


Removing elements:

To delete a specified element from a set, you use the remove() method:

mySet = {'a', 'b', 'c', 'd'}

# Deleting elements:
mySet.delete('a')

# Deleting a non-existent element:
mySet.delete('z') # raises KeyError exception

console.log(mySet); # {'b', 'c', 'd'}

The set first checks if that key exists and if not, it raises a KeyError. If you want a function that leaves a set unchanged if the element is not present, you can use discard():

mySet = {'a', 'b', 'c', 'd'}

# Discarding existent element:
mySet.discard('a') # removes 'a'

# Discarding non-existent element:
mySet.delete('z') # nothing happens

console.log(mySet); # {'b', 'c', 'd'}

Removing an element from a set takes O(1) time.


Iterating over the Set values:

If we want to iterate over values of the set, we can use the for loop along with the in operator:

mySet = {'b', 'c', 'c', 'd'}

for val in mySet:
    print(val)

# This will print the following:
# b
# d
# c


Notice the order is not the same as we initiated. set keeps the data in random order

Iterating over a set takes O(n) time.


Space Complexity

A set uses O(n) space, where n is the number of elements existing in the set.


Assignment
Follow the Coding Tutorial and let's play with some Hash Sets.


Hint
Look at the examples above if you get stuck.


Introduction

In this lesson, we will explore the concept of Hash Sets in Python. Hash Sets are a fundamental data structure that allows for the storage of unique items. They are particularly useful in scenarios where you need to ensure that no duplicate elements are present in a collection. Hash Sets are implemented in Python using the set object, which provides efficient methods for adding, removing, and checking for the existence of elements.

Understanding the Basics

Before diving into the more complex aspects of Hash Sets, it's important to understand their basic properties and operations. A Hash Set is an unordered collection of unique elements. This means that each element in the set must be distinct, and the order in which elements are stored is not guaranteed.

Let's start with a simple example:

# Creating an empty set
emptySet = set()

# Creating a set from an iterable
elements = [1, '2', 'apple', 1]
mySet = set(elements)

print(mySet) # Output: {1, '2', 'apple'}

In this example, we create an empty set and a set from a list of elements. Notice that the duplicate element 1 is only included once in the set.

Main Concepts

Now that we have a basic understanding of Hash Sets, let's delve into some key concepts and techniques:

  • Creation: You can create a set using the set() constructor or by using curly braces {} with initial values.
  • Adding Elements: Use the add() method to add elements to a set. If the element already exists, the set remains unchanged.
  • Checking Existence: Use the in operator to check if an element exists in the set.
  • Removing Elements: Use the remove() method to remove an element. If the element does not exist, a KeyError is raised. Alternatively, use discard() to avoid the error.
  • Iteration: Use a for loop to iterate over the elements of a set.

Examples and Use Cases

Let's look at some examples to see how these concepts are applied in various contexts:

# Example 1: Adding and checking elements
mySet = {'a', 'b', 'c'}
mySet.add('d')
print('a' in mySet) # Output: True
print('z' in mySet) # Output: False

# Example 2: Removing elements
mySet.remove('a')
print(mySet) # Output: {'b', 'c', 'd'}
mySet.discard('z') # No error raised

# Example 3: Iterating over a set
for val in mySet:
    print(val)

These examples demonstrate how to add, check, remove, and iterate over elements in a set.

Common Pitfalls and Best Practices

When working with Hash Sets, there are some common mistakes to avoid and best practices to follow:

  • Avoid Duplicate Elements: Remember that sets do not allow duplicate elements. Adding a duplicate element will have no effect.
  • Use discard() for Safe Removal: When removing elements, use discard() to avoid KeyError exceptions.
  • Iterate Efficiently: Use a for loop to iterate over set elements efficiently.

Advanced Techniques

Once you are comfortable with the basics, you can explore some advanced techniques:

  • Set Operations: Perform union, intersection, difference, and symmetric difference operations using set methods like union(), intersection(), difference(), and symmetric_difference().
  • Set Comprehensions: Use set comprehensions to create sets in a concise and readable manner.
# Advanced Example: Set operations
set1 = {1, 2, 3}
set2 = {3, 4, 5}

union_set = set1.union(set2)
intersection_set = set1.intersection(set2)
difference_set = set1.difference(set2)
symmetric_difference_set = set1.symmetric_difference(set2)

print(union_set) # Output: {1, 2, 3, 4, 5}
print(intersection_set) # Output: {3}
print(difference_set) # Output: {1, 2}
print(symmetric_difference_set) # Output: {1, 2, 4, 5}

Code Implementation

Here is a comprehensive example that demonstrates the correct use of Hash Sets in Python:

# Comprehensive Example: Working with Hash Sets

# Creating a set
mySet = {'a', 'b', 'c'}

# Adding elements
mySet.add('d')
mySet.add('a') # Duplicate, no effect

# Checking existence
print('a' in mySet) # Output: True
print('z' in mySet) # Output: False

# Removing elements
mySet.remove('a')
mySet.discard('z') # No error raised

# Iterating over the set
for val in mySet:
    print(val)

# Set operations
set1 = {1, 2, 3}
set2 = {3, 4, 5}

union_set = set1.union(set2)
intersection_set = set1.intersection(set2)
difference_set = set1.difference(set2)
symmetric_difference_set = set1.symmetric_difference(set2)

print(union_set) # Output: {1, 2, 3, 4, 5}
print(intersection_set) # Output: {3}
print(difference_set) # Output: {1, 2}
print(symmetric_difference_set) # Output: {1, 2, 4, 5}

Debugging and Testing

When working with Hash Sets, it's important to test your code thoroughly. Here are some tips for debugging and testing:

  • Use Print Statements: Print the set and its elements at various stages to verify the correctness of your operations.
  • Write Test Cases: Create test cases to check the behavior of your set operations. Use Python's unittest module for structured testing.
import unittest

class TestHashSet(unittest.TestCase):
    def test_add_and_check(self):
        mySet = {'a', 'b', 'c'}
        mySet.add('d')
        self.assertTrue('a' in mySet)
        self.assertFalse('z' in mySet)

    def test_remove(self):
        mySet = {'a', 'b', 'c'}
        mySet.remove('a')
        self.assertFalse('a' in mySet)
        mySet.discard('z') # No error raised

    def test_set_operations(self):
        set1 = {1, 2, 3}
        set2 = {3, 4, 5}
        self.assertEqual(set1.union(set2), {1, 2, 3, 4, 5})
        self.assertEqual(set1.intersection(set2), {3})
        self.assertEqual(set1.difference(set2), {1, 2})
        self.assertEqual(set1.symmetric_difference(set2), {1, 2, 4, 5})

if __name__ == '__main__':
    unittest.main()

Thinking and Problem-Solving Tips

When working with Hash Sets, consider the following strategies:

  • Break Down Problems: Divide complex problems into smaller, manageable parts and solve them step-by-step.
  • Practice Regularly: Solve coding exercises and projects that involve Hash Sets to reinforce your understanding.

Conclusion

In this lesson, we covered the fundamental concepts of Hash Sets in Python, including creation, adding elements, checking existence, removing elements, and iterating over sets. We also explored advanced techniques, common pitfalls, and best practices. By mastering these concepts, you can efficiently work with unique collections of elements in your programs.

Additional Resources

For further reading and practice, consider the following resources: