Course Outline

The values in a Hash Set can be either simple primitives like strings or integers as well as more complex object types like object literals or arrays.

In C++, the Hash Set is implemented as an unordered_set object.

Creation:

To create a new empty unordered_set that holds strings, you use the following syntax:

unordered_set<string> mySet;

Creating an empty Set takes O(1) time.

Constructing from iterable

The Set constructor also accepts an optional iterable object. If you pass an iterable object to the Set constructor, all the unique elements of the object will be added to the new set:

vector<string> elements = {"b", "c", "c", "d"};
unordered_set<string> mySet(elements.begin(), elements.end());

// mySet contains {"b", "c", "d"}

This takes O(n) time, where n is the number of elements in the iterable object.

Adding elements:

To add an element to the set, you use the insert() method:

unordered_set<string> mySet({"b", "c", "c", "d"});

// Adding new elements
mySet.insert("x");

// Adding an existing element
mySet.insert("d"); // nothing happens

// mySet contains {"b", "c", "d", "x"}

The Set first checks if that element already exists and if so, it does nothing and also doesn't raise an error.

Adding an element to a Set takes O(1) time.

Checking if a value exists:

To check if a Set has a specific element, you use the count() method which returns 1 if the set contains the element, otherwise, it returns 0.

unordered_set<string> mySet({"b", "c", "c", "d"});

mySet.count("c"); // returns 1 since 'c' appears once (remember that set keeps unique values)

mySet.count("x"); // returns 0 since it does not exist in the set

Checking if a value exists in a Set takes O(1) time.

Removing elements:

To delete a specified element from a set, you use the erase() method:

unordered_set<string> mySet({"b", "c", "c", "d"});

// Deleting elements
mySet.erase("c");

// Deleting a non-existent element
mySet.erase("z"); // does nothing

// mySet contains {"b", "d"}

The Set first checks if that key exists and if not, it does nothing and also doesn't raise an error, so it's safe to delete non-existing elements.

Removing an element from a Set takes O(1) time.

Iterating over the Set values:

If we want to iterate over values of the unordered_set, we can use the for loop along with the : operator:

unordered_set<string> mySet({"b", "c", "c", "d"});

for (string val : mySet) {
    cout << val << endl;
}

// This will print the following:
// d
// c
// b

Iterating over an unordered_set takes O(n) time.

Notice the order is not the same as we initiated. set keeps the data in random order

Space Complexity

A Set uses O(n) space, where n is the number of elements existing in the Set.

Assignment
Follow the Coding Tutorial and let's play with some Hash Sets.

Hint
Look at the examples above if you get stuck.

Introduction

In this lesson, we will explore the concept of Hash Sets in C++. Hash Sets are a powerful data structure that allows for the storage of unique elements and provides efficient operations for adding, removing, and checking the existence of elements. Understanding Hash Sets is crucial for solving problems that require quick lookups and ensuring uniqueness in collections.

Hash Sets are particularly useful in scenarios where you need to maintain a collection of unique items, such as tracking unique user IDs, filtering duplicate entries, or implementing caching mechanisms.

Understanding the Basics

Before diving into the details of Hash Sets, let's understand some fundamental concepts:

Uniqueness: Hash Sets store only unique elements. If you try to add a duplicate element, it will be ignored.
Hashing: Hash Sets use a hashing function to map elements to specific positions in an internal array, allowing for efficient lookups.
Unordered: The elements in a Hash Set are not stored in any particular order. The order may change as elements are added or removed.

Understanding these basics is essential before moving on to more complex aspects of Hash Sets.

Main Concepts

Let's delve into the key concepts and techniques involved in using Hash Sets in C++:

Creation: You can create an empty unordered_set using the syntax unordered_set<string> mySet;. This operation takes constant time, O(1).
Constructing from Iterable: You can initialize a Hash Set with elements from an iterable object, such as a vector. This operation takes linear time, O(n), where n is the number of elements in the iterable.
Adding Elements: Use the insert() method to add elements to the Hash Set. If the element already exists, it will be ignored. Adding an element takes constant time, O(1).
Checking Existence: Use the count() method to check if an element exists in the Hash Set. This operation also takes constant time, O(1).
Removing Elements: Use the erase() method to remove elements from the Hash Set. If the element does not exist, nothing happens. Removing an element takes constant time, O(1).
Iterating: You can iterate over the elements of the Hash Set using a for loop. Iterating over the set takes linear time, O(n).

Examples and Use Cases

Let's look at some examples to understand how Hash Sets can be used in various contexts:

Example 1: Tracking Unique User IDs

#include <iostream>
#include <unordered_set>
#include <vector>
#include <string>

using namespace std;

int main() {
    vector<string> userIds = {"user1", "user2", "user3", "user1", "user4"};
    unordered_set<string> uniqueUserIds(userIds.begin(), userIds.end());

    for (const string& id : uniqueUserIds) {
        cout << id << endl;
    }

    return 0;
}

In this example, we initialize a Hash Set with user IDs from a vector. The Hash Set automatically removes duplicate entries, ensuring that only unique user IDs are stored.

Example 2: Filtering Duplicate Entries

#include <iostream>
#include <unordered_set>
#include <vector>
#include <string>

using namespace std;

vector<string> filterDuplicates(const vector<string>& input) {
    unordered_set<string> uniqueElements(input.begin(), input.end());
    return vector<string>(uniqueElements.begin(), uniqueElements.end());
}

int main() {
    vector<string> data = {"apple", "banana", "apple", "orange", "banana"};
    vector<string> filteredData = filterDuplicates(data);

    for (const string& item : filteredData) {
        cout << item << endl;
    }

    return 0;
}

In this example, we define a function filterDuplicates that takes a vector of strings and returns a new vector with duplicate entries removed using a Hash Set.

Common Pitfalls and Best Practices

When working with Hash Sets, it's important to be aware of common pitfalls and follow best practices:

Avoiding Duplicates: Remember that Hash Sets automatically handle duplicates, so you don't need to manually check for them.
Efficient Hash Functions: Ensure that the hash function used by the Hash Set is efficient to avoid performance issues.
Memory Usage: Be mindful of the memory usage of Hash Sets, especially when dealing with large datasets.

Advanced Techniques

Let's explore some advanced techniques related to Hash Sets:

Custom Hash Functions: You can define custom hash functions for complex data types to optimize the performance of Hash Sets.
Combining with Other Data Structures: Hash Sets can be combined with other data structures, such as vectors or maps, to solve more complex problems.

Code Implementation

Here is a comprehensive example demonstrating the use of Hash Sets in C++:

#include <iostream>
#include <unordered_set>
#include <vector>
#include <string>

using namespace std;

int main() {
    // Creating an empty unordered_set
    unordered_set<string> mySet;

    // Adding elements to the set
    mySet.insert("apple");
    mySet.insert("banana");
    mySet.insert("orange");

    // Checking if an element exists
    if (mySet.count("banana")) {
        cout << "Banana is in the set" << endl;
    }

    // Removing an element
    mySet.erase("banana");

    // Iterating over the set
    for (const string& fruit : mySet) {
        cout << fruit << endl;
    }

    return 0;
}

This code demonstrates the creation of a Hash Set, adding elements, checking for existence, removing elements, and iterating over the set.

Debugging and Testing

When working with Hash Sets, consider the following tips for debugging and testing:

Print Statements: Use print statements to verify the contents of the Hash Set at different stages of your program.
Test Cases: Write test cases to ensure that your Hash Set operations work as expected. Test edge cases, such as adding duplicate elements or removing non-existent elements.

Thinking and Problem-Solving Tips

Here are some strategies for approaching problems related to Hash Sets:

Break Down Problems: Divide complex problems into smaller, manageable parts that can be solved using Hash Sets.
Practice: Solve coding exercises and projects that involve Hash Sets to improve your understanding and proficiency.

Conclusion

In this lesson, we explored the concept of Hash Sets in C++ and learned how to create, manipulate, and utilize them effectively. Hash Sets are a powerful tool for ensuring uniqueness and performing efficient lookups in collections. By mastering Hash Sets, you can solve a wide range of programming problems more efficiently.

Keep practicing and exploring further applications of Hash Sets to enhance your programming skills.

Additional Resources

For further reading and practice problems related to Hash Sets, consider the following resources:

C++ Reference: unordered_set
GeeksforGeeks: unordered_set in C++ STL
LeetCode - Practice coding problems involving Hash Sets

Course Outline

Welcome To C++

Variables

Calculations With Numbers

Functions I

Calculations - Challenges

Strings I

Conditional Statements

Conditional Statements - Challenges

Arrays I

For Loops - Looping Over Arrays

While Loops & For Loops

Loops & Conditionals

Loops & Global Variables

Looping Over Array Indices

Big O Notation

Loops - Challenges

Strings

Functions & Classes

Nested Loops & Brute Force

Sorting

Partial Sums #1

Partial Sums #2

Two Pointers

Hash Table

Binary Search

Recursion

Divide And Conquer

Backtracking

Memoization

Linked Lists

Sliding Window

Heap

Stack

Binary Trees

Dynamic Programming

Greedy

Graphs

Hash Sets in C++