What is a 'set' in Python?

A foundational guide to the set, Python's data structure for storing unordered collections of unique elements. Learn how to create sets and perform common mathematical set operations like union and intersection.

In Python, a set is a collection that is both unordered and unindexed. But its most important characteristic is that a set can only contain unique elements. Any duplicate items will be automatically removed.

Sets are modeled on the mathematical concept of a set, and they provide powerful and efficient methods for performing standard set operations like union, intersection, and difference.

Creating a Set

You can create a set by placing a comma-separated sequence of items inside curly braces {}.

# Create a set from a list of numbers
numbers = {1, 2, 3, 4, 4, 4} # Duplicates are automatically removed

print(numbers) # {1, 2, 3, 4}

To create an empty set, you must use the set() function. Using empty curly braces {} will create an empty dictionary, not an empty set.

empty_set = set()
empty_dict = {}

Key Properties of Sets

  • Unordered: The items in a set do not have a defined order. You cannot access items by an index.
  • Unique: A set cannot have two items with the same value.
  • Mutable: You can add and remove items from a set.

Adding and Removing Items

  • add(): Adds a single element to the set. If the element is already in the set, it does nothing.

    my_set = {1, 2, 3}
    my_set.add(4)
    
  • update(): Adds all the items from another iterable (like a list or another set) to the set.

    my_set.update([4, 5, 6])
    
  • remove(): Removes a specified element. It will raise a KeyError if the item is not found.

    my_set.remove(3)
    
  • discard(): Also removes a specified element, but it will not raise an error if the item is not found.

    my_set.discard(99) # Does nothing, no error
    

Common Use Cases

1. Removing Duplicates from a List

This is one of the most common and elegant use cases for a set. You can quickly remove all duplicate items from a list by converting it to a set and then back to a list.

my_list = [1, 2, 2, 3, 4, 4, 5, 5, 5]

# Convert to a set to remove duplicates, then back to a list
unique_list = list(set(my_list))

print(unique_list) # [1, 2, 3, 4, 5]

2. Membership Testing

Checking if an item exists in a set is incredibly fast and efficient (average time complexity of O(1)). This is much faster than checking for an item in a list (O(n)).

my_set = {1, 2, 3, 4, 5}

print(3 in my_set)    # True
print(10 in my_set)   # False

If you need to frequently check for the existence of items in a large collection, a set is a much better choice than a list.

Set Operations

Sets support standard mathematical operations.

a = {1, 2, 3, 4}
b = {3, 4, 5, 6}
  • Union (|): Returns a new set containing all items from both sets.

    a | b  # {1, 2, 3, 4, 5, 6}
    
  • Intersection (&): Returns a new set containing only the items present in both sets.

    a & b  # {3, 4}
    
  • Difference (-): Returns a new set containing items in the first set but not in the second set.

    a - b  # {1, 2}
    
  • Symmetric Difference (^): Returns a new set with items in either set, but not both.

    a ^ b  # {1, 2, 5, 6}
    

Conclusion

While lists and dictionaries are often the go-to collection types, Python's set provides a powerful and efficient tool for specific tasks. Its ability to enforce uniqueness and perform high-speed membership tests makes it perfect for removing duplicates and checking for existence. Furthermore, its support for mathematical set operations provides a clean and readable syntax for comparing and combining collections.