4.3. Iterable Set

  • Only unique values

  • Mutable - can add, remove, and modify items

  • Stores only hashable elements (int, float, bool, None, str, tuple)

  • Set is unordered data structure and do not record element position or insertion

  • Do not support getitem and slice

  • Contains in set has O(1) average case complexity [1]

4.3.1. Syntax

  • data = set() - empty set

  • No short syntax

  • Only unique values

Defining only with set() - no short syntax:

>>> data = set()

Comma after last element of a one element set is optional. Brackets are required

>>> data = {1}
>>> data = {1, 2, 3}
>>> data = {1.1, 2.2, 3.3}
>>> data = {True, False}
>>> data = {'a', 'b', 'c'}
>>> data = {'a', 1, 2.2, True, None}

Stores only unique values:

>>> {1, 2, 1}
{1, 2}

Compares by values, not types:

>>> {1}
{1}
>>> {1.0}
{1.0}
>>> {1, 1.0}
{1}
>>> {1.0, 1}
{1.0}

4.3.2. Hashable

  • Can store elements of any hashable types

Hashable (Immutable):

  • int

  • float

  • bool

  • NoneType

  • str

  • tuple

>>> data = {1, 2, 'a'}
>>> data = {1, 2, (3, 4)}

Non-hashable (Mutable):

  • list

  • set

  • dict

>>> data = {1, 2, [3, 4]}
Traceback (most recent call last):
TypeError: unhashable type: 'list'
>>>
>>> data = {1, 2, {3, 4}}
Traceback (most recent call last):
TypeError: unhashable type: 'set'

"Hashable types are also immutable" is true for builtin types, but it's not a universal truth.

  • More information in OOP Hash.

  • More information in OOP Object Identity.

4.3.3. Type Conversion

  • set() converts argument to set

>>> data = 'abcd'
>>> set(data) == {'a', 'b', 'c', 'd'}
True
>>> data = ['a', 'b', 'c', 'd']
>>> set(data) == {'a', 'b', 'c', 'd'}
True
>>> data = ('a', 'b', 'c', 'd')
>>> set(data) == {'a', 'b', 'c', 'd'}
True
>>> data = {'a', 'b', 'c', 'd'}
>>> set(data) == {'a', 'b', 'c', 'd'}
True

4.3.4. Deduplicate

Works with str, list, tuple

>>> data = [1, 2, 3, 1, 1, 2, 4]
>>> set(data)
{1, 2, 3, 4}

Converting set deduplicate items:

>>> data = [
...     'Watney',
...     'Lewis',
...     'Martinez',
...     'Watney',
... ]
>>>
>>> set(data) == {'Watney', 'Lewis', 'Martinez'}
True

4.3.5. Add

>>> data = {1, 2}
>>>
>>> data.add(3)
>>> data == {1, 2, 3}
True
>>>
>>> data.add(3)
>>> data == {1, 2, 3}
True
>>>
>>> data.add(4)
>>> data == {1, 2, 3, 4}
True
>>> data = {1, 2}
>>> data.add([3, 4])
Traceback (most recent call last):
TypeError: unhashable type: 'list'
>>> data = {1, 2}
>>> data.add((3, 4))
>>> data == {1, 2, (3, 4)}
True
>>> data = {1, 2}
>>> data.add({3, 4})
Traceback (most recent call last):
TypeError: unhashable type: 'set'

4.3.6. Update

>>> data = {1, 2}
>>> data.update({3, 4})
>>> data == {1, 2, 3, 4}
True
>>> data.update([5, 6])
>>> data == {1, 2, 3, 4, 5, 6}
True
>>> data.update((7, 8))
>>> data == {1, 2, 3, 4, 5, 6, 7, 8}
True

4.3.7. Pop

Gets and remove items

>>> data = {1, 2, 3}
>>> value = data.pop()
>>> value in [1, 2, 3]
True

4.3.8. Membership

Is Disjoint?:

  • True - if there are no common elements in data and x

  • False - if any x element are in data

>>> data = {1,2}
>>>
>>> data.isdisjoint({1,2})
False
>>> data.isdisjoint({1,3})
False
>>> data.isdisjoint({3,4})
True

Is Subset?:

  • True - if x has all elements from data

  • False - if x don't have element from data

>>> data = {1,2}
>>>
>>> data.issubset({1})
False
>>> data.issubset({1,2})
True
>>> data.issubset({1,2,3})
True
>>> data.issubset({1,3,4})
False
>>> {1,2} < {3,4}
False
>>> {1,2} < {1,2}
False
>>> {1,2} < {1,2,3}
True
>>> {1,2,3} < {1,2}
False
>>> {1,2} <= {3,4}
False
>>> {1,2} <= {1,2}
True
>>> {1,2} <= {1,2,3}
True
>>> {1,2,3} <= {1,2}
False

Is Superset?:

  • True - if data has all elements from x

  • False - if data don't have element from x

>>> data = {1,2}
>>>
>>> data.issuperset({1})
True
>>> data.issuperset({1,2})
True
>>> data.issuperset({1,2,3})
False
>>> data.issuperset({1,3})
False
>>> data.issuperset({2,1})
True
>>> {1,2} > {1,2}
False
>>> {1,2} > {1,2,3}
False
>>> {1,2,3} > {1,2}
True
>>> {1,2} >= {1,2}
True
>>> {1,2} >= {1,2,3}
False
>>> {1,2,3} >= {1,2}
True

4.3.9. Basic Operations

Union (returns sum of elements from data and x):

>>> data = {1,2}
>>>
>>> data.union({1,2})
{1, 2}
>>> data.union({1,2,3})
{1, 2, 3}
>>> data.union({1,2,4})
{1, 2, 4}
>>> data.union({1,3}, {2,4})
{1, 2, 3, 4}
>>> {1,2} | {1,2}
{1, 2}
>>> {1,2,3} | {1,2}
{1, 2, 3}
>>> {1,2,3} | {1,2,4}
{1, 2, 3, 4}
>>> {1,2} | {1,3} | {2,4}
{1, 2, 3, 4}

Difference (returns elements from data which are not in x):

>>> data = {1,2}
>>>
>>> data.difference({1,2})
set()
>>> data.difference({1,2,3})
set()
>>> data.difference({1,4})
{2}
>>> data.difference({1,3}, {2,4})
set()
>>> data.difference({3,4})
{1, 2}
>>> {1,2} - {2,3}
{1}
>>> {1,2} - {2,3} - {3}
{1}
>>> {1,2} - {1,2,3}
set()

Symmetric Difference (returns elements from data and x, but without common):

>>> data = {1,2}
>>>
>>> data.symmetric_difference({1,2})
set()
>>> data.symmetric_difference({1,2,3})
{3}
>>> data.symmetric_difference({1,4})
{2, 4}
>>> data.symmetric_difference({1,3}, {2,4})
Traceback (most recent call last):
TypeError: set.symmetric_difference() takes exactly one argument (2 given)
>>> data.symmetric_difference({3,4})
{1, 2, 3, 4}
>>> {1,2} ^ {1,2}
set()
>>> {1,2} ^ {2,3}
{1, 3}
>>> {1,2} ^ {1,3}
{2, 3}

Intersection (returns common element from in data and x):

>>> data = {1,2}
>>>
>>> data.intersection({1,2})
{1, 2}
>>> data.intersection({1,2,3})
{1, 2}
>>> data.intersection({1,4})
{1}
>>> data.intersection({1,3}, {2,4})
set()
>>> data.intersection({1,3}, {1,4})
{1}
>>> data.intersection({3,4})
set()
>>> {1,2} & {2,3}
{2}
>>> {1,2} & {2,3} & {2,4}
{2}
>>> {1,2} & {2,3} & {3}
set()

4.3.10. Cardinality

>>> data = {1, 2, 3}
>>> len(data)
3

4.3.11. References

4.3.12. Assignments

Code 4.7. Solution
"""
* Assignment: Iterable Set Create
* Type: class assignment
* Complexity: easy
* Lines of code: 5 lines
* Time: 5 min

English:
    1. Create sets:
        a. `result_a` without elements
        b. `result_a` with elements: 1, 2, 3
        c. `result_b` with elements: 1.1, 2.2, 3.3
        d. `result_c` with elements: 'a', 'b', 'c'
        e. `result_d` with elements: True, False
        f. `result_e` with elements: 1, 2.2, True, 'a'
    2. Run doctests - all must succeed

Polish:
    1. Stwórz sety:
        a. `result_a` bez elementów
        b. `result_a` z elementami: 1, 2, 3
        c. `result_b` z elementami: 1.1, 2.2, 3.3
        d. `result_c` z elementami: 'a', 'b', 'c'
        e. `result_d` z elementami: True, False, True
        f. `result_e` z elementami: 1, 2.2, True, 'a'
    2. Uruchom doctesty - wszystkie muszą się powieść

Tests:
    >>> import sys; sys.tracebacklimit = 0

    >>> assert result_a is not Ellipsis, \
    'Assign your result to variable `result_a`'
    >>> assert result_b is not Ellipsis, \
    'Assign your result to variable `result_b`'
    >>> assert result_c is not Ellipsis, \
    'Assign your result to variable `result_c`'
    >>> assert result_d is not Ellipsis, \
    'Assign your result to variable `result_d`'
    >>> assert result_e is not Ellipsis, \
    'Assign your result to variable `result_e`'
    >>> assert result_f is not Ellipsis, \
    'Assign your result to variable `result_f`'

    >>> assert type(result_a) is set, \
    'Variable `result_a` has invalid type, should be set'
    >>> assert type(result_b) is set, \
    'Variable `result_b` has invalid type, should be set'
    >>> assert type(result_c) is set, \
    'Variable `result_c` has invalid type, should be set'
    >>> assert type(result_d) is set, \
    'Variable `result_d` has invalid type, should be set'
    >>> assert type(result_e) is set, \
    'Variable `result_e` has invalid type, should be set'
    >>> assert type(result_f) is set, \
    'Variable `result_f` has invalid type, should be set'

    >>> assert result_a == set(), \
    'Variable `result_a` has invalid value, should be set()'
    >>> assert result_b == {1, 2, 3}, \
    'Variable `result_b` has invalid value, should be {1, 2, 3}'
    >>> assert result_c == {1.1, 2.2, 3.3}, \
    'Variable `result_c` has invalid value, should be {1.1, 2.2, 3.3}'
    >>> assert result_d == {'a', 'b', 'c'}, \
    'Variable `result_d` has invalid value, should be {"a", "b", "c"}'
    >>> assert result_e == {True, False}, \
    'Variable `result_e` has invalid value, should be {True, False}'
    >>> assert result_f == {1, 2.2, True, 'a'}, \
    'Variable `result_f` has invalid value, should be {1, 2.2, True, "a"}'
"""

# Set without elements
# type: set
result_a = ...

# Set with elements: 1, 2, 3
# type: set[int]
result_b = ...

# Set with elements: 1.1, 2.2, 3.3
# type: set[float]
result_c = ...

# Set with elements: 'a', 'b', 'c'
# type: set[str]
result_d = ...

# Set with elements: True, False
# type: set[bool]
result_e = ...

# Set with elements: 1, 2.2, True, 'a'
# type: set[int|float|bool|str]
result_f = ...

Code 4.8. Solution
"""
* Assignment: Iterable Set Many
* Type: class assignment
* Complexity: easy
* Lines of code: 9 lines
* Time: 8 min

English:
    1. Non-functional requirements:
        a. Assignmnet verifies creation of `set()` and method `.add()` and
           `.update()` usage
        b. For simplicity numerical values type as `floats`, and not `str`
        c. Example: instead of '5.8' just type 5.8
        d. Do not use `str.split()`, `slice`, `getitem`, `for`, `while` or
           any other control-flow statement
    2. Create set `result` representing row with index 1
    3. Values from row at index 2 add to `result` using `.add()` (five calls)
    4. From row at index 3 create `set` and add it to `result` using
       `.update()` (one call)
    5. From row at index 4 `tuple` and add it to `result` using `.update()`
       (one call)
    6. From row at index 5 `list` and add it to `result` using `.update()` (
       one call)
    7. Run doctests - all must succeed

Polish:
    1. Wymagania niefunkcjonalne:
        a. Zadanie sprawdza tworzenie `set()` oraz użycie metod `.add()` i
           `.update()`
        b. Dla uproszczenia wartości numeryczne wypisuj jako `float`,
        a nie `str`
        c. Przykład: zamiast '5.8' zapisz 5.8
        d. Nie używaj `str.split()`, `slice`, `getitem`, `for`, `while` lub
           jakiejkolwiek innej instrukcji sterującej
    2. Stwórz zbiór `result` reprezentujący wiersz o indeksie 1
    3. Wartości z wiersza o indeksie 2 dodawaj do `result` używając `.add()`
       (pięć wywołań)
    4. Na podstawie wiersza o indeksie 3 stwórz `set` i dodaj go do `result`
       używając `.update()` (jedno wywołanie)
    5. Na podstawie wiersza o indeksie 4 stwórz `tuple` i dodaj go do
       `result` używając `.update()` (jedno wywołanie)
    6. Na podstawie wiersza o indeksie 5 stwórz `list` i dodaj go do
       `result` używając `.update()` (jedno wywołanie)
    7. Uruchom doctesty - wszystkie muszą się powieść

Tests:
    >>> import sys; sys.tracebacklimit = 0

    >>> assert result is not Ellipsis, \
    'Assign your result to variable `result`'
    >>> assert type(result) is set, \
    'Variable `result` has invalid type, should be set'
    >>> assert len(result) == 22, \
    'Variable `result` length should be 22'

    >>> assert ('sepal_length' not in result
    ...     and 'sepal_width' not in result
    ...     and 'petal_length' not in result
    ...     and 'petal_width' not in result
    ...     and 'species' not in result)

    >>> assert result >= {5.8, 2.7, 5.1, 1.9, 'virginica'}
    >>> assert result >= {5.1, 3.5, 1.4, 0.2, 'setosa'}
    >>> assert result >= {5.7, 2.8, 4.1, 1.3, 'versicolor'}
    >>> assert result >= {6.3, 2.9, 5.6, 1.8, 'virginica'}
    >>> assert result >= {6.4, 3.2, 4.5, 1.5, 'versicolor'}
"""

DATA = [
    'sepal_length,sepal_width,petal_length,petal_width,species',
    '5.8,2.7,5.1,1.9,virginica',
    '5.1,3.5,1.4,0.2,setosa',
    '5.7,2.8,4.1,1.3,versicolor',
    '6.3,2.9,5.6,1.8,virginica',
    '6.4,3.2,4.5,1.5,versicolor',
]

# Set with row at DATA[1] (manually converted to float and str)
# type: set[float|str]
result = ...

# Add to result float 5.1
...

# Add to result float 3.5
...

# Add to result float 1.4
...

# Add to result float 0.2
...

# Add to result str setosa
...

# Update result with set 5.7, 2.8, 4.1, 1.3, 'versicolor'
...

# Update result with tuple 6.3, 2.9, 5.6, 1.8, 'virginica'
...

# Update result with list 6.4, 3.2, 4.5, 1.5, 'versicolor'
...