4.2. Iterable List

  • Mutable - can add, remove, and modify items

  • Stores elements of any type

CPython's lists are really variable-length arrays, not Lisp-style linked lists. The implementation uses a contiguous array of references to other objects, and keeps a pointer to this array and the array's length in a list head structure.

This makes indexing a list data[i] an operation whose cost is independent of the size of the list or the value of the index.

When items are appended or inserted, the array of references is resized. Some cleverness is applied to improve the performance of appending items repeatedly; when the array must be grown, some extra space is allocated so the next few times don't require an actual resize.

4.2.1. Syntax

  • data = [] - empty list

  • data = [1, 2.2, 'abc'] - list with values

  • data = [] is faster than data = list()

Defining empty list with [] is used more often, but list() is more explicit:

>>> data = list()
>>> data = []

Comma after last element is optional:

>>> data = [1]
>>> data = [1,]

Can store elements of any types:

>>> data = [1, 2, 3]
>>> data = [1.1, 2.2, 3.3]
>>> data = [True, False]
>>> data = ['a', 'b', 'c']
>>> data = ['a', 1, 2.2, True, None]

Brackets are required

>>> data = [1, 2, 3]

Performance:

>>> %%timeit -r 10_000 -n 10_000  
... data = list()
...
53.8 ns ± 8.15 ns per loop (mean ± std. dev. of 10000 runs, 10,000 loops each)
>>>
>>>
>>> %%timeit -r 10_000 -n 10_000  
... data = []
...
23.9 ns ± 4.23 ns per loop (mean ± std. dev. of 10000 runs, 10,000 loops each)

4.2.2. Type Conversion

  • list() converts argument to list

  • Takes one iterable as an argument

  • Multiple arguments are not allowed

Builtin function list() converts argument to list

>>> text = 'hello'
>>> list(text)
['h', 'e', 'l', 'l', 'o']
>>> colors = ['red', 'green', 'blue']
>>> list(colors)
['red', 'green', 'blue']
>>> colors = ('red', 'green', 'blue')
>>> list(colors)
['red', 'green', 'blue']
>>> list('red', 'green', 'blue')
Traceback (most recent call last):
TypeError: list expected at most 1 argument, got 3

4.2.3. Get Item

  • Returns a value at given index

  • Note, that Python start counting at zero (zero based indexing)

  • Raises IndexError if the index is out of range

  • More information in Iterable GetItem

  • More information in Iterable Slice

>>> colors = ['red', 'green', 'blue']
>>>
>>> colors[0]
'red'
>>> colors[1]
'green'
>>> colors[2]
'blue'

4.2.4. Set Item

>>> colors = ['red', 'green', 'blue']
>>> colors[0] = 'black'
>>>
>>> print(colors)
['black', 'green', 'blue']
>>> colors = ['red', 'green', 'blue']
>>> colors[4] = 'black'
Traceback (most recent call last):
IndexError: list assignment index out of range

4.2.5. Del Item

>>> colors = ['red', 'green', 'blue']
>>> del colors[2]
>>>
>>> print(colors)
['red', 'green']
>>> colors = ['red', 'green', 'blue']
>>> result = colors.pop()
>>>
>>> colors
['red', 'green']
>>> result
'blue'

4.2.6. Append

  • list + list - add

  • list += list - increment add

  • list.extend() - extend

  • list.append() - append

  • O(1) complexity

Add:

>>> colors = ['red', 'green', 'blue']
>>> result = colors + ['black']
>>>
>>> print(colors)
['red', 'green', 'blue']
>>>
>>> print(result)
['red', 'green', 'blue', 'black']

Increment Add:

>>> colors = ['red', 'green', 'blue']
>>> colors += ['black']
>>>
>>> print(colors)
['red', 'green', 'blue', 'black']

Extend:

>>> colors = ['red', 'green', 'blue']
>>> colors.extend(['black', 'white'])
>>>
>>> print(colors)
['red', 'green', 'blue', 'black', 'white']

Append: >>> colors = ['red', 'green', 'blue'] >>> colors.append(['black', 'white']) >>> >>> print(colors) ['red', 'green', 'blue', ['black', 'white']]

Errors:

>>> colors + 'black'
Traceback (most recent call last):
TypeError: can only concatenate list (not "str") to list
>>> colors = ['red', 'green', 'blue']
>>> colors += 'black'
>>>
>>> print(colors)
['red', 'green', 'blue', 'b', 'l', 'a', 'c', 'k']

4.2.7. Insert

  • list.insert(idx, object)

  • Insert object at specific position

  • O(n) complexity

>>> colors = ['red', 'green', 'blue']
>>> colors.insert(0, 'black')
>>>
>>> print(colors)
['black', 'red', 'green', 'blue']
>>> colors = ['red', 'green', 'blue']
>>> colors.insert(1, 'black')
>>>
>>> print(colors)
['red', 'black', 'green', 'blue']

4.2.8. Sort

  • sorted() - returns new sorted list, but does not modify the original

  • list.sort() - sorts list and returns None

Why doesn't list.sort() return the sorted list? [3]

In situations where performance matters, making a copy of the list just to sort it would be wasteful. Therefore, list.sort() sorts the list in place. In order to remind you of that fact, it does not return the sorted list. This way, you won't be fooled into accidentally overwriting a list when you need a sorted copy but also need to keep the unsorted version around.

If you want to return a new list, use the built-in sorted() function instead. This function creates a new list from a provided iterable, sorts it and returns it. For example, here's how to iterate over the keys of a dictionary in sorted order

Timsort is a hybrid stable sorting algorithm, derived from merge sort and insertion sort, designed to perform well on many kinds of real-world data. It was implemented by Tim Peters in 2002 for use in the Python programming language. The algorithm finds subsequences of the data that are already ordered (runs) and uses them to sort the remainder more efficiently. This is done by merging runs until certain criteria are fulfilled. Timsort has been Python's standard sorting algorithm since version 2.3. It is also used to sort arrays of non-primitive type in Java SE 7, on the Android platform, in GNU Octave, on V8, Swift, and Rust. [1]

  • Worst-case performance: \(O(n\log{n})\)

  • Best-case performance: \(O(n)\)

  • Average performance: \(O(n\log{n})\)

  • Worst-case space complexity: \(O(n)\)

  • sorted() - Returns sorted list, do not modify the original

  • list.sort() - Changes object permanently, returns None

Return sorted values without modifying a list:

>>> values = [3, 1, 2]
>>>
>>> sorted(values)
[1, 2, 3]
>>>
>>> sorted(values, reverse=True)
[3, 2, 1]

Permanent sorting with list modification (note that list.sort() modifies values, and returns None, not values):

>>> values = [3, 1, 2]
>>>
>>> values.sort()
>>> values
[1, 2, 3]
>>>
>>> values.sort(reverse=True)
>>> values
[3, 2, 1]

You can also use list.sort() and/or sorted() with str. It will sort strings according to Unicode (UTF-8) value, that is ASCII table for latin alphabet and Unicode for extended encoding. This kind of sorting is called lexicographic order.

>>> colors = ['red', 'green', 'blue']
>>>
>>> sorted(colors)
['blue', 'green', 'red']

4.2.9. Reverse

  • reversed()

  • list.reverse()

>>> colors = ['red', 'green', 'blue']
>>> colors.reverse()
>>> colors
['blue', 'green', 'red']
>>> colors = ['red', 'green', 'blue']
>>> result = reversed(colors)
>>>
>>> list(result)
['blue', 'green', 'red']

Why?:

>>> colors = ['red', 'green', 'blue']
>>> result = reversed(colors)
>>>
>>> result  
<list_reverseiterator object at 0x...>
>>>
>>> next(result)
'blue'
>>> next(result)
'green'
>>> next(result)
'red'
>>> next(result)
Traceback (most recent call last):
StopIteration

4.2.10. Index

  • list.index() - position at which something is in the list

  • Note, that Python start counting at zero (zero based indexing)

  • Raises ValueError if the value is not present

>>> colors = ['red', 'green', 'blue']
>>> result = colors.index('blue')
>>>
>>> print(result)
2

4.2.11. Count

  • list.count() - number of occurrences of value

>>> colors = ['red', 'green', 'blue', 'red', 'blue', 'red']
>>> result = colors.count('red')
>>>
>>> print(result)
3

4.2.12. Method Chaining

>>> colors = ['red', 'green', 'blue']
>>> colors.sort()
>>> colors.append('black')
>>>
>>> print(colors)
['blue', 'green', 'red', 'black']
>>> colors = ['red', 'green', 'blue']
>>>
>>> colors.sort().append('black')
Traceback (most recent call last):
AttributeError: 'NoneType' object has no attribute 'append'

4.2.13. Built-in Functions

  • min() - Minimal value

  • max() - Maximal value

  • sum() - Sum of elements

  • len() - Length of a list

  • all() - All values are True

  • any() - Any values is True

List with numeric values:

>>> data = [3, 1, 2]
>>>
>>> len(data)
3
>>> min(data)
1
>>> max(data)
3
>>> sum(data)
6

List with string values:

>>> data = ['a', 'c', 'b']
>>>
>>> len(data)
3
>>> min(data)
'a'
>>> max(data)
'c'
>>> sum(data)
Traceback (most recent call last):
TypeError: unsupported operand type(s) for +: 'int' and 'str'

List with boolean values:

>>> data = [True, False, True]
>>>
>>> any(data)
True
>>> all(data)
False

4.2.14. Memory

../../_images/memory-list.png

Figure 4.2. Memory representation for list

4.2.15. Shallow Copy vs Deep Copy

  • Shallow Copy (by reference) - identifiers are pointing to the same object in memory

  • Deep Copy - identifiers are pointing to distinct objects

  • Shallow Copy is faster and requires less memory (no duplicated objects)

  • Deep Copy is slower and requires twice sa much memory, but is safe for modification

Shallow Copy:

>>> a = ['red', 'green', 'blue']
>>> b = a
>>>
>>> a.append('black')
>>>
>>> a
['red', 'green', 'blue', 'black']
>>> b
['red', 'green', 'blue', 'black']
>>>
>>> id(a)  
4417433984
>>> id(b)  
4417433984

Deep Copy:

>>> a = ['red', 'green', 'blue']
>>> b = a.copy()
>>>
>>> a.append('black')
>>>
>>> a
['red', 'green', 'blue', 'black']
>>> b
['red', 'green', 'blue']
>>>
>>> id(first)  
4391796976
>>> id(second)  
4391797008

4.2.16. Recap

  • Mutable - can add, remove, and modify items

  • Stores elements of any type

  • Extensible and flexible

4.2.17. References

4.2.18. Assignments

Code 4.4. Solution
"""
* Assignment: Iterable List Create
* Type: class assignment
* Complexity: easy
* Lines of code: 5 lines
* Time: 5 min

English:
    1. Create lists:
        a. `result_a` without elements
        b. `result_a` with elements: 1, 2, 3
        c. `result_b` with elements: 1.1, 2.2, 3.3
        d. `result_c` with elements: 'a', 'b', 'c'
        e. `result_d` with elements: True, False
        f. `result_e` with elements: 1, 2.2, True, 'a'
    2. Run doctests - all must succeed

Polish:
    1. Stwórz listy:
        a. `result_a` bez elementów
        b. `result_a` z elementami: 1, 2, 3
        c. `result_b` z elementami: 1.1, 2.2, 3.3
        d. `result_c` z elementami: 'a', 'b', 'c'
        e. `result_d` z elementami: True, False, True
        f. `result_e` z elementami: 1, 2.2, True, 'a'
    2. Uruchom doctesty - wszystkie muszą się powieść

Tests:
    >>> import sys; sys.tracebacklimit = 0

    >>> assert result_a is not Ellipsis, \
    'Assign your result to variable `result_a`'
    >>> assert result_b is not Ellipsis, \
    'Assign your result to variable `result_b`'
    >>> assert result_c is not Ellipsis, \
    'Assign your result to variable `result_c`'
    >>> assert result_d is not Ellipsis, \
    'Assign your result to variable `result_d`'
    >>> assert result_e is not Ellipsis, \
    'Assign your result to variable `result_e`'
    >>> assert result_f is not Ellipsis, \
    'Assign your result to variable `result_f`'

    >>> assert type(result_a) is list, \
    'Variable `result_a` has invalid type, should be list'
    >>> assert type(result_b) is list, \
    'Variable `result_b` has invalid type, should be list'
    >>> assert type(result_c) is list, \
    'Variable `result_c` has invalid type, should be list'
    >>> assert type(result_d) is list, \
    'Variable `result_d` has invalid type, should be list'
    >>> assert type(result_e) is list, \
    'Variable `result_e` has invalid type, should be list'
    >>> assert type(result_f) is list, \
    'Variable `result_f` has invalid type, should be list'

    >>> assert result_a == [], \
    'Variable `result_a` has invalid value, should be []'
    >>> assert result_b == [1, 2, 3], \
    'Variable `result_b` has invalid value, should be [1, 2, 3]'
    >>> assert result_c == [1.1, 2.2, 3.3], \
    'Variable `result_c` has invalid value, should be [1.1, 2.2, 3.3]'
    >>> assert result_d == ['a', 'b', 'c'], \
    'Variable `result_d` has invalid value, should be ["a", "b", "c"]'
    >>> assert result_e == [True, False], \
    'Variable `result_e` has invalid value, should be [True, False]'
    >>> assert result_f == [1, 2.2, True, 'a'], \
    'Variable `result_f` has invalid value, should be [1, 2.2, True, "a"]'
"""

# List without elements
# type: list
result_a = ...

# List with elements: 1, 2, 3
# type: list[int]
result_b = ...

# List with elements: 1.1, 2.2, 3.3
# type: list[float]
result_c = ...

# List with elements: 'a', 'b', 'c'
# type: list[str]
result_d = ...

# List with elements: True, False
# type: list[bool]
result_e = ...

# List with elements: 1, 2.2, True, 'a'
# type: list[int|float|bool|str]
result_f = ...

Code 4.5. Solution
"""
* Assignment: Iterable List Many
* Type: class assignment
* Complexity: easy
* Lines of code: 3 lines
* Time: 5 min

English:
    1. Create list `a` with data from row 1
    2. Create list `b` with data from row 2
    3. Create list `c` with data from row 3
    4. Rewrite data manually:
        a. Do not automate by writing code
        b. Do not use `str.split()`, `slice`, `getitem`, `for`, `while`
           or any other control-flow statement
        c. Objective is to learn the syntax, not automation
        d. Convert numerical values to float (manually)
    5. Run doctests - all must succeed

Polish:
    1. Stwórz listę `a` z danymi z wiersza 1
    2. Stwórz listę `b` z danymi z wiersza 2
    3. Stwórz listę `c` z danymi z wiersza 3
    4. Przepisz dane ręcznie:
        a. Nie automatyzuj pisząc kod
        b. Nie używaj `str.split()`, `slice`, `getitem`, `for`, `while`
           lub jakiejkolwiek innej instrukcji sterującej
        c. Celem jest nauka składni, a nie automatyzacja
        d. Przekonwertuj wartości numeryczne do float (ręcznie)
    5. Uruchom doctesty - wszystkie muszą się powieść

Tests:
    >>> import sys; sys.tracebacklimit = 0

    >>> assert a is not Ellipsis, \
    'Assign your result to variable `a`'
    >>> assert b is not Ellipsis, \
    'Assign your result to variable `b`'
    >>> assert c is not Ellipsis, \
    'Assign your result to variable `c`'
    >>> assert type(a) is list, \
    'Variable `a` has invalid type, should be list'

    >>> assert type(b) is list, \
    'Variable `b` has invalid type, should be list'
    >>> assert type(c) is list, \
    'Variable `c` has invalid type, should be list'
    >>> assert len(a) == 5, \
    'Variable `a` length should be 5'
    >>> assert len(b) == 5, \
    'Variable `b` length should be 5'
    >>> assert len(c) == 5, \
    'Variable `c` length should be 5'

    >>> assert (5.8 in a
    ...     and 2.7 in a
    ...     and 5.1 in a
    ...     and 1.9 in a
    ...     and 'virginica' in a)

    >>> assert (5.1 in b
    ...     and 3.5 in b
    ...     and 1.4 in b
    ...     and 0.2 in b
    ...     and 'setosa' in b)

    >>> assert (5.7 in c
    ...     and 2.8 in c
    ...     and 4.1 in c
    ...     and 1.3 in c
    ...     and 'versicolor' in c)
"""

DATA = [
    'sepal_length,sepal_width,petal_length,petal_width,species',
    '5.8,2.7,5.1,1.9,virginica',
    '5.1,3.5,1.4,0.2,setosa',
    '5.7,2.8,4.1,1.3,versicolor',
    '6.3,2.9,5.6,1.8,virginica',
    '6.4,3.2,4.5,1.5,versicolor',
]

# With data from row[1]: 5.8, 2.7, 5.1, 1.9 and virginica
# type: list[float|str]
a = ...

# With data from row[2]: 5.1, 3.5, 1.4, 0.2 and setosa
# type: list[float|str]
b = ...

# With data from row[3]: 5.7, 2.8, 4.1, 1.3 and versicolor
# type: list[float|str]
c = ...

Code 4.6. Solution
"""
* Assignment: Iterable List Modify
* Type: homework
* Complexity: easy
* Lines of code: 3 lines
* Time: 5 min

English:
    1. Insert at the beginning of `a` letter 'x'
    2. Append to the `b` last element popped from `a`
    3. For getting elements use `list.pop()`
    4. From list `c` using `del` delete last element
    5. Run doctests - all must succeed

Polish:
    1. Na początek `a` wstaw literę 'x'
    2. Na koniec `b` wstaw ostatni element wyciągnięty z `a`
    3. Do wyciągnięcia używaj `list.pop()`
    4. Z listy `c` za pomocą `del` usuń ostatni element
    5. Uruchom doctesty - wszystkie muszą się powieść

Tests:
    >>> import sys; sys.tracebacklimit = 0

    >>> assert a is not Ellipsis, \
    'Assign your result to variable `a`'
    >>> assert b is not Ellipsis, \
    'Assign your result to variable `b`'
    >>> assert c is not Ellipsis, \
    'Assign your result to variable `c`'
    >>> assert type(a) is list, \
    'Variable `a` has invalid type, should be list'
    >>> assert type(b) is list, \
    'Variable `b` has invalid type, should be list'
    >>> assert type(c) is list, \
    'Variable `c` has invalid type, should be list'

    >>> a
    ['x', 4.7, 3.2, 1.3, 0.2]
    >>> b
    [7.0, 3.2, 4.7, 1.4, 'versicolor', 'setosa']
    >>> c
    [7.6, 3.0, 6.6, 2.1]
"""

a = [4.7, 3.2, 1.3, 0.2, 'setosa']
b = [7.0, 3.2, 4.7, 1.4, 'versicolor']
c = [7.6, 3.0, 6.6, 2.1, 'virginica']

# insert at the beginning of `a` letter 'x'
...

# append to the `b` last element popped from `a`
# for getting elements use `list.pop()`
...

# from list `c` using `del` delete last element
...