All Articles

Understanding Attributes, Dicts and Slots in Python

Understanding Attributes in Python

Python is a very dynamic language by nature. Variables do not need to be declared and can be added as attributes almost everywhere. Consider an empty class like this:

class MyClass:
    pass

This is a complete definition of a class in Python. Granted, it does nothing, but it’s still valid.

At any later point in time, we can “patch” attributes to our class like this:

MyClass.class_attribute = 42

The class has this new class_attribute value from that point on.

If we instantiate this class with my_object = MyClass() we can verify that the class_attribute value is 42:

>>> my_object.class_attribute
42

Of course, we can add attributes to our instances as well:

>>> my_object.instance_attribute = 21
>>> my_object.instance_attribute
21

Have you ever wondered where these attributes are stored?

Explicit is better than implicit. (from the Zen of Python)

Python wouldn’t be Python without a well-defined and customizable behaviour for attributes. The attributes of a “thing” in Python are stored in a magic attribute called __dict__. We can access it like so:

class MyClass:
    class_attribute = "Class"

    def __init__(self):
        self.instance_attribute = "Instance"

my_object = MyClass()

print(my_object.__dict__)
print(MyClass.__dict__)

As you can see, the class_attribute is stored in the __dict__ of MyClass itself, whereas the instance_attribute is stored within the __dict__ of my_object.

That means, whenever you access my_object.instance_attribute Python will first look in my_object.__dict__, and then in MyClass.__dict__. If the attribute instance_attribute is found in neither dictionary, an AttributeError is raised.

Side Note

What is a “thing” in Python? You see that every “thing” in Python has a __dict__ attribute, even a class itself. Logically, a class like MyClass is of type class, meaning that the class itself is an object of type class. Since this might sound confusing, I use the colloquial term “thing” instead.

“Hacking” the __dict__ attribute

Like always in Python, the __dict__ attribute behaves like any other attribute in Python. Since Python is a language that prefers passing by reference, we can look at a bug that occurs quite frequently by accident. Consider a class AddressBook:

class AddressBook:
    addresses = []

Now, let’s create some address books and create some addresses:

alices_address_book = AddressBook()
alices_address_book.addresses.append(("Sherlock Holmes", "221B Baker St., London"))
alices_address_book.addresses.append(("Al Bundy", "9764 Jeopardy Lane, Chicago, Illinois"))


bobs_address_book = AddressBook()
bobs_address_book.addresses.append(("Bart Simpson", "742 Evergreen Terrace, Springfield, USA"))
bobs_address_book.addresses.append(("Hercule Poirot", "Apt. 56B, Whitehaven Mansions, Sandhurst Square, London W1"))

Interestingly, Alice and Bob now share one address book:

>>> alices_address_book.addresses
[('Sherlock Holmes', '221B Baker St., London'),
 ('Al Bundy', '9764 Jeopardy Lane, Chicago, Illinois'),
 ('Bart Simpson', '742 Evergreen Terrace, Springfield, USA'),
 ('Hercule Poirot', 'Apt. 56B, Whitehaven Mansions, Sandhurst Square, London W1')]
>>> bobs_address_book.addresses
[('Sherlock Holmes', '221B Baker St., London'),
 ('Al Bundy', '9764 Jeopardy Lane, Chicago, Illinois'),
 ('Bart Simpson', '742 Evergreen Terrace, Springfield, USA'),
 ('Hercule Poirot', 'Apt. 56B, Whitehaven Mansions, Sandhurst Square, London W1')]

This is because the addresses attribute is defined at the class level. The empty list is created only once (addresses = []), namely, when the Python interpreter creates the class. Thus, for any subsequent instance of the AddressBook class, the same list is referenced by addresses. We can fix this bug by moving the creation of the empty list to the instance level like so:

class AddressBook:
    def __init__(self):
        self.addresses = []

By moving the creation of the empty list to the constructor (__init__ method), a new list is created whenever a new instance of AddressBook is created. Therefore, the instances do not unintentionally share the same list anymore.

Introducing the Borg

Can we leverage this behaviour somehow intentionally? Is there a use case where we want all instances to share the same storage? Turns out there is! There is a Design Pattern called Singleton. This ensures that there is only one instance of the class during the program’s runtime. For example, it can be useful if this is used for a database connection class or a configuration store.

Note that you should use singleton classes only occasionally because they introduce some global state in your program, which makes it hard to test individual components of your program in isolation.

What would be a Pythonic way to implement a singleton-ish pattern?

Consider this class:

class Borg:
    _shared = {}
    def __init__(self):
        self.__dict__ = self._shared

This class has a _shared attribute initialized as an empty array. We know from the previous paragraphs that the dict instance is the same object for the class. Inside the constructor (__init__) then, we set the __dict__ of the instance to this shared dictionary. As a result, all dynamically added attributes are shared amongst each instance of that class.

Let’s check:

>>> borg_1 = Borg()
>>> borg_2 = Borg()
>>> 
>>> borg_1.value = 42
>>> borg_2.value 
42

Why can’t we set the __dict__ = {} to the class directly like so

class Borg:
    __dict__ = {}
>>> borg_1 = Borg()
>>> borg_2 = Borg()
>>> 
>>> borg_1.value = 42
>>> borg_2.value 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'Borg' object has no attribute 'value'

This is because in the latter case, we set the __dict__ attribute to the class itself. However, we access the attribute of the instance by typing borg_2.value. Only when the __dict__ attribute is set on the instance level we can make use of our Borg pattern. The way to achieve this is by using the constructor to change the __dict__ attribute on the instance level.

Memory usage of Attributes

Dynamically adding attributes at runtime on instance level or class level comes with a cost. The dictionary structure is quite memory intensive in Python’s internals. In situations where you instantiate a lot (thousands) of instances, this might become a bottleneck.

However, first things first: What are slots? While you can dynamically add attributes to “things” in Python, slots restrict this functionality. When you add a __slots__ attribute to a class, you pre-define which member attributes you allow. Let’s have a look:

class SlottedClass:
    __slots__ = ['value']
    def __init__(self, i):
        self.value = i

With this definition, any instance of SlottedClass can only access the attribute value. Accessing other (dynamic) attributes will raise an AttributeError:

>>> slotted = SlottedClass(42)
>>> slotted.value
42
>>> slotted.forbidden_value = 21
AttributeError: 'SlottedClass' object has no attribute 'forbidden_value'

Restricting the ability to add attributes dynamically is useful for reducing runtime errors that might occur because of typos in attribute names. Still, more importantly, this restriction will reduce the memory usage of your code – in some cases significantly. Let’s try to check this.

We create two classes, one slotted and one unslotted one. Both classes access an attribute called value inside their __init__ method, and in the case of the slotted class, that is the only attribute in __slots__.

We create a million instances for each class and store these instances in a list. After that, we look at the list’s size. The list of slotted class instances should be smaller.

import sys

class SlottedClass:
    __slots__ = ['value']
    def __init__(self, i):
        self.value = i

class UnSlottedClass:
    def __init__(self, i):
        self.value = i

slotted = []
for i in range(1_000_000):
    slotted.append(SlottedClass(i))
print(sys.getsizeof(slotted))

unslotted = []
for i in range(1_000_000):
    unslotted.append(UnSlottedClass(i))
print(sys.getsizeof(unslotted))

However, we get back a value of 8448728 for each list. So how do we save memory then using slots?

Let’s use the ipython-memory-usage module to check how much memory is consumed during the runtime of our test programme.

In [1]: def slotted_fn():
   ...:     class SlottedClass:
   ...:         __slots__ = ["value"]
   ...:
   ...:         def __init__(self, i):
   ...:             self.value = i
   ...:
   ...:     slotted = []
   ...:     for i in range(1_000_000):
   ...:         slotted.append(SlottedClass(i))
   ...:     return slotted
   ...:
   ...:
   ...: def unslotted_fn():
   ...:     class UnSlottedClass:
   ...:         def __init__(self, i):
   ...:             self.value = i
   ...:
   ...:     unslotted = []
   ...:     for i in range(1_000_000):
   ...:         unslotted.append(UnSlottedClass(i))
   ...:     return unslotted
   ...:
   ...:
   ...: import ipython_memory_usage.ipython_memory_usage as imu
   ...:
   ...: imu.start_watching_memory()
In [1] used 0.0000 MiB RAM in 0.11s, peaked 0.00 MiB above current, total RAM usage 52.48 MiB

In [2]: slotted_fn()
Out[2]: ...
In [2] used 84.9766 MiB RAM in 0.73s, peaked 0.00 MiB above current, total RAM usage 139.00 MiB

In [3]: unslotted_fn()
Out[3]: ...
In [3] used 200.1562 MiB RAM in 0.84s, peaked 0.00 MiB above current, total RAM usage 339.16 MiB

As you can see, the slotted version took only roughly 85 MiB of RAM, while the unslotted version needed more than 200 MiB, although the resulting size of the lists is the same.

The reason for this is the way Python handles dicts internally. When not specifying __slots__, Python uses a dictionary by default to store attributes. This dictionary is dynamic in nature, can be resized, needs to be organized by keys, etc. That’s why Python needs a lot of memory to manage the dictionary.

In the slotted version of the class, the key features of a dict are no longer needed as there is no dynamic resizing allowed anymore. Thus, Python allocates memory upfront for the attributes mentioned in the __slots__.