Understanding Attributes in Python
Python is a very dynamic language by nature. Variables do not need to be declared and can be added as attributes almost everywhere. Consider an empty class like this:
class MyClass:
pass
This is a complete definition of a class in Python. Granted, it does nothing, but it’s still valid.
At any later point in time, we can “patch” attributes to our class like this:
MyClass.class_attribute = 42
The class has this new class_attribute
value from that point on.
If we instantiate this class with my_object = MyClass()
we can verify that the class_attribute
value is 42:
>>> my_object.class_attribute
42
Of course, we can add attributes to our instances as well:
>>> my_object.instance_attribute = 21
>>> my_object.instance_attribute
21
Have you ever wondered where these attributes are stored?
Explicit is better than implicit. (from the Zen of Python)
Python wouldn’t be Python without a well-defined and customizable behaviour for attributes. The attributes of a “thing” in Python are stored in a magic attribute called __dict__
. We can access it like so:
class MyClass:
class_attribute = "Class"
def __init__(self):
self.instance_attribute = "Instance"
my_object = MyClass()
print(my_object.__dict__)
print(MyClass.__dict__)
As you can see, the class_attribute
is stored in the __dict__
of MyClass
itself, whereas the instance_attribute
is stored within the __dict__
of my_object
.
That means, whenever you access my_object.instance_attribute
Python will first look in my_object.__dict__
, and then in MyClass.__dict__
. If the attribute instance_attribute
is found in neither dictionary, an AttributeError
is raised.
Side Note
What is a “thing” in Python? You see that every “thing” in Python has a __dict__
attribute, even a class itself. Logically, a class like MyClass
is of type class
, meaning that the class itself is an object of type class
. Since this might sound confusing, I use the colloquial term “thing” instead.
“Hacking” the __dict__
attribute
Like always in Python, the __dict__
attribute behaves like any other attribute in Python. Since Python is a language that prefers passing by reference, we can look at a bug that occurs quite frequently by accident. Consider a class AddressBook
:
class AddressBook:
addresses = []
Now, let’s create some address books and create some addresses:
alices_address_book = AddressBook()
alices_address_book.addresses.append(("Sherlock Holmes", "221B Baker St., London"))
alices_address_book.addresses.append(("Al Bundy", "9764 Jeopardy Lane, Chicago, Illinois"))
bobs_address_book = AddressBook()
bobs_address_book.addresses.append(("Bart Simpson", "742 Evergreen Terrace, Springfield, USA"))
bobs_address_book.addresses.append(("Hercule Poirot", "Apt. 56B, Whitehaven Mansions, Sandhurst Square, London W1"))
Interestingly, Alice and Bob now share one address book:
>>> alices_address_book.addresses
[('Sherlock Holmes', '221B Baker St., London'),
('Al Bundy', '9764 Jeopardy Lane, Chicago, Illinois'),
('Bart Simpson', '742 Evergreen Terrace, Springfield, USA'),
('Hercule Poirot', 'Apt. 56B, Whitehaven Mansions, Sandhurst Square, London W1')]
>>> bobs_address_book.addresses
[('Sherlock Holmes', '221B Baker St., London'),
('Al Bundy', '9764 Jeopardy Lane, Chicago, Illinois'),
('Bart Simpson', '742 Evergreen Terrace, Springfield, USA'),
('Hercule Poirot', 'Apt. 56B, Whitehaven Mansions, Sandhurst Square, London W1')]
This is because the addresses
attribute is defined at the class
level. The empty list is created only once (addresses = []
), namely, when the Python interpreter creates the class. Thus, for any subsequent instance of the AddressBook
class, the same list
is referenced by addresses
. We can fix this bug by moving the creation of the empty list
to the instance level like so:
class AddressBook:
def __init__(self):
self.addresses = []
By moving the creation of the empty list to the constructor (__init__
method), a new list is created whenever a new instance of AddressBook
is created. Therefore, the instances do not unintentionally share the same list
anymore.
Introducing the Borg
Can we leverage this behaviour somehow intentionally? Is there a use case where we want all instances to share the same storage? Turns out there is! There is a Design Pattern called Singleton. This ensures that there is only one instance of the class during the program’s runtime. For example, it can be useful if this is used for a database connection class or a configuration store.
Note that you should use singleton classes only occasionally because they introduce some global state in your program, which makes it hard to test individual components of your program in isolation.
What would be a Pythonic way to implement a singleton-ish pattern?
Consider this class:
class Borg:
_shared = {}
def __init__(self):
self.__dict__ = self._shared
This class has a _shared
attribute initialized as an empty array. We know from the previous paragraphs that the dict
instance is the same object for the class.
Inside the constructor (__init__
) then, we set the __dict__
of the instance to this shared dictionary. As a result, all dynamically added attributes are shared amongst each instance of that class.
Let’s check:
>>> borg_1 = Borg()
>>> borg_2 = Borg()
>>>
>>> borg_1.value = 42
>>> borg_2.value
42
Why can’t we set the __dict__ = {}
to the class directly like so
class Borg:
__dict__ = {}
>>> borg_1 = Borg()
>>> borg_2 = Borg()
>>>
>>> borg_1.value = 42
>>> borg_2.value
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'Borg' object has no attribute 'value'
This is because in the latter case, we set the __dict__
attribute to the class itself. However, we access the attribute of the instance by typing borg_2.value
. Only when the __dict__
attribute is set on the instance level we can make use of our Borg pattern. The way to achieve this is by using the constructor to change the __dict__
attribute on the instance level.
Memory usage of Attributes
Dynamically adding attributes at runtime on instance level or class level comes with a cost. The dictionary structure is quite memory intensive in Python’s internals. In situations where you instantiate a lot (thousands) of instances, this might become a bottleneck.
However, first things first: What are slots? While you can dynamically add attributes to “things” in Python, slots restrict this functionality. When you add a __slots__
attribute to a class
, you pre-define which member attributes you allow. Let’s have a look:
class SlottedClass:
__slots__ = ['value']
def __init__(self, i):
self.value = i
With this definition, any instance of SlottedClass
can only access the attribute value
. Accessing other (dynamic) attributes will raise an AttributeError
:
>>> slotted = SlottedClass(42)
>>> slotted.value
42
>>> slotted.forbidden_value = 21
AttributeError: 'SlottedClass' object has no attribute 'forbidden_value'
Restricting the ability to add attributes dynamically is useful for reducing runtime errors that might occur because of typos in attribute names. Still, more importantly, this restriction will reduce the memory usage of your code – in some cases significantly. Let’s try to check this.
We create two classes, one slotted and one unslotted one. Both classes access an attribute called value
inside their __init__
method, and in the case of the slotted class, that is the only attribute in __slots__
.
We create a million instances for each class and store these instances in a list. After that, we look at the list’s size. The list of slotted class instances should be smaller.
import sys
class SlottedClass:
__slots__ = ['value']
def __init__(self, i):
self.value = i
class UnSlottedClass:
def __init__(self, i):
self.value = i
slotted = []
for i in range(1_000_000):
slotted.append(SlottedClass(i))
print(sys.getsizeof(slotted))
unslotted = []
for i in range(1_000_000):
unslotted.append(UnSlottedClass(i))
print(sys.getsizeof(unslotted))
However, we get back a value of 8448728
for each list. So how do we save memory then using slots?
Let’s use the ipython-memory-usage module to check how much memory is consumed during the runtime of our test programme.
In [1]: def slotted_fn():
...: class SlottedClass:
...: __slots__ = ["value"]
...:
...: def __init__(self, i):
...: self.value = i
...:
...: slotted = []
...: for i in range(1_000_000):
...: slotted.append(SlottedClass(i))
...: return slotted
...:
...:
...: def unslotted_fn():
...: class UnSlottedClass:
...: def __init__(self, i):
...: self.value = i
...:
...: unslotted = []
...: for i in range(1_000_000):
...: unslotted.append(UnSlottedClass(i))
...: return unslotted
...:
...:
...: import ipython_memory_usage.ipython_memory_usage as imu
...:
...: imu.start_watching_memory()
In [1] used 0.0000 MiB RAM in 0.11s, peaked 0.00 MiB above current, total RAM usage 52.48 MiB
In [2]: slotted_fn()
Out[2]: ...
In [2] used 84.9766 MiB RAM in 0.73s, peaked 0.00 MiB above current, total RAM usage 139.00 MiB
In [3]: unslotted_fn()
Out[3]: ...
In [3] used 200.1562 MiB RAM in 0.84s, peaked 0.00 MiB above current, total RAM usage 339.16 MiB
As you can see, the slotted version took only roughly 85 MiB of RAM, while the unslotted version needed more than 200 MiB, although the resulting size of the lists is the same.
The reason for this is the way Python handles dict
s internally. When not specifying __slots__
, Python uses a dictionary by default to store attributes. This dictionary is dynamic in nature, can be resized, needs to be organized by keys, etc. That’s why Python needs a lot of memory to manage the dictionary.
In the slotted version of the class, the key features of a dict
are no longer needed as there is no dynamic resizing allowed anymore. Thus, Python allocates memory upfront for the attributes mentioned in the __slots__
.