Python, in general, is a pass-by-reference language. What does that mean, and what do you need to look out for?
Pass by Reference
In Python, any variable that points to an object actually doesn’t hold a copy of its value.
Let’s see what this means.
my_pizza_toppings = your_pizza_toppings = []
my_pizza_toppings.append('Anchovies')
my_pizza_toppings.append('Olives')
your_pizza_toppings.append('Pineapple')
your_pizza_toppings.append('Ham')
I’m not judging you for ordering pineapples on pizza, but what we end up with is a pizza order probably none of us two would like to eat:
print(my_pizza_toppings)
print(your_pizza_toppings)
Interestingly, we end up having both your pizza and my pizza, topped with ['Anchovies', 'Olives', 'Pineapple', 'Ham']
, which again, probably isn’t what we wanted in the first place.
The reason for this misled pizza order is that we initially created an object (in our case, a list by using = []
). Now that we have two identical variables (my_pizza_toppings
, and your_pizza_toppings
) pointing to that one element, they end up being the same. It is as if we write on the same sheet of paper when accessing any of these variables.
How to fix that problem?
Luckily, it is straightforward to fix the problem by intentionally creating two independent lists. Think of this as a sheet of paper for each of us to write down our pizza order. Our adapted code looks like this:
my_pizza_toppings = []
your_pizza_toppings = []
my_pizza_toppings.append('Anchovies')
my_pizza_toppings.append('Olives')
your_pizza_toppings.append('Pineapple')
your_pizza_toppings.append('Ham')
print(my_pizza_toppings)
print(your_pizza_toppings)
Now, we have two distinct orders:
['Anchovies', 'Olives']
['Pineapple', 'Ham']
What to look out for?
While the above argument might not be mind-boggling and easy to understand, there are some common pitfalls I see Python developers (including myself) trap into from time to time.
The object-oriented pizza shop
Let’s say you have modelled your pizza shop with an object-oriented approach in mind. Chances are, you have a class Pizza
somewhere in your code, which might look like this:
class Pizza:
toppings = []
def __init__(self, ...):
...
...
def add_topping(self, topping):
...
self.toppings.append(topping)
...
Let’s see what happens if the two of us will order a pizza:
my_pizza = Pizza()
my_pizza.add_topping('Anchovies')
my_pizza.add_topping('Olives')
your_pizza = Pizza()
your_pizza.add_topping('Pineapple')
your_pizza.add_topping('Ham')
Interestingly, we run into the same problem:
print(my_pizza.toppings)
print(your_pizza.toppings)
Again, we two will get the same disgusting pizza: ['Anchovies', 'Olives', 'Pineapple', 'Ham']
. Why is that?
Note that we created two instances of Pizza
this time, and we didn’t assign two variables to the same object. But still, our pizzas will get seriously messed up.
The reason for this is that we created an empty list in the body of our class Pizza
. This creates, as you would expect, an empty list. But the Python interpreter creates that one only once, in particular, when the class is loaded. So, in the end, you’ll have an instance using the class
attribute toppings
. And that is the same.
How to fix that?
We could rewrite our Pizza
class like so:
class Pizza:
def __init__(self, ...):
self.toppings = []
...
def add_topping(self, topping):
...
self.toppings.append(topping)
...
We moved the initial list creation to the __init__
method of our class. What’s the difference, you may ask? Well, the __init__
method is called every time a new instance is created, and thus it will create a new instance attribute for each new object rather than each instance relying on the same class attribute.
Messing things up in functions
Let’s say, for whatever reason, you have a class which adds something to a list. For convenience, if you do not already have a list, the function will kindly create one for you. Consider this code:
def add_topping(topping_name, toppings=[]):
toppings.append(topping_name)
return toppings
First, let’s check if that function works as expected:
>>> add_topping('Anchovies')
['Anchovies']
That looks good, so then let’s go and order our two pizzas again.
my_pizza_toppings = add_topping('Anchovies')
my_pizza_toppings = add_topping('Olives', my_pizza_toppings)
your_pizza_toppings = add_topping('Pineapple')
your_pizza_toppings = add_topping('Ham', your_pizza_toppings)
Oh no! Again, my_pizza_toppings
, and your_pizza_toppings
are the same:
['Anchovies', 'Olives', 'Pineapple', 'Ham']
What happened here? Again, it looks like we have done everything correctly, but still, it all got messed up.
The reason here is the function’s definition. Just as it was the case for the class attribute in our Pizza
class, the default argument (toppings=[]
) is evaluated only once by Python, which is when the function is defined. So any call to that function which omits the default argument will return that one instance of our initially empty list.
How to fix that?
We can change the default value of the toppings
parameter to None
and check for None
inside the function. If we see a None
value, we can create the list right there.
def add_topping(topping_name, toppings=None):
if toppings is None:
toppings = []
toppings.append(topping_name)
return toppings
As opposed to the definition of the empty list in the function’s definition, this time, a new empty list gets created every time the functions is called without that optional parameter.
A brain teaser
Now that we have learned about the caveats of pass by reference issues, we can now have a look at this brain teaser I stumbled upon in the Twitter feed of Reuven lerner:
with open('some_file.txt') as f:
for one_line in f:
f = 6
print(one_line)
It looks like we overwrite the f
variable so that this little script should somehow stop in the next iteration of the loop. However, it runs just fine and prints the whole file from its first to the last line.
See Reuven’s explanation of why this is the case in the original Tweet.
Borgs, Borgs, Borgs
Sometimes, sharing the same object within your code is exactly what you want.
For example, you can create a class that behaves de facto like a singleton by using the borg pattern. It’s called borg pattern as a reference to the Borgs in Star Trek where they are linked in a hive mind called the Collective.
I explained the Borg pattern in detail on my blog here.
In this article, however, we have learned how to avoid such pitfalls when we do not want to share information in presumably different variables. I sometimes refer to the examples in this article as involuntary borgs.