Unexpected side effects in Python classes
http://nvie.com/posts/unexpected-side-effects-in-python-classes/
Published: March 03, 2010
Today, I lost several hours while debugging a language implementation detail in Python that I did not know of and that really feels counterintuitive and dangerous to me.
I was writing unit tests for a Python class that I was implementing, when one of the tests that had repeatedly been passing suddenly failed. Moreover, the failing test case was really for testing some completely unrelated piece of functionaly. This simply could not be broken!
After at least an hour of scrutinizing the code, I was able to distill the real problem, which I think is summarized here in the most compact way:
class Foo:
x = {}
def __init__(self, id):
self.x[id] = id
f1 = Foo(5)
print f1.x # {5:5}, as expected
f2 = Foo(6)
print f2.x # {5:5,6:6} ?!
Creating a simple Foo instance twice exposes the ugly side effect: the
second Foo instance has an already initialized x instance variable when
the constructor enters! Yuck! Moreover, now, too:
print f1.x # {5:5,6:6}, too!
Apparently, the x “instance variable” is a shared object, much like a global
or class variable.
To be even more confusing, this doesn’t seem to hold for basic data types. For example, change the dictionary to an integer, and the example behaves as expected:
class Foo:
x = 3
def __init__(self, id):
self.x = id
f1 = Foo(5)
print f1.x # 5, as expected
f2 = Foo(6)
print f2.x # 6, as expected
The behaviour demystified
The real confusion here is that I was thinking that I was creating “instance variables”, like you would in C++ or Java. As the Python documentation mentions:
“data attributes correspond to […] to data members in C++. Data attributes need not be declared; like local variables, they spring into existence when they are first assigned to.”
Yes, I knew that, but nonetheless my real-world class is much bigger than Foo
and I wanted an explicit overview on which instance variables are in this
class. Hence the data member.
However, this is not how the Python interpreter processes Python code. In fact,
upon class definition, the statement x = {} is executed within the scope of
the newly defined class. To prove this:
class Bar:
x = {}
print Bar.x # {}
Even without a constructor or instance variable, we can access the data member
x. Of course. Now this suddenly seems obvious.
But what about our instance variables? Apparently, when we create a new
instance of Bar, the instance data member x is initially pointing to the
same object as the class data member x. The following example proves this:
class Foo:
x = {}
def __init__(self, id):
self.x = { id: id }
f1 = Foo(5)
print f1.x # { 5:5 }, as expected
f2 = Foo(6)
print f2.x # { 6:6 }, as expected
print Foo.x # {}, didn't intend this!
This example also demonstrates the subtlety of the accidentally discovered
side-effect. Remember how we were changing the dictionary in our initial
example? self.x[id] = id
The instance data member was pointing to the same object as the class data
member. By updating the dictionary, the single dictionary object was changed,
causing unwanted side effects in other class instances.
In the listing above, x is forced to point to a new dictionary by the
assignment self.x = { id:id }. In other words, x points to a new object!
This also perfectly explains why the integer example worked—it’s the same kind
of assignment.
Conclusion
To summarize, I learned some important lessons today:
- All the time, I have been creating class data members in all my classes, without knowing this.
- I initialized those members to default values, effectively creating useless objects that are never accessed and just claiming memory.
- Although it can be explained, a seemingly innocent statement like
x = {}can have very ugly side effects. Be warned! - Never underestimate the power of unit tests. It is absolutely worth the investment.