Unexpected side effects in Python classes

Today, I lost several hours while debugging a language implementation detail in Python that I did not know of and that really feels counterintuitive and dangerous to me.

I was writing unit tests for a Python class that I was implementing, when one of the tests that had repeatedly been passing suddenly failed. Moreover, the failing test case was really for testing some completely unrelated piece of functionaly. This simply could not be broken!

After at least an hour of scrutinizing the code, I was able to distill the real problem, which I think is summarized here in the most compact way:

class Foo:
  x = {}
  
  def __init__(self, id):
    self.x[id] = id

f1 = Foo(5)
print f1.x     # {5:5}, as expected
f2 = Foo(6)
print f2.x     # {5:5,6:6} ?!

Creating a simple Foo instance twice exposes the ugly side effect: the second Foo instance has an already initialized x instance variable when the constructor enters! Yuck! Moreover, now, too:

print f1.x     # {5:5,6:6}, too!

Apparently, the x “instance variable” is a shared object, much like a global or class variable.

To be even more confusing, this doesn’t seem to hold for basic data types. For example, change the dictionary to an integer, and the example behaves as expected:

class Foo:
  x = 3

  def __init__(self, id):
    self.x = id

f1 = Foo(5)
print f1.x     # 5, as expected
f2 = Foo(6)
print f2.x     # 6, as expected

The behaviour demystified

The real confusion here is that I was thinking that I was creating “instance variables”, like you would in C++ or Java. As the Python documentation mentions:

“data attributes correspond to […] to data members in C++. Data attributes need not be declared; like local variables, they spring into existence when they are first assigned to.”

Yes, I knew that, but nonetheless my real-world class is much bigger than Foo and I wanted an explicit overview on which instance variables are in this class. Hence the data member.

However, this is not how the Python interpreter processes Python code. In fact, upon class definition, the statement x = {} is executed within the scope of the newly defined class. To prove this:

class Bar:
  x = {}

print Bar.x     # {}

Even without a constructor or instance variable, we can access the data member x. Of course. Now this suddenly seems obvious.

But what about our instance variables? Apparently, when we create a new instance of Bar, the instance data member x is initially pointing to the same object as the class data member x. The following example proves this:

class Foo:
  x = {}
  
  def __init__(self, id):
    self.x = { id: id }

f1 = Foo(5)
print f1.x     # { 5:5 }, as expected
f2 = Foo(6)
print f2.x     # { 6:6 }, as expected
print Foo.x    # {}, didn't intend this!

This example also demonstrates the subtlety of the accidentally discovered side-effect. Remember how we were changing the dictionary in our initial example? self.x[id] = id
The instance data member was pointing to the same object as the class data member. By updating the dictionary, the single dictionary object was changed, causing unwanted side effects in other class instances.

In the listing above, x is forced to point to a new dictionary by the assignment self.x = { id:id }. In other words, x points to a new object! This also perfectly explains why the integer example worked—it’s the same kind of assignment.

Conclusion

To summarize, I learned some important lessons today:

If you want to get in touch, I'm @nvie on Twitter.