In the previous article in this
series
we looked at a seemingly simple class graph with some surprising
behavior. The central mystery was how a class with two bases can seem to
invoke two different method implementations with just a single
invocation of super(). In order to understand how that works, we
need to delve into the details of how super() works, and this
involves understanding some design details of the Python language
itself.
Method Resolution Order
The first detail we need to understand is the notion of method
resolution order or simply MRO. Put simply a method resolution order is
the ordering of an inheritance graph for the purposes of deciding which
implementation to use when a method is invoked on an object. Let's look
at that definition a bit more closely.
First, we said that an MRO is an "ordering of an inheritance graph".
Consider a simple diamond class structure like this:
>>> class A: pass
...
>>> class B(A): pass
...
>>> class C(A): pass
...
>>> class D(B, C): pass
...
The MRO for these classes could be, in principle, any ordering of the
classes A, B, C, and D (and object, the ultimate
base class of all classes in Python.) Python, of course, doesn't just
pick the order randomly, and we'll cover how it picks the order in a
later section. For now, let's examine the MROs for our classes using the
mro() class method:
>>> A.mro()
[<class '__main__.A'>,
<class 'object'>]
>>> B.mro()
[<class '__main__.B'>,
<class '__main__.A'>,
<class 'object'>]
>>> C.mro()
[<class '__main__.C'>,
<class '__main__.A'>,
<class 'object'>]
>>> D.mro()
[<class '__main__.D'>,
<class '__main__.B'>,
<class '__main__.C'>,
<class '__main__.A'>,
<class 'object'>]
We can see that all of our classes have an MRO. But what is it used for?
The second half of our definition said "for the purposes of deciding
which implementation to use when a method is invoked on an object". What
this means is that Python looks at a class's MRO when a method is
invoked on an instance of that class. Starting at the head of the MRO,
Python examines each class in order looking for the first one which
implements the invoked method. That implementation is the one that gets
used.
For example, let's augment our simple example with a method implemented
in multiple locations:
>>> class A:
... def foo(self):
... print('A.foo')
...
>>> class B(A):
... def foo(self):
... print('B.foo')
...
>>> class C(A):
... def foo(self):
... print('C.foo')
...
>>> class D(B, C):
... pass
...
What will happen if we invoke foo() on an instance of D?
Remember that the MRO of D was [D, B, C, A, object]. Since the
first class in that sequence to support foo() is B, we would
expect to see "B.foo" printed, and indeed that is exactly what happens:
What if remove the implementation in B? We would expect to see "C.foo",
which again is what happens:
>>> class A:
... def foo(self):
... print('A.foo')
...
>>> class B(A):
... pass
...
>>> class C(A):
... def foo(self):
... print('C.foo')
...
>>> class D(B, C):
... pass
...
>>> D().foo()
C.foo
To reiterate, method resolution order is nothing more than some ordering
of the inheritance graph that Python uses to find method
implementations. It's a relatively simple concept, but it's one that
many developers understand only intuitively and partially. But how does
Python calculate an MRO? We hinted earlier – and you probably suspected
– that it's not just any random ordering, and in the next section we'll
look at precisely how Python does this.
C3 superclass linearization
The short answer to the question of how Python determines MRO is "C3 superclass
linearization", or simply C3. C3 is an algorithm initially developed for the
Dylan programming language , and it has since been adopted by several
prominent programming languages including Perl, Parrot, and of course Python.
We won't go into great detail on how C3 works, though there is plenty of
information on the web that can tell you everything you need to know.
What's important to know about C3 is that it guarantees three important
features:
- Subclasses appear before base classes
- Base class declaration order is preserved
- For all classes in an inheritance graph, the relative orderings
guaranteed by 1 and 2 are preserved at all points in the graph.
In other words, by rule 1, you will never see an MRO where a class is
preceded by one of its base classes. If you have this:
>>> class Foo(Fred, Jim, Shiela):
... pass
...
you will never see an MRO where Foo comes after Fred, Jim,
or Shiela. This, again, is because Fred, Jim, and Shiela
are all base classes of Foo, and C3 puts base classes after
subclasses.
Likewise, by rule 2, you will never see an MRO where the base classes
specified to the class keyword are in a different relative order
than that definition. Given the same code above, this means that you
will never see and MRO with Fred after either Jim or Shiela.
Nor will you see an MRO with Jim after Shiela. This is because
the base class declaration order is preserved by C3.
The third constraint guaranteed by C3 simply means that the relative
orderings determined by one class in an inheritance graph – i.e. the
ordering constraints based on one class's base class declarations – will
not be violated in any MRO for any class in that graph.
C3 limits your inheritance options
One interesting side-effect of the use of C3 is that not all inheritance
graphs are legal. It's possible to construct inheritance graphs which
make it impossible to meet all of the constraints of C3. When this
happens, Python raises an exception and prevents the creation of the
invalid class:
>>> class A:
... pass
...
>>> class B(A):
... pass
...
>>> class C(A):
... pass
...
>>> class D(B, A, C):
... pass
...
Traceback (most recent call last):
File "<input>", line 1, in <module>
TypeError: Cannot create a consistent method resolution
order (MRO) for bases A, C
In this case, we've asked for D to inherit from B, A, and
C, in that order. Unfortunately, C3 wants to enforce two
incompatible constraints in this case:
- It wants to put C before A because A is a base class of
C
- It wants to put A before C because of D's base class
ordering
Since these are obviously mutually exclusive states, C3 rejects the
inheritance graph and Python raises a TypeError.
That's about it, really. These rules provide a consistent, predictable
basis for calculating MROs. Understanding C3, or even just knowing that
it exists, is perhaps not important for day-to-day Python development,
but it's an interesting tidbit for those interested in the details of
language design.
Super proxies
The third detail we need to understand in order to resolve our mystery
is the notion of a "super proxy". When you invoke super() in Python
, what actually happens is that you construct an object of type
super. In other words, super is a class, not a keyword or some
other construct. You can see this in a REPL:
>>> s = super(C)
>>> type(s)
<class 'super'>
>>> dir(s)
['__class__', '__delattr__', '__dir__',
'__doc__', '__eq__', '__format__', '__ge__',
'__get__', '__getattribute__', '__gt__',
'__hash__', '__init__', '__le__', '__lt__',
'__ne__', '__new__', '__reduce__',
'__reduce_ex__', '__repr__', '__self__',
'__self_class__', '__setattr__', '__sizeof__',
'__str__', '__subclasshook__', '__thisclass__']
In most cases, super objects have two important pieces of
information: an MRO and a class in that MRO. When you use an invocation like this:
then the MRO is that of the type of obj, and the class within that
MRO is a_type.
Likewise, when you use an invocation like this:
the MRO is that of type2 and the class within that MRO is type1.
Given that, what exactly does super() do? It's hard to put it in a
succinct, pithy, or memorable form, but here's my best attempt so far.
Given a method resolution order and a class C in that MRO,
super() gives you an object which resolves methods using only the
part of the MRO which comes after C.
In other words, rather than resolving method invocation using the full
MRO like normal, super uses only the tail of an MRO. In all other
ways, though, method resolution is occurring exactly as it normally
would.
For example, suppose I have an MRO like this:
and further suppose that I have a super object using this MRO and
the class C in this MRO. In that case, the super instance would
only resolve to methods implemented in D, E, or object (in
that order.) In other words, a call like this:
would only resolve to an implementation of foo() in D or E.
Why the name super-proxy
You might wonder why we've been using the name super-proxy when
discussing super instances. The reason is that instances of
super are designed to respond to any method name, resolving the
actual method implementation based on their MRO and class configuration.
In this way, super instances act as proxies for all objects. They
simply pass arguments through to an underlying implementation.
The mystery is almost resolved!
We now know everything we need to know to resolve the mystery described
in the first article in this
series.
You can (and probably should) see if you can figure it out for yourself
at this point. By applying the concepts we discussed in this article -
method resolution order, C3, and super proxies - you should be able to
see how SortedIntList is able to enforce the constraints of
IntList and SortedList even though it only makes a single call
to super().
If you'd rather wait, though, the third article in this series will lay
it all out for you. Stay tuned!