I know that default arguments are created at the function initialisation time and not every time the function is called. See the following code:
def ook (item, lst=[]):
lst.append(item)
print 'ook', lst
def eek (item, lst=None):
if lst is None: lst = []
lst.append(item)
print 'eek', lst
max = 3
for x in xrange(max):
ook(x)
for x in xrange(max):
eek(x)
What I do not get is why this was implemented this way. What benefits does this behaviour offers over an initialisation at each call time?
2
I think the reason is implementation simplicity. Let me elaborate.
The default value of the function is an expression that you need to evaluate. In your case it is a simple expression that does not depend on the closure, but it can be something that contains free variables – def ook(item, lst = something.defaultList())
. If you are to design Python, you will have a choice – do you evaluate it once when the function is defined or every time when the function is called. Python chooses the first (unlike Ruby, which goes with the second option).
There are some benefits for this.
First, you get some speed and memory boosts. In most cases you will have immutable default arguments and Python can construct them just once, instead of on every function call. This saves (some) memory and time. Of course, it doesn’t work quite well with mutable values, but you know how you can go around.
Another benefit is the simplicity. It’s quite easy to understand how the expression is evaluated – it uses the lexical scope when the function is defined. If they went the other way, the lexical scope might change between the definition and the invocation and make it a bit harder to debug. Python goes a long way to be extremely straightforward in those cases.
4
One way to put it is that the lst.append(item)
doesn’t mutate the lst
parameter. lst
still references the same list. It’s just that the content of that list has been mutated.
Basically, Python doesn’t have (that I recall) any constant or immutable variables at all – but it does have some constant, immutable types. You can’t modify an integer value, you can only replace it. But you can modify the content of a list without replacing it.
Like an integer, you can’t modify a reference, you can only replace it. But you can modify the content of the object being referenced.
As for creating the default object once, I imagine that’s mostly as an optimisation, to save on object-creation and garbage collection overheads.
2
What benefits does this behaviour offers over an initialisation at each call time?
It lets you select the behavior you want, as you demonstrated in your example. So if you want that the default argument is immutable, you use an immutable value, such as None
or 1
. If you want to make the default argument mutable, you use something mutable, such as []
. It’s just flexibility, albeit admittedly, it can bite if you don’t know it.
I think the real answer is: Python was written as a procedural language and only adopted functional aspects after-the-fact. What you’re looking for is for parameter defaulting to be done as a closure, and closures in Python are really only half-baked. For evidence of this try:
a = []
for i in range(3):
a.append(lambda: i)
print [ f() for f in a ]
which gives [2, 2, 2]
where you would expect a true closure to produce [0, 1, 2]
.
There are quite a lot of things that I’d like if Python had the ability to wrap parameter defaulting in closures. For example:
def foo(a, b=a.b):
...
Would be extremely handy, but “a” isn’t in scope at function definition time, so you can’t do that and instead have to do the clunky:
def foo(a, b=None):
if b is None:
b = a.b
Which is almost the same thing… almost.
1
A huge benefit is memoization. This is a standard example:
def fibmem(a, cache={0:1,1:1}):
if a in cache: return cache[a]
res = fib(a-1, cache) + fib(a-2, cache)
cache[a] = res
return res
and for comparison:
def fib(a):
if a == 0 or a == 1: return 1
return fib(a-1) + fib(a-2)
Time measurements in ipython:
In [43]: %time print(fibmem(33))
5702887
CPU times: user 0 ns, sys: 0 ns, total: 0 ns
Wall time: 200 µs
In [43]: %time print(fib(33))
5702887
CPU times: user 1.44 s, sys: 15.6 ms, total: 1.45 s
Wall time: 1.43 s
It happens because compilation in Python is performed by executing the descriptive code.
If one said
def f(x = {}):
....
it would be pretty clear that you wanted a new array each time.
But what if I say:
list_of_all = {}
def create(stuff, x = list_of_all):
...
Here I would guess I want to create stuff into various lists, and have a single global catch-all when I do not specify a list.
But how would the compiler guess this? So why try? We could rely on whether this was named or not, and it might help sometimes, but really it would just be guessing. At the same time, there is a good reason not to try — consistency.
As it is, Python just executes the code. The variable list_of_all is assigned an object already, so that object is passed by reference into the code that defaults x in the same way that a call to any function would get a reference to a local object named here.
If we wanted to distinguish the unnamed from the named case, that would involve the code at compilation executing assignment in a significantly different way than it is executed at run-time. So we do not make the special case.
This happens because functions in Python are first-class objects:
Default parameter values are evaluated when the function definition is
executed. This means that the expression is evaluated once, when the
function is defined, and that the same “pre-computed” value is used
for each call.
It goes on to explain that editing the parameter value modifies the default value for subsequent calls, and that a simple solution of using None as the default, with an explicit test in the function body, is all that is needed to ensure no surprises.
Which means that def foo(l=[])
becomes an instance of that function when called, and gets reused for further calls. Think of function parameters as becoming apart of an object’s attributes.
Pro’s could include leveraging this to have classes have C-like static variables. So it’s best to declare default values None and initialize them as needed:
class Foo(object):
def bar(self, l=None):
if not l:
l = []
l.append(5)
return l
f = Foo()
print(f.bar())
print(f.bar())
g = Foo()
print(g.bar())
print(g.bar())
yields:
[5] [5] [5] [5]
instead of the unexpected:
[5] [5, 5] [5, 5, 5] [5, 5, 5, 5]
4