Discussion:
memory leak with large list??
Tim Peters
2003-01-30 03:00:43 UTC
Permalink
[Stanley]
...
1. Since several people suggested it, I've been looking at Numeric
python, which looks really great. The question is, are there any
disadvantages to using Numeric over using the array module. It seems
that array is just a poor cousin of Numeric. So I assume that either
the array module has some advantage (speed or memory wise) over
Numeric, or it exists simply because Numeric is not a part of the base
python distribution. So is there any reason not to use Numeric??
The array module existed long before Numeric did, and I doubt their
audiences even overlap. "array" is just a storage-efficiency gimmick, while
Numeric aims at fast numeric computation. Computation with array.array
objects is usually *slower* than using Python lists, since the raw bytes in
an array.array have to be wrapped in a Python object every time they're
sucked out, and the wrapper stripped again whenever they're stored.

The only disadvantage to using Numeric would be if you wanted to distribute
your app, and to people who balked at installing Numeric. It may also cut
back on the number of people willing to help you develop you're app, if it's
Open Source, simply because many Python users don't know anything about
Numeric.
2. Is there any nice way to profile memory in python?
No. Python isn't the operating system, and only the OS knows "for real" how
the OS passes out memory. Python is several layers removed from those
truths.
Stanley
2003-01-29 17:12:25 UTC
Permalink
Thanks for all the good pointers. I did try using the array module,
initializing with the 'd' type, and based on my 'ps' measurements, it
was taking up just about twice as much memory as a list with the same
number of elements, which made me think that the array module wasn't
what I understood it to be. In retrospect, I relize that I was
comparing the size of a list of 12e6 pointers to 0.0 to an array of 8
byte floats, which I guess is like comparing apples to apple seeds.


This brings up two questions for me:

1. Since several people suggested it, I've been looking at Numeric
python, which looks really great. The question is, are there any
disadvantages to using Numeric over using the array module. It seems
that array is just a poor cousin of Numeric. So I assume that either
the array module has some advantage (speed or memory wise) over
Numeric, or it exists simply because Numeric is not a part of the base
python distribution. So is there any reason not to use Numeric??

2. Is there any nice way to profile memory in python? Using ps seems
like a very crude tool, but the gc module only seems to help with
leaks, and the profiler doesn't seem to deal with memory.


I do understand that its easiest to buy more memory, but I'm
eventually going to want to have several data structures with 12e6
floats (and possibly a larger number), it seems to make sense to try
to economize a bit, before I run out of addressable memory.
Stanley
2003-01-24 22:35:55 UTC
Permalink
I should start out by saying that I am using the linux command 'ps' to
check memory usage, and I am not entirely sure of the implications of
this.
command_text = 'ps -o pmem,rss,rssize,sz,vsz ' + str(os.getpid())
os.system(command_text)
First I check the memory footprint of the process.
%MEM RSS RSS SZ VSZ
0.3 1824 1824 734 2936

Then I generate a large list (12 million floats), either using map, a
list comprehension, or preallocating a list, then filling it using a
for loop.
big_list = [float(index) for index in xrange(12000000)]
or
big_list = map(float, xrange(12000000))
or
big_list = [0.0] * 12000000
... big_list[index] = float(index)


When I check the memory footprint again, it is larger than I would
expect, but I'm not sure what the per element overhead of a list is.

%MEM RSS RSS SZ VSZ
46.2 237648 237648 59690 238760

So then delete it.
del(big_list)
But the memory footprint of the process remains very large:

%MEM RSS RSS SZ VSZ
37.1 190772 190772 47971 191884


Is python leaking memory, or is there something else going on here.
big_list = [0.0] * 12000000
and then delete it, I don't see the leak, but then memory footprint is
smaller than you would expect
from 12 million floats:

%MEM RSS RSS SZ VSZ
9.4 48700 48700 12453 49812

So I think python is using a shortcut in this case.


I am using Python 2.2.1 on Debian 3.0, kernel 2.4.17-686.

Thanks for the help!
Skip Montanaro
2003-01-24 23:33:28 UTC
Permalink
Post by Stanley
%MEM RSS RSS SZ VSZ
46.2 237648 237648 59690 238760
So then delete it.
del(big_list)
%MEM RSS RSS SZ VSZ
37.1 190772 190772 47971 191884
Is python leaking memory, or is there something else going on here.
No, this is your platform's malloc() library not returning memory to the
operating system.
Post by Stanley
big_list = [0.0] * 12000000
and then delete it, I don't see the leak, but then memory footprint is
smaller than you would expect
%MEM RSS RSS SZ VSZ
9.4 48700 48700 12453 49812
So I think python is using a shortcut in this case.
Yes. a float for 0.0 is allocated once, then referred to from all those
Post by Stanley
list1 = [0.0] * 5
list2 = [float(0) for f in xrange(5)]
list1
[0.0, 0.0, 0.0, 0.0, 0.0]
Post by Stanley
list2
[0.0, 0.0, 0.0, 0.0, 0.0]
Post by Stanley
list1 == list2
True
Post by Stanley
map(id, list1)
[6355924, 6355924, 6355924, 6355924, 6355924]
Post by Stanley
map(id, list2)
[6355860, 6355908, 6355892, 6355876, 6355844]

Skip
Terry Reedy
2003-01-25 00:10:25 UTC
Permalink
Post by Stanley
Then I generate a large list (12 million floats), either using map, a
list comprehension, or preallocating a list, then filling it using a
for loop.
12 million different floats use 12 million * sizeof float object
bytes. Float object is at least 8 bytes for float + object overhead
(about 4 bytes?) list has array of 12 millions pointers (4 (or
possibly 8) bytes). Thus at least be 16 * 12 million, roughly 200
million bytes, regardless of method.

...
Post by Stanley
When I check the memory footprint again, it is larger than I would
expect, but I'm not sure what the per element overhead of a list is.
%MEM RSS RSS SZ VSZ
46.2 237648 237648 59690 238760
If 237648 is # of Kbytes, it is roughly what should expect.
Post by Stanley
So then delete it.
del(big_list)
%MEM RSS RSS SZ VSZ
37.1 190772 190772 47971 191884
Deleting frees mem for reuse by Python but does not necessarily return
memory to system for reuse by other processes. (Exact behavior is
*very* system specific, according to Tim Peters' war stories.)
Post by Stanley
Is python leaking memory, or is there something else going on here.
Pretty sure no, yes.
Post by Stanley
big_list = [0.0] * 12000000
(I presume you mean, by preallocate, that you subsequently overwrote
with floats as before.) This is 12 million duplicate pointers to
*one* object. 48 megabyte memory for array is allocated just once.
Others methods start with small list (perhaps with room for 8
pointers) and reallocate and copy as list grows. This can chop up
heap. Preallocation is much better for large arrays.
Post by Stanley
and then delete it, I don't see the leak, but then memory footprint is
smaller than you would expect
%MEM RSS RSS SZ VSZ
9.4 48700 48700 12453 49812
So I think python is using a shortcut in this case.
I am using Python 2.2.1 on Debian 3.0, kernel 2.4.17-686.
Maybe someone else is familiar with this particular system.

Terry J. Reedy

Bengt Richter
2003-01-26 00:29:22 UTC
Permalink
[someone]
Post by Stanley
Then I generate a large list (12 million floats), either using map,
a list comprehension, or preallocating a list, then filling it using a
for loop.
[Terry Reedy]
12 million different floats use 12 million * sizeof float object
bytes. Float object is at least 8 bytes for float + object overhead
(about 4 bytes?)
Each object has at least a pointer to the object's type object, and a
refcount, for at least 8 bytes of overhead. So a float object consumes at
least 16 bytes.
Why do you need to duplicate the type pointer for all those? I.e., if
you allocated space in decent-size arrays of object representations without
type pointer, and just reserved a header slot in front of the array, and
allocated the whole so array headers are aligned on, say, 128k addresses,
then you could look up the same type info with just an address mask and gain
4 bytes per float. Ditto for any homogeneous allocation arenas. IWT you
could allocate big blocks from a memory mapped file area if there's no other
way to get aligned large blocks. Or you could allocate really large virtual
space and sacrifice space at the front if necessary to start arena allocations
aligned as desired. Or if some platform is stubborn, virtualize id as a
composite of arena-selector and offset fields, and use zero offset to get
to the header and the type info. Anyway, you know what I'm saying.

Of course you could have a space (it wouldn't be an array per se) with current-style
objects with individual type slots too, just by making the arena header type info say
"see object-specific type slot". Just that freeing/allocating space is more complicated
Than

I see from below[1] that allocation is already from a pool of type-specific
free lists for float, so IWT those lists could be linked within the
arrays just mentioned (you just can't use the type slot, because there wouldn't
be any individual ones, so maybe use the value slot instead).

[...]
Deleting frees mem for reuse by Python but does not necessarily return
memory to system for reuse by other processes. (Exact behavior is
*very* system specific, according to Tim Peters' war stories.)
Its worse in this case: int and float objects come out of special internal
type-specific "free lists", and there's no bound on how large those can get.
static void
float_dealloc(PyFloatObject *op)
{
if (PyFloat_CheckExact(op)) {
op->ob_type = (struct _typeobject *)free_list;
free_list = op;
}
else
op->ob_type->tp_free((PyObject *)op);
}
IOW, memory once allocated for a float can never be reused for any other
kind of object, and isn't returned to the platform C library until Python
shuts down. The list memory did get returned to the platform C library, and
in this case it looks like the latter did return that chunk to the OS. If
the OP had allocated some other "large object" after allocating the list,
chances are good that the platform C would not have returned the list memory
to the OS. Even then, it doesn't much matter -- the OS will simply page out
that part of the address space if it's not used again. The VM highwater
mark doesn't have a primary effect on performance.
You'd have to change the above to use the ref count slot or the first 4 bytes
of the value slot instead of op->ob_type to link the free list though,
since ob_type would be coming from, e.g., OB_TYPE(op) with the appropriate
address masking etc in an OB_TYPE macro.

I wonder what the performance hit would be, given processor speeds and caching etc.
Or has this been tried and rejected for Python?

Regards,
Bengt Richter
Bengt Richter
2003-01-26 21:11:40 UTC
Permalink
Post by Bengt Richter
Each object has at least a pointer to the object's type object,
and a
Post by Bengt Richter
refcount, for at least 8 bytes of overhead. So a float object
consumes at
Post by Bengt Richter
least 16 bytes.
Why do you need to duplicate the type pointer for all those? I.e.,
if
Post by Bengt Richter
you allocated space in decent-size arrays of object
representations without
Post by Bengt Richter
type pointer, and just reserved a header slot in front of the
array, [snip]
Because in Python, each object is a separate object, and because lists
are heterogeneous. Anyone who wants to work with 12 mil floats should
almost certainly be using Numerical Python, which does have homegenous
arrays of native floats (8 bytes per) along with C functions to do
most common 'base' computatons.
I agree about using Numerical Python and an array of native floats for
12 mil floats, as you describe, but I don't think the fact that lists
"are heterogenous" is a real reason to need per-object type pointers
for all types of objects.

A list doesn't care where the objects it refers to are stored, so e.g.,
floats could be put in floats-only areas, and others could be put in
mixed-areas, and you'd figure type by figuring out from the address first
what kind of area the thing is in, and then if it's a float-only area,
you know right then you've got a float, and if its mixed, you go to the
object representation for type info.

Small lists and tuples could themselves be allocated in specialized areas
with some space savings, since they are homogeneous in themselves -- i.e.,
they don't actually have objects embedded, just references to objects.

But Tim is right, all in all it's usually easier to buy memory. And a simple
consistent way of doing things is worth a _lot_, which today's memory prices
allow. (I just got my space-saving reflexes when 4k of 300ns memory cost
more than all the PC's I've owned put together ;-)

Regards,
Bengt Richter
John Machin
2003-01-26 07:39:27 UTC
Permalink
Post by Bengt Richter
[someone]
Post by Stanley
Then I generate a large list (12 million floats), either using map,
a list comprehension, or preallocating a list, then filling it using a
for loop.
[Terry Reedy]
12 million different floats use 12 million * sizeof float object
bytes. Float object is at least 8 bytes for float + object overhead
(about 4 bytes?)
Each object has at least a pointer to the object's type object, and a
refcount, for at least 8 bytes of overhead. So a float object consumes at
least 16 bytes.
Why do you need to duplicate the type pointer for all those? I.e., if
you allocated space in decent-size arrays of object representations without
type pointer, and just reserved a header slot in front of the array, [snip]
You'll have to pardon me if my brain appears fried (possible; 40
degrees (Celsius) here yesterday) but I don't understand Bengt's
question, nor Tim's answer.

Shouldn't the answer be: (a) You can't do that with the Python "list"
container, which can contain objects of *any* type (b) You can do that
with a different container, if you restrict all the objects in the
container to being of the one type (c) That has been done, for many
types; see the "array" module.
Terry Reedy
2003-01-26 07:39:43 UTC
Permalink
Post by Bengt Richter
Each object has at least a pointer to the object's type object, and a
refcount, for at least 8 bytes of overhead. So a float object consumes at
least 16 bytes.
Why do you need to duplicate the type pointer for all those? I.e., if
you allocated space in decent-size arrays of object
representations without
Post by Bengt Richter
type pointer, and just reserved a header slot in front of the array, [snip]
Because in Python, each object is a separate object, and because lists
are heterogeneous. Anyone who wants to work with 12 mil floats should
almost certainly be using Numerical Python, which does have homegenous
arrays of native floats (8 bytes per) along with C functions to do
most common 'base' computatons.

Terry J. Reedy
Tim Peters
2003-01-26 02:14:59 UTC
Permalink
[Bengt Richter]
Post by Bengt Richter
Why do you need to duplicate the type pointer for all those?
That's the way it is, and countless lines of code-- both in the core and in
extension modules --rely on being able to get at the type object from a
PyObject* p via p->ob_type.
Post by Bengt Richter
I.e., if you allocated space in decent-size arrays of object
representations without type pointer, and just reserved a header
slot in front of the array, ... [etc] ...
This can't be changed now.
Post by Bengt Richter
...
I wonder what the performance hit would be, given processor
speeds and caching etc.
Or has this been tried and rejected for Python?
Tried no, rejected yes <wink>.
Tim Peters
2003-01-25 12:41:42 UTC
Permalink
[someone]
Post by Stanley
Then I generate a large list (12 million floats), either using map,
a list comprehension, or preallocating a list, then filling it using a
for loop.
[Terry Reedy]
12 million different floats use 12 million * sizeof float object
bytes. Float object is at least 8 bytes for float + object overhead
(about 4 bytes?)
Each object has at least a pointer to the object's type object, and a
refcount, for at least 8 bytes of overhead. So a float object consumes at
least 16 bytes.
list has array of 12 millions pointers (4 (or possibly 8) bytes). Thus
at least be 16 * 12 million, roughly 200 million bytes, regardless of
method.
At least 192 million for the float objects, and at least 48 million for the
list.
Post by Stanley
When I check the memory footprint again, it is larger than I would
expect, but I'm not sure what the per element overhead of a list is.
%MEM RSS RSS SZ VSZ
46.2 237648 237648 59690 238760
If 237648 is # of Kbytes, it is roughly what should expect.
Yup.
Post by Stanley
So then delete it.
del(big_list)
%MEM RSS RSS SZ VSZ
37.1 190772 190772 47971 191884
Deleting frees mem for reuse by Python but does not necessarily return
memory to system for reuse by other processes. (Exact behavior is
*very* system specific, according to Tim Peters' war stories.)
Its worse in this case: int and float objects come out of special internal
type-specific "free lists", and there's no bound on how large those can get.
Here's the deallocator for floats:

static void
float_dealloc(PyFloatObject *op)
{
if (PyFloat_CheckExact(op)) {
op->ob_type = (struct _typeobject *)free_list;
free_list = op;
}
else
op->ob_type->tp_free((PyObject *)op);
}

IOW, memory once allocated for a float can never be reused for any other
kind of object, and isn't returned to the platform C library until Python
shuts down. The list memory did get returned to the platform C library, and
in this case it looks like the latter did return that chunk to the OS. If
the OP had allocated some other "large object" after allocating the list,
chances are good that the platform C would not have returned the list memory
to the OS. Even then, it doesn't much matter -- the OS will simply page out
that part of the address space if it's not used again. The VM highwater
mark doesn't have a primary effect on performance.
Tim Peters
2003-01-26 17:05:01 UTC
Permalink
[Bengt Richter]
Post by Bengt Richter
Why do you need to duplicate the type pointer for all those? I.e., if
you allocated space in decent-size arrays of object representations
without type pointer, and just reserved a header slot in front of the
array, ...
[John Machin]
Post by Bengt Richter
You'll have to pardon me if my brain appears fried (possible; 40
degrees (Celsius) here yesterday) but I don't understand Bengt's
question, nor Tim's answer.
Shouldn't the answer be: (a) You can't do that with the Python "list"
container, which can contain objects of *any* type (b) You can do that
with a different container, if you restrict all the objects in the
container to being of the one type (c) That has been done, for many
types; see the "array" module.
Bengt is describing a space-saving trick that relies on address calculation
to deduce the type of an object. Suppose instead of having a pointer to a
type object at the start of every Python object, we had no per-object type
pointers at all. Instead all of memory is carved into (say) 128KB chunks,
each aligned (this part is crucial) at an address evenly divisible by 128K.
Only the first 4 bytes of a chunk contain a type pointer. Each chunk can
hold objects of only a single type, and all objects in a chunk share the
4-byte type pointer at the start of the chunk. To get from an address p to
its type, then, you mask off the low bits to get to the closest-preceding
128KB boundary, and read up the 4 bytes at that address to get a pointer to
the type object.

Python doesn't do this, and it can't be done sticking to standard C: the
address calculations required rely on architecture assumptions that aren't
always true (for example, word-addressed machines screw it up a little, and
tagged architectures using low-order address bits for metadata screw it
royally). Python's small-object allocator nevertheless does "something like
it", carving large chunks of storage into aligned "pools", and storing
common pool info at pool-aligned addresses. The pool info for an object is
later gotten via address calculation.

If Python is ported to a box where that fails, the Python small-object
allocator can be disabled via config fiddling. It's also the case that,
even when it's enabled, Python doesn't insist than anyone (meaning 3rd-party
extension modules) use Python's small-object allocator. If they want to
allocate extension objects with the platform malloc, or their own malloc, or
whatever, Python doesn't care. Forcing everyone to use Python's allocator
wouldn't fly, and if getting a pointer to a type object *required* address
calculation, Python would have to insist that everyone use Python's memory
layer.

All in all, it's a lot easier if people just buy more RAM <0.9 wink>.
Loading...