memory leak with large list??

Thanks for all the good pointers. I did try using the array module,
initializing with the 'd' type, and based on my 'ps' measurements, it
was taking up just about twice as much memory as a list with the same
number of elements, which made me think that the array module wasn't
what I understood it to be. In retrospect, I relize that I was
comparing the size of a list of 12e6 pointers to 0.0 to an array of 8
byte floats, which I guess is like comparing apples to apple seeds.

This brings up two questions for me:

1. Since several people suggested it, I've been looking at Numeric
python, which looks really great. The question is, are there any
disadvantages to using Numeric over using the array module. It seems
that array is just a poor cousin of Numeric. So I assume that either
the array module has some advantage (speed or memory wise) over
Numeric, or it exists simply because Numeric is not a part of the base
python distribution. So is there any reason not to use Numeric??

2. Is there any nice way to profile memory in python? Using ps seems
like a very crude tool, but the gc module only seems to help with
leaks, and the profiler doesn't seem to deal with memory.

I do understand that its easiest to buy more memory, but I'm
eventually going to want to have several data structures with 12e6
floats (and possibly a larger number), it seems to make sense to try
to economize a bit, before I run out of addressable memory.

Stanley

2003-01-24 22:35:55 UTC

I should start out by saying that I am using the linux command 'ps' to
check memory usage, and I am not entirely sure of the implications of
this.

command_text = 'ps -o pmem,rss,rssize,sz,vsz ' + str(os.getpid())
os.system(command_text)

First I check the memory footprint of the process.
%MEM RSS RSS SZ VSZ
0.3 1824 1824 734 2936

Then I generate a large list (12 million floats), either using map, a
list comprehension, or preallocating a list, then filling it using a
for loop.

big_list = [float(index) for index in xrange(12000000)]

big_list = map(float, xrange(12000000))

big_list = [0.0] * 12000000

... big_list[index] = float(index)

When I check the memory footprint again, it is larger than I would
expect, but I'm not sure what the per element overhead of a list is.

%MEM RSS RSS SZ VSZ
46.2 237648 237648 59690 238760

So then delete it.

del(big_list)

But the memory footprint of the process remains very large:

%MEM RSS RSS SZ VSZ
37.1 190772 190772 47971 191884

Is python leaking memory, or is there something else going on here.

big_list = [0.0] * 12000000

and then delete it, I don't see the leak, but then memory footprint is
smaller than you would expect
from 12 million floats:

%MEM RSS RSS SZ VSZ
9.4 48700 48700 12453 49812

So I think python is using a shortcut in this case.

I am using Python 2.2.1 on Debian 3.0, kernel 2.4.17-686.

Thanks for the help!

Skip Montanaro

2003-01-24 23:33:28 UTC

Post by Stanley
%MEM RSS RSS SZ VSZ
46.2 237648 237648 59690 238760
So then delete it.

del(big_list)

%MEM RSS RSS SZ VSZ
37.1 190772 190772 47971 191884
Is python leaking memory, or is there something else going on here.

No, this is your platform's malloc() library not returning memory to the
operating system.

big_list = [0.0] * 12000000

and then delete it, I don't see the leak, but then memory footprint is
smaller than you would expect
%MEM RSS RSS SZ VSZ
9.4 48700 48700 12453 49812
So I think python is using a shortcut in this case.

Yes. a float for 0.0 is allocated once, then referred to from all those

list1 = [0.0] * 5
list2 = [float(0) for f in xrange(5)]
list1

[0.0, 0.0, 0.0, 0.0, 0.0]

list2

[0.0, 0.0, 0.0, 0.0, 0.0]

list1 == list2

True

map(id, list1)

[6355924, 6355924, 6355924, 6355924, 6355924]

map(id, list2)

[6355860, 6355908, 6355892, 6355876, 6355844]

Skip

Terry Reedy

2003-01-25 00:10:25 UTC

Post by Stanley
Then I generate a large list (12 million floats), either using map, a
list comprehension, or preallocating a list, then filling it using a
for loop.

12 million different floats use 12 million * sizeof float object
bytes. Float object is at least 8 bytes for float + object overhead
(about 4 bytes?) list has array of 12 millions pointers (4 (or
possibly 8) bytes). Thus at least be 16 * 12 million, roughly 200
million bytes, regardless of method.

...

Post by Stanley
When I check the memory footprint again, it is larger than I would
expect, but I'm not sure what the per element overhead of a list is.
%MEM RSS RSS SZ VSZ
46.2 237648 237648 59690 238760

If 237648 is # of Kbytes, it is roughly what should expect.

Post by Stanley
So then delete it.

del(big_list)

%MEM RSS RSS SZ VSZ
37.1 190772 190772 47971 191884

Deleting frees mem for reuse by Python but does not necessarily return
memory to system for reuse by other processes. (Exact behavior is
*very* system specific, according to Tim Peters' war stories.)

Post by Stanley
Is python leaking memory, or is there something else going on here.

Pretty sure no, yes.

big_list = [0.0] * 12000000

(I presume you mean, by preallocate, that you subsequently overwrote
with floats as before.) This is 12 million duplicate pointers to
*one* object. 48 megabyte memory for array is allocated just once.
Others methods start with small list (perhaps with room for 8
pointers) and reallocate and copy as list grows. This can chop up
heap. Preallocation is much better for large arrays.

Post by Stanley
and then delete it, I don't see the leak, but then memory footprint is
smaller than you would expect
%MEM RSS RSS SZ VSZ
9.4 48700 48700 12453 49812
So I think python is using a shortcut in this case.
I am using Python 2.2.1 on Debian 3.0, kernel 2.4.17-686.

Maybe someone else is familiar with this particular system.

Terry J. Reedy

Bengt Richter

2003-01-26 00:29:22 UTC

[someone]

Post by Stanley
Then I generate a large list (12 million floats), either using map,
a list comprehension, or preallocating a list, then filling it using a
for loop.

[Terry Reedy]

12 million different floats use 12 million * sizeof float object
bytes. Float object is at least 8 bytes for float + object overhead
(about 4 bytes?)

Each object has at least a pointer to the object's type object, and a
refcount, for at least 8 bytes of overhead. So a float object consumes at
least 16 bytes.

Why do you need to duplicate the type pointer for all those? I.e., if
you allocated space in decent-size arrays of object representations without
type pointer, and just reserved a header slot in front of the array, and
allocated the whole so array headers are aligned on, say, 128k addresses,
then you could look up the same type info with just an address mask and gain
4 bytes per float. Ditto for any homogeneous allocation arenas. IWT you
could allocate big blocks from a memory mapped file area if there's no other
way to get aligned large blocks. Or you could allocate really large virtual
space and sacrifice space at the front if necessary to start arena allocations
aligned as desired. Or if some platform is stubborn, virtualize id as a
composite of arena-selector and offset fields, and use zero offset to get
to the header and the type info. Anyway, you know what I'm saying.

Of course you could have a space (it wouldn't be an array per se) with current-style
objects with individual type slots too, just by making the arena header type info say
"see object-specific type slot". Just that freeing/allocating space is more complicated
Than

I see from below[1] that allocation is already from a pool of type-specific
free lists for float, so IWT those lists could be linked within the
arrays just mentioned (you just can't use the type slot, because there wouldn't
be any individual ones, so maybe use the value slot instead).

[...]

Deleting frees mem for reuse by Python but does not necessarily return
memory to system for reuse by other processes. (Exact behavior is
*very* system specific, according to Tim Peters' war stories.)

Its worse in this case: int and float objects come out of special internal
type-specific "free lists", and there's no bound on how large those can get.
static void
float_dealloc(PyFloatObject *op)
{
if (PyFloat_CheckExact(op)) {
op->ob_type = (struct _typeobject *)free_list;
free_list = op;
}
else
op->ob_type->tp_free((PyObject *)op);
}
IOW, memory once allocated for a float can never be reused for any other
kind of object, and isn't returned to the platform C library until Python
shuts down. The list memory did get returned to the platform C library, and
in this case it looks like the latter did return that chunk to the OS. If
the OP had allocated some other "large object" after allocating the list,
chances are good that the platform C would not have returned the list memory
to the OS. Even then, it doesn't much matter -- the OS will simply page out
that part of the address space if it's not used again. The VM highwater
mark doesn't have a primary effect on performance.

You'd have to change the above to use the ref count slot or the first 4 bytes
of the value slot instead of op->ob_type to link the free list though,
since ob_type would be coming from, e.g., OB_TYPE(op) with the appropriate
address masking etc in an OB_TYPE macro.

I wonder what the performance hit would be, given processor speeds and caching etc.
Or has this been tried and rejected for Python?

Regards,
Bengt Richter

Bengt Richter

2003-01-26 21:11:40 UTC

Each object has at least a pointer to the object's type object,

and a

refcount, for at least 8 bytes of overhead. So a float object

consumes at

least 16 bytes.

Why do you need to duplicate the type pointer for all those? I.e.,

Post by Bengt Richter
you allocated space in decent-size arrays of object

representations without

Post by Bengt Richter
type pointer, and just reserved a header slot in front of the

array, [snip]
Because in Python, each object is a separate object, and because lists
are heterogeneous. Anyone who wants to work with 12 mil floats should
almost certainly be using Numerical Python, which does have homegenous
arrays of native floats (8 bytes per) along with C functions to do
most common 'base' computatons.

I agree about using Numerical Python and an array of native floats for
12 mil floats, as you describe, but I don't think the fact that lists
"are heterogenous" is a real reason to need per-object type pointers
for all types of objects.

A list doesn't care where the objects it refers to are stored, so e.g.,
floats could be put in floats-only areas, and others could be put in
mixed-areas, and you'd figure type by figuring out from the address first
what kind of area the thing is in, and then if it's a float-only area,
you know right then you've got a float, and if its mixed, you go to the
object representation for type info.

Small lists and tuples could themselves be allocated in specialized areas
with some space savings, since they are homogeneous in themselves -- i.e.,
they don't actually have objects embedded, just references to objects.

But Tim is right, all in all it's usually easier to buy memory. And a simple
consistent way of doing things is worth a _lot_, which today's memory prices
allow. (I just got my space-saving reflexes when 4k of 300ns memory cost
more than all the PC's I've owned put together ;-)

Regards,
Bengt Richter

John Machin

2003-01-26 07:39:27 UTC

[someone]

Post by Stanley
Then I generate a large list (12 million floats), either using map,
a list comprehension, or preallocating a list, then filling it using a
for loop.

[Terry Reedy]

12 million different floats use 12 million * sizeof float object
bytes. Float object is at least 8 bytes for float + object overhead
(about 4 bytes?)

Each object has at least a pointer to the object's type object, and a
refcount, for at least 8 bytes of overhead. So a float object consumes at
least 16 bytes.

You'll have to pardon me if my brain appears fried (possible; 40
degrees (Celsius) here yesterday) but I don't understand Bengt's
question, nor Tim's answer.

Shouldn't the answer be: (a) You can't do that with the Python "list"
container, which can contain objects of *any* type (b) You can do that
with a different container, if you restrict all the objects in the
container to being of the one type (c) That has been done, for many
types; see the "array" module.

Terry Reedy

2003-01-26 07:39:43 UTC

Each object has at least a pointer to the object's type object, and a
refcount, for at least 8 bytes of overhead. So a float object consumes at
least 16 bytes.

Why do you need to duplicate the type pointer for all those? I.e., if
you allocated space in decent-size arrays of object

representations without

Post by Bengt Richter
type pointer, and just reserved a header slot in front of the array, [snip]

Because in Python, each object is a separate object, and because lists
are heterogeneous. Anyone who wants to work with 12 mil floats should
almost certainly be using Numerical Python, which does have homegenous
arrays of native floats (8 bytes per) along with C functions to do
most common 'base' computatons.

Terry J. Reedy

Tim Peters

2003-01-26 02:14:59 UTC

[Bengt Richter]

Post by Bengt Richter
Why do you need to duplicate the type pointer for all those?

That's the way it is, and countless lines of code-- both in the core and in
extension modules --rely on being able to get at the type object from a
PyObject* p via p->ob_type.

Post by Bengt Richter
I.e., if you allocated space in decent-size arrays of object
representations without type pointer, and just reserved a header
slot in front of the array, ... [etc] ...

This can't be changed now.

Post by Bengt Richter
...
I wonder what the performance hit would be, given processor
speeds and caching etc.
Or has this been tried and rejected for Python?

Tried no, rejected yes <wink>.

Tim Peters

2003-01-25 12:41:42 UTC

[someone]

Post by Stanley
Then I generate a large list (12 million floats), either using map,
a list comprehension, or preallocating a list, then filling it using a
for loop.

[Terry Reedy]

12 million different floats use 12 million * sizeof float object
bytes. Float object is at least 8 bytes for float + object overhead
(about 4 bytes?)

Each object has at least a pointer to the object's type object, and a
refcount, for at least 8 bytes of overhead. So a float object consumes at
least 16 bytes.

list has array of 12 millions pointers (4 (or possibly 8) bytes). Thus
at least be 16 * 12 million, roughly 200 million bytes, regardless of
method.

At least 192 million for the float objects, and at least 48 million for the
list.

If 237648 is # of Kbytes, it is roughly what should expect.

Yup.

Post by Stanley
So then delete it.

del(big_list)

%MEM RSS RSS SZ VSZ
37.1 190772 190772 47971 191884

Deleting frees mem for reuse by Python but does not necessarily return
memory to system for reuse by other processes. (Exact behavior is
*very* system specific, according to Tim Peters' war stories.)

Its worse in this case: int and float objects come out of special internal
type-specific "free lists", and there's no bound on how large those can get.
Here's the deallocator for floats:

static void
float_dealloc(PyFloatObject *op)
{
if (PyFloat_CheckExact(op)) {
op->ob_type = (struct _typeobject *)free_list;
free_list = op;
}
else
op->ob_type->tp_free((PyObject *)op);
}

IOW, memory once allocated for a float can never be reused for any other
kind of object, and isn't returned to the platform C library until Python
shuts down. The list memory did get returned to the platform C library, and
in this case it looks like the latter did return that chunk to the OS. If
the OP had allocated some other "large object" after allocating the list,
chances are good that the platform C would not have returned the list memory
to the OS. Even then, it doesn't much matter -- the OS will simply page out
that part of the address space if it's not used again. The VM highwater
mark doesn't have a primary effect on performance.

Tim Peters

2003-01-26 17:05:01 UTC