Discussion:
Multiprocessing, shared memory vs. pickled copies
John Ladasky
2011-04-04 20:20:16 UTC
Permalink
Hi folks,

I'm developing some custom neural network code. I'm using Python 2.6,
Numpy 1.5, and Ubuntu Linux 10.10. I have an AMD 1090T six-core CPU,
and I want to take full advantage of it. I love to hear my CPU fan
running, and watch my results come back faster.

When I'm training a neural network, I pass two numpy.ndarray objects
to a function called evaluate. One array contains the weights for the
neural network, and the other array contains the input data. The
evaluate function returns an array of output data.

I have been playing with multiprocessing for a while now, and I have
some familiarity with Pool. Apparently, arguments passed to a Pool
subprocess must be able to be pickled. Pickling is still a pretty
vague progress to me, but I can see that you have to write custom
__reduce__ and __setstate__ methods for your objects. An example of
code which creates a pickle-friendly ndarray subclass is here:

http://www.mail-archive.com/numpy-discussion at scipy.org/msg02446.html

Now, I don't know that I actually HAVE to pass my neural network and
input data as copies -- they're both READ-ONLY objects for the
duration of an evaluate function (which can go on for quite a while).
So, I have also started to investigate shared-memory approaches. I
don't know how a shared-memory object is referenced by a subprocess
yet, but presumably you pass a reference to the object, rather than
the whole object. Also, it appears that subprocesses also acquire a
temporary lock over a shared memory object, and thus one process may
well spend time waiting for another (individual CPU caches may
sidestep this problem?) Anyway, an implementation of a shared-memory
ndarray is here:

https://bitbucket.org/cleemesser/numpy-sharedmem/src/3fa526d11578/shmarray.py

I've added a few lines to this code which allows subclassing the
shared memory array, which I need (because my neural net objects are
more than just the array, they also contain meta-data). But I've run
into some trouble doing the actual sharing part. The shmarray class
CANNOT be pickled. I think that my understanding of multiprocessing
needs to evolve beyond the use of Pool, but I'm not sure yet. This
post suggests as much.

http://mail.scipy.org/pipermail/scipy-user/2009-February/019696.html

I don't believe that my questions are specific to numpy, which is why
I'm posting here, in a more general Python forum.

When should one pickle and copy? When to implement an object in
shared memory? Why is pickling apparently such a non-trivial process
anyway? And, given that multi-core CPU's are apparently here to stay,
should it be so difficult to make use of them?
Dan Stromberg
2011-04-04 22:10:51 UTC
Permalink
Post by John Ladasky
When should one pickle and copy? When to implement an object in
shared memory? Why is pickling apparently such a non-trivial process
anyway? And, given that multi-core CPU's are apparently here to stay,
should it be so difficult to make use of them?<http://mail.python.org/mailman/listinfo/python-list>
Pickle and copy when your distinct processes will benefit from having
multiple distinct copies of the data - that is, copies that can diverge
from each other if written to in one or more of the processes.

Use shared memory if your distinct processes need to be able to see a single
copy of the same data - EG, because one needs to see the result of the
changes made in another.

Pickling's not a big deal - it's just turning structured data (from an int
to a nested series of objects) into a stream of bytes. Not all extension
types support it without extra work though.

Yes, multicore is increasingly important, and Python needs to support it
well. multiprocessing is one good way of doing so. Also, Python 3.2 has a
new module facilitating this, and an improved GIL situation. Then there
are things like greenlets and stackless python. Oh, and if you move to
jython or ironpython, you get better multithreading.

Multithreading is generally a bit lighter-weight than multiprocessing, but
it also gives a little tighter coupling between different parts of the
program and a greater probability of race conditions.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20110404/ee68f9f8/attachment.html>
Philip Semanchuk
2011-04-04 23:34:29 UTC
Permalink
Post by John Ladasky
I have been playing with multiprocessing for a while now, and I have
some familiarity with Pool. Apparently, arguments passed to a Pool
subprocess must be able to be pickled.
Hi John,
multiprocessing's use of pickle is not limited to Pool. For instance, objects put into a multiprocessing.Queue are also pickled, as are the args to a multiprocessing.Process. So if you're going to use multiprocessing, you're going to use pickle, and you need pickleable objects.
Post by John Ladasky
Pickling is still a pretty
vague progress to me, but I can see that you have to write custom
__reduce__ and __setstate__ methods for your objects.
Well, that's only if one's objects don't support pickle by default. A lot of classes do without any need for custom __reduce__ and __setstate__ methods. Since you're apparently not too familiar with pickle, I don't want you to get the false impression that it's a lot of trouble. I've used pickle a number of times and never had to write custom methods for it.
Post by John Ladasky
Now, I don't know that I actually HAVE to pass my neural network and
input data as copies -- they're both READ-ONLY objects for the
duration of an evaluate function (which can go on for quite a while).
So, I have also started to investigate shared-memory approaches. I
don't know how a shared-memory object is referenced by a subprocess
yet, but presumably you pass a reference to the object, rather than
the whole object. Also, it appears that subprocesses also acquire a
temporary lock over a shared memory object, and thus one process may
well spend time waiting for another (individual CPU caches may
sidestep this problem?) Anyway, an implementation of a shared-memory
There's no standard shared memory implementation for Python. The mmap module is as close as you get. I wrote & support the posix_ipc and sysv_ipc modules which give you IPC primitives (shared memory and semaphores) in Python. They work well (IMHO) but they're *nix-only and much lower level than multiprocessing. If multiprocessing is like a kitchen well stocked with appliances, posix_ipc (and sysc_ipc) is like a box of sharp knives.

Note that mmap and my IPC modules don't expose Python objects. They expose raw bytes in memory. YOu're still going to have to jump through some hoops (...like pickle) to turn your Python objects into a bytestream and vice versa.


What might be easier than fooling around with boxes of sharp knives is to convert your ndarray objects to Python lists. Lists are pickle-friendly and easy to turn back into ndarray objects once they've crossed the pickle boundary.
Post by John Ladasky
When should one pickle and copy? When to implement an object in
shared memory? Why is pickling apparently such a non-trivial process
anyway? And, given that multi-core CPU's are apparently here to stay,
should it be so difficult to make use of them?
My answers to these questions:

1) Depends
2) In Python, almost never unless you're using a nice wrapper like shmarray.py
3) I don't think it's non-trivial =)
4) No, definitely not. Python will only get better at working with multiple cores/CPUs, but there's plenty of room for improvement on the status quo.

Hope this helps
Philip
Dan Stromberg
2011-04-05 01:03:21 UTC
Permalink
Post by Philip Semanchuk
So if you're going to use multiprocessing, you're going to use pickle, and
you need pickleable objects.
http://docs.python.org/library/multiprocessing.html#sharing-state-between-processes
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20110404/390e7e1b/attachment.html>
Philip Semanchuk
2011-04-05 01:16:28 UTC
Permalink
Post by Dan Stromberg
Post by Philip Semanchuk
So if you're going to use multiprocessing, you're going to use pickle, and
you need pickleable objects.
http://docs.python.org/library/multiprocessing.html#sharing-state-between-processes
Thank you, Dan. My reading comprehension skills need work.

Cheers
Philip
Robert Kern
2011-04-05 00:05:35 UTC
Permalink
Post by John Ladasky
Hi folks,
I'm developing some custom neural network code. I'm using Python 2.6,
Numpy 1.5, and Ubuntu Linux 10.10. I have an AMD 1090T six-core CPU,
and I want to take full advantage of it. I love to hear my CPU fan
running, and watch my results come back faster.
You will want to ask numpy questions on the numpy mailing list.

http://www.scipy.org/Mailing_Lists
Post by John Ladasky
When I'm training a neural network, I pass two numpy.ndarray objects
to a function called evaluate. One array contains the weights for the
neural network, and the other array contains the input data. The
evaluate function returns an array of output data.
I have been playing with multiprocessing for a while now, and I have
some familiarity with Pool. Apparently, arguments passed to a Pool
subprocess must be able to be pickled. Pickling is still a pretty
vague progress to me, but I can see that you have to write custom
__reduce__ and __setstate__ methods for your objects. An example of
http://www.mail-archive.com/numpy-discussion at scipy.org/msg02446.html
Note that numpy arrays are already pickle-friendly. This message is telling you
how, *if* you are already subclassing, how to make your subclass pickle the
extra information it holds.
Post by John Ladasky
Now, I don't know that I actually HAVE to pass my neural network and
input data as copies -- they're both READ-ONLY objects for the
duration of an evaluate function (which can go on for quite a while).
So, I have also started to investigate shared-memory approaches. I
don't know how a shared-memory object is referenced by a subprocess
yet, but presumably you pass a reference to the object, rather than
the whole object. Also, it appears that subprocesses also acquire a
temporary lock over a shared memory object, and thus one process may
well spend time waiting for another (individual CPU caches may
sidestep this problem?) Anyway, an implementation of a shared-memory
https://bitbucket.org/cleemesser/numpy-sharedmem/src/3fa526d11578/shmarray.py
I've added a few lines to this code which allows subclassing the
shared memory array, which I need (because my neural net objects are
more than just the array, they also contain meta-data).
Honestly, you should avoid subclassing ndarray just to add metadata. It never
works well. Make a plain class, and keep the arrays as attributes.
Post by John Ladasky
But I've run
into some trouble doing the actual sharing part. The shmarray class
CANNOT be pickled.
Please never just *say* that something doesn't work. Show us what you tried, and
show us exactly what output you got. I assume you tried something like this:

[Downloads]$ cat runmp.py
from multiprocessing import Pool
import shmarray


def f(z):
return z.sum()

y = shmarray.zeros(10)
z = shmarray.ones(10)

p = Pool(2)
print p.map(f, [y, z])


And got output like this:

[Downloads]$ python runmp.py
Exception in thread Thread-2:
Traceback (most recent call last):
File
"/Library/Frameworks/Python.framework/Versions/7.0/lib/python2.7/threading.py",
line 530, in __bootstrap_inner
self.run()
File
"/Library/Frameworks/Python.framework/Versions/7.0/lib/python2.7/threading.py",
line 483, in run
self.__target(*self.__args, **self.__kwargs)
File
"/Library/Frameworks/Python.framework/Versions/7.0/lib/python2.7/multiprocessing/pool.py",
line 287, in _handle_tasks
put(task)
PicklingError: Can't pickle <class
'multiprocessing.sharedctypes.c_double_Array_10'>: attribute lookup
multiprocessing.sharedctypes.c_double_Array_10 failed


Now, the sharedctypes is supposed to implement shared arrays. Underneath, they
have some dynamically created types like this c_double_Array_10 type.
multiprocessing has a custom pickler which has a registry of reduction functions
for types that do not implement a __reduce_ex__() method. For these dynamically
created types that cannot be imported from a module, this dynamic registry is
the only way to do it. At least at one point, the Connection objects which
communicate between processes would use this custom pickler to serialize objects
to bytes to transmit them.

However, at least in Python 2.7, multiprocessing seems to have a C extension
module defining the Connection objects. Unfortunately, it looks like this C
extension just imports the regular pickler that is not aware of these custom
types. That's why you get this error. I believe this is a bug in Python.

So what did you try, and what output did you get? What version of Python are you
using?
Post by John Ladasky
I think that my understanding of multiprocessing
needs to evolve beyond the use of Pool, but I'm not sure yet. This
post suggests as much.
http://mail.scipy.org/pipermail/scipy-user/2009-February/019696.html
Maybe. If the __reduce_ex__() method is implemented properly (and
multiprocessing bugs aren't getting in the way), you ought to be able to pass
them to a Pool just fine. You just need to make sure that the shared arrays are
allocated before the Pool is started. And this only works on UNIX machines. The
shared memory objects that shmarray uses can only be inherited. I believe that's
what Sturla was getting at.
--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
Robert Kern
2011-04-05 15:47:17 UTC
Permalink
Post by Robert Kern
However, at least in Python 2.7, multiprocessing seems to have a C extension
module defining the Connection objects. Unfortunately, it looks like this C
extension just imports the regular pickler that is not aware of these custom
types. That's why you get this error. I believe this is a bug in Python.
So what did you try, and what output did you get? What version of Python are you
using?
Post by John Ladasky
I think that my understanding of multiprocessing
needs to evolve beyond the use of Pool, but I'm not sure yet. This
post suggests as much.
http://mail.scipy.org/pipermail/scipy-user/2009-February/019696.html
Maybe. If the __reduce_ex__() method is implemented properly (and
multiprocessing bugs aren't getting in the way), you ought to be able to pass
them to a Pool just fine. You just need to make sure that the shared arrays are
allocated before the Pool is started. And this only works on UNIX machines. The
shared memory objects that shmarray uses can only be inherited. I believe that's
what Sturla was getting at.
I'll take that back a little bit. Since the underlying shared memory types can
only be shared by inheritance, the ForkingPickler is only used for the arguments
passed to Process(), and not used for things sent through Queues (which Pool
uses). Since Pool cannot guarantee that the data exists before the Pool starts
its subprocesses, it must use the general mechanism.

So in short, if you pass the shmarrays as arguments to the target function in
Process(), it should work fine.
--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
John Ladasky
2011-04-05 16:58:05 UTC
Permalink
Hi Philip,

Thanks for the reply.
Post by Philip Semanchuk
So if you're going to use multiprocessing, you're going to use pickle, and you
need pickleable objects.
OK, that's good to know.
Post by Philip Semanchuk
Post by John Ladasky
Pickling is still a pretty
vague progress to me, but I can see that you have to write custom
__reduce__ and __setstate__ methods for your objects.
Well, that's only if one's objects don't support pickle by default. A lot of
classes do without any need for custom __reduce__ and __setstate__ methods.
Since you're apparently not too familiar with pickle, I don't want you to get
the false impression that it's a lot of trouble. I've used pickle a number of
times and never had to write custom methods for it.
I used __reduce__ and __setstate__ once before with success, but I
just hacked at code I found on the Net, without fully understanding
it.

All right, since my last post, I've made an attempt to understand a
bit more about pickle. In doing so, I'm getting into the guts of the
Python class model, in a way that I did not expect. I believe that I
need to understand the following:

1) What kinds of objects have a __dict__,

2) Why the default pickling methods do not simply look for a __dict__
and serialize its contents, if it is present, and

3) Why objects can exist that do not support pickle by default, and
cannot be correctly pickled without custom __reduce__ and __setstate__
methods.

I can only furnish a (possibly incomplete) answer to question #1.
... pass # It doesn't get any simpler
...
Post by Philip Semanchuk
Post by John Ladasky
x = Foo()
x
<__builtin__.Foo instance at 0x9b916cc>
Post by Philip Semanchuk
Post by John Ladasky
x.__dict__
{}
Post by Philip Semanchuk
Post by John Ladasky
x.spam = "eggs"
x.__dict__
{'spam': 'eggs'}
Post by Philip Semanchuk
Post by John Ladasky
setattr(x, "parrot", "dead")
<__builtin__.Bar instance at 0x9b916cc>
Post by Philip Semanchuk
Post by John Ladasky
dir(x)
['__doc__', '__module__', 'parrot', 'spam']
Post by Philip Semanchuk
Post by John Ladasky
x.__dict__
{'parrot': 'dead', 'spam': 'eggs'}
Post by Philip Semanchuk
Post by John Ladasky
x.__dict__ = {"a":"b", "c":"d"}
dir(x)
['__doc__', '__module__', 'a', 'c']
Post by Philip Semanchuk
Post by John Ladasky
x.__dict__
{'a': 'b', 'c': 'd'}

Aside: this finding makes me wonder what exactly dir() is showing me
about an object (all objects in the namespace, including methods, but
not __dict__ itself?), and why it is distinct from __dict__.

In contrast, it appears that BUILT-IN classes (numbers, lists,
dictionaries, etc.) do not have a __dict__. And that you can't force
a __dict__ into built-in objects, no matter how you may try.
Post by Philip Semanchuk
Post by John Ladasky
n = 1
dir(n)
['__abs__', '__add__', '__and__', '__class__', '__cmp__',
'__coerce__',
'__delattr__', '__div__', '__divmod__', '__doc__', '__float__',
'__floordiv__',
'__format__', '__getattribute__', '__getnewargs__', '__hash__',
'__hex__',
'__index__', '__init__', '__int__', '__invert__', '__long__',
'__lshift__',
'__mod__', '__mul__', '__neg__', '__new__', '__nonzero__',
'__oct__', '__or__',
'__pos__', '__pow__', '__radd__', '__rand__', '__rdiv__',
'__rdivmod__',
'__reduce__', '__reduce_ex__', '__repr__', '__rfloordiv__',
'__rlshift__',
'__rmod__', '__rmul__', '__ror__', '__rpow__', '__rrshift__',
'__rshift__',
'__rsub__', '__rtruediv__', '__rxor__', '__setattr__',
'__sizeof__', '__str__',
'__sub__', '__subclasshook__', '__truediv__', '__trunc__',
'__xor__', 'conjugate',
'denominator', 'imag', 'numerator', 'real']
Post by Philip Semanchuk
Post by John Ladasky
n.__dict__
File "<console>", line 1, in <module>
''' <type 'exceptions.AttributeError'> : 'int' object has no
attribute '__dict__' '''
Post by Philip Semanchuk
Post by John Ladasky
n.__dict__ = {"a":"b", "c":"d"}
File "<console>", line 1, in <module>
''' <type 'exceptions.AttributeError'> : 'int' object has no
attribute '__dict__' '''
Post by Philip Semanchuk
Post by John Ladasky
setattr(n, "__dict__", {"a":"b", "c":"d"})
File "<console>", line 1, in <module>
''' <type 'exceptions.AttributeError'> : 'int' object has no
attribute '__dict__' '''


This leads straight into my second question. I THINK, without knowing
for sure, that most user classes would pickle correctly by simply
iterating through __dict__. So, why isn't this the default behavior
for Python? Was the assumption that programmers would only want to
pickle built-in classes? Can anyone show me an example of a common
object where my suggested approach would fail?
Post by Philip Semanchuk
What might be easier than fooling around with boxes of sharp knives is to convert
your ndarray objects
... with their attendant meta-data, of course...
Post by Philip Semanchuk
to Python lists. Lists are pickle-friendly and easy to turn
back into ndarray objects once they've crossed the pickle boundary.
That's a cute trick. I may try it, but I would rather do it the way
that Python recommends. I always prefer to understand what I'm doing,
and why.
Post by Philip Semanchuk
Python will only get better at working with multiple cores/CPUs, but there's plenty of room for improvement on the status quo.
Thank you for agreeing with me. I'm not formally trained in computer
science, and there was always the chance that my opinion as a non-
professional might come from simply not understanding the issues well
enough. But I chose Python as a programming language (after Applesoft
Basic, 6502 assembler, Pascal, and C -- yeah, I've been at this a
while) because of its ease of entry. And while Python has carried me
far, I don't think that an issue that is as central to modern
computing as using multiple CPU's concurrently is so arcane that we
should declare it black magic -- and therefore, it's OK if Python's
philosophical goal of achieving clarity and simplicity is not met in
this case.
Philip Semanchuk
2011-04-05 17:43:08 UTC
Permalink
Post by John Ladasky
Hi Philip,
Thanks for the reply.
Post by Philip Semanchuk
So if you're going to use multiprocessing, you're going to use pickle, and you
need pickleable objects.
OK, that's good to know.
But as Dan Stromberg pointed out, there are some pickle-free ways to communicate between processes using multiprocessing.
Post by John Ladasky
This leads straight into my second question. I THINK, without knowing
for sure, that most user classes would pickle correctly by simply
iterating through __dict__. So, why isn't this the default behavior
for Python? Was the assumption that programmers would only want to
pickle built-in classes?
... pass
...
Post by John Ladasky
Post by Philip Semanchuk
import pickle
foo_instance = Foo()
pickle.dumps(foo_instance)
'ccopy_reg\n_reconstructor\np0\n(c__main__\nFoo\np1\nc__builtin__\nobject\np2\nNtp3\nRp4\n.'


And as Robert Kern pointed out, numpy arrays are also pickle-able.
Post by John Ladasky
Post by Philip Semanchuk
import numpy
pickle.dumps(numpy.zeros(3))
"cnumpy.core.multiarray\n_reconstruct\np0\n(cnumpy\nndarray\np1\n(I0\ntp2\nS'b'\np3\ntp4\nRp5\n(I1\n(I3\ntp6\ncnumpy\ndtype\np7\n(S'f8'\np8\nI0\nI1\ntp9\nRp10\n(I3\nS'<'\np11\nNNNI-1\nI-1\nI0\ntp12\nbI00\nS'\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00'\np13\ntp14\nb."

As a side note, you should always use "new style" classes, particularly since you're exploring the details of Python class construction. "New" is a bit a of misnomer now, as "new" style classes were introduced in Python 2.2. They have been the status quo in Python 2.x for a while now and are the only choice in Python 3.x.

Subclassing object gives you a new style class:
class Foo(object):

Not subclassing object (as you did in your example) gives you an old style class:
class Foo:



Cheers
Philip
John Ladasky
2011-04-07 06:40:53 UTC
Permalink
Hello again, Philip,

I really appreciate you sticking with me. Hopefully this will help
someone else, too. I've done some more reading, and will offer some
minimal code below.

I've known about this page for a while, and it describes some of the
unconventional things one needs to consider when subclassing
numpy.ndarray:

http://www.scipy.org/Subclasses

Now, section 11.1 of the Python documentation says the following
concerning pickling: "Classes can further influence how their
instances are pickled; if the class defines the method __getstate__(),
it is called and the return state is pickled as the contents for the
instance, instead of the contents of the instance?s dictionary. If
there is no __getstate__() method, the instance?s __dict__ is
pickled."

http://docs.python.org/release/2.6.6/library/pickle.html

That being said, I'm having problems there! I've found this minimal
example of pickling with overridden __getstate__ and __setstate__
methods:

http://stackoverflow.com/questions/1939058/please-give-me-a-simple-example-about-setstate-and-getstate-thanks

I'll put all of the thoughts generated by these links together after
responding to a few things you wrote.
Post by Philip Semanchuk
But as Dan Stromberg pointed out, there are some pickle-free ways to communicate between processes using multiprocessing.
I only see your reply to Dan Stromberg in this thread, but not Dan
Stromberg's original post. I am reading this through Google Groups.
Perhaps Dan's post failed to make it through a gateway for some
reason?
Post by Philip Semanchuk
As a side note, you should always use "new style" classes, particularly since you're exploring the details of Python class construction. "New" is a bit a of misnomer now, as "new" style classes were introduced in Python 2.2. They have been the status quo in Python 2.x for a while now and are the only choice in Python 3.x.
Sorry, that was an oversight on my part. Normally I do remember that,
but I've been doing a lot of subclassing rather than defining top-
level classes, and therefore it slipped my mind.
OK, I got that working, I'll show an example farther down. (I never
tried to pickle a minimal class until now, I went straight for a hard
one.)
Post by Philip Semanchuk
And as Robert Kern pointed out, numpy arrays are also pickle-able.
OK, but SUBCLASSES of numpy.ndarray are not, in my hands, pickling as
I would expect. I already have lots of code that is based on such
subclasses, and they do everything I want EXCEPT pickle correctly. I
may have to direct this line of questions to the numpy discussion
group after all.

This is going to be a longer exposition. So let me thank you in
advance for your patience, and ask you to get comfortable.


========== begin code ==========

from numpy import array, ndarray
from pickle import dumps, loads

##===

class Foo(object):

def __init__(self, contents):
self.contents = contents

def __str__(self):
return str(type(self)) + "\n" + str(self.__dict__)

##===

class SuperFoo(Foo):

def __getstate__(self):
print "__getstate__"
duplicate = dict(self.__dict__)
duplicate["bonus"] = "added during pickling"
return duplicate

def __setstate__(self, d):
print "__setstate__"
self.__dict__ = d
self.finale = "added during unpickling"

##===

class ArraySubclass(ndarray):
"""
See http://www.scipy.org/Subclasses, this class is very similar.
"""
def __new__(subclass, data, extra=None, dtype=None, copy=False):
print " __new__"
arr = array(data, dtype=dtype, copy=copy)
arr = arr.view(subclass)
if extra is not None:
arr.extra = extra
elif hasattr(data, "extra"):
arr.extra = data.extra
return arr

def __array_finalize__(self, other):
print " __array_finalize__"
self.__dict__ = getattr(other, "__dict__", {})

def __str__(self):
return str(type(self)) + "\n" + ndarray.__str__(self) + \
"\n__dict__ : " + str(self.__dict__)

##===

class PicklableArraySubclass(ArraySubclass):

def __getstate__(self):
print "__getstate__"
return self.__dict__

def __setstate__(self, d):
print "__setstate__"
self.__dict__ = d

##===

print "\n\n** Create a Foo object, then create a copy via pickling."
original = Foo("pickle me please")
print original
clone = loads(dumps(original))
print clone

print "\n\n** Create a SuperFoo object, just to watch __setstate__ and
__getstate__."
original = SuperFoo("pickle me too, please")
print original
clone = loads(dumps(original))
print clone

print "\n\n** Create a numpy ndarray, then create a copy via
pickling."
original = array(((1,2,3),(4,5,6)))
print original
clone = loads(dumps(original))
print clone

print "\n\n** Create an ArraySubclass object, with meta-data..."
original = ArraySubclass(((9,8,7),(6,5,4)), extra = "pickle me
PLEASE!")
print original
print "\n...now attempt to create a copy via pickling."
clone = loads(dumps(original))
print clone

print "\n\n** That failed, try a PicklableArraySubclass..."
original = PicklableArraySubclass(((1,2),(3,4)), extra = "pickle,
dangit!!!")
print original
print "\n...now try to create a copy of the PicklableArraySubclass via
pickling."
clone = loads(dumps(original))
print clone


========== end code, begin output ==========


** Create a Foo object, then create a copy via pickling.
<class '__main__.Foo'>
{'contents': 'pickle me please'}
<class '__main__.Foo'>
{'contents': 'pickle me please'}


** Create a SuperFoo object, just to watch __setstate__ and
__getstate__.
<class '__main__.SuperFoo'>
{'contents': 'pickle me too, please'}
__getstate__
__setstate__
<class '__main__.SuperFoo'>
{'bonus': 'added during pickling', 'finale': 'added during
unpickling', 'contents': 'pickle me too, please'}


** Create a numpy ndarray, then create a copy via pickling.
[[1 2 3]
[4 5 6]]
[[1 2 3]
[4 5 6]]


** Create an ArraySubclass object, with meta-data...
__new__
__array_finalize__
__array_finalize__
__array_finalize__
<class '__main__.ArraySubclass'>
[[9 8 7]
[6 5 4]]
__dict__ : {'extra': 'pickle me PLEASE!'}

...now attempt to create a copy via pickling.
__array_finalize__
__array_finalize__
__array_finalize__
<class '__main__.ArraySubclass'>
[[9 8 7]
[6 5 4]]
__dict__ : {}


** That failed, try a PicklableArraySubclass...
__new__
__array_finalize__
__array_finalize__
__array_finalize__
<class '__main__.PicklableArraySubclass'>
[[1 2]
[3 4]]
__dict__ : {'extra': 'pickle, dangit!!!'}

...now try to create a copy of the PicklableArraySubclass via
pickling.
__array_finalize__
__setstate__
Traceback (most recent call last):
File "minimal ndarray subclass example for posting.py", line 109, in
<module>
clone = loads(dumps(original))
File "/usr/lib/python2.6/pickle.py", line 1374, in loads
return Unpickler(file).load()
File "/usr/lib/python2.6/pickle.py", line 858, in load
dispatch[key](self)
File "/usr/lib/python2.6/pickle.py", line 1217, in load_build
setstate(state)
File "minimal ndarray subclass example for posting.py", line 75, in
__setstate__
self.__dict__ = d
TypeError: __dict__ must be set to a dictionary, not a 'tuple'
Post by Philip Semanchuk
Exit code: 1
========== end output ==========

Observations:

My first three examples work fine. The second example, my SuperFoo
class, shows that I can intercept pickling attempts and add data to
__dict__.

Problems appear in the fourth example, when I try to pickle an
ArraySubclass object. It gets the array itself, but __dict__ is not
copied, despite the fact that that is supposed to be Python's default
behavior, and my first example did indeed show this default behavior.

So in my fifth and final example, the PickleableArraySublcass object,
I tried to override __getstate__ and __setstate__ just as I did with
SuperFoo. Here's what I see: 1) __getstate__ is NEVER called. 2)
__setstate__ is called, but it's expecting a dictionary as an
argument, and that dictionary is absent.

What's up with that?
John Ladasky
2011-04-07 07:41:13 UTC
Permalink
Following up to my own post...
Post by John Ladasky
What's up with that?
Apparently, "what's up" is that I will need to implement a third
method in my ndarray subclass -- namely, __reduce__.

http://www.mail-archive.com/numpy-discussion at scipy.org/msg02446.html

I'm burned out for tonight, I'll attempt to grasp what __reduce__ does
tomorrow.

Again, I'm going to point out that, given the extent that
multiprocessing depends upon pickling, pickling should be made
easier. This is Python, for goodness' sake! I'm still surprised at
the hoops I've got to jump through.
Philip Semanchuk
2011-04-07 13:23:27 UTC
Permalink
Post by John Ladasky
Following up to my own post...
Post by John Ladasky
What's up with that?
Apparently, "what's up" is that I will need to implement a third
method in my ndarray subclass -- namely, __reduce__.
http://www.mail-archive.com/numpy-discussion at scipy.org/msg02446.html
I'm burned out for tonight, I'll attempt to grasp what __reduce__ does
tomorrow.
Again, I'm going to point out that, given the extent that
multiprocessing depends upon pickling, pickling should be made
easier. This is Python, for goodness' sake! I'm still surprised at
the hoops I've got to jump through.
Hi John,
My own experience has been that when I reach a surprising level of hoop jumping, it usually means there's an easier path somewhere else that I'm neglecting.

But if pickling subclasses of numpy.ndarray objects is what you really feel you need to do, then yes, I think asking on the numpy list is the best idea.


Good luck
Philip
Robert Kern
2011-04-07 17:44:05 UTC
Permalink
Post by John Ladasky
Post by Philip Semanchuk
And as Robert Kern pointed out, numpy arrays are also pickle-able.
OK, but SUBCLASSES of numpy.ndarray are not, in my hands, pickling as
I would expect. I already have lots of code that is based on such
subclasses, and they do everything I want EXCEPT pickle correctly. I
may have to direct this line of questions to the numpy discussion
group after all.
Define the __reduce_ex__() method, not __getstate__(), __setstate__().

http://docs.python.org/release/2.6.6/library/pickle.html#pickling-and-unpickling-extension-types

ndarrays are extension types, so they use that mechanism.
--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
John Ladasky
2011-04-07 18:39:35 UTC
Permalink
Post by Robert Kern
Post by John Ladasky
Post by Philip Semanchuk
And as Robert Kern pointed out, numpy arrays are also pickle-able.
OK, but SUBCLASSES of numpy.ndarray are not, in my hands, pickling as
I would expect. ?I already have lots of code that is based on such
subclasses, and they do everything I want EXCEPT pickle correctly. ?I
may have to direct this line of questions to the numpy discussion
group after all.
Define the __reduce_ex__() method, not __getstate__(), __setstate__().
http://docs.python.org/release/2.6.6/library/pickle.html#pickling-and...
ndarrays are extension types, so they use that mechanism.
Thanks, Robert, as you can see, I got on that track shortly after I
posted my code example. This is apparently NOT a numpy issue, it's an
issue for pickling all C extension types.

Is there a way to examine a Python object, and determine whether it's
a C extension type or not? Or do you have to deduce that from the
documentation and/or the source code?

I started hunting through the numpy source code last night. It's
complicated. I haven't found the ndarray object definition yet.
Perhaps because I was looking through .py files, when I actually
should have been looking through .c files?
Robert Kern
2011-04-07 20:01:18 UTC
Permalink
Post by John Ladasky
Post by Robert Kern
Post by John Ladasky
Post by Philip Semanchuk
And as Robert Kern pointed out, numpy arrays are also pickle-able.
OK, but SUBCLASSES of numpy.ndarray are not, in my hands, pickling as
I would expect. I already have lots of code that is based on such
subclasses, and they do everything I want EXCEPT pickle correctly. I
may have to direct this line of questions to the numpy discussion
group after all.
Define the __reduce_ex__() method, not __getstate__(), __setstate__().
http://docs.python.org/release/2.6.6/library/pickle.html#pickling-and...
ndarrays are extension types, so they use that mechanism.
Thanks, Robert, as you can see, I got on that track shortly after I
posted my code example. This is apparently NOT a numpy issue, it's an
issue for pickling all C extension types.
Yes, but seriously, you should ask on the numpy mailing list. You will probably
run into more numpy-specific issues. At least, we'd have been able to tell you
things like "ndarray is an extension type, so look at that part of the
documentation" quicker.
Post by John Ladasky
Is there a way to examine a Python object, and determine whether it's
a C extension type or not?
For sure? No, not really. Not at the Python level, at least. You may be able to
do something at the C level, I don't know.
Post by John Ladasky
Or do you have to deduce that from the
documentation and/or the source code?
I started hunting through the numpy source code last night. It's
complicated. I haven't found the ndarray object definition yet.
Perhaps because I was looking through .py files, when I actually
should have been looking through .c files?
Yes. The implementation for __reduce__ is in numpy/core/src/multiarray/methods.c
as array_reduce(). You may want to look in numpy/ma/core.py for the definition
of MaskedArray. It shows how you would define __reduce__, __getstate__ and
__setstate__ for a subclass of ndarray.
--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
sturlamolden
2011-04-07 23:39:27 UTC
Permalink
Post by Robert Kern
PicklingError: Can't pickle <class
'multiprocessing.sharedctypes.c_double_Array_10'>: attribute lookup
multiprocessing.sharedctypes.c_double_Array_10 failed
Hehe :D

That is why programmers should not mess with code they don't
understand!

Ga?l and I wrote shmem to avoid multiprocessing.sharedctypes, because
they cannot be pickled (they are shared by handle inheritance)! To do
this we used raw Windows API and Unix System V IPC instead of
multiprocessing.Array, and the buffer is pickled by giving it a name
in the file system. Please be informed that the code on bitbucked has
been "fixed" by someone who don't understand my code. "If it ain't
broke don't fix it."

http://folk.uio.no/sturlamo/python/sharedmem-feb13-2009.zip

Known issues/bugs: 64-bit support is lacking, and os._exit in
multiprocessing causes a memory leak on Linux.
Post by Robert Kern
Maybe. If the __reduce_ex__() method is implemented properly (and
multiprocessing bugs aren't getting in the way), you ought to be able to pass
them to a Pool just fine. You just need to make sure that the shared arrays are
allocated before the Pool is started. And this only works on UNIX machines. The
shared memory objects that shmarray uses can only be inherited. I believe that's
what Sturla was getting at.
It's a C extension that gives a buffer to NumPy. Then YOU changed how
NumPy pickles arrays referencing these buffers, using
pickle.copy_reg :)

Sturla
sturlamolden
2011-04-08 00:03:01 UTC
Permalink
https://bitbucket.org/cleemesser/numpy-sharedmem/src/3fa526d11578/shm...
I've added a few lines to this code which allows subclassing the
shared memory array, which I need (because my neural net objects are
more than just the array, they also contain meta-data). ?But I've run
into some trouble doing the actual sharing part. ?The shmarray class
CANNOT be pickled.
That is hilarious :-)

I see that the bitbucket page has my and Ga?ls name on it, but the
code is changed and broken beyond repair! I don't want to be
associated with that crap!

Their "shmarray.py" will not work -- ever. It fails in two ways:

1. multiprocessing.Array cannot be pickled (as you noticed). It is
shared by handle inheritance. Thus we (that is Ga?l and I) made a
shmem buffer object that could be pickled by giving it a name in the
file system, instead of sharing it anonymously by inheriting the
handle. Obviously those behind the bitbucket page don't understand the
difference between named and anonymous shared memory (that is, System
V IPC and BSD mmap, respectively.)

2. By subclassing numpy.ndarray a pickle dump would encode a copy of
the buffer. But that is what we want to avoid! We want to share the
buffer itself, not make a copy of it! So we changed how numpy pickles
arrays pointing to shared memory, instead of subclassing ndarray. I
did that by slightly modifying some code written by Robert Kern.

http://folk.uio.no/sturlamo/python/sharedmem-feb13-2009.zip
Known issues/bugs: 64-bit support is lacking, and os._exit in
multiprocessing causes a memory leak on Linux.


Sturla
sturlamolden
2011-04-08 00:38:34 UTC
Permalink
Post by sturlamolden
http://folk.uio.no/sturlamo/python/sharedmem-feb13-2009.zip
Known issues/bugs: 64-bit support is lacking, and os._exit in
multiprocessing causes a memory leak on Linux.
I should probably fix it for 64-bit now. Just recompiliong with 64-bit
integers will not work, because I intentionally hardcoded the higher
32 bits to 0. It doesn't help to represent the lower 32 bits with a 64
bit integer (which it seems someone actually have tried :-D) The
memory leak on Linux is pesky. os._exit prevents clean-up code from
executing, but unlike Windows the Linux kernel does no reference
counting. I am worried we actually need to make a small kernel driver
to make this work properly for Linux, since os._exit makes the current
user-space reference counting fail on child process exit.


Sturla
sturlamolden
2011-04-08 01:10:25 UTC
Permalink
Post by sturlamolden
I should probably fix it for 64-bit now. Just recompiliong with 64-bit
integers will not work, because I intentionally hardcoded the higher
32 bits to 0.
That was easy, 64-bit support for Windows is done :-)

Now I'll just have to fix the Linux code, and figure out what to do
with os._exit preventing clean-up on exit... :-(

Sturla
John Ladasky
2011-04-09 07:36:51 UTC
Permalink
Post by sturlamolden
Post by sturlamolden
I should probably fix it for 64-bit now. Just recompiliong with 64-bit
integers will not work, because I intentionally hardcoded the higher
32 bits to 0.
That was easy, 64-bit support for Windows is done :-)
Now I'll just have to fix the Linux code, and figure out what to do
with os._exit preventing clean-up on exit... :-(
Sturla
Hi Sturla,

Thanks for finding my discussion! Yes, it's about passing numpy
arrays to multiple processors. I'll accomplish that any way that I
can.

AND thanks to the discussion provided here by Philip and Robert, I've
become a bit less confused about pickling. I have working code which
subclasses ndarray, pickles using __reduce_ex__, and unpickles using
__setstate__. Just seven lines of additional code do what I want.

I almost understand it, too. :^) Seriously, I'm not altogether sure
about the purpose and structure of the tuple generated by
__reduce_ex__. It looks very easy to abuse and break. Referencing
tuple elements by number seems rather unPythonic. Has Python 3 made
any improvements to pickle, I wonder?

At the moment, my arrays are small enough that pickling them should
not be a very expensive process -- provided that I don't have to keep
doing it over and over again! I'm returning my attention to the
multiprocessing module now. Processes created by Pool do not appear
to persist. They seem to disappear after they are called. So I can't
call them once with the neural net array, and then once again (or even
repeatedly) with input data. Perhaps I need to look at Queue.

I will retain a copy of YOUR shmarray code (not the Bitbucket code)
for some time in the future. I anticipate that my arrays might get
really large, and then copying them might not be practical in terms of
time and memory usage.
sturlamolden
2011-04-09 17:15:49 UTC
Permalink
Thanks for finding my discussion! ?Yes, it's about passing numpy
arrays to multiple processors. ?I'll accomplish that any way that I
can.
My preferred ways of doing this are:

1. Most cases for parallel processing are covered by libraries, even
for neural nets. This particularly involves linear algebra solvers and
FFTs, or calling certain expensive functions (sin, cos, exp) over and
over again. The solution here is optimised LAPACK and BLAS (Intel MKL,
AMD ACML, GotoBLAS, ATLAS, Cray libsci), optimised FFTs (FFTW, Intel
MKL, ACML), and fast vector math libraries (Intel VML, ACML). For
example, if you want to make multiple calls to the function "exp",
there is a good chance you want to use a vector math library. Despite
of this, most Python programmers' instinct seems to be to use multiple
processes with numpy.exp or math.exp, or use multiple threads in C
with exp from libm (cf. math.h). Why go through this pain when a
single function call to Intel VML or AMD ACML (acml-vm) will be much
better? It is common to see scholars argue that "yes but my needs are
so special that I need to customise everything myself." Usually this
translates to "I don't know these libraries (not even that they exist)
and are happy to reinvent the wheel." Thus, if you think you need to
use manually managed threads or processes for parallel technical
computing, and even contemplate that the GIL might get in your way,
there is a 99% chance you are wrong. You will almost ALWAYS want ot
use a fast library, either directly in Python or linked to your own
serial C or Fortran code. You have probably heard that "premature
optimisation is the root of all evil in computer programming." It
particularly applies here. Learn to use the available performance
libraires, and it does not matter from which language they are called
(C or Fortran will not be faster than Python!) This is one of the
major reasons Python can be used for HPC (high-performance computing)
even though the Python part itself is "slow". Most of these libraires
are available for free (GotoBLAS, ATLAS, FFTW, ACML), but Intel MKL
and VML require a license fee.

Also note that there are comprehensive numerical libraries you can
use, which can be linked with multi-threaded performance libraries
under the hood. Of particular interest are the Fortran libraries from
NAG and IMSL, which are the two gold standards of technical computing.
Also note that the linear algebra solvers of NumPy and SciPy in
Enthought Python Distribution are linked with Intel MKL. Enthought's
license are cheaper than Intel's, and you don't need to build NumPy or
SciPy against MKL yourself. Using scipy.linalg from EPD is likely to
cover your need for parallel computing with neural nets.


2. Use Cython, threading.Thread, and release the GIL. Perhaps we
should have a cookbook example in scipy.org on this. In the "nogil"
block you can call a library or do certain things that Cython allows
without holding the GIL.


3. Use C, C++ or Fortran with OpenMP, and call these using Cython,
ctypes or f2py. (I prefer Cython and Fortran 95, but any combination
will do.)


4. Use mpi4py with any MPI implementation.


Note that 1-4 are not mutually exclusive, you can always use a certain
combination.
I will retain a copy of YOUR shmarray code (not the Bitbucket code)
for some time in the future. ?I anticipate that my arrays might get
really large, and then copying them might not be practical in terms of
time and memory usage.
The expensive overhead in passing a NymPy array to
multiprocessing.Queue is related to pickle/cPickle, not IPC or making
a copy of the buffer.

For any NumPy array you can afford to copy in terms of memory, just
work with copies.

The shared memory arrays I made are only useful for large arrays. They
are just as expensive to pickle in terms of time, but can be
inexpensive in terms of memory.

Also beware that the buffer is not copied to the pickle, so you need
to call .copy() to pickle the contents of the buffer.

But again, I'd urge you to consider a library or threads
(threading.Thread in Cython or OpenMP) before you consider multiple
processes. The reason I have not updated the sharedmem arrays for two
years is that I have come to the conclusion that there are better ways
to do this (paricularly vendor tuned libraries). But since they are
mostly useful with 64-bit (i.e. large arrays), I'll post an update
soon.

If you decide to use a multithreaded solution (or shared memory as
IPC), beware of "false sharing". If multiple processors write to the
same cache line (they can be up to 512K depending on hardware), you'll
create an invisible "GIL" that will kill any scalability. That is
because dirty cache lines need to be synchonized with RAM. "False
sharing" is one of the major reasons that "home-brewed" compute-
intensive code will not scale.

It is not uncommon to see Java programmers complain about Python's
GIL, and then they go on to write i/o bound or false shared code. Rest
assured that multi-threaded Java will not scale better than Python in
these cases :-)


Regards,
Sturla
John Ladasky
2011-04-09 20:18:47 UTC
Permalink
Post by sturlamolden
Thanks for finding my discussion! ?Yes, it's about passing numpy
arrays to multiple processors. ?I'll accomplish that any way that I
can.
1. Most cases for parallel processing are covered by libraries, even
for neural nets. This particularly involves linear algebra solvers and
FFTs, or calling certain expensive functions (sin, cos, exp) over and
over again. The solution here is optimised LAPACK and BLAS (Intel MKL,
AMD ACML, GotoBLAS, ATLAS, Cray libsci), optimised FFTs (FFTW, Intel
MKL, ACML), and fast vector math libraries (Intel VML, ACML). For
example, if you want to make multiple calls to the function "exp",
there is a good chance you want to use a vector math library. Despite
of this, most Python programmers' instinct seems to be to use multiple
processes with numpy.exp or math.exp, or use multiple threads in C
with exp from libm (cf. math.h). Why go through this pain when a
single function call to Intel VML or AMD ACML (acml-vm) will be much
better? It is common to see scholars argue that "yes but my needs are
so special that I need to customise everything myself." Usually this
translates to "I don't know these libraries (not even that they exist)
and are happy to reinvent the wheel." ?
Whoa, Sturla. That was a proper core dump!

You're right, I'm unfamiliar with the VAST array of libraries that you
have just described. I will have to look at them. It's true, I
probably only know of the largest and most widely-used Python
libraries. There are so many, who can keep track?

Now, I do have a special need. I've implemented a modified version of
the Fahlman Cascade Correlation algorithm that will not be found in
any existing libraries, and which I think should be superior for
certain types of problems. (I might even be able to publish this
algorithm, if I can get it working and show some examples?)

That doesn't mean that I can't use the vector math libraries that
you've recommended. As long as those libraries can take advantage of
my extra computing power, I'm interested. Note, however, that the
cascade evaluation does have a strong sequential requirement. It's
not a traditional three-layer network. In fact, describing a cascade
network according to the number of "layers" it has is not very
meaningful, because each hidden node is essentially its own layer.

So, there are limited advantages to trying to parallelize the
evaluation of ONE cascade network's weights against ONE input vector.
However, evaluating several copies of one cascade network's output,
against several different test inputs simultaneously, should scale up
nicely. Evaluating many possible test inputs is exactly what you do
when training a network to a data set, and so this is how my program
is being designed.
Post by sturlamolden
Thus, if you think you need to
use manually managed threads or processes for parallel technical
computing, and even contemplate that the GIL might get in your way,
there is a 99% chance you are wrong. You will almost ALWAYS want ot
use a fast library, either directly in Python or linked to your own
serial C or Fortran code. You have probably heard that "premature
optimisation is the root of all evil in computer programming." It
particularly applies here.
Well, I thought that NUMPY was that fast library...

Funny how this works, though -- I built my neural net class in Python,
rather than avoiding numpy and going straight to wrapping code in C,
precisely because I wanted to AVOID premature optimization (for
unknown, and questionable gains in performance). I started on this
project when I had only a single-core CPU, though. Now that multi-
core CPU's are apparently here to stay, and I've seen just how long my
program takes to run, I want to make full use of multiple cores. I've
even looked at MPI. I'm considering networking to another multi-CPU
machine down the hall, once I have my program working.
Post by sturlamolden
But again, I'd urge you to consider a library or threads
(threading.Thread in Cython or OpenMP) before you consider multiple
processes.
My single-CPU neural net training program had two threads, one for the
GUI and one for the neural network computations. Correct me if I'm
wrong here, but -- since the two threads share a single Python
interpreter, this means that only a single CPU is used, right? I'm
looking at multiprocessing for this reason.
Post by sturlamolden
The reason I have not updated the sharedmem arrays for two
years is that I have come to the conclusion that there are better ways
to do this (paricularly vendor tuned libraries). But since they are
mostly useful with 64-bit (i.e. large arrays), I'll post an update
soon.
If you decide to use a multithreaded solution (or shared memory as
IPC), beware of "false sharing". If multiple processors write to the
same cache line (they can be up to 512K depending on hardware), you'll
create an invisible "GIL" that will kill any scalability. That is
because dirty cache lines need to be synchonized with RAM. "False
sharing" is one of the major reasons that "home-brewed" compute-
intensive code will not scale.
Even though I'm not formally trained in computer science, I am very
conscious of the fact that WRITING to shared memory is a problem,
cache or otherwise. At the very top of this thread, I pointed out
that my neural network training function would need READ-ONLY access
to two items -- the network weights, and the input data. Given that,
and my (temporary) struggles with pickling, I considered the shared-
memory approach as an alternative.
Post by sturlamolden
It is not uncommon to see Java programmers complain about Python's
GIL, and then they go on to write i/o bound or false shared code. Rest
assured that multi-threaded Java will not scale better than Python in
these cases :-)
I've never been a Java programmer, and I hope it stays that way!
sturlamolden
2011-04-10 15:01:41 UTC
Permalink
Post by John Ladasky
So, there are limited advantages to trying to parallelize the
evaluation of ONE cascade network's weights against ONE input vector.
However, evaluating several copies of one cascade network's output,
against several different test inputs simultaneously, should scale up
nicely. ?
You get an 1 by n Jacobian for the errors, for which a parallel vector
math library could help. You will also need to train the input layer
(errors with an m by n Jacobian), with QuickProp or LM, to which an
optimized LAPACK can give you a parallel QR or SVD. So all the
inherent parallelism in this case can be solved by libraries.
Post by John Ladasky
Well, I thought that NUMPY was that fast library...
NumPy is a convinient container (data buffer object), and often an
acceptaby fast library (comparable to Matlab). But comparted to raw C
or Fortran it can be slow (NumPy is memory bound, not compute bound),
and compared to many 'performance libraires' NumPy code will run like
turtle.
Post by John Ladasky
My single-CPU neural net training program had two threads, one for the
GUI and one for the neural network computations. ?Correct me if I'm
wrong here, but -- since the two threads share a single Python
interpreter, this means that only a single CPU is used, right? ?I'm
looking at multiprocessing for this reason.
Only access to the Python interpreter i serialised. It is a common
misunderstanding that everything is serialized.

Two important points:

1. The functions you call can spawn threads in C or Fortran. These
threads can run feely even though one of them is holding the GIL. This
is e.g. what happens when you use OpenMP with Fortran or call a
performance library. This is also how Twisted can implement
asynchronous i/o on Windows using i/o completion ports (Windows will
set up a pool of background threads that run feely.)

2. The GIL can be released while calling into C or Fortran, and
reacquired afterwards, so multiple Python threads can execute in
parallel. If you call a function you know to be re-entrant, you can
release the GIL when calling it. This is also how Python threads can
be used to implement non-blocking i/o with functions like file.read
(they release the GIL before blocking on i/o).

The extent to which these two situations can happen depends on the
libraries you use, and how you decide to call them.

Also note that NumPy/SciPy can be compiled againt different BLAS/
LAPACK libraries, some of which will use multithreading internally.
Note that NumPy and SciPy does not release the GIL as often as it
could, which is why I often prefer to use libraries like LAPACK
directly.

In your setup you should relese the GIL around computations to prevent
the GUI from freezing.


Sturla
sturlamolden
2011-04-10 22:35:32 UTC
Permalink
Post by sturlamolden
That was easy, 64-bit support for Windows is done :-)
Now I'll just have to fix the Linux code, and figure out what to do
with os._exit preventing clean-up on exit... :-(
Now it feel dumb, it's not worse than monkey patching os._exit, which
I should have realised since the machinery depends on monkey patching
NumPy. I must have forgotten we're working with Python here, or I have
been thinking too complex. Ok, 64-bit support for Linux is done too,
and the memory leak is gone :-)

Sturla
Miki Tebeka
2011-04-10 16:11:35 UTC
Permalink
Post by John Ladasky
Now, I don't know that I actually HAVE to pass my neural network and
input data as copies -- they're both READ-ONLY objects for the
duration of an evaluate function (which can go on for quite a while).
One option in that case is to use "fork" (if you're on a *nix machine).
See http://pythonwise.blogspot.com/2009/04/pmap.html for example ;)

HTH
--
Miki Tebeka <miki.tebeka at gmail.com>
http://pythonwise.blogspot.com
John Nagle
2011-04-10 16:27:31 UTC
Permalink
Post by Miki Tebeka
Post by John Ladasky
Now, I don't know that I actually HAVE to pass my neural network and
input data as copies -- they're both READ-ONLY objects for the
duration of an evaluate function (which can go on for quite a while).
One option in that case is to use "fork" (if you're on a *nix machine).
See http://pythonwise.blogspot.com/2009/04/pmap.html for example ;)
Unless you have a performance problem, don't bother with shared
memory.

If you have a performance problem, Python is probably the wrong
tool for the job anyway.

John Nagle
sturlamolden
2011-04-10 22:29:44 UTC
Permalink
? ? Unless you have a performance problem, don't bother with shared
memory.
? ? If you have a performance problem, Python is probably the wrong
tool for the job anyway.
Then why does Python have a multiprocessing module?

In my opinion, if Python has a multiprocessing module in the standard
library, it should also be possible to use it with NumPy.

Sturla
John Nagle
2011-04-11 07:21:42 UTC
Permalink
Post by sturlamolden
Post by John Nagle
Unless you have a performance problem, don't bother with shared
memory.
If you have a performance problem, Python is probably the wrong
tool for the job anyway.
Then why does Python have a multiprocessing module?
Because nobody can fix the Global Interpreter Lock problem in CPython.

The multiprocessing module is a hack to get around the fact
that Python threads don't run concurrently, and thus, threaded
programs don't effectively use multi-core CPUs.

John Nagle
sturlamolden
2011-04-11 19:11:47 UTC
Permalink
? ? ?Because nobody can fix the Global Interpreter Lock problem in CPython.
? ? ?The multiprocessing module is a hack to get around the fact
that Python threads don't run concurrently, and thus, threaded
programs don't effectively use multi-core CPUs.
Then why not let NumPy use shared memory in this case? It is as simple
as this:

import numpy as np
import sharedmem as sm
private_array = np.zeros((10,10))
shared_array = sm.zeros((10,10))

I.e. the code is identical (more or less).

Also, do you think shared memory has other uses besides parallel
computing, such as IPC? It is e.g. an easy way to share data between
Python and an external program. Do you e.g. need to share (not send)
large amounts of data between Python and Java or Matlab?

The difference here from multiprocessing.Array is that IPC with
multiprocessing.Queue will actually work, because I am not using
anonymous shared memory. Instead of inherting handles, the segment has
a name in the file system, which means it can be memory mapped from
anywhere.

I posted the code to the NumPy mailing list yesterday.

Sturla
sturlamolden
2011-04-11 19:25:37 UTC
Permalink
Post by sturlamolden
import numpy as np
import sharedmem as sm
private_array = np.zeros((10,10))
shared_array ?= sm.zeros((10,10))
I am also using this to implement synchronization primitives (barrier,
lock, semaphore, event) that can be sent over an instance of
multiprocessing.Queue. The synchronization primitives in
multiprocessing cannot be communicated this way.

Sturla

Loading...