Discussion:
[Python-Dev] Making python C-API thread safe (try 2)
(too old to reply)
Harri Pesonen
2003-09-15 19:39:24 UTC
Permalink
On Sat, 13 Sep 2003 13:55:59 GMT,
It's not useless, but it is not optimal. That's a fact. If you have a
multiprocessor machine, you just waste it because Python can't run two
threads at the same time. So if you want to create a high performance
web site, then Python is out of the question for the scripting language.
You're making an awful lot of assumptions there. I'm going to
concentrate on one of them.
First, you assume that this hypothetical high-performance website
really is CPU-bound. I've never had one of those to deal with, but I
won't argue.
Second, I'll assume that SMP really is the most cost-effective way
available to scale up your CPU power. Probably depends on the rack
space you have available and maybe somebody already provided an SMP
box for the job.
Third, I'll assume that cacheing is already being used to the extent
that it's applicable. Every website I've worked on is much more
read-intensive than write-intensive. For these sort of sites,
appropriate cacheing buys you a LOT.
Disk space and RAM are cheap. But maybe you have an app where this
isn't the case and you really can't avoid heavy CPU useage.
Fourth, you seem to think that threading is the only / best-practice
way to handle a lot of requests. This is the point I'm going to argue
against.
I'll give one counter-example because it's the one I'm most familiar
with.
In the Zope world, it's common to run a load balancer in front of one
or more ZEO clients (processes) per processor, all connected to a
single ZEO server. This is a matter of very simple configuration. It's
*completely* transparent to the application, which has no idea whether
you're running ZEO or have a direct connection to a dedicated ZODB
database. ZEO has the added benefit of being equally applicable
whether the CPUs are on one box or not. You can't say that about
threading. This approach could certainly be used outside the zope
world. ZEO is not rocket science, it's about 6500 lines of code plus
3100 lines of unit tests. Note also that ZEO's primary purpose is to
remove one of the single points of failure; handling more requests is
almost a side effect. Free threading won't help you with the
point-of-failure issue.
I will also mention that it's quite possible to handle a lot of
requests without using threads *at all*, and the resulting code can
look very nice. See twistedmatrix.com.
Sure, it would be nice if a single python process could take advantage
of SMP, but it's demonstrably false that python is currently "out of
the question" for "high performance" web use. In fact python performs
very well on the web. To say otherwise is nothing but FUD.
I think the primary problem with the GIL is the bad publicity that
results from faulty assumptions like this.
Okay, so it's not out of the question, but it is still not optimal! :-)

The point I was trying to make (not in this message but in general) is
that it would be simple (trivial but tedious) to create a version of
Python that is thread-safe, and the only reason it is not done is
because it would break old code. So we are in this GIL world just
because of that old code... it's a shame. It's like Visual Basic 6, it
can't multitask properly either (but for other reasons). All other
modern languages are free-threaded. Before I learned Python I assumed
that it is as well. From the response I got from python-dev it seems
that the GIL is here to stay forever, unless someone does something to
it. I only wish I could spend a few weeks full time with Python C API,
and make a free-threading version.

"Therefore, the rule exists that only the thread that has acquired the
global interpreter lock may operate on Python objects or call Python/C
API functions. In order to support multi-threaded Python programs, the
interpreter regularly releases and reacquires the lock -- by default,
every 100 bytecode instructions (this can be changed with
sys.setcheckinterval()). The lock is also released and reacquired around
potentially blocking I/O operations like reading or writing a file, so
that other threads can run while the thread that requests the I/O is
waiting for the I/O operation to complete."

Yuck. But then:

"The Python interpreter needs to keep some bookkeeping information
separate per thread -- for this it uses a data structure called
PyThreadState . This is new in Python 1.5; in earlier versions, such
state was stored in global variables, and switching threads could cause
problems. In particular, exception handling is now thread safe, when the
application uses sys.exc_info() to access the exception last raised in
the current thread.

There's one global variable left, however: the pointer to the current
PyThreadState structure. While most thread packages have a way to store
``per-thread global data,'' Python's internal platform independent
thread abstraction doesn't support this yet. Therefore, the current
thread state must be manipulated explicitly."

Only one global variable left (in fact there is Py_None as well). Why
not get rid of it, then??

Harri
Brian Quinlan
2003-09-15 20:55:32 UTC
Permalink
Post by Harri Pesonen
The point I was trying to make (not in this message but in general) is
that it would be simple (trivial but tedious) to create a version of
Python that is thread-safe, and the only reason it is not done is
because it would break old code.
Actually, it would be reasonably difficult. If you don't agree why not
spend a weekend implementing it and see how robust your implementation
is? Also, adding object-level locking would involve a huge performance
penalty.
Post by Harri Pesonen
All other modern languages are free-threaded.
Ruby is not free-threaded in the sense that you mean.
Post by Harri Pesonen
"Therefore, the rule exists that only the thread that has acquired the
global interpreter lock may operate on Python objects or call Python/C
API functions. In order to support multi-threaded Python programs, the
interpreter regularly releases and reacquires the lock -- by default,
every 100 bytecode instructions (this can be changed with
sys.setcheckinterval()). The lock is also released and reacquired
around
Post by Harri Pesonen
potentially blocking I/O operations like reading or writing a file, so
that other threads can run while the thread that requests the I/O is
waiting for the I/O operation to complete."
Yuck.
You can release the lock at other times too. If you were doing a
long-running calculation that doesn't require use of the Python API, for
example, it would be appropriate to release the GIL.
Post by Harri Pesonen
Only one global variable left (in fact there is Py_None as well). Why
not get rid of it, then??
I think that you must be missing something... There is nothing special
about Py_None. The number 5 is globally shared just like Py_None is.
This is a performance optimization used to prevent lots of small numbers
from being allocated and destroyed all the time.

Cheers,
Brian
Christopher A. Craig
2003-09-12 12:46:54 UTC
Permalink
After sleeping over night, I think that I got it. :-) The simple solution is,
that each thread created in Python gets its own independent interpreter state
as well. And there could be a separate thread-global interpreter state for
shared memory access. Access to this global state would always be
synchronized. There could even be multiple named global states, so that the
thread interlocking could be minimized. The python syntax for creating objects
synchronize a = "abcd"
Also when creating the new thread, it's arguments would be copied from the
creating state to the new state.
What does it sound? Of course it would be incompatible with the current
threading system in Python, but it would be totally multithreading, no global
interpreter lock needed. It would be faster than current Python, there would
be no need to free or acquire the lock when calling OS functions, and no need
to check how many byte codes have been processed, etc.
Couldn't you do this now with multiple processes and the shm module?
--
Christopher A. Craig <list-python at ccraig.org>
"I affirm brethren by the boasting in you which I have in Christ Jesus
our Lord, I die daily" I Cor 15:31 (NASB)
Shane Hathaway
2003-09-11 20:14:51 UTC
Permalink
This sounds like POSH. There was a paper about it at PyCon.
http://poshmodule.sourceforge.net/posh/html/
Excellent. Thank you for the pointer.

Shane
Jeff Epler
2003-09-17 12:08:48 UTC
Permalink
But wouldn't it be better if Python had real multitasking? Comments like
the above mean that you accept Python as it is, fine. But usually people
want to make things better when they see that something can be improved.
If my roof is leaking, I have two choices: fix it or accept it. If I am
able to fix it, then I'll probably do it.
The old code base really is the problem here. If Python threads really
run at the same time in the future, how many current applications stop
working because they depend on the fact that only one thread runs at any
given time, and do not acquire and release locks as needed? On the other
hand, my suggestion probably means that we couldn't have a threading
module compatible with thread or threading anyhow (we could have
freethreading, with specific functions for inter-thread communication).
I don't think "the roof is leaking". In the current situation, users
can write threads in the cases where they make sense (and easily share
any data without jumping through hoops) and even derive a performance
benefit when the threads' work is done without the GIL held. Or, they
can use multiple processes with explicit data sharing, and derive a
performance benefit even when each process does its work with the GIL
held.

In your situation, the first paradigm becomes impossible, and the second
paradigm has been modeled with a "shared nothing" approach which adds
complexity to Python to achieve with threads what the operating system
already offers using processes!

Keep in mind that probably more than of 99.9% of machines out there
have only one CPU anyway, so all you re-claim is the GIL overhead if you
remove it. Yay, a 2%* performance increase for a huge loss in flexibility.
And processes still win, because you can probably easily hack out the
thread/threading modules, turn the GIL operations into no-ops, and use
fork() to get that 2%* performance increase.

Let me know when you have benchmark figures to the contrary.

Jeff
* Number picked out of the air
Mitch Chapman
2003-09-13 13:16:01 UTC
Permalink
Mitch> At IPC8 Greg Wilson, then of the Software Carpentry project,
Mitch> noted that the GIL made it hard to write multi-threaded
Python
Mitch> apps which could take advantage of multi-processor systems.
He
Mitch> argued that this would limit Python's appeal in some
segments of
Mitch> the scientific community.
This is a known issue.
For example, it was discussed at IPC8 ;)
Thusfar, it hasn't seemed to slow down Python's
acceptance by the scientific community all that much.
Good point.
Mitch> Perhaps those who find the subject important have left the
Mitch> community? Perhaps they've adopted kludgey workarounds?
Or perhaps they are happy to have tools like scipy and MayaVi to make
their
jobs easier.
No doubt. I'm among those happy to have access to scipy.weave, for
example. But I also wish for better thread scalability to make my job
easier.

Unlike Andrew I don't think the lack of maintenance for 1.4's free
threading packages is due to any perception that threading performance
is unimportant. It seems more likely that the packages were not updated
because they proved not to solve the performance problems, and that no
alternatives have emerged because the problem is hard to solve.

So, if Harri Pesonen has ideas for achieving better thread scalability,
I want to encourage him to develop them rather than to suggest such
an effort is unimportant to the community.
Nobody has claimed that it isn't a problem for some people.
I read Andrew's comment as meaning that he believed nobody in the
community thought scalable threading was particularly important. In
order for that to be true, something like Greg Wilson's prediction, that
many who would otherwise have found Python appealing would have turned
to other solutions, would have to have come to pass. ("Perhaps those...
have left the community?")

Or else, as I noted, people might have found kludgey workarounds. Or as
had happened in my company, they might have deferred some performance
problems which could have been addressed by scalable threading. Both of
It's maybe less
of a problem than it appears at first though.
Having said all of that, it looks like this thread originated on python-
dev, rather than on c.l.py where I found it. So I'm probably missing
some reasons for what appears to be a dismissive attitude toward Harri's
efforts. I'll go read the python-dev archives.

--
Mitch
Skip Montanaro
2003-09-13 15:09:05 UTC
Permalink
Mitch> Unlike Andrew I don't think the lack of maintenance for 1.4's
Mitch> free threading packages is due to any perception that threading
Mitch> performance is unimportant. It seems more likely that the
Mitch> packages were not updated because they proved not to solve the
Mitch> performance problems, and that no alternatives have emerged
Mitch> because the problem is hard to solve.

One reason (maybe the primary reason) it never went further than a patch to
1.4 was that it was slower in the (common) single-threaded case. Free
threading is no magic bullet. There is a lot of overhead in maintaining all
the fine-grained locks necessary to dispense with the GIL.

Mitch> Having said all of that, it looks like this thread originated on
Mitch> python-dev, rather than on c.l.py where I found it. So I'm
Mitch> probably missing some reasons for what appears to be a dismissive
Mitch> attitude toward Harri's efforts. I'll go read the python-dev
Mitch> archives.

Python-dev is the place to discuss concrete proposals. Python-list is the
place to hash out ideas. Simple as that. You can read the few messages in
python-dev in the archive for September:

http://mail.python.org/pipermail/python-dev/2003-September/thread.html

Search for "thread safe".

Skip
Paul Winkler
2003-09-15 18:24:41 UTC
Permalink
On Sat, 13 Sep 2003 13:55:59 GMT,
It's not useless, but it is not optimal. That's a fact. If you have a
multiprocessor machine, you just waste it because Python can't run two
threads at the same time. So if you want to create a high performance
web site, then Python is out of the question for the scripting language.
You're making an awful lot of assumptions there. I'm going to
concentrate on one of them.

First, you assume that this hypothetical high-performance website
really is CPU-bound. I've never had one of those to deal with, but I
won't argue.

Second, I'll assume that SMP really is the most cost-effective way
available to scale up your CPU power. Probably depends on the rack
space you have available and maybe somebody already provided an SMP
box for the job.

Third, I'll assume that cacheing is already being used to the extent
that it's applicable. Every website I've worked on is much more
read-intensive than write-intensive. For these sort of sites,
appropriate cacheing buys you a LOT.
Disk space and RAM are cheap. But maybe you have an app where this
isn't the case and you really can't avoid heavy CPU useage.

Fourth, you seem to think that threading is the only / best-practice
way to handle a lot of requests. This is the point I'm going to argue
against.

I'll give one counter-example because it's the one I'm most familiar
with.
In the Zope world, it's common to run a load balancer in front of one
or more ZEO clients (processes) per processor, all connected to a
single ZEO server. This is a matter of very simple configuration. It's
*completely* transparent to the application, which has no idea whether
you're running ZEO or have a direct connection to a dedicated ZODB
database. ZEO has the added benefit of being equally applicable
whether the CPUs are on one box or not. You can't say that about
threading. This approach could certainly be used outside the zope
world. ZEO is not rocket science, it's about 6500 lines of code plus
3100 lines of unit tests. Note also that ZEO's primary purpose is to
remove one of the single points of failure; handling more requests is
almost a side effect. Free threading won't help you with the
point-of-failure issue.

I will also mention that it's quite possible to handle a lot of
requests without using threads *at all*, and the resulting code can
look very nice. See twistedmatrix.com.

Sure, it would be nice if a single python process could take advantage
of SMP, but it's demonstrably false that python is currently "out of
the question" for "high performance" web use. In fact python performs
very well on the web. To say otherwise is nothing but FUD.

I think the primary problem with the GIL is the bad publicity that
results from faulty assumptions like this.
A.M. Kuchling
2003-09-12 11:40:48 UTC
Permalink
On Fri, 12 Sep 2003 07:56:55 +0300,
I don't know, I got mail about writing a PEP. It is clear that it would
not be accepted, because it would break the existing API. The change is
so big that I think that it has to be called a different language.
It would just be a different implementation of the same language. Jython
has different garbage collection characteristics from CPython, but they
still implement the same language; Stackless Python is still Python.
because this is too important to be ignored. Python *needs* to be
free-threading...
On the other hand, considering that the last free threading packages were
for 1.4, and no one has bothered to update them, the community doesn't seem
to find the subject as important as you do. :)

--amk
Steve Holden
2003-09-14 16:13:19 UTC
Permalink
"Harri Pesonen" <fuerte at sci.fi> wrote in message
news:EIL8b.6564$ZB4.1083 at reader1.news.jippii.net...
[...]
In my opinion, free threading
is something that *must* be fixed. It is the biggest flaw in a language,
which I falled in love with just a couple of weeks ago.
I've been keeping quiet about this thread for a while now, and have even
deleted a couple of curmudgeonly draft responses to earlier postings of
yours. However, you've now managed to generate a curmudgeon-trigger event.
The key phrase above is "In my opinion". Apparently you hold your own
opinion in high regard.

Since you only fell in live with Python a couple of weeks ago, it behooves
you to do rather more research than you apparently have before telling its
long-term live-in partners what's wrong with the language and how to fix it.

Just as a matter of interest, have you searched the archives for previous
discussions about implementing free-threading in Python? If so, your
opinions seem apparently untouched by what you would have read.
And the flaw is
not in the language, but just in the implementation. I like Python so
much that I plan to do all my future developing with it, as much as is
possible. But when I found out about the C API internals, global
interpreter lock, thread state locking and so on, I was very
disappointed. So disappointed that I decided to do something to it, at
some point, perhaps in six months, when I have more time. Meanwhile, I
hope that someone has heard what I said, and perhaps does something to
it. It is the biggest problem in the best language there is at the moment.
The traditional response to such gushings, which I believe has already come
your way via both python-dev and this newsgroup, is "post a patch". Please
realise that it's only *your* opinion that this is Python's biggest problem.
It may well be a problem for the applications you want to write in it, but
believe me when I tell you, if it were a real problem for the majority of
users it would already have been addressed by some of the better minds in
the software development community.

A little humility wouldn't hurt. comp.lang.python is a very tolerant
newsgroup (when I'm not posting :-) but you are in danger of being perceived
as arrogant and thoughtless.

which-for-all-i-know-you-are-not-ly y'rs - steve
--
Steve Holden http://www.holdenweb.com/
Python Web Programming http://pydish.holdenweb.com/pwp/
Mitch Chapman
2003-09-12 14:07:35 UTC
Permalink
Post by A.M. Kuchling
On Fri, 12 Sep 2003 07:56:55 +0300,
On the other hand, considering that the last free threading packages were
for 1.4, and no one has bothered to update them, the community doesn't seem
to find the subject as important as you do. :)
--amk
At IPC8 Greg Wilson, then of the Software Carpentry project,
noted that the GIL made it hard to write multi-threaded Python
apps which could take advantage of multi-processor systems. He
argued that this would limit Python's appeal in some segments of
the scientific community.

Perhaps those who find the subject important have left the
community? Perhaps they've adopted kludgey workarounds?

In my company the GIL has meant that some performance problems,
which could most easily be addressed by multiple threads
sharing access to large collections of Python objects, have
been deferred.


--
Mitch
Skip Montanaro
2003-09-12 19:57:03 UTC
Permalink
Mitch> At IPC8 Greg Wilson, then of the Software Carpentry project,
Mitch> noted that the GIL made it hard to write multi-threaded Python
Mitch> apps which could take advantage of multi-processor systems. He
Mitch> argued that this would limit Python's appeal in some segments of
Mitch> the scientific community.

This is a known issue. Thusfar, it hasn't seemed to slow down Python's
acceptance by the scientific community all that much. Look at Scientific
Python, Numeric (and Numarray), as well as work in bioinformatics, 3D
graphics, etc. I think the availability of high-quality tools is just as
important in the scientific community as elsewhere.

Mitch> Perhaps those who find the subject important have left the
Mitch> community? Perhaps they've adopted kludgey workarounds?

Or perhaps they are happy to have tools like scipy and MayaVi to make their
jobs easier.

Mitch> In my company the GIL has meant that some performance problems,
Mitch> which could most easily be addressed by multiple threads sharing
Mitch> access to large collections of Python objects, have been
Mitch> deferred.

Nobody has claimed that it isn't a problem for some people. It's maybe less
of a problem than it appears at first though.

Skip
Gary Feldman
2003-09-14 18:11:05 UTC
Permalink
* You don't have to add locks around your data structures, and run the risk
of subtle and time-dependent bugs if you forget or get the locking wrong.
* You have to figure out some persistent way to store session information so
that all the processes can access it. This is extra work, but it also
Of course, this point obviates the previous point. And regardless of which
approach you use, you need to spend time to distinguish shared data and
per-thread/process data.
any benefit from using threading in web applications.
The main benefit of threads is performance. Sharing data among the threads
is often faster than sharing data among processes (you don't need to deal
with shm_get, etc. for sharing memory on UNIX systems). Locking the shared
data is likewise often faster. Finally, my understanding of the MS Windows
world is that threads as a general rule are more efficient than processes.

Gary
Aahz
2003-09-15 05:36:50 UTC
Permalink
In article <pdb9mv42covtudsfnfj02lakbniufabt6t at 4ax.com>,
Post by Gary Feldman
The main benefit of threads is performance. Sharing data among
the threads is often faster than sharing data among processes (you
don't need to deal with shm_get, etc. for sharing memory on UNIX
systems). Locking the shared data is likewise often faster. Finally,
my understanding of the MS Windows world is that threads as a general
rule are more efficient than processes.
The main disadvantage of free-threading lies with external libraries
that are not thread-safe (not even talking about thread-hot). It's
*much* easier to work with external libraries that are explicitly
designed to work with Python's threading system.
--
Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/

"It is easier to optimize correct code than to correct optimized code."
--Bill Harlan
Shane Hathaway
2003-09-11 22:50:51 UTC
Permalink
I wonder whether others would consider such a thing valuable, or even
feasible. :-)
I consider it valuable, but not feasible. Contrary to your analysis, I
believe the biggest problem with shared memory is pointer/address
management. E.g. if you have a shared list L, and do
L.append(1)
then a reference to the object 1 is put into L[0]. However, the
pointer is only valid in the process writing the reference, not in
other processes. Fixing that is nearly impossible, except *perhaps* by
reimplementing Python from ground up.
There are two ways to solve that:

- Copy the "1" object into shared memory before passing it to
L.append(). This would be nice and transparent, but also a little DWIM.

- Create special object types just for shared memory.

As it turns out, POSH (see Jeremy's link) takes the second strategy,
apparently quite successfully! Anyone who has concerns about the GIL
should take a long, hard look at POSH. It just might solve the problem
for lots of people.

Shane
Pettersen, Bjorn S
2003-09-17 19:32:13 UTC
Permalink
From: Harri Pesonen [mailto:fuerte at sci.fi]
It is more efficient to have one process and several threads, than
several processes each having one thread.
If it's easier to code "several processes each having one thread",
then
I'll sure do it. I've accepted a 2x-100x speed hit by using Python,
so
if it's a 25% difference in efficiency, I won't sign up to wait for
code
that doesn't exist yet.
But wouldn't it be better if Python had real multitasking? Comments
like the
above mean that you accept Python as it is, fine. But usually people
want to
make things better when they see that something can be improved. If my
roof is
leaking, I have two choices: fix it or accept it. If I am able to fix
it, then
I'll probably do it.
[...]

No, comments like the above realizes that there are certain tradeoffs.
Reference
counting as a primary garbage collection mechanism is fundamentally
incompatible
with multithreading in an object oriented language -- unless you have
mitigating
factors like the GIL. The number of locking operations quickly overwhelm
actual
computation. (I'm aware that you don't want shared data, however I
completely
fail to see the utility..)

There is some research indicating that generational gc is more efficient
with
regards to total program run time in heavily mt oo applications, so
that's
certainly an interesting area to look into -- it would, however, require
a
fundamentally different language implmentation (one which didn't have
the nice
finalization guarantees CPython currently has).

When it comes to applicability, I'm assuming you're not heading in this
direction to do heavy computational work in pure Python(?) If all you
want to do
is watch your Python programs run maximally paralell, I would suggest
Jython as
an alternative without a GIL...

-- bjorn
Brian Quinlan
2003-09-16 18:24:31 UTC
Permalink
There is no object-level locking in my proposal. Just independent
free-threaded interpreters, which don't see the objects of other
interpreters at all.
OK, but this is useless to the average Python programmer. It is only
useful to people embedding Python interpreters in multithreaded
applications. I would imagine that this represents <1% of Python users.
There could be an extra global interpreter state for shared-memory
object access. Accessing this would always be synchronized, but only
this. Python would automatically copy data from this state to
thread-local state and back when needed. This would require a special
So now you want to change the language definition for the benefit of a
small minority of users?

Cheers,
Brian
Harri Pesonen
2003-09-16 18:43:37 UTC
Permalink
Post by Brian Quinlan
There is no object-level locking in my proposal. Just independent
free-threaded interpreters, which don't see the objects of other
interpreters at all.
OK, but this is useless to the average Python programmer. It is only
useful to people embedding Python interpreters in multithreaded
applications. I would imagine that this represents <1% of Python users.
No, all Python developers who create threads would benefit as well. It's
probably another 1%.
Post by Brian Quinlan
There could be an extra global interpreter state for shared-memory
object access. Accessing this would always be synchronized, but only
this. Python would automatically copy data from this state to
thread-local state and back when needed. This would require a special
So now you want to change the language definition for the benefit of a
small minority of users?
No, change the language definition (if needed) for the benefit of real
multitasking.

Harri

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20030916/5c6ffb4c/attachment.html>
Martin v. Löwis
2003-09-15 21:39:19 UTC
Permalink
Post by Harri Pesonen
The point I was trying to make (not in this message but in general) is
that it would be simple (trivial but tedious) to create a version of
Python that is thread-safe, and the only reason it is not done is
because it would break old code. So we are in this GIL world just
because of that old code...
This statement is not true. It is not trivial, and it is not being not
done because of old code.

Your approach to support "multi-threading" (add an interpreter state
to all functions) would allow to use different interpreters across
different threads, and those interpreters could not share a single
object.

I doubt that this is what most users would want as "SMP Python", and
it has no significant difference over a multi-process solution.

IOW, it is a useless approach.
Post by Harri Pesonen
It's like Visual Basic 6, it can't multitask properly either (but
for other reasons). All other modern languages are
free-threaded. Before I learned Python I assumed that it is as well.
You mean, like Tcl, Perl, or Ruby? Neither of these has free threading
(plus, your proposed implementation strategy would not offer free
threading, either)
Post by Harri Pesonen
Only one global variable left (in fact there is Py_None as well). Why
not get rid of it, then??
Because global variables are not the only problem. Not even the most
important one. Atleast not if you want free threading.

Regards,
Martin
Harri Pesonen
2003-09-16 17:06:00 UTC
Permalink
Post by Brian Quinlan
Post by Harri Pesonen
The point I was trying to make (not in this message but in general) is
that it would be simple (trivial but tedious) to create a version of
Python that is thread-safe, and the only reason it is not done is
because it would break old code.
Actually, it would be reasonably difficult. If you don't agree why not
spend a weekend implementing it and see how robust your implementation
is? Also, adding object-level locking would involve a huge performance
penalty.
There is no object-level locking in my proposal. Just independent
free-threaded interpreters, which don't see the objects of other
interpreters at all.

There could be an extra global interpreter state for shared-memory
object access. Accessing this would always be synchronized, but only
this. Python would automatically copy data from this state to
thread-local state and back when needed. This would require a special
syntax for variables in global state:

synchronize a = "asdf"
Post by Brian Quinlan
Post by Harri Pesonen
Only one global variable left (in fact there is Py_None as well). Why
not get rid of it, then??
I think that you must be missing something... There is nothing special
about Py_None. The number 5 is globally shared just like Py_None is.
This is a performance optimization used to prevent lots of small numbers
from being allocated and destroyed all the time.
Py_None is special because it is shared between all interpreters, it is
global. Py_None is defined as:

#define Py_None (&_Py_NoneStruct)

PyObject _Py_NoneStruct = {
PyObject_HEAD_INIT(&PyNone_Type)
};

static PyTypeObject PyNone_Type = {
PyObject_HEAD_INIT(&PyType_Type)
0,
"NoneType",
0,
0,
(destructor)none_dealloc, /*tp_dealloc*/ /*never called*/
0, /*tp_print*/
0, /*tp_getattr*/
0, /*tp_setattr*/
0, /*tp_compare*/
(reprfunc)none_repr, /*tp_repr*/
0, /*tp_as_number*/
0, /*tp_as_sequence*/
0, /*tp_as_mapping*/
0, /*tp_hash */
};

From the above we see that when Py_None reference count reaches zero, a
destructor none_dealloc is called.

static void
none_dealloc(PyObject* ignore)
{
/* This should never get called, but we also don't want to SEGV if
* we accidently decref None out of existance.
*/
Py_FatalError("deallocating None");
}

This makes no sense at all. Why Py_FatalError? It would be better to have

static void
none_dealloc(PyObject* ignore)
{
}

so that it is not necessary to call Py_INCREF(Py_None)
Py_DECREF(Py_None) at all. Guess how many times these are called in
Python C API? Py_INCREF is called 2001 times and Py_DECREF two times.

OK, Python also calls _Py_ForgetReference on the object when it is
freed, so that something else should be changed here as well.
_Py_ForgetReference unlinks the object from double linked list.

By changing the way how Py_None is freed (by doing nothing) Python would
get simpler and faster. And of course you could give Py_None a large
reference count on startup, so that the deallocator is never actually
called (even if it does nothing).

If there are more of these static objects, like number 5 as you said,
then the destructor for these should be changed as well, so that they
are never freed. This way the independent free-threaded interpreters
don't have to worry about these objects.

Harri
Skip Montanaro
2003-09-16 20:14:09 UTC
Permalink
Actually, it makes perfect sense. The reference count of Py_None is
never supposed to reach zero. If that happens it's because you have
a bug (too many Py_DECREFs or not enouch Py_INCREFs). Your version
of none_dealloc silently masks the error.
Harri> Yes, and there is no error if the reference count of Py_None
Harri> reaches zero. The Py_None reference count has no meaning.

Yes it does. If it goes to zero it means you have a bug in your code. Note
that there are plenty of situations where you INCREF or DECREF objects not
knowing (or caring) that the actual object you are messing with might be
Py_None. If I happen to incompletely test a function defined (in C)
something like:

def myfunc(arg1, arg2, arg3=None):
blah
blah
blah

and have a reference count error in my code related to arg3 such that I
erroneously DECREF it, but never test the three arg case, your version of
none_dealloc() will silently miss the problem. The current none_dealloc()
will rightfully complain if myfunc() is called enough times, because
Py_None's reference count will go to zero.

Harri> By changing the way how Py_None is freed (by doing nothing)
Harri> Python would get simpler and faster.

I think you're wrong. There are lots of places in the current code base
where the INCREFs and DECREFs happen without concern for the actual object
passed. There are at least some places where a check for Py_None would
probably be warranted. In any case, I suspect that most INCREFs and DECREFs
of Py_None actually happen when the interpreter doesn't realize that Py_None
is being manipulated. By eliminating the few cases where you do know you're
messing with Py_None you probably won't reduce the number of Py_INCREF and
Py_DECREF calls substantially.

Harri> How can you have negative reference counts? Answer: You
Harri> can't.

Yes you can. From Include/object.h:

/* PyObject_HEAD defines the initial segment of every PyObject. */
#define PyObject_HEAD \
_PyObject_HEAD_EXTRA \
int ob_refcnt; \
struct _typeobject *ob_type;

Note that ob_refcnt is defined as an int, not an unsigned int. I'm not sure
if there are any ramifications to changing the type of that field. I don't
have time to inspect the code at the level necessary to answer
authoritatively.

Harri> I think that you are completely missing the point of Python's
Harri> reference counting. :-) The idea is that when the count reaches
Harri> zero, then the object is deallocated. But if the object was never
Harri> allocated in the first place, why deallocate it then? That's why
Harri> having empty none_dealloc is beautiful.

Except for the case where the reference counting has an error. Let me
restate it this way: The C implementation of Python operationally defines a
reference count of zero for a statically allocated object (not just Py_None)
as a fatal error. That's as it should be, because any statically allocated
object should have a reference count of at least one.

Harri> I think that PyINCREF(Py_None) is ugly, and at least it is
Harri> completely unnecessary.

You're the person who keeps asking for it. Perhaps you should perform the
tests. If it works, submit a patch to SF. If not, dig deeper and figure
out why.

Skip
Andrew Bennetts
2003-09-17 03:11:44 UTC
Permalink
But if the reference count is negative for allocated objects, then you
already have a bug. I mean that the count is never negative when the
application works correctly. The object is already deallocated when the
count reaches zero, so checking for negative counts is (usually)
superfluous. Python gets slightly faster if you just remove this check
(from release version).
Have you benchmarked whether removing this check makes any significant, or
even measurable, difference to the speed of a Python program?

I'm not convinced that the benefits of this outweigh the costs.

-Andrew.
Harri Pesonen
2003-09-17 02:58:55 UTC
Permalink
Post by Skip Montanaro
Actually, it makes perfect sense. The reference count of Py_None is
never supposed to reach zero. If that happens it's because you have
a bug (too many Py_DECREFs or not enouch Py_INCREFs). Your version
of none_dealloc silently masks the error.
Harri> Yes, and there is no error if the reference count of Py_None
Harri> reaches zero. The Py_None reference count has no meaning.
Yes it does. If it goes to zero it means you have a bug in your code. Note
that there are plenty of situations where you INCREF or DECREF objects not
knowing (or caring) that the actual object you are messing with might be
Py_None. If I happen to incompletely test a function defined (in C)
blah
blah
blah
and have a reference count error in my code related to arg3 such that I
erroneously DECREF it, but never test the three arg case, your version of
none_dealloc() will silently miss the problem. The current none_dealloc()
will rightfully complain if myfunc() is called enough times, because
Py_None's reference count will go to zero.
Yeah, but on the other hand, the application crashes less if it is not
checked! :-)
Post by Skip Montanaro
Harri> By changing the way how Py_None is freed (by doing nothing)
Harri> Python would get simpler and faster.
I think you're wrong. There are lots of places in the current code base
where the INCREFs and DECREFs happen without concern for the actual object
passed. There are at least some places where a check for Py_None would
probably be warranted. In any case, I suspect that most INCREFs and DECREFs
of Py_None actually happen when the interpreter doesn't realize that Py_None
is being manipulated. By eliminating the few cases where you do know you're
messing with Py_None you probably won't reduce the number of Py_INCREF and
Py_DECREF calls substantially.
There are 2001 explicit calls to Py_INCREF(Py_None) in C API source
code. These are unnecessary, but the implicit calls are of course fine.
Removing the explicit calls would make Python slightly faster, just
about enough to compensate the required changes in deallocation
mechanism as least.
Post by Skip Montanaro
Harri> How can you have negative reference counts? Answer: You
Harri> can't.
/* PyObject_HEAD defines the initial segment of every PyObject. */
#define PyObject_HEAD \
_PyObject_HEAD_EXTRA \
int ob_refcnt; \
struct _typeobject *ob_type;
Note that ob_refcnt is defined as an int, not an unsigned int. I'm not sure
if there are any ramifications to changing the type of that field. I don't
have time to inspect the code at the level necessary to answer
authoritatively.
But if the reference count is negative for allocated objects, then you
already have a bug. I mean that the count is never negative when the
application works correctly. The object is already deallocated when the
count reaches zero, so checking for negative counts is (usually)
superfluous. Python gets slightly faster if you just remove this check
(from release version).
Post by Skip Montanaro
Harri> I think that you are completely missing the point of Python's
Harri> reference counting. :-) The idea is that when the count reaches
Harri> zero, then the object is deallocated. But if the object was never
Harri> allocated in the first place, why deallocate it then? That's why
Harri> having empty none_dealloc is beautiful.
Except for the case where the reference counting has an error. Let me
restate it this way: The C implementation of Python operationally defines a
reference count of zero for a statically allocated object (not just Py_None)
as a fatal error. That's as it should be, because any statically allocated
object should have a reference count of at least one.
Why? It has no meaning, because the object was never allocated, and it
can't be deallocated. And if the deallocation routine does nothing for
statically allocated objects (as it should), we are OK. I agree that
there must be cases when you gain from this check (when you have bugs in
your expansion code), but for pure Python source code this is just not
needed.

And by the way, are are at least technically incorrect by saying "any
statically allocated objects should have a reference count of at least
one" because when Python starts, all these reference counts are already
zero. The exception raises only when it reaches zero for the second
time. :-)
Post by Skip Montanaro
Harri> I think that PyINCREF(Py_None) is ugly, and at least it is
Harri> completely unnecessary.
You're the person who keeps asking for it. Perhaps you should perform the
tests. If it works, submit a patch to SF. If not, dig deeper and figure
out why.
Yeah, getting closer to that I guess. But I should not be doing it just
yet, I have other projects, but this just keeps bugging me.

Harri
Skip Montanaro
2003-09-17 14:23:45 UTC
Permalink
(I'm tiring rapidly of this topic. This is my last post.)
The current none_dealloc() will rightfully complain if myfunc() is
called enough times, because Py_None's reference count will go to
zero.
Harri> Yeah, but on the other hand, the application crashes less if it
Harri> is not checked! :-)

I know you have a smiley there, however, I want the system to do its best to
crash for me and not wait until my customer has it.

Harri> There are 2001 explicit calls to Py_INCREF(Py_None) in C API
Harri> source code. These are unnecessary, but the implicit calls are of
Harri> course fine. Removing the explicit calls would make Python
Harri> slightly faster, just about enough to compensate the required
Harri> changes in deallocation mechanism as least.

How do you know that? Have you tested anything yet?

Harri> But if the reference count is negative for allocated objects,
Harri> then you already have a bug. I mean that the count is never
Harri> negative when the application works correctly. The object is
Harri> already deallocated when the count reaches zero, so checking for
Harri> negative counts is (usually) superfluous. Python gets slightly
Harri> faster if you just remove this check (from release version).

Much of the stuff in there for reference counting is there to help you when
the application doesn't work correctly.
That's as it should be, because any statically allocated object
should have a reference count of at least one.
Harri> Why? It has no meaning, because the object was never allocated,
Harri> and it can't be deallocated.

Because it is an object. Structurally and semantically it should behave the
same as all the other objects in the system.

Harri> Yeah, getting closer to that I guess. But I should not be doing
Harri> it just yet, I have other projects, but this just keeps bugging
Harri> me.

It's your itch. You have to be the one to scratch it.

Skip
Skip Montanaro
2003-09-16 19:22:13 UTC
Permalink
Harri> Py_None is special because it is shared between all interpreters, it is
Harri> global. Py_None is defined as:

...


Harri> From the above we see that when Py_None reference count reaches
Harri> zero, a destructor none_dealloc is called.

...

Harri> This makes no sense at all. Why Py_FatalError? It would be better
Harri> to have

Harri> static void
Harri> none_dealloc(PyObject* ignore)
Harri> {
Harri> }

Actually, it makes perfect sense. The reference count of Py_None is never
supposed to reach zero. If that happens it's because you have a bug (too
many Py_DECREFs or not enouch Py_INCREFs). Your version of none_dealloc
silently masks the error.

Harri> so that it is not necessary to call Py_INCREF(Py_None)
Harri> Py_DECREF(Py_None) at all. Guess how many times these are called
Harri> in Python C API? Py_INCREF is called 2001 times and Py_DECREF two
Harri> times.

You mean how many places does Py_(IN|DE)CREF(Py_None) appear? Note that in
most situations the interpreter itself calls Py_DECREF(<return value>) for
you. You just can't tell that "<return value>" is Py_None using a static
scan of the C source code.

Harri> By changing the way how Py_None is freed (by doing nothing)
Harri> Python would get simpler and faster.

I think you are completely missing the point of Python's reference
counting. Py_None is nothing special except for the fact that it is not
allocated on the stack. In fact, if you removed all the INCREFs and
DECREFs I think you'd have to special-case the code which detects negative
reference counts. Py_None's reference count would quickly go negative and
instead of the normal reference counting dance you'd always be calling the
error function which handles negative ref counts. It would always have to
compare its argument object with Py_None to make sure it wasn't complaining
about the now-special Py_None object.

Skip
Harri Pesonen
2003-09-16 19:41:50 UTC
Permalink
Post by Skip Montanaro
Harri> Py_None is special because it is shared between all interpreters, it is
...
Harri> From the above we see that when Py_None reference count reaches
Harri> zero, a destructor none_dealloc is called.
...
Harri> This makes no sense at all. Why Py_FatalError? It would be better
Harri> to have
Harri> static void
Harri> none_dealloc(PyObject* ignore)
Harri> {
Harri> }
Actually, it makes perfect sense. The reference count of Py_None is never
supposed to reach zero. If that happens it's because you have a bug (too
many Py_DECREFs or not enouch Py_INCREFs). Your version of none_dealloc
silently masks the error.
Yes, and there is no error if the reference count of Py_None reaches
zero. The Py_None reference count has no meaning.
Post by Skip Montanaro
Harri> so that it is not necessary to call Py_INCREF(Py_None)
Harri> Py_DECREF(Py_None) at all. Guess how many times these are called
Harri> in Python C API? Py_INCREF is called 2001 times and Py_DECREF two
Harri> times.
You mean how many places does Py_(IN|DE)CREF(Py_None) appear? Note that in
most situations the interpreter itself calls Py_DECREF(<return value>) for
you. You just can't tell that "<return value>" is Py_None using a static
scan of the C source code.
Harri> By changing the way how Py_None is freed (by doing nothing)
Harri> Python would get simpler and faster.
I think you are completely missing the point of Python's reference
counting. Py_None is nothing special except for the fact that it is not
allocated on the stack. In fact, if you removed all the INCREFs and
DECREFs I think you'd have to special-case the code which detects negative
reference counts. Py_None's reference count would quickly go negative and
instead of the normal reference counting dance you'd always be calling the
error function which handles negative ref counts. It would always have to
compare its argument object with Py_None to make sure it wasn't complaining
about the now-special Py_None object.
How can you have negative reference counts? Answer: You can't. Just
remove the code that checks for negative reference counts. It is not
needed. The code just gets faster again. OK, you could have it in debug
builds, and in that case it could also check that the object was in fact
allocated on stack.

I think that you are completely missing the point of Python's reference
counting. :-) The idea is that when the count reaches zero, then the
object is deallocated. But if the object was never allocated in the
first place, why deallocate it then? That's why having empty
none_dealloc is beautiful.

I think that PyINCREF(Py_None) is ugly, and at least it is completely
unnecessary.

Harri
Christos TZOTZIOY Georgiou
2003-09-17 07:24:40 UTC
Permalink
On Wed, 17 Sep 2003 06:29:36 +0300, rumours say that Harri Pesonen
<fuerte at sci.fi> might have written:

[snip discussion so far because I want to focus on this sentence]
Multiple threads are more efficient, they use less system resources, and
they allow fast shared memory access (in a way or another).
"efficient": I asked you in another part of this thread: please define
efficient.

"resources": A program that forks itself on current operating systems
does not consume *much* more memory than a multithreaded one; code is
shared, many data pages won't be written so they will remain common.
Whatever I may say, have you got numbers from your personal experience?

"fast memory access": yes, accessing directly the data memory of the
same process is faster than going through the shmem calls first, but how
much do you believe this overhead is? Or do you believe that accessing
mapped shared memory is slower than accessing the data memory of the
process?

The whole point is that you seem to be based on the words of others
about "thread efficiency", without having experience of your own against
other possible solutions. Do you have example applications you want to
build? Did you post a description of the problems you encountered so
that the people in c.l.py can help you?
--
TZOTZIOY, I speak England very best,
Microsoft Security Alert: the Matrix began as open source.
Christos TZOTZIOY Georgiou
2003-09-16 21:17:55 UTC
Permalink
On Tue, 16 Sep 2003 20:06:00 +0300, rumours say that Harri Pesonen
There is no object-level locking in my proposal. Just independent
free-threaded interpreters, which don't see the objects of other
interpreters at all.
What's the point in using threads then? Threading implies easy access
to shared data.
There could be an extra global interpreter state for shared-memory
object access. Accessing this would always be synchronized, but only
this. Python would automatically copy data from this state to
thread-local state and back when needed. This would require a special
synchronize a = "asdf"
In *this* case, POSH (or similar shared memory mechanisms) and multiple
processes should be a simpler solution. UNIX systems have managed for
decades to work very well without any threading mechanisms. pipe() and
fork() are quite older than threads and queues! :)

I do know the usefulness of threads, and I do use them [1]. But,
re-read what you wrote above, and please explain: what in your opinion
is the advantage of multiple threads over multiple processes in this
case?



[1] For example, I have a script that collects data over XML-RPC from
other machines in my company's network, does a little processing on the
data (mostly calculating md5 sums) and reads and writes local files;
when running on a 2x500MHz P3 Linux, I've seen it reach up to 134%
(command 'top' shows full utilisation of *one* CPU as 100%); running on
a 2x1GHz P3 W2k machine, task manager shows up to 74% (with a lower
averaging interval, peaks are easier to spot). Think I would do better
without the GIL?
In the few cases where I would love to have "free" threading in Python
(eg an image comparison script of mine), I have already implemented the
heavy calculations in C, allowing python threads in the meanwhile.
Threads are most useful when doing mostly CPU-intensive calculations;
Python is not created for these, but it is an excellent glue language.
--
TZOTZIOY, I speak England very best,
Microsoft Security Alert: the Matrix began as open source.
Harri Pesonen
2003-09-16 17:22:41 UTC
Permalink
Harri,
I don't understand how your suggested changes, even once carried out,
will let me use threads to increase the speed of a CPU-intensive Python
solutions = []
# CPU-intensive code here. Let's say it runs for 1 second per call
solutions.append(l)
# A generator of items in the problem space
# pretty fast!
...
yield l
consider_solution(l)
I could thread it, so that N threads each run is_solution on a different
queue = worker_tasks(consider_solution, N)
queue.add(l)
queue.shutdown()
But with your proposed changes, it sounds like each thread becomes an
island, with no access to common objects (like the list "solutions" or
the queue connecting the main thread with the worker threads).
If threading truly worked, then I'd be able to run efficiently on n*1
CPUs, where n is the ratio of the speed of one iteration of is_solution
compared to one iteration of problem_space.
On the other hand, I can make the above work quickly today by using
processes and pipes. I can do this only because I've identified the
parts that need to be shared (the queue of candidate solutions, and the
list of confirmed solutions). I think that's the same level of effort
required under the "thread is an island" approach you're suggesting, but
the processes&pipes code will likely be easier to write.
I mostly agree with what you said.

Another approach to shared memory access is to have a special syntax or
functions that do it. These functions could do internally just what you
said, use pipes, or whatever. Each thread could have a name, a string,
and then we could have a couple of simple built-in functions to send
messages (strings) from thread to thread, peek messages, and wait for
messages. One message would have a special meaning of Quit, so that the
thread knows when to stop. Bascially this is all that is needed for
these independent threads.

Another approach would be the separate shared interpreter state, access
to which would be synchronized. This is probably much harder to
implement, but it would be more beautiful, you could have there all
different objects like in normal non-threaded Python. So if you have two
threads, you would have three independent interpreter states (one shared).

It is more efficient to have one process and several threads, than
several processes each having one thread.

Harri
Christos TZOTZIOY Georgiou
2003-09-16 21:31:45 UTC
Permalink
On Tue, 16 Sep 2003 20:22:41 +0300, rumours say that Harri Pesonen
<fuerte at sci.fi> might have written:

[snip description of alternative python multithreading mechanisms
allowing riddance of the GIL]
It is more efficient to have one process and several threads, than
several processes each having one thread.
Please define 'efficient' as used in your statement.
--
TZOTZIOY, I speak England very best,
Microsoft Security Alert: the Matrix began as open source.
Jeff Epler
2003-09-16 21:53:53 UTC
Permalink
Post by Christos TZOTZIOY Georgiou
On Tue, 16 Sep 2003 20:22:41 +0300, rumours say that Harri Pesonen
[snip description of alternative python multithreading mechanisms
allowing riddance of the GIL]
It is more efficient to have one process and several threads, than
several processes each having one thread.
Please define 'efficient' as used in your statement.
I hear that on Windows, processes are very inefficient compared to
threads. However, I have no idea what this actually means. Does Harri?
Or is he thinking of something else? Is he thinking of something like
the "higher context switching cost in terms of TLB misses"? God help
anybody who lets thoughts about TLB misses guide the way he writes
Python code! (excepting maybe Tim Peters and anybody writing
numpy/numarray)

I suspect that on Windows, stuff like pipe/dup2/fork has been turned
into a hopelessly complicated mess, rather than something I could "roll
my own" Python object-passing system out of inside an hour (using pickle
or the like).

Another cool thing about the pipe approach is that you can probably
distribute the processing over a network with only a little more work...
wowee, that's cool.

Jeff
Jeff Epler
2003-09-16 21:11:48 UTC
Permalink
It is more efficient to have one process and several threads, than
several processes each having one thread.
If it's easier to code "several processes each having one thread", then
I'll sure do it. I've accepted a 2x-100x speed hit by using Python, so
if it's a 25% difference in efficiency, I won't sign up to wait for code
that doesn't exist yet.

Jeff
Daniel Dittmar
2003-09-17 13:43:31 UTC
Permalink
Post by Jeff Epler
Keep in mind that probably more than of 99.9% of machines out there
have only one CPU anyway
This could very well change in the future and is probably quite different in
the server segment. Although if I had 2 CPUs on my desktop, I'd prefer it if
a CPU hog blocks only one of them, so Python would be perfect for that
scenario.
Post by Jeff Epler
no-ops, and use fork() to get that 2%* performance increase.
Keep in mind that probably more than <wild guess> of machines out there use
windows, so fork () is not an option.

But the real problem with free threading are the reference counters as they
have to be protected from concurrent accesses, which is now done cheaply by
relying on the GIL.

So my guess is that a free threading Python will be slower, even when no
threads are used.

Daniel
Christopher A. Craig
2003-09-17 14:08:05 UTC
Permalink
But wouldn't it be better if Python had real multitasking? Comments
like the above mean that you accept Python as it is, fine. But
usually people want to make things better when they see that
something can be improved. If my roof is leaking, I have two
choices: fix it or accept it. If I am able to fix it, then I'll
probably do it.
This is awfully dramatic. Python does have real multitasking, take a
look at os.fork().

Your proposal would require a great deal of work [1] to achieve a very
limited result: Situations where people wanted to share Python objects
between multiple, independent interpreters in a single process. I
don't think that happens very often. Note that the average threaded
Python user is probably using a single interpreter and so will see no
noticeable speedup from this.

If you really want to do this, I'd rather see some sort of shared
memory manager that lets you share Python objects between independent
processes. That gives nearly all the benefits without having to break
a bunch of C modules.

1) You say it's easy, but I'll believe it when I see a patch.
--
Christopher A. Craig <list-python at ccraig.org>
I develop for Linux for a living, I used to develop for DOS. Going from
DOS to Linux is like trading a glider for an F117. - Lawrence Foard
Harri Pesonen
2003-09-17 03:22:19 UTC
Permalink
Post by Jeff Epler
It is more efficient to have one process and several threads, than
several processes each having one thread.
If it's easier to code "several processes each having one thread", then
I'll sure do it. I've accepted a 2x-100x speed hit by using Python, so
if it's a 25% difference in efficiency, I won't sign up to wait for code
that doesn't exist yet.
But wouldn't it be better if Python had real multitasking? Comments like
the above mean that you accept Python as it is, fine. But usually people
want to make things better when they see that something can be improved.
If my roof is leaking, I have two choices: fix it or accept it. If I am
able to fix it, then I'll probably do it.

The old code base really is the problem here. If Python threads really
run at the same time in the future, how many current applications stop
working because they depend on the fact that only one thread runs at any
given time, and do not acquire and release locks as needed? On the other
hand, my suggestion probably means that we couldn't have a threading
module compatible with thread or threading anyhow (we could have
freethreading, with specific functions for inter-thread communication).

Harri

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20030917/135d3b03/attachment.html>
Martin v. Löwis
2003-09-12 05:50:46 UTC
Permalink
Post by Shane Hathaway
- Copy the "1" object into shared memory before passing it to
L.append(). This would be nice and transparent, but also a little
DWIM.
That might work for the object 1, but some object simply couldn't be
copied, e.g. file objects, _tkinter objects, and so on.
Post by Shane Hathaway
- Create special object types just for shared memory.
That might be feasible, but would not be valuable. You would have to
duplicate entire type hierarchies, and it would not convenient for
users to use this duplication.
Post by Shane Hathaway
As it turns out, POSH (see Jeremy's link) takes the second strategy,
apparently quite successfully! Anyone who has concerns about the GIL
should take a long, hard look at POSH. It just might solve the
problem for lots of people.
It's an interesting approach, but it remains to be seen whether it
really solves problems in real life.

Regards,
Martin
Shane Hathaway
2003-09-11 18:50:02 UTC
Permalink
[Moved to python-list at python.org, where this thread belongs]
But my basic message is this: Python needs to be made thread safe.
Making the individual interpreters thread safe is trivial, and benefits
many people, and is a necessary first step; making threads within
interpreter thread safe is possible as well, at least if you leave
something for the developer, as you should, as you do in every other
programming language as well.
Lately, I've been considering an alternative to this line of thinking.
I've been wondering whether threads are truly the right direction to
pursue. This would be heresy in the Java world, but maybe Pythonistas
are more open to this thought.

The concept of a thread is composed of two concepts: multiple processes
and shared memory. Supporting multiple simultaneous processes is
relatively simple and has proven value. Shared memory, on the other
hand, results in a great number of complications. Some of the
complications have remained difficult problems for a long time:
preventing deadlocks, knowing exactly what needs to be locked, finding
race conditions, etc. I don't believe we should force the burden of
thread safety on every software engineer. Engineers have better things
to do.

At the same time, shared memory is quite valuable when you're ready to
take on the burden of thread safety. Therefore, I'm looking for a good
way to split a process into multiple processes and share only certain
parts of a program with other processes. I'd like some form of
*explicit* sharing with a Pythonic API.

Imagine the following Python module:


import pseudothreads

data = pseudothreads.shared([])

def data_collection_thread():
s = get_some_data()
data.append(s)

for n in range(4):
pseudothreads.start_new_thread(data_collection_thread)


In this made-up example, nothing is shared between threads except for
the "data" global. The shared() function copies the list to shared
memory and returns a wrapper around the list that prevents access by
multiple threads simultaneously. start_new_thread() is a thin wrapper
around os.fork(). Each pseudothread has its own global interpreter lock.

I wonder whether others would consider such a thing valuable, or even
feasible. :-)

Shane
Harri Pesonen
2003-09-11 20:28:04 UTC
Permalink
Post by Shane Hathaway
[Moved to python-list at python.org, where this thread belongs]
But my basic message is this: Python needs to be made thread safe.
Making the individual interpreters thread safe is trivial, and
benefits many people, and is a necessary first step; making threads
within interpreter thread safe is possible as well, at least if you
leave something for the developer, as you should, as you do in every
other programming language as well.
Lately, I've been considering an alternative to this line of thinking.
I've been wondering whether threads are truly the right direction to
pursue. This would be heresy in the Java world, but maybe Pythonistas
are more open to this thought.
The concept of a thread is composed of two concepts: multiple
processes and shared memory. Supporting multiple simultaneous
processes is relatively simple and has proven value. Shared memory,
on the other hand, results in a great number of complications. Some
preventing deadlocks, knowing exactly what needs to be locked, finding
race conditions, etc. I don't believe we should force the burden of
thread safety on every software engineer. Engineers have better
things to do.
At the same time, shared memory is quite valuable when you're ready to
take on the burden of thread safety. Therefore, I'm looking for a
good way to split a process into multiple processes and share only
certain parts of a program with other processes. I'd like some form
of *explicit* sharing with a Pythonic API.
import pseudothreads
data = pseudothreads.shared([])
s = get_some_data()
data.append(s)
pseudothreads.start_new_thread(data_collection_thread)
In this made-up example, nothing is shared between threads except for
the "data" global. The shared() function copies the list to shared
memory and returns a wrapper around the list that prevents access by
multiple threads simultaneously. start_new_thread() is a thin wrapper
around os.fork(). Each pseudothread has its own global interpreter lock.
I wonder whether others would consider such a thing valuable, or even
feasible. :-)
Sounds great! :-) Basically, you are suggesting that each thread that is
created in Python, has its own separate interpreter, and if the
programmer ever wants to communicate between different threads, he has
to do it explicitly using the tools that are available. Sounds good to me.

So my original suggestion of removing all global data and having the
interpreter state always on stack, expanded with this idea, it would
make Python completely thread safe. No need for global interpreter lock
anymore. Of course this would break the existing Python thread code, but
something has to break anyhow. :-)

Harri
Jeremy Hylton
2003-09-11 20:03:07 UTC
Permalink
Post by Shane Hathaway
import pseudothreads
data = pseudothreads.shared([])
s = get_some_data()
data.append(s)
pseudothreads.start_new_thread(data_collection_thread)
In this made-up example, nothing is shared between threads except for
the "data" global. The shared() function copies the list to shared
memory and returns a wrapper around the list that prevents access by
multiple threads simultaneously. start_new_thread() is a thin wrapper
around os.fork(). Each pseudothread has its own global interpreter lock.
I wonder whether others would consider such a thing valuable, or even
feasible. :-)
This sounds like POSH. There was a paper about it at PyCon.

http://poshmodule.sourceforge.net/posh/html/

Python uses a single global lock known as the global interpreter lock
(or GIL) to serialize execution of byte codes. The GIL becomes a
major bottleneck when executing multi-threaded Python applications, in
particular on multi-processor architectures. This paper presents
POSH, which is an extension module to Python that attempts to address
the problems associated with the GIL by enabling placement of Python
objects in shared memory. In particular, POSH allows multiple
processes to share objects in much the same way that threads do with
standard Python objects. We have found that the use of POSH allows
some applications to be structured as if they used threads, but
without the GIL bottleneck.

Jeremy

Aahz
2003-09-15 20:53:48 UTC
Permalink
In article <mwo9b.242$B13.103 at reader1.news.jippii.net>,
Post by Harri Pesonen
The point I was trying to make (not in this message but in general) is
that it would be simple (trivial but tedious) to create a version of
Python that is thread-safe, and the only reason it is not done is
because it would break old code.
You're wrong. You have refused to do your research, and you have been
ignoring information people provide to you.
Post by Harri Pesonen
Only one global variable left (in fact there is Py_None as well).
You're wrong. There are only two global *C* variables -- but all Python
objects are global. Most people want to share information between
threads; as soon as that becomes a requirement, you run into *BIG*
problems with refcounting and garbage collection when you do free
threading. That doesn't even count the problem I mentioned earlier
about interfacing with thread-unsafe libraries.

Now, would you care to learn how Python actually works before making
further pronouncements?
--
Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/

"It is easier to optimize correct code than to correct optimized code."
--Bill Harlan
Shane Hathaway
2003-09-12 21:41:42 UTC
Permalink
Which leads to problems with reference counting and garbage
what happens when a process goes down? Should the objects allocated by
it be destroyed? Or remain persistent? Until when? If a process died
in an unclean fashion, it might not delete its references to objects
cleanly.
And it gets more complex again when you're dealing with users and
permissions.
I think that there are so many questions associated with the approach
of sharing python objects through shared memory that it will probably
remain an "application specific" technique for some time to come.
Though doubtless some more knowledgable person than I will now
contradict me. Please. :-)
I'd just like to point out that Steffen Viken Valvag has just recently
(March 2003) researched this, found solutions, and written an
implementation.

http://poshmodule.sourceforge.net/posh/html/node4.html

That page leaves a few questions about the distributed garbage
collection unanswered, but it's a nice high-level overview.

I don't know how stable Posh is, but if nothing else it is a convincing
proof of concept.

Shane
Alan Kennedy
2003-09-12 16:27:51 UTC
Permalink
Post by Christopher A. Craig
The simple solution is,
that each thread created in Python gets its own independent
interpreter state as well. And there could be a separate
thread-global interpreter state for shared memory access. Access
to this global state would always be
synchronized.
Couldn't you do this now with multiple processes and the shm module?
Yes, you can store representations of python objects in shared memory,
e.g. pickles, etc: an in-memory database, with indices, etc.

So when you want to access a "shared" object, you go to the shared
store, retrieve the pickle, unpickle it and use it, making sure to
repickle any changes and store them back in the shared memory. Which
might be cumbersome and inefficient, particularly if your goal is to
maximise efficiency in a multi-processor situation.

But if you want to store actual python objects in shared memory, and
avoid all the pickling etc,then you have to change the interpreter so
that it obtains the memory for new python objects from the shared
memory pool instead of "local" memory.

Which leads to problems with reference counting and garbage
collection. These would have to take multiple processes into account:
what happens when a process goes down? Should the objects allocated by
it be destroyed? Or remain persistent? Until when? If a process died
in an unclean fashion, it might not delete its references to objects
cleanly.

And it gets more complex again when you're dealing with users and
permissions.

I think that there are so many questions associated with the approach
of sharing python objects through shared memory that it will probably
remain an "application specific" technique for some time to come.

Though doubtless some more knowledgable person than I will now
contradict me. Please. :-)
--
alan kennedy
-----------------------------------------------------
check http headers here: http://xhaus.com/headers
email alan: http://xhaus.com/mailto/alan
Martin v. Löwis
2003-09-11 20:40:10 UTC
Permalink
I wonder whether others would consider such a thing valuable, or even
feasible. :-)
I consider it valuable, but not feasible. Contrary to your analysis, I
believe the biggest problem with shared memory is pointer/address
management. E.g. if you have a shared list L, and do

L.append(1)

then a reference to the object 1 is put into L[0]. However, the
pointer is only valid in the process writing the reference, not in
other processes. Fixing that is nearly impossible, except *perhaps* by
reimplementing Python from ground up.

Regards,
Martin
Tim Peters
2003-09-17 00:40:33 UTC
Permalink
[Jeff Epler]
Post by Jeff Epler
I hear that on Windows, processes are very inefficient compared to
threads. However, I have no idea what this actually means.
Creating a process is a very heavy operation on Windows compared to creating
a thread (which is cheap), and especially under the older Windows flavors.
The various Windows automation APIs (like COM under 20 names) are also very
happy playing with threads.
Post by Jeff Epler
Does Harri? Or is he thinking of something else? Is he thinking of
something like the "higher context switching cost in terms of TLB
misses"? God help anybody who lets thoughts about TLB misses guide
the way he writes Python code! (excepting maybe Tim Peters
Indeed, I think of nothing else!
Post by Jeff Epler
and anybody writing numpy/numarray)
Ya, but they're crazy <wink>.
Post by Jeff Epler
I suspect that on Windows, stuff like pipe/dup2/fork has been turned
into a hopelessly complicated mess, rather than something I could
"roll my own" Python object-passing system out of inside an hour
(using pickle or the like).
fork() doesn't exist on Windows (unless you use Cygwin, and then you can
measure a fork's progress with an egg timer).
Post by Jeff Epler
Another cool thing about the pipe approach is that you can probably
distribute the processing over a network with only a little more
work... wowee, that's cool.
Pipes and sockets work fine on Windows. For some really cool distributed
shenanigans, build on the Python Spread wrapper (which also works fine on
Windows):

http://www.python.org/other/spread/

back-to-tlb-worries-ly y'rs - tim
Harri Pesonen
2003-09-12 04:56:55 UTC
Permalink
Please do not CC: my mail to Python-Dev again; I intentionally did not
include python-dev on my CC: because it was asked that we move this
thread elsewhere.
But my basic message is this: Python needs to be made thread safe.
Making the individual interpreters thread safe is trivial, and
benefits many people, and is a necessary first step;
It's far from trivial - you're talking about invalidating every
piece of C code written for Python over a multi-year people by
dozens upon dozens of extension authors.
The change is trivial in Python C API. I already said that it would
break everything outside the Python distribution, but the change in
other applications is also trivial.
How do you propose that C code called *from* Python *receive* the
threadstate pointer?
Exactly like that. Is there a problem? I'm suggesting that every
function call gets that pointer, unless the function can get it from
some other argument, that contains a pointer to it.
It doesn't benefit many people: only those using isolated
interpreters embedded in a multithreaded C program.
I don't know how many people are writing threads in Python, either. I
guess that not so many. In my case I only need a thread safe
interpreter, I don't create threads in Python code. So just having
what I described would be enough for me: no need for global
interpreter lock, and Python would be really multithreading. It would
benefit many people, I'm sure.
Obviously, it's enough for you, or you wouldn't be proposing it. What
does it do for me? Nothing whatsoever, except add needless overhead
and make me rewrite every C extension I've ever written for Python.
So, by and large, you're not going to get much support for your change
from Python developers, especially those who write C extensions, or
depend on extensions written by others.
Probably. That's why I'm thinking now that the language should be called
something else, like MPython for "multi-threading Python". It would be
99% compatible with the existing Python syntax, but have different
internals.
Yes, I'm aware of the None problem at least (only one instance of
it). Please enlighten me about the other critical sections? Object
allocation/freeing?
Data structure manipulations, e.g. all use of dictionaries. Python
spends most of its time doing dictionary lookups or modifications, all
of which need to be protected.
After sleeping over night, I think that I got it. :-) The simple
solution is, that each thread created in Python gets its own independent
interpreter state as well. And there could be a separate thread-global
interpreter state for shared memory access. Access to this global state
would always be synchronized. There could even be multiple named global
states, so that the thread interlocking could be minimized. The python
syntax for creating objects in this global state should be invented:

synchronize a = "abcd"

Also when creating the new thread, it's arguments would be copied from
the creating state to the new state.

What does it sound? Of course it would be incompatible with the current
threading system in Python, but it would be totally multithreading, no
global interpreter lock needed. It would be faster than current Python,
there would be no need to free or acquire the lock when calling OS
functions, and no need to check how many byte codes have been processed,
etc.
I'm guessing you haven't done much writing of C extensions for
Python (or Python core C), or else you'd realize why trying to make
INCREF/DECREF threadsafe would absolutely decimate performance.
Reference count updates happen *way* too often in normal code flow.
I also knew that already. But how else can you do it?
The way it's done now! :)
I understand why the current Python works like it does. But I think that
it's time for the next generation. If you don't do it, and I have no
time now to do it, I'm still sure that this is done at some point,
rather sooner than later.
Of course, changing Python to not have a single None would help a
lot. Or, perhaps it could have a single None, but in case of None,
the reference count would have no meaning, it would never be
deallocated, because it would be checked in code. Maybe it does it
already, I don't know.
I really don't mean to be rude (another reason I'm writing this to you
privately), but this paragraph shows you are *really* new to Python
both at the level of coding in Python and coding with Python's C API.
I wish I could explain in detail why, but there's really far too much
that you don't understand and it would take me too long. I will
attempt to summarize a very few points, however: first, identity
(pointer comparison) is a part of the Python language, so you can't
have multiple None instances any more than you can have more than one
value be NULL in C. Second, at the C level, all Python objects
(including None) have an absolutely uniform API, so having refcount
behavior be different for different kinds of objects is not at all
practical. Third, if you had more than one Py_None at the C level,
you'd either have to make Py_None a macro, or rewrite all the C. If
you don't think that's a problem, you have absolutely no idea how much
C code out there is written to the Python API.
Yes, Py_None would be a macro. All access to interpreter state would go
through the interpreter state pointer that is always in stack, the first
argument each C API function gets. That pointer should be named so that
the macros will always work ("tState", for example, so that Py_None
macro would expand to tState->mPy_None, for example).
I'm also wondering why this problem has not been addressed before?
It has; the cure is worse than the disease. A few years ago, somebody
wrote a "free-threading" version of Python, which locked individual
data objects rather than use the global interpreter lock. The
performance for single-threaded programs was abominable, and the
performance gain even on multiprocessor machines was not thought worth
the cost. So the project was scrapped.
There would be no locking in my proposal, except when accessing the
shared memory global thread state.

I don't know, I got mail about writing a PEP. It is clear that it would
not be accepted, because it would break the existing API. The change is
so big that I think that it has to be called a different language.

This is the last message I will make about this matter (before actually
starting to code it), so I'm posting this to python-list as well,
because this is too important to be ignored. Python *needs* to be
free-threading...

Harri
Jeff Epler
2003-09-15 21:10:39 UTC
Permalink
Harri,
I don't understand how your suggested changes, even once carried out,
will let me use threads to increase the speed of a CPU-intensive Python
program. For instance, consider the following code:

solutions = []

def is_solution(l):
# CPU-intensive code here. Let's say it runs for 1 second per call

def consider_solution(l):
if is_solution(l):
solutions.append(l)

def problem_space(l):
# A generator of items in the problem space
# pretty fast!
...
yield l

def all_solutions():
for l in problem_space:
consider_solution(l)


I could thread it, so that N threads each run is_solution on a different
candidate:
def all_solutions():
queue = worker_tasks(consider_solution, N)
for l in problem_space:
queue.add(l)
queue.shutdown()

But with your proposed changes, it sounds like each thread becomes an
island, with no access to common objects (like the list "solutions" or
the queue connecting the main thread with the worker threads).
If threading truly worked, then I'd be able to run efficiently on n*1
CPUs, where n is the ratio of the speed of one iteration of is_solution
compared to one iteration of problem_space.

On the other hand, I can make the above work quickly today by using
processes and pipes. I can do this only because I've identified the
parts that need to be shared (the queue of candidate solutions, and the
list of confirmed solutions). I think that's the same level of effort
required under the "thread is an island" approach you're suggesting, but
the processes&pipes code will likely be easier to write.

Jeff
Mitch Chapman
2003-09-13 16:18:42 UTC
Permalink
Post by Skip Montanaro
Mitch> Unlike Andrew I don't think the lack of maintenance for
1.4's
Mitch> free threading packages is due to any perception that
threading
Mitch> performance is unimportant. It seems more likely that the
Mitch> packages were not updated because they proved not to solve
the
Mitch> performance problems, and that no alternatives have emerged
Mitch> because the problem is hard to solve.
One reason (maybe the primary reason) it never went further than a
patch to
1.4 was that it was slower in the (common) single-threaded case.
For more on the patch, see the FAQ:
http://www.python.org/doc/faq/library.html#can-t-we-get-rid-of-the-
global-interpreter-lock

I wish I'd read this entry before my latest reply to Andrew Kuchling.
It explains why somebody (GvR, according to the old FAQ) believes
free-threading would decrease the suitability of Python for existing
users.
Post by Skip Montanaro
You can read the few messages in
http://mail.python.org/pipermail/python-dev/2003-September/thread.html
Search for "thread safe".
Thanks for publishing the pointers. I didn't consider that others
might be interested in the original thread.

--
Mitch
Continue reading on narkive:
Loading...