Discussion:
write to the same file from multiple processes at the same time?
gabor
2005-05-27 12:32:53 UTC
Permalink
hi,

what i want to achieve:
i have a cgi file, that writes an entry to a text-file..
like a log entry (when was it invoked, when did his worke end).
it's one line of text.

the problem is:
what happens if 2 users invoke the cgi at the same time?

and it will happen, because i am trying now to stress test it, so i will
start 5-10 requests in parallel and so on.

so, how does one synchronizes several processes in python?

first idea was that the cgi will create a new temp file every time,
and at the end of the stress-test, i'll collect the content of all those
files. but that seems as a stupid way to do it :(

another idea was to use a simple database (sqlite?) which probably has
this problem solved already...

any better ideas?

thanks,
gabor
unknown
2005-05-27 13:21:21 UTC
Permalink
The OP was probably on the right track when he suggested that things
like SQLite (conveniently wrapped with PySQLite) had already solved
this problem.
But they haven't. They depend on messy things like server processes
constantly running, which goes against the idea of a cgi that only
runs when someone calls it.
Jp Calderone
2005-05-27 14:17:58 UTC
Permalink
Post by unknown
But they haven't. They depend on messy things like server processes
constantly running, which goes against the idea of a cgi that only
runs when someone calls it.
SQLite is an in-process dbm.
http://www.sqlite.org/faq.html#q7
(7) Can multiple applications or multiple instances of the same
application access a single database file at the same time?
Multiple processes can have the same database open at the same
time. Multiple processes can be doing a SELECT at the same
time. But only one process can be making changes to the database
at once.
But multiple processes changing the database simultaneously is
precisely what the OP wants to do.
Er, no. The OP precisely wants exactly one process to be able to write at a time. If he was happy with multiple processes writing simultaneously, he wouldn't need any locking mechanism at all >:)

If you keep reading that FAQ entry, you discover that SQLite implements its own locking mechanism internally, allowing different processes to *interleave* writes to the database, and preventing any data corruption which might arise from simultaneous writes.

That said, I think an RDBM is a ridiculously complex solution to this simple problem. A filesystem lock, preferably using the directory or symlink trick (but flock() is fun too, if you're into that sort of thing), is clearly the solution to go with here.

Jp
Peter Hansen
2005-05-27 22:33:44 UTC
Permalink
And PySQLite conveniently wraps the relevant calls with retries when
the database is "locked" by the writing process, making it roughly a
no-brainer to use SQLite databases as nice simple log files where
you're trying to write from multiple CGI processes like the OP wanted.
Oh, ok. But what kind of locks does it use?
I think the FAQ can answer that better than I can, since I'm not sure
whether you're asking about any low-level (OS) locks it might use or
higher-level (e.g. database-level locking) that it might use. In
summary, however, at the database level it provides only coarse-grained
locking on the entire database. It *is* supposed to be a relatively
simple/lightweight solution compared to typical RDBMSes...

(There's also an excrutiating level of detail about this whole area in
the page at http://www.sqlite.org/lockingv3.html ).

-Peter
Mike Meyer
2005-05-28 18:47:56 UTC
Permalink
Really, I think the Python library is somewhat lacking in not
providing a simple, unified interface for doing stuff like this.
It's got one. Well, three, actually.

The syslog module solves the problem quite nicely, but only works on
Unix. If the OP is working on Unix systems, that may be a good
solution.

The logging module has a SysLogHandler that talks to syslog on
Unix. It also has an NTEventLogHandler for use on NT. I'm not familiar
with NT's event log, but I presume it has the same kind of
functionality as Unix's syslog facility.

<mike
--
Mike Meyer <mwm at mired.org> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
unknown
2005-05-27 22:52:15 UTC
Permalink
Post by Peter Hansen
I think the FAQ can answer that better than I can, since I'm not sure
whether you're asking about any low-level (OS) locks it might use or
higher-level (e.g. database-level locking) that it might use. In
summary, however, at the database level it provides only
coarse-grained locking on the entire database. It *is* supposed to be
a relatively simple/lightweight solution compared to typical RDBMSes...
Compared to what the OP was asking for, which was a way to synchronize
appending to a serial log file, SQlite is very complex. It's also
much more complex than (say) the dbm module, which is what Python apps
normally use as a lightweight db.
Post by Peter Hansen
(There's also an excrutiating level of detail about this whole area in
the page at http://www.sqlite.org/lockingv3.html ).
Oh ok, it says it uses some special locking system calls on Windows.
Since those calls aren't in the Python stdlib, it must be using C
extensions, which again means complexity. But it looks like the
built-in msvcrt module has ways to lock parts of files in Windows.

Really, I think the Python library is somewhat lacking in not
providing a simple, unified interface for doing stuff like this.
unknown
2005-05-27 12:49:49 UTC
Permalink
Post by gabor
so, how does one synchronizes several processes in python?
first idea was that the cgi will create a new temp file every time,
and at the end of the stress-test, i'll collect the content of all
those files. but that seems as a stupid way to do it :(
There was a thread about this recently ("low-end persistence
strategies") and for Unix the simplest answer seems to be the
fcntl.flock function. For Windows I don't know the answer.
Maybe os.open with O_EXCL works.
Jp Calderone
2005-05-27 22:30:32 UTC
Permalink
Oh, ok. But what kind of locks does it use?
It doesn't really matter, does it?
Huh? Sure, if there's some simple way to accomplish the locking, the
OP's act can do the same thing without SQlite's complexity.
I'm sure the locking mechanisms it uses have changed between
different releases, and may even be selected based on the platform
being used.
Well, yes, but WHAT ARE THEY??????
Beats me, and I'm certainly not going to dig through the code to find out :) For the OP's purposes, the mechanism I mentioned earlier in this thread is almost certainly adequate. To briefly re-summarize, when you want to acquire a lock, attempt to create a directory with a well-known name. When you are done with it, delete the directory. This works across all platforms and filesystems likely to be encountered by a Python program.

Jp
Peter Hansen
2005-05-27 22:02:56 UTC
Permalink
Unfortunately this assumes that the open() call will always succeed,
when in fact it is likely to fail sometimes when another file has
already opened the file but not yet completed writing to it, AFAIK.
Not in my experience. At least under Unix, it's perfectly OK
to open a file while somebody else is writing to it. Perhaps
Windows can't deal with that situation?
Hmm... just tried it: you're right! On the other hand, the results were
unacceptable: each process has a separate file pointer, so it appears
whichever one writes first will have its output overwritten by the
second process.

Change the details, but the heart of my objection is the same.

-Peter
Peter Hansen
2005-05-27 22:35:47 UTC
Permalink
Post by Peter Hansen
Hmm... just tried it: you're right! On the other hand, the results were
unacceptable: each process has a separate file pointer, so it appears
whichever one writes first will have its output overwritten by the
second process.
Did you open the files for 'append' ?
Nope. I suppose that would be a rational thing to do for log files,
wouldn't it? I wonder what happens when one does that...

-Peter
ucntcme
2005-05-29 05:32:41 UTC
Permalink
Well I just tried it on Linux anyway. I opened the file in two python
processes using append mode.

I then wrote simple function to write then flush what it is passed:

def write(msg):
foo.write("%s\n" % msg)
foo.flush()

I then opened another terminal and did 'tail -f myfile.txt'.

It worked just fine.

Maybe that will help. Seems simple enough to me for basic logging.

Cheers,
Bill

Christopher Weimann
2005-05-27 22:27:26 UTC
Permalink
Post by Peter Hansen
Hmm... just tried it: you're right! On the other hand, the results were
unacceptable: each process has a separate file pointer, so it appears
whichever one writes first will have its output overwritten by the
second process.
Did you open the files for 'append' ?
Peter Hansen
2005-05-27 22:29:59 UTC
Permalink
Post by Peter Hansen
Not in my experience. At least under Unix, it's perfectly OK
to open a file while somebody else is writing to it. Perhaps
Windows can't deal with that situation?
Hmm... just tried it: you're right!
Umm... the part you were right about was NOT the possibility that
Windows can't deal with the situation, but the suggestion that it might
actually be able to (since apparently it can). Sorry to confuse.

-Peter
Roy Smith
2005-05-27 13:14:25 UTC
Permalink
Post by gabor
so, how does one synchronizes several processes in python?
This is a very hard problem to solve in the general case, and the answer
depends more on the operating system you're running on than on the
programming language you're using.

On the other hand, you said that each process will be writing a single line
of output at a time. If you call flush() after each message is written,
that should be enough to ensure that the each line gets written in a single
write system call, which in turn should be good enough to ensure that
individual lines of output are not scrambled in the log file.

If you want to do better than that, you need to delve into OS-specific
things like the flock function in the fcntl module on unix.
Roy Smith
2005-05-27 13:27:38 UTC
Permalink
The OP was probably on the right track when he suggested that things
like SQLite (conveniently wrapped with PySQLite) had already solved this
problem.
Perhaps, but a relational database seems like a pretty heavy-weight
solution for a log file.
Gerhard Haering
2005-05-27 14:00:56 UTC
Permalink
Post by Roy Smith
The OP was probably on the right track when he suggested that things
like SQLite (conveniently wrapped with PySQLite) had already solved this
problem.
Perhaps, but a relational database seems like a pretty heavy-weight
solution for a log file.
On the other hand, it works ;-)

-- Gerhard
--
Gerhard H?ring - gh at ghaering.de - Python, web & database development
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 196 bytes
Desc: Digital signature
URL: <http://mail.python.org/pipermail/python-list/attachments/20050527/1aa2f1bf/attachment.pgp>
Steve Holden
2005-05-31 13:57:47 UTC
Permalink
Post by Roy Smith
The OP was probably on the right track when he suggested that things
like SQLite (conveniently wrapped with PySQLite) had already solved this
problem.
Perhaps, but a relational database seems like a pretty heavy-weight
solution for a log file.
Excel seems like a pretty heavyweight solution for most of the
applications it's used for, too. Most people are interested in solving a
problem and moving on, and while this may lead to bloatware it can also
lead to the inclusion of functionality that can be hugely useful in
other areas of the application.

regards
Steve
--
Steve Holden +1 703 861 4237 +1 800 494 3119
Holden Web LLC http://www.holdenweb.com/
Python Web Programming http://pydish.holdenweb.com/
Peter Hansen
2005-05-27 13:18:17 UTC
Permalink
Post by Roy Smith
On the other hand, you said that each process will be writing a single line
of output at a time. If you call flush() after each message is written,
that should be enough to ensure that the each line gets written in a single
write system call, which in turn should be good enough to ensure that
individual lines of output are not scrambled in the log file.
Unfortunately this assumes that the open() call will always succeed,
when in fact it is likely to fail sometimes when another file has
already opened the file but not yet completed writing to it, AFAIK.
Post by Roy Smith
If you want to do better than that, you need to delve into OS-specific
things like the flock function in the fcntl module on unix.
The OP was probably on the right track when he suggested that things
like SQLite (conveniently wrapped with PySQLite) had already solved this
problem.

-Peter
Grant Edwards
2005-05-27 14:50:04 UTC
Permalink
Post by Roy Smith
On the other hand, you said that each process will be writing a single line
of output at a time. If you call flush() after each message is written,
that should be enough to ensure that the each line gets written in a single
write system call, which in turn should be good enough to ensure that
individual lines of output are not scrambled in the log file.
Unfortunately this assumes that the open() call will always succeed,
when in fact it is likely to fail sometimes when another file has
already opened the file but not yet completed writing to it, AFAIK.
Not in my experience. At least under Unix, it's perfectly OK
to open a file while somebody else is writing to it. Perhaps
Windows can't deal with that situation?
--
Grant Edwards grante Yow! FOOLED you! Absorb
at EGO SHATTERING impulse
visi.com rays, polyester poltroon!!
gabor
2005-05-30 12:28:11 UTC
Permalink
Post by Jp Calderone
To briefly re-summarize, when you want to acquire a lock, attempt to
create a directory with a well-known name. When you are done with it,
delete the directory. This works across all platforms and filesystems
likely to be encountered by a Python program.
thanks...
but the problem now is that the cgi will have to wait for that directory
to be gone, when he is invoked.. and i do not want to code that :)
i'm too lazy..
so basically i want the code to TRY to write to the file, and WAIT if it
is opened for write right now...
something like a mutex-synchronized block of the code...
ok, i ended up with the following code:

def syncLog(filename,text):
f = os.open(filename,os.O_WRONLY | os.O_APPEND)
fcntl.flock(f,fcntl.LOCK_EX)
os.write(f,text)
#FIXME: what about releasing the lock?
os.close(f)

it seems to do what i need ( the flock() call waits until he can get
access).. i just don't know if i have to unlock() the file before i
close it..


gabor
gabor
2005-05-31 07:54:29 UTC
Permalink
Post by gabor
f = os.open(filename,os.O_WRONLY | os.O_APPEND)
fcntl.flock(f,fcntl.LOCK_EX)
os.write(f,text)
#FIXME: what about releasing the lock?
os.close(f)
it seems to do what i need ( the flock() call waits until he can get
access).. i just don't know if i have to unlock() the file before i
close it..
The lock should free when you close the file descriptor. Personally,
I'm a great believer in doing things explicitly rather than
implicitly,
and would add the extra fcntl.flock(f, fcntl.LOCK_UN) call
before closing the file.
done :)

gabor
Mike Meyer
2005-05-30 18:12:49 UTC
Permalink
Post by gabor
f = os.open(filename,os.O_WRONLY | os.O_APPEND)
fcntl.flock(f,fcntl.LOCK_EX)
os.write(f,text)
#FIXME: what about releasing the lock?
os.close(f)
it seems to do what i need ( the flock() call waits until he can get
access).. i just don't know if i have to unlock() the file before i
close it..
The lock should free when you close the file descriptor. Personally,
I'm a great believer in doing things explicitly rather than
implicitly, and would add the extra fcntl.flock(f, fcntl.LOCK_UN) call
before closing the file.

<mike
--
Mike Meyer <mwm at mired.org> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
unknown
2005-05-27 22:10:16 UTC
Permalink
And PySQLite conveniently wraps the relevant calls with retries when
the database is "locked" by the writing process, making it roughly a
no-brainer to use SQLite databases as nice simple log files where
you're trying to write from multiple CGI processes like the OP wanted.
Oh, ok. But what kind of locks does it use?
unknown
2005-05-27 13:43:04 UTC
Permalink
Post by unknown
But they haven't. They depend on messy things like server processes
constantly running, which goes against the idea of a cgi that only
runs when someone calls it.
SQLite is an in-process dbm.
http://www.sqlite.org/faq.html#q7

(7) Can multiple applications or multiple instances of the same
application access a single database file at the same time?

Multiple processes can have the same database open at the same
time. Multiple processes can be doing a SELECT at the same
time. But only one process can be making changes to the database
at once.

But multiple processes changing the database simultaneously is
precisely what the OP wants to do.
Peter Hansen
2005-05-27 22:06:41 UTC
Permalink
http://www.sqlite.org/faq.html#q7
[snip]
Multiple processes can have the same database open at the same
time. Multiple processes can be doing a SELECT at the same
time. But only one process can be making changes to the database
at once.
But multiple processes changing the database simultaneously is
precisely what the OP wants to do.
What isn't described in the above quote from the FAQ is how SQLite
*protects* your data from corruption in this case, unlike the "raw"
approach where you just use file handles.

And PySQLite conveniently wraps the relevant calls with retries when the
database is "locked" by the writing process, making it roughly a
no-brainer to use SQLite databases as nice simple log files where you're
trying to write from multiple CGI processes like the OP wanted.

Disclaimer: I haven't actually done that myself, and have only started
playing with pysqlite2 a day ago, but I have spent a fair bit of time
experimenting and reading the relevant docs and I believe I've got this
all correct.

-Peter
unknown
2005-05-27 22:59:43 UTC
Permalink
Post by gabor
what happens if 2 users invoke the cgi at the same time?
Would BerkleyDB support that?
gabor
2005-05-30 09:12:57 UTC
Permalink
Post by Jp Calderone
To briefly re-summarize, when you
want to acquire a lock, attempt to create a directory with a well-known
name. When you are done with it, delete the directory. This works
across all platforms and filesystems likely to be encountered by a
Python program.
thanks...

but the problem now is that the cgi will have to wait for that directory
to be gone, when he is invoked.. and i do not want to code that :)
i'm too lazy..

so basically i want the code to TRY to write to the file, and WAIT if it
is opened for write right now...

something like a mutex-synchronized block of the code...

gabor
Jp Calderone
2005-05-27 13:31:48 UTC
Permalink
Post by unknown
The OP was probably on the right track when he suggested that things
like SQLite (conveniently wrapped with PySQLite) had already solved
this problem.
But they haven't. They depend on messy things like server processes
constantly running, which goes against the idea of a cgi that only
runs when someone calls it.
SQLite is an in-process dbm.

Jp
Jp Calderone
2005-05-27 22:21:04 UTC
Permalink
And PySQLite conveniently wraps the relevant calls with retries when
the database is "locked" by the writing process, making it roughly a
no-brainer to use SQLite databases as nice simple log files where
you're trying to write from multiple CGI processes like the OP wanted.
Oh, ok. But what kind of locks does it use?
It doesn't really matter, does it?

I'm sure the locking mechanisms it uses have changed between different releases, and may even be selected based on the platform being used.

Jp
Piet van Oostrum
2005-05-31 10:37:45 UTC
Permalink
Isn't a write to a file that's opened as append atomic in most operating
systems? At least in modern Unix systems. man open(2) should give more
information about this.

Like:
f = file("filename", "a")
f.write(line)
f.flush()

if line fits into the stdio buffer. Otherwise os.write can be used.

As this depends on the OS support for append, it is not portable. But
neither is locking. And I am not sure if it works for NFS-mounted files.
--
Piet van Oostrum <piet at cs.uu.nl>
URL: http://www.cs.uu.nl/~piet [PGP]
Private email: piet at vanoostrum.org
fraca7
2005-05-27 13:34:05 UTC
Permalink
Post by Peter Hansen
[snip]
Try this:

http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/65203
unknown
2005-05-27 22:22:17 UTC
Permalink
Oh, ok. But what kind of locks does it use?
It doesn't really matter, does it?
Huh? Sure, if there's some simple way to accomplish the locking, the
OP's act can do the same thing without SQlite's complexity.
I'm sure the locking mechanisms it uses have changed between
different releases, and may even be selected based on the platform
being used.
Well, yes, but WHAT ARE THEY??????
jean-marc
2005-05-27 14:08:25 UTC
Permalink
Sorry, why is the temp file solution 'stupid'?, (not
aesthetic-pythonistic???) - it looks OK: simple and direct, and
certainly less 'heavy' than any db stuff (even embedded)

And collating in a 'official log file' can be done periodically by
another process, on a time-scale that is 'useful' if not
instantaneous...

Just trying to understand here...

JMD
gabor
2005-05-30 09:13:43 UTC
Permalink
Post by jean-marc
Sorry, why is the temp file solution 'stupid'?, (not
aesthetic-pythonistic???) - it looks OK: simple and direct, and
certainly less 'heavy' than any db stuff (even embedded)
And collating in a 'official log file' can be done periodically by
another process, on a time-scale that is 'useful' if not
instantaneous...
Just trying to understand here...
actually this is what i implemented after asking the question, and works
fine :)

i just thought that maybe there is a solution where i don't have to deal
with 4000 files in the temp folder :)

gabor
Do Re Mi chel La Si Do
2005-05-28 07:10:44 UTC
Permalink
Hi !


On windows, with PyWin32, to read this little sample-code :


import time
import win32file, win32con, pywintypes

def flock(file):
hfile = win32file._get_osfhandle(file.fileno())
win32file.LockFileEx(hfile, win32con.LOCKFILE_EXCLUSIVE_LOCK, 0, 0xffff,
pywintypes.OVERLAPPED())

def funlock(file):
hfile = win32file._get_osfhandle(file.fileno())
win32file.UnlockFileEx(hfile, 0, 0xffff, pywintypes.OVERLAPPED())


file = open("FLock.txt", "r+")
flock(file)
file.seek(123)
for i in range(500):
file.write("AAAAAAAAAA")
print i
time.sleep(0.001)

#funlock(file)
file.close()




Michel Claveau
Continue reading on narkive:
Loading...