A new and very robust method for doing file locking over NFS?

Discussion:

Douglas Alan

2003-04-18 23:22:34 UTC

If the actions between the dashed lines all took place within the same
wall second, client 1 may read its own cached data instead of the data
written by client 2. Our eventual solution was a kernel patch which let
us evict any cached data for a particular file from memory, and we did
it every time we performed a locking operation. You can bet that
performance suffered, but at least it was correct.

Hmmm, why not just fetch the time in between locking and unlocking and
if a second has not passed, sleep for the difference? I suppose this
wouldn't work if you routinely need to lock a file for writing at a
rate of more than once a second, but there probably aren't that many
applications that write to files over NFS that need to be able to to
do that.

Thanks for the info -- it is very helpful.

|>oug

Andy Jewell

2003-04-18 11:30:08 UTC

Permalink

I'd like to do file locking over NFS without using lockd. The reason
I want to avoid using lockd is because many lockd implementations are
too buggy.
It is fairly easy to avoid using lockd -- just avoid using lockf() to
lock a file. Instead of using lockf(), lock a file by creating a lock
file that you open with the O_CREAT | O_EXCL flags. To unlock the
file, you merely unlink the lock file. This method is fairly
reliable, except that there is a small chance with every file lock or
unlock that something will go wrong.
(1) The lock file might be created without the client realizing that
it has been created if the file creation acknowledgement is lost due
to severe network problems. The file being locked would then remain
locked forever (until someone manually deletes the lock) because no
process would take responsibility for having locked the file. This
failure symptom is relatively benign for my purposes and if needed it
can be fixed via the approach described in the Red Hat man page for
the open() system call.
(2) When a process goes to remove its file lock, the acknowledgement
for the unlink() could be lost. If this happens, then the NFS driver
on the client could accidentally unlink a lock file created by another
process when it retries the unlink() request. This failure symptom is
pretty bad for my purposes, since it could cause a structured file to
become corrupt.
I have an idea for a slightly different way of doing file locking that
I think solves problem #2 (and also solves problem #1). What if,
instead of using a lock file to lock a file, we rename the file to
something like "filename.locked.hostname.pid"? If the rename()
acknowledgement gets lost, the client will see the rename() system
call as having failed due to the file not existing. But in this case
it can then check for the existence of "filename.locked.hostname.pid".
If this file exists, then the process knows that the rename() system
call didn't actually fail--the acknowledgement just got lost. Later,
when the process goes to unlock the file, it will rename the file back
to "filename". Again, if the rename system call appears to fail, the
process can check for the existance of "filename.locked.hostname.pid".
If the file no longer exists, then it knows the rename call really did
succeed, and again the acknowledgement just got lost.
How does this sound? Is this close to foolproof, or am missing
something?
I'm not much of an NFS expert, so I am a bit worried that there are
details of NFS client-side caching that I don't understand that would
prevent this scheme from working without some modification.
|>oug

It depends on how other processes are playing with your files; what if another
program is just scanning directories and opening files at random? If this
program doesn't know your locking scheme, it'll just walk all over your
process...

However, I have used a similar method to this (on windows) when I was playing
with a distributed file migration tool I wrote. I had a control directory
which contained a queue of command-files waiting to be processed, and each
participating process had a directory named <hostname>.<pid>, and a log
directory. When a process 'claimed' a command-file, it would try to move it
to it's own subdirectory; if that failed, it would just assume that another
process got there first and then pick another one. Once each process had
completed the specified work, it would delete the command-file and write a
log file, named <hostname>.<pid>.<command-file-name> to the log directory
specifying how everything had gone. This worked fine with about 6 'helpers',
each running two or three processes. The 'engine' ran fine - with no race
conditions evident and no file corruptions or collisions. It was a bit of a
bar-steward to *STOP* (just as any multi-process program), as busy 'remote
threads' would have to complete their work before they would see the 'all
stop' flag. The most difficult thing, however, was recovering from a fatal
error, (such as when I hit Python's recursion limit), where the supervisor
process died, but the workers continued to try to process, or when I'd made
some command error or such and the workers all 'sulked'. It was fun doing it
though, in the end, the performance boost wasn't enough to warrant all the
hassle, so I reverted back to the linear version :-)

Anyway, the point I'm making is that, if you have complete control over the
location of the files you are trying to lock, and know that no rogue
processes will interfere, you should be ok. Remember that other programs
will not honour your locks unless you modify them to do so (with windows that
may be difficult...). I would suggest moving the file to another directory,
though, rather than simply renaming it, as it's easier to see what's going
on. It's also a good idea to disambiguate your pid's by adding the hostname
to them - pid's are only guaranteed to be unique within a host, so two remote
processes on different machines could possibly (read: will eventually) end up
with the same pid .

Windows file locking is just as unpredictable as NFS, particularly when you
are mixing different versions on the same network.

Using this renaming scheme does give you one particular edge - portability, as
this should work equally well on windoze, mac and [u|li]n[i|u]ix.

Good luck,

-andyj

Jeff Epler

2003-04-18 19:46:23 UTC

Permalink

The other problem you run into with NFS is that the client might see
stale data.

In old-fashioned NFS client/server setups, a client does an attribute
fetch to determine whether cached data (file contents) is still current.
Time stamps have 1-second granularity. If you have an operation that
does not require very long to complete, a client can see out-of-date
information. Example (lock/unlock correspond to calls to a custom lock
daemon over TCP, just as you suggest):
Client 1 Client 2
Lock file
---------------------
Write something
Unlock

Lock file
Read it
Unlock
Lock file
Write something
---------------------
Unlock
Lock file
read it

If the actions between the dashed lines all took place within the same
wall second, client 1 may read its own cached data instead of the data
written by client 2. Our eventual solution was a kernel patch which let
us evict any cached data for a particular file from memory, and we did
it every time we performed a locking operation. You can bet that
performance suffered, but at least it was correct.

I don't know if any of these problems have been corrected. I think that
some NFS servers have timestamps that have finer resolution than one
second.

Jeff

Douglas Alan

2003-04-17 19:56:17 UTC

Permalink

I'd like to do file locking over NFS without using lockd. The reason
I want to avoid using lockd is because many lockd implementations are
too buggy.

It is fairly easy to avoid using lockd -- just avoid using lockf() to
lock a file. Instead of using lockf(), lock a file by creating a lock
file that you open with the O_CREAT | O_EXCL flags. To unlock the
file, you merely unlink the lock file. This method is fairly
reliable, except that there is a small chance with every file lock or
unlock that something will go wrong.

The two failure symptoms, as I understand things, are as follows:

(1) The lock file might be created without the client realizing that
it has been created if the file creation acknowledgement is lost due
to severe network problems. The file being locked would then remain
locked forever (until someone manually deletes the lock) because no
process would take responsibility for having locked the file. This
failure symptom is relatively benign for my purposes and if needed it
can be fixed via the approach described in the Red Hat man page for
the open() system call.

(2) When a process goes to remove its file lock, the acknowledgement
for the unlink() could be lost. If this happens, then the NFS driver
on the client could accidentally unlink a lock file created by another
process when it retries the unlink() request. This failure symptom is
pretty bad for my purposes, since it could cause a structured file to
become corrupt.

I have an idea for a slightly different way of doing file locking that
I think solves problem #2 (and also solves problem #1). What if,
instead of using a lock file to lock a file, we rename the file to
something like "filename.locked.hostname.pid"? If the rename()
acknowledgement gets lost, the client will see the rename() system
call as having failed due to the file not existing. But in this case
it can then check for the existence of "filename.locked.hostname.pid".
If this file exists, then the process knows that the rename() system
call didn't actually fail--the acknowledgement just got lost. Later,
when the process goes to unlock the file, it will rename the file back
to "filename". Again, if the rename system call appears to fail, the
process can check for the existance of "filename.locked.hostname.pid".
If the file no longer exists, then it knows the rename call really did
succeed, and again the acknowledgement just got lost.

How does this sound? Is this close to foolproof, or am missing
something?

I'm not much of an NFS expert, so I am a bit worried that there are
details of NFS client-side caching that I don't understand that would
prevent this scheme from working without some modification.

|>oug

Charlie Reiman

2003-04-18 16:27:08 UTC

Permalink

I like this idea a lot but it has one drawback. In the traditional
lockfile approach, you can have multiple readers even if someone is
writing. Now that you rename the file, eveytime someone is writing the
filename is a moving target. Depending on how you intend to use the
file, this might make your solution a no-go from the get-go.

But as long as your locking behavior applies to both reading and
writing, it seems quite foolproof.

FWIW, I've never had any problems with NFS for traditional lock
files. NFS works pretty hard to get messages where they need to
be. If I were that deeply concerned or running across an unreliable
network, I'd probably write my own lock daemon and run it over TCP.

Charlie.

Douglas Alan

2003-04-21 21:44:56 UTC

Permalink

Post by Charlie Reiman
I like this idea a lot but it has one drawback. In the traditional
lockfile approach, you can have multiple readers even if someone is
writing. Now that you rename the file, eveytime someone is writing the
filename is a moving target. Depending on how you intend to use the
file, this might make your solution a no-go from the get-go.

Yes, that is an issue to consider. In the application I am
considering this for, it is adequate for the locks to be completely
exclusive.

If one wanted to use my approach for non-exclusive locks, I think that
other processes could glob for "filename.locked.*.*", or the locking
process could leave a symbolic link behind. If a process wants to
read a file and it isn't there, it would wait in a sleep loop for
either "filename" or "filename.locked.*.*" to appear (or for the
symbolic link to appear, depending on which method we chose). If
neither appears after some amount of time, then the process would
conclude that the file has gone missing and generate an error.

It is possible, although unlikely, that if many processes were locking
and unlocking the file, that one of the processes might see neither
"filename" nor "filename.locked.*.*" every time it looks for them,
even after many attempts, and even though the file is really there.
In that case we would get a spurious "file missing error", but this
would not result in any corrupted data, and if we make the locking
processes persistent and patient enough, this will probably never
occur in our lifetimes.

Post by Charlie Reiman
FWIW, I've never had any problems with NFS for traditional lock
files. NFS works pretty hard to get messages where they need to
be.

This is somewhat encouraging to hear. Does your positive experience
extend to Linux NFS servers and clients?

Post by Charlie Reiman
If I were that deeply concerned or running across an unreliable
network, I'd probably write my own lock daemon and run it over TCP.

That's certainly another reasonable approach. I'd prefer not to have
a centralized daemon, however, unless absolutely necessary, because it
makes the software more difficult to install and administer.

|>oug

Douglas Alan

2003-04-21 21:29:48 UTC

Permalink

Post by Andy Jewell
It depends on how other processes are playing with your files; what
if another program is just scanning directories and opening files at
random? If this program doesn't know your locking scheme, it'll
just walk all over your process...

This is just business as usual for Unix, so I'm used to living with
this worry. File locking on Unix is typically only advisory, even
when locking files exclusively on local filesystems, and I've seen on
occasion, for instance, Linux distributions where the mail delivery
agent and the mail reading agent were compiled to use different
advisory locking methods. When this happens the result, of course, is
not so robust.

In any case, I am glad to hear that you have had success with a very
similar strategy as mine. It gives me some confidence that my
approach will work well.

|>oug

Ben Hutchings

2003-04-28 18:35:31 UTC

Permalink

Yes, this is the method described in the Red Hat man page for the
open() system call, which I mentioned, but it doesn't solve the
unlocking problem, which remains, I believe, as I described.

You're quite right. There probably isn't a solution. Welcome to NFS.

Douglas Alan

2003-04-25 11:46:32 UTC

Permalink

Yes, this is the method described in the Red Hat man page for the
open() system call, which I mentioned, but it doesn't solve the
unlocking problem, which remains, I believe, as I described.

|>oug

Ben Hutchings

2003-04-24 14:47:47 UTC

Permalink

<snip>

O_EXCL doesn't work with NFS prior to version 3, and maybe not even
then.

The usual solution for creating a lock file in an NFS-safe way is:

1. Create a file in the target directory with a unique name (usually
involving hostname and pid).
2. Attempt to link the lock file name to this file (using the temporary
file name). Ignore whether this succeeds or fails, as the result is
not reliable.
3. stat() the target file name and record the number of links.
4. Unlink the original file name.
5. If the number of links recorded in step 3 was 2, continue, else fail.
(Or try again, up to some maximum number of times.)

A cleaner solution would be to use a client-server protocol instead of
synchronising through the file system, but that's not so easy to do.

In Python:

hostname = socket.gethostname().split('.')[0]
unique_name = '%s.%s-%d' % (lock_name, hostname, os.getpid())
for i in range(5):
os.close(os.open(unique_name, os.O_CREAT|os.O_WRONLY))
try:
try:
os.link(lock_name, unique_name)
except OSError:
pass
if os.stat(lock_name).st_nlink == 2:
break
finally:
os.unlink(unique_name)
else:
raise OSError((errno.EEXIST, os.strerror(errno.EEXIST), lock_name))