Post by exarkunPost by Dan StrombergPost by Nathan HueskenHi,
I am writing a network application which needs from time to time do
file transfer (I am writing the server as well as the client). For
simple network messages, I use pyro because it is very comfortable.
But I suspect, that doing a file transfer is very inefficient over
pyro, am I right (the files are pretty big)?
I somehow need to ensure, that the client requesting a file transfer is
the same client getting the file. So some sort of authentication is
needed.
What library would you use to do the file transfer? Regards,
Nathan
I've never used Pyro, but for a fast network file transfer in Python,
I'd probably use the socket module directly, with a cache oblivious
http://en.wikipedia.org/wiki/Cache-oblivious_algorithm
It doesn't use sockets, it uses files, but I recently did a Python
progress meter application that uses a cache oblivious algorithm that
can get over 5 gigabits/second throughput (that's without the network in
the picture, though if it were used on 10 Gig-E with a suitable
transport it could probably do nearly that), on a nearly-modern PC
http://stromberg.dnsalias.org/~strombrg/gprog/ .
This seems needlessly complicated. Do you have a hard drive that can
deliver 5 gigabits/second to your application? More than likely not.
Most such programs aren't optimized well for one machine, let alone
adapting well to the cache-related specifics of about any transfer - so
the thing you're using to measure performance, instead becomes the
bottleneck itself. I don't think I'd use an oral thermometer that gave a
patient a temporarily higher fever, and it'd be nice if I didn't have to
retune the thermometer for each patient, too.
Besides, it's a _conceptually_ simple algorithm - keep the n best-
performing block sizes, and pick the best one historically for subsequent
writes, trying a different, random blocksize once in a while even if
things are going well with the current blocksize. It's actually
something I learned about as an undergrad from a favorite professor, who
was a little insistent that hard coding a "good" block size for the
specifics of a single machine was short sighted when you care about
performance, as code almost always moves to a different machine (or a
different disk, or a different network peer) eventually. Come to think
of it, she taught two of my 3 algorithms classes. Naturally, she also
said that you shouldn't tune for performance unnecessarily.
Post by exarkunA more realistic answer is probably to use something based on HTTP. This
solves a number of real-world problems, like the exact protocol to use
over the network, and detecting network issues which cause the transfer
to fail. It also has the benefit that there's plenty of libraries
already written to help you out.
Didn't the OP request something fast? HTTP code is prone to be
"optimized" for small transfers (if that), as most of the web is small
files.
OP: I should mention: If you're on gigabit or better, you probably should
speak with your sysadmin about enabling Jumbo Frames and Path MTU
Discovery - otherwise, even a cache oblivious algorithm likely won't be
able to help much - the CPU would likely get pegged too early. If, on
the other hand, you only care about 10BaseT speeds, or perhaps even
100BaseT speeds, HTTP would probably be fine (a typical CPU today can
keep up with that fine), especially if you're doing a single transfer at
a time.