FIle transfer over network

Discussion:

FIle transfer over network - with Pyro?

Nathan Huesken

2010-06-03 17:47:45 UTC

Hi,

I am writing a network application which needs from time to time do
file transfer (I am writing the server as well as the client).
For simple network messages, I use pyro because it is very comfortable.
But I suspect, that doing a file transfer is very inefficient over
pyro, am I right (the files are pretty big)?

I somehow need to ensure, that the client requesting a file transfer is
the same client getting the file. So some sort of authentication is
needed.

What library would you use to do the file transfer?
Regards,
Nathan

Dan Stromberg

2010-06-03 18:58:17 UTC

Permalink

Post by Nathan Huesken
Hi,
I am writing a network application which needs from time to time do
file transfer (I am writing the server as well as the client).
For simple network messages, I use pyro because it is very comfortable.
But I suspect, that doing a file transfer is very inefficient over
pyro, am I right (the files are pretty big)?
I somehow need to ensure, that the client requesting a file transfer is
the same client getting the file. So some sort of authentication is
needed.
What library would you use to do the file transfer?
Regards,
Nathan

I've never used Pyro, but for a fast network file transfer in Python,
I'd probably use the socket module directly, with a cache oblivious
algorithm:
http://en.wikipedia.org/wiki/Cache-oblivious_algorithm

It doesn't use sockets, it uses files, but I recently did a Python
progress meter application that uses a cache oblivious algorithm that
can get over 5 gigabits/second throughput (that's without the network
in the picture, though if it were used on 10 Gig-E with a suitable
transport it could probably do nearly that), on a nearly-modern PC
running Ubuntu with 2 cores It's at:
http://stromberg.dnsalias.org/~strombrg/gprog/ .

For a simple example of using sockets in python (without a cache
oblivious algorithm, unfortunately), you could glance at this:
http://stromberg.dnsalias.org/~strombrg/pnetcat.html

HTH :)

exarkun

2010-06-03 20:05:15 UTC

Permalink

Post by Dan Stromberg

Post by Nathan Huesken
Hi,
I am writing a network application which needs from time to time do
file transfer (I am writing the server as well as the client).
For simple network messages, I use pyro because it is very
comfortable.
But I suspect, that doing a file transfer is very inefficient over
pyro, am I right (the files are pretty big)?
I somehow need to ensure, that the client requesting a file transfer is
the same client getting the file. So some sort of authentication is
needed.
What library would you use to do the file transfer?
Regards,
Nathan

I've never used Pyro, but for a fast network file transfer in Python,
I'd probably use the socket module directly, with a cache oblivious
http://en.wikipedia.org/wiki/Cache-oblivious_algorithm
It doesn't use sockets, it uses files, but I recently did a Python
progress meter application that uses a cache oblivious algorithm that
can get over 5 gigabits/second throughput (that's without the network
in the picture, though if it were used on 10 Gig-E with a suitable
transport it could probably do nearly that), on a nearly-modern PC
http://stromberg.dnsalias.org/~strombrg/gprog/ .

This seems needlessly complicated. Do you have a hard drive that can
deliver 5 gigabits/second to your application? More than likely not.

A more realistic answer is probably to use something based on HTTP.
This solves a number of real-world problems, like the exact protocol to
use over the network, and detecting network issues which cause the
transfer to fail. It also has the benefit that there's plenty of
libraries already written to help you out.

Jean-Paul

Dan Stromberg

2010-06-05 17:14:13 UTC

Permalink

Post by exarkun

Post by Dan Stromberg

Post by Nathan Huesken
Hi,
I am writing a network application which needs from time to time do
file transfer (I am writing the server as well as the client). For
simple network messages, I use pyro because it is very comfortable.
But I suspect, that doing a file transfer is very inefficient over
pyro, am I right (the files are pretty big)?
I somehow need to ensure, that the client requesting a file transfer is
the same client getting the file. So some sort of authentication is
needed.
What library would you use to do the file transfer? Regards,
Nathan

I've never used Pyro, but for a fast network file transfer in Python,
I'd probably use the socket module directly, with a cache oblivious
http://en.wikipedia.org/wiki/Cache-oblivious_algorithm
It doesn't use sockets, it uses files, but I recently did a Python
progress meter application that uses a cache oblivious algorithm that
can get over 5 gigabits/second throughput (that's without the network in
the picture, though if it were used on 10 Gig-E with a suitable
transport it could probably do nearly that), on a nearly-modern PC
http://stromberg.dnsalias.org/~strombrg/gprog/ .

This seems needlessly complicated. Do you have a hard drive that can
deliver 5 gigabits/second to your application? More than likely not.

Most such programs aren't optimized well for one machine, let alone
adapting well to the cache-related specifics of about any transfer - so
the thing you're using to measure performance, instead becomes the
bottleneck itself. I don't think I'd use an oral thermometer that gave a
patient a temporarily higher fever, and it'd be nice if I didn't have to
retune the thermometer for each patient, too.

Besides, it's a _conceptually_ simple algorithm - keep the n best-
performing block sizes, and pick the best one historically for subsequent
writes, trying a different, random blocksize once in a while even if
things are going well with the current blocksize. It's actually
something I learned about as an undergrad from a favorite professor, who
was a little insistent that hard coding a "good" block size for the
specifics of a single machine was short sighted when you care about
performance, as code almost always moves to a different machine (or a
different disk, or a different network peer) eventually. Come to think
of it, she taught two of my 3 algorithms classes. Naturally, she also
said that you shouldn't tune for performance unnecessarily.

Post by exarkun
A more realistic answer is probably to use something based on HTTP. This
solves a number of real-world problems, like the exact protocol to use
over the network, and detecting network issues which cause the transfer
to fail. It also has the benefit that there's plenty of libraries
already written to help you out.

Didn't the OP request something fast? HTTP code is prone to be
"optimized" for small transfers (if that), as most of the web is small
files.

OP: I should mention: If you're on gigabit or better, you probably should
speak with your sysadmin about enabling Jumbo Frames and Path MTU
Discovery - otherwise, even a cache oblivious algorithm likely won't be
able to help much - the CPU would likely get pegged too early. If, on
the other hand, you only care about 10BaseT speeds, or perhaps even
100BaseT speeds, HTTP would probably be fine (a typical CPU today can
keep up with that fine), especially if you're doing a single transfer at
a time.

geremy condra

2010-06-05 20:34:45 UTC

Permalink

Post by Dan Stromberg

Post by exarkun
A more realistic answer is probably to use something based on HTTP. This
solves a number of real-world problems, like the exact protocol to use
over the network, and detecting network issues which cause the transfer
to fail. ?It also has the benefit that there's plenty of libraries
already written to help you out.

Didn't the OP request something fast?

Nope. He pointed out that pyro is not efficient and asked what libraries
we would use.

OP: HTTP is a reasonable choice unless you need really extreme
performance.

Geremy Condra

Nathan Huesken

2010-06-09 17:27:22 UTC

Permalink

Thanks for all the replies.
I might use http, or I utilize a separate ftp server.

On Sat, 5 Jun 2010 13:34:45 -0700

Post by geremy condra

Post by Dan Stromberg

Post by exarkun
A more realistic answer is probably to use something based on
HTTP. This solves a number of real-world problems, like the exact
protocol to use over the network, and detecting network issues
which cause the transfer to fail. ?It also has the benefit that
there's plenty of libraries already written to help you out.

Didn't the OP request something fast?

Nope. He pointed out that pyro is not efficient and asked what
libraries we would use.
OP: HTTP is a reasonable choice unless you need really extreme
performance.
Geremy Condra

Irmen de Jong

2010-06-03 21:07:43 UTC

Permalink

How big is 'pretty big'?

Pyro could work just fine for file transfers, depending on the size of
the files, how often you need to transfer, and the speed of your computer.

But you are correct that Pyro has substantial overhead compared to a
solution specifically designed for file transfer (such as copying over
network file system or network shares, ftp, or http). It boils down to:
- reading the whole file into memory before it can be transferred
- needing to pickle/unpickle the file data

This might or might not be an actual problem.

If you really need another protocol for your file transfers, I agree
with Jean-Paul to just use a HTTP based solution (web server). Or maybe
simply copy the file over a network share/network filesystem? Both can
be configured to require proper authentication.

Irmen