Discussion:
Iterating over a binary file
Andrew MacIntyre
2004-01-07 10:02:46 UTC
Permalink
f = file(filename, 'rb')
data = f.read(1024)
someobj.update(data)
data = f.read(1024)
f.close()
The above code works, but I don't like making two read() calls. Any
way to avoid it, or a more elegant syntax? Thanks.
I believe the canonical form is:

f = file(filename, 'rb')
while 1:
data = f.read(1024)
if not data:
break
someobj.update(data)
f.close()

This was also the canonical form for text files, in the case where
f.readlines() wasn't appropriate, prior to the introduction of file
iterators and xreadlines().

--
Andrew I MacIntyre "These thoughts are mine alone..."
E-mail: andymac at bullseye.apana.org.au (pref) | Snail: PO Box 370
andymac at pcug.org.au (alt) | Belconnen ACT 2616
Web: http://www.andymac.org/ | Australia
Sambo
2004-01-07 04:18:12 UTC
Permalink
The above code works, but I don't like making two read() calls. Any
way to avoid it, or a more elegant syntax? Thanks.
f = file(filename, 'rb')
data = f.read(1024)
break
someobj.update(data)
f.close()
There's been proposals around to add an assignment-expression operator
like in C, so you could say something like
f = file(filename, 'rb')
someobj.update(data)
f.close()
It's funny, but I find the first version much more readable than the
second one. Especially if I consciously forget the "do lots of stuff
in condition part of while" indoctrination from C. If there is lots of
stuff in while you have to stare at it a bit more, and it becomes
"idiomatic", something you learn, perhaps even cookbook stuff, instead
of obvious-as-such.
Idioms exist because they're useful, and there's already plenty of
them in Python, like ''.join(stringlist) or "for i in xrange(n)" etc.
Maybe the condition in the while statement makes that statement twice
as hard to read. However, the example as a whole can still be easier,
simply because it's shorter.
Statement Reading difficulty
========= ==================
f = file(filename, 'rb') 1
while 1: 1
data = f.read(1024) 1
if len(data) <= 0: 1
break 1
someobj.update(data) 1
f.close() 1
Total reading difficulty: 7
Statement Reading difficulty
========= ==================
f = file(filename, 'rb') 1
while len(data := f.read(1024)) > 0: 2
someobj.update(data) 1
f.close() 1
Total reading difficulty: 5
I got through college on a version of this reasoning. I was a math
major. I had friends studying history and literature who said "that's
a hard subject", but I thought they were crazy. But in a normal math
class, there's one textbook that you use for the whole semester, and
you cover maybe half the chapters in it. I was able to keep up. But
in a literature course, you usually have to read a different entire
book from cover to cover EVERY WEEK. I took a couple classes like
that and barely survived. Yes, it takes a lot more effort to read a
page of a math book than a page of a novel. When you compare the
total reading load though, math was a much easier major than
literature or history.
It's the same with programs. I'd rather read 5 lines of tight code
that each actually does something, than 3 pages of loose code (the
kind that's usually written in Java) that spastically meanders trying
to do the same thing, even if the individual loose lines are easier to
read than the tight lines.
I would say, that depends on the persons competency in a given language.
Naturally once you are writing long/large programs it is better to have tight
code, but for a newby it is too much to translate at once.
While I consider myself expert in "C" , I am still learning "C++".

That does not mean a language has to lack the capability.


Then again how large a program can you or would you want to write with python?

Cheers, Sam.
Derek
2004-01-06 20:25:11 UTC
Permalink
Pardon the newbie question, but how can I iterate over blocks of data
from a binary file (i.e., I can't just iterate over lines, because
there may be no end-of-line delimiters at all). Essentially I want to
to this:

f = file(filename, 'rb')
data = f.read(1024)
while len(data) > 0:
someobj.update(data)
data = f.read(1024)
f.close()

The above code works, but I don't like making two read() calls. Any
way to avoid it, or a more elegant syntax? Thanks.
unknown
2004-01-07 00:09:13 UTC
Permalink
The above code works, but I don't like making two read() calls. Any
way to avoid it, or a more elegant syntax? Thanks.
f = file(filename, 'rb')
data = f.read(1024)
break
someobj.update(data)
f.close()
There's been proposals around to add an assignment-expression operator
like in C, so you could say something like
f = file(filename, 'rb')
someobj.update(data)
f.close()
It's funny, but I find the first version much more readable than the
second one. Especially if I consciously forget the "do lots of stuff
in condition part of while" indoctrination from C. If there is lots of
stuff in while you have to stare at it a bit more, and it becomes
"idiomatic", something you learn, perhaps even cookbook stuff, instead
of obvious-as-such.
Idioms exist because they're useful, and there's already plenty of
them in Python, like ''.join(stringlist) or "for i in xrange(n)" etc.

Maybe the condition in the while statement makes that statement twice
as hard to read. However, the example as a whole can still be easier,
simply because it's shorter.

Version 1:

Statement Reading difficulty
========= ==================

f = file(filename, 'rb') 1
while 1: 1
data = f.read(1024) 1
if len(data) <= 0: 1
break 1
someobj.update(data) 1
f.close() 1

Total reading difficulty: 7

Now the second version:

Statement Reading difficulty
========= ==================

f = file(filename, 'rb') 1
while len(data := f.read(1024)) > 0: 2
someobj.update(data) 1
f.close() 1


Total reading difficulty: 5

I got through college on a version of this reasoning. I was a math
major. I had friends studying history and literature who said "that's
a hard subject", but I thought they were crazy. But in a normal math
class, there's one textbook that you use for the whole semester, and
you cover maybe half the chapters in it. I was able to keep up. But
in a literature course, you usually have to read a different entire
book from cover to cover EVERY WEEK. I took a couple classes like
that and barely survived. Yes, it takes a lot more effort to read a
page of a math book than a page of a novel. When you compare the
total reading load though, math was a much easier major than
literature or history.

It's the same with programs. I'd rather read 5 lines of tight code
that each actually does something, than 3 pages of loose code (the
kind that's usually written in Java) that spastically meanders trying
to do the same thing, even if the individual loose lines are easier to
read than the tight lines.
Anton Vredegoor
2004-01-07 05:44:29 UTC
Permalink
Statement Reading difficulty
========= ==================
f = file(filename, 'rb') 1
while len(data := f.read(1024)) > 0: 2
someobj.update(data) 1
f.close() 1
Total reading difficulty: 5
In Python it can be done even simpler than in C, by making the
"someobj.update" method return the length of the data:

#derek.py

class X:

def update(self,data):
#print a chunk and a space
print data,
return len(data)

def test():
x = X()
f = file('derek.py','rb')
while x.update(f.read(1)):
pass
f.close()

if __name__=='__main__':
test()

IMHO the generator solution proposed earlier is more natural to some
(all?) Python programmers.

Anton
Ville Vainio
2004-01-06 21:52:30 UTC
Permalink
The above code works, but I don't like making two read() calls. Any
way to avoid it, or a more elegant syntax? Thanks.
f = file(filename, 'rb')
data = f.read(1024)
break
someobj.update(data)
f.close()
There's been proposals around to add an assignment-expression operator
like in C, so you could say something like
f = file(filename, 'rb')
someobj.update(data)
f.close()
It's funny, but I find the first version much more readable than the
second one. Especially if I consciously forget the "do lots of stuff
in condition part of while" indoctrination from C. If there is lots of
stuff in while you have to stare at it a bit more, and it becomes
"idiomatic", something you learn, perhaps even cookbook stuff, instead
of obvious-as-such.
but that's the subject of holy war around here too many times ;-). Don't
hold your breath waiting for it.
Probably true. Instead of ":=", I wouldn't mind getting rid of
expressions/statements difference as a whole.
--
Ville Vainio http://www.students.tut.fi/~vainio24
Derrick 'dman' Hudson
2004-01-06 23:50:02 UTC
Permalink
There's been proposals around to add an assignment-expression operator
like in C, so you could say something like
It's funny, but I find the first version much more readable than the
second one. Especially if I consciously forget the "do lots of stuff
in condition part of while" indoctrination from C.
but that's the subject of holy war around here too many times ;-). Don't
hold your breath waiting for it.
Probably true. Instead of ":=", I wouldn't mind getting rid of
expressions/statements difference as a whole.
Uh-oh. Don't go there. If there was no difference, then you would be
able to perform assignment, even define a class, in the condition of a
while. I don't think you want that based on what you said above. (I
certainly don't want to have to read code with such complexity!)

-D
--
For society, it's probably a good thing that engineers value function
over appearance. For example, you wouldn't want engineers to build
nuclear power plants that only _look_ like they would keep all the
radiation inside.
(Scott Adams - The Dilbert principle)

www: http://dman13.dyndns.org/~dman/ jabber: dman at dman13.dyndns.org
Daniel Ehrenberg
2004-01-07 02:21:49 UTC
Permalink
Post by Derrick 'dman' Hudson
Uh-oh. Don't go there. If there was no difference, then you would be
able to perform assignment, even define a class, in the condition of a
while. I don't think you want that based on what you said above. (I
certainly don't want to have to read code with such complexity!)
-D
I was able to create an simple text pager (like Unix's more) in some
nested list comprehensions. Just because I can do that doesn't mean
that real programs will be made like that. IMHO the difference between
statements and expressions doesn't really make sense, and it is one of
the few advantages Lisp/Scheme (and almost Lua) has over Python.

Daniel Ehrenberg
unknown
2004-01-06 20:58:38 UTC
Permalink
f = file(filename, 'rb')
data = f.read(1024)
someobj.update(data)
data = f.read(1024)
f.close()
The above code works, but I don't like making two read() calls. Any
way to avoid it, or a more elegant syntax? Thanks.
You can make it even uglier:

f = file(filename, 'rb')
while 1:
data = f.read(1024)
if len(data) <= 0:
break
someobj.update(data)
f.close()

There's been proposals around to add an assignment-expression operator
like in C, so you could say something like

f = file(filename, 'rb')
while len(data := f.read(1024)) > 0:
someobj.update(data)
f.close()

but that's the subject of holy war around here too many times ;-). Don't
hold your breath waiting for it.
Jp Calderone
2004-01-07 03:24:16 UTC
Permalink
Post by Derek
Pardon the newbie question, but how can I iterate over blocks of data
from a binary file (i.e., I can't just iterate over lines, because
there may be no end-of-line delimiters at all). Essentially I want to
f = file(filename, 'rb')
data = f.read(1024)
someobj.update(data)
data = f.read(1024)
f.close()
The above code works, but I don't like making two read() calls. Any
way to avoid it, or a more elegant syntax? Thanks.
f = file(filename, 'rb')
for data in iter(lambda: f.read(1024), ''):
someobj.update(data)
f.close()

Jp
Peter Otten
2004-01-06 20:44:27 UTC
Permalink
Post by Derek
Pardon the newbie question, but how can I iterate over blocks of data
from a binary file (i.e., I can't just iterate over lines, because
there may be no end-of-line delimiters at all). Essentially I want to
f = file(filename, 'rb')
data = f.read(1024)
someobj.update(data)
data = f.read(1024)
f.close()
The above code works, but I don't like making two read() calls. Any
way to avoid it, or a more elegant syntax? Thanks.
You can tuck away the ugliness in a generator:

def blocks(infile, size=1024):
while True:
block = infile.read(size)
if len(block) == 0:
break
yield block

#use it:
for data in blocks(f):
someobj.update(data)

Peter
Peter Abel
2004-01-07 13:20:50 UTC
Permalink
Post by Derek
Pardon the newbie question, but how can I iterate over blocks of data
from a binary file (i.e., I can't just iterate over lines, because
there may be no end-of-line delimiters at all). Essentially I want to
f = file(filename, 'rb')
data = f.read(1024)
someobj.update(data)
data = f.read(1024)
f.close()
The above code works, but I don't like making two read() calls. Any
way to avoid it, or a more elegant syntax? Thanks.
There's an aproach to mimic the following C-statements in Python:

while (result = f.read(1024))
{
do_some_thing(result);
}
... global result
... result=val
... return val
...
Post by Derek
f=file('README.txt','rb')
... print len(result)
...

121
Post by Derek
f.close()
Regards
Peter

Continue reading on narkive:
Loading...