Discussion:
iterating over lines in a file
(too old to reply)
Donn Cave
2000-07-22 15:54:05 UTC
Permalink
Quoth nobody <no at bo.dy>:
| cjc26 at nospam.cornell.edu (Cliff Crawford), in
| <slrn8nhj0s.jr0.cjc26 at synecdoche.sowrong.org>:
| > * nobody <no at bo.dy> menulis:
|>> now, this newbie has run into another perl idiom he'd like to figure
|>> out how to rewrite in python - "while (<FILE>) { print; }" - and, by
|>> extension, how to get python to print the string i hand it, the whole
|>> string i hand it, and nothing but the string i hand it?
|
|> Hmm..not quite sure what you mean..maybe you want to use
|> sys.stdout.write() instead?
|
| perhaps. unfortunately, i haven't had time to work on this since i made
| that posting, so i have got no further; i expect there's some fairly easy
...

He's right. Go ahead and get to work on your program, and use
sys.stdout.write(). Read up on the file object in general, for
more details.

|> print doesn't really try to be clever with its arguments, except for
|> printing a space between each one and a newline at the end.
|
| that's trying to be clever. then, if i'm reading raw lines from a file and
| want to print them verbatim, i have to strip a newline somewhere somehow?
| how do i copy a file to another one, is there a file.copy method in some
| module somewhere?

Not to my knowledge, but you don't need it. To read, I reckon you're
probably using the file object read() or readline() functions. The
file object write() function is the inverse operation and doesn't
strip anything. (Well, note that on some platforms the underlying
C library functions may tamper with the output and input data to
get the CR/LF issues right, so open as binary if that's an issue.)
The "print" statement is not for applications like this.

The os module also exports the POSIX 1003.1 open(), read() and write()
functions, which in a few exotic situations may be more practical than
a file object.

Donn Cave, donn at oz.net
Alex Martelli
2000-07-20 14:41:38 UTC
Permalink
"Roger Upole" <rupole at compaq.net> wrote in message
Using an initial read is a common enough idiom in any language.
It is extremely rare in languages which allow assignment in
expressions, and would normally denote lack of familiarity
with the language's idioms.
f=open('filename','r')
fline = f.readline()
....
fline = f.readline()
Everything should be done ONCE, and only ONCE. In ONE
place in the code. This expands the abstract operation
"get next thingy if any" in two places, just because of a
language quirk.
line = self.source.readline()
self.line = line
return line # may be empty, thus false
I don't see why not just:

def readline(self):
self.line = self.source.readline()
return self.line

Seems to have exactly the same semantics; the local-variable
line does not appear to be playing any role.
could somebody enlighten me, please? and is there any easier way to
iterate over lines in a file without resorting to ugly break statements?
The fileinput module does just that in a very elegant way, IMHO:

import fileinput

for line in fileinput.input("myfile.txt"):
# do whatever you wish with line


Alex
Alex Martelli
2000-07-21 22:25:21 UTC
Permalink
"Roger Upole" <rupole at compaq.net> wrote in message
I must take exception to your insistence that
"Everything should be done ONCE, and only ONCE. In ONE
place in the code."
Hey, if you must, you must. Maybe I should have said "must
be done" rather than "should be done", too:-).
Besides sounding didactic,
Uh, is this supposed to be bad? My prose's main defect is
generally that of being long and convoluted. If for once
I've managed to be short and pithy, it was no doubt because
I was subconsciously quoting some Great Master. Or maybe
channeling Him. (Kent Beck...? Ward Cunningham...?)
it is also completely false. In many cases,
an initial read is necessary to determine how (or even if) the rest of a
file will be processed. Also, different code may need to be called if
the file is empty.
Yes, such special cases do indeed come up (when one just
cannot control the specs/fileformats). In this case, the
general structure tends to be:
read the first piece if any
if appropriate, do the rest

The "first piece" (a line, if you're lucky; but, often,
a bunch of them -- e.g., all lines up to the first empty
one, included) gets read *and processed*, *once*. Then,
if appropriate, after the first piece has set up stuff,
"the rest" gets read and processed.

In other words, the read-the-first-piece part does not
leave you with an already-read beginning-of-the-rest
ready to be processed before further reading, in all
such cases. When it does (the read-item that tells you
the first-piece is finished is not a separator/terminator
token, but is already the beginning of the-rest), then
you do have an input structure that may lend itself to
your favourite idiom (although in some cases, pushing
the inappropriate item back to be pseudo-reread next is
also a very useful idiom).

But then, the semantic role of the reading in the two
places differs. If you pre-test, transform, &c, the
line just read, you will probably do it in different
ways for the prologue and the main-body. E.g.,
consider:

line = f.readline()
while we_are_in_headers(line):
process_as_headers(line)
line = f.readline()
while we_are_in_main(line):
process_as_main(line)
line = f.readline()

i.e., we're now reading-next-line in *three* places.
While the different things we're doing are *two*.
Our code's structure does not ideally reflect its
internal logic; once again, the initial readline
stands out as an artificial construct.

We can restore the balance, and regain simplicity,
with a little bit of abstraction. Maybe a bit too
extreme, but sort of nice, for example:

headers = Filter(f, we_are_in_headers, process_as_headers)
while headers.more():
headers.next()

main = Filter(f, we_are_in_main, process_as_main)
while main.more():
main.next()

Of course, there's no need for the while loops and
the more and next methods of the Filter class, which
in fact could perfectly well be a function and just
do everything itself.

So, our main code can become:
process(f, we_are_in_headers, process_as_headers)
process(f, we_are_in_main, process_as_main)
utterly simple; and the function process loops just
once, readline's in just one place, and has all the
simplicity one could wish...:

def process(file, testfun, procfun):
while 1:
line=file.readline()
if !testfun(line):
return
procfun(line)

...well, all the simplicity except one little detail:
we _again_ have the "while 1:" idiom!-)
Additionally, this overly simplistic programming style would be completely
inadequate for most involved programming tasks.
The most involved tasks have the most ferocious need
for an extremistic pursuit of simplicity. The
unpleasant issue of the "extra initial readline"
should, IMHO, rankle even more if the actual task
looms pretty complicated, because we surely _don't_
need extra complication when the tasks itself supplies
a lot. Python has just about the right degree of
_sophistication_ to let me refactor things until they
are satisfactorily _simple_...

...and among simplicity's cornerstones, "code each
[different] thing ONCE", i.e., "in ONE place", stands
pretty high indeed.


Alex
nobody
2000-07-24 12:45:43 UTC
Permalink
David Bolen <db3l at fitlinxx.com>, in <uhf9g4eam.fsf at ctwd0143.fitlinxx.com>:

[many thanksworthy things snipped - thanks!]
Just to note too - you mention reading "raw lines" - that's sort of
oxymoronic, since by definition "lines" have format information (end of
line markers that differ by platform) that have to be processed, and
thus you aren't really reading anything "raw" :-)
i wasn't really thinking about that, but now that you mention it, you're
right about that. i'm not just new to python, i'm fairly new to programming
in general, and i don't always have the philosophical insights the right
way around yet. they're the hardest, and most important, part to get.
Oh BTW, if you really just wanted a file copy, then yes, you could also
use the higher level copy() method in the shutil module.
what i was actually trying to do was a prepend-this-string-to-that-file
routine, and i think i got it. the straight translation from perl got
really, really long due to all the try-except pairs; i like exceptions
better than perl's "do {} or die" idiom as they're clearer and seem more
flexible, but they're sure more verbose, too.
David Bolen
2000-07-24 17:51:30 UTC
Permalink
Post by nobody
what i was actually trying to do was a prepend-this-string-to-that-file
routine, and i think i got it. the straight translation from perl got
really, really long due to all the try-except pairs; i like exceptions
better than perl's "do {} or die" idiom as they're clearer and seem more
flexible, but they're sure more verbose, too.
You may find that by applying the exception handling more
strategically, that you can both streamline your code and make it more
readable.

If you're used to checking result codes from a function, the
inclination is probably to place every function call in its own
try/except clause, which can lead to pretty verbose code. But if you
look at the problem from the perspective of recovery, you'll probably
find that your exception handling makes more sense at a higher level
than the lowest function calls.

For example, in your function to prepend-this-string-to-that-file, you
probably have several operations that can fail (the open, the I/O, and
so on). But it's the integrity of the overall operation that you are
probably concerned with, so I'd expect that a single try/except clause
around the operation in general would be sufficient for exception
handling. Without seeing the code, I'd bet that the failure course is
pretty similar regardless of how or where within the process of
prepending the line things fail. (Of course, I could be wrong too :-))

--
-- David
--
/-----------------------------------------------------------------------\
\ David Bolen \ E-mail: db3l at fitlinxx.com /
| FitLinxx, Inc. \ Phone: (203) 708-5192 |
/ 860 Canal Street, Stamford, CT 06902 \ Fax: (203) 316-5150 \
\-----------------------------------------------------------------------/
nobody
2000-07-22 15:46:07 UTC
Permalink
cjc26 at nospam.cornell.edu (Cliff Crawford), in
Post by Donn Cave
Post by Donn Cave
now, this newbie has run into another perl idiom he'd like to figure
out how to rewrite in python - "while (<FILE>) { print; }" - and, by
extension, how to get python to print the string i hand it, the whole
string i hand it, and nothing but the string i hand it?
Hmm..not quite sure what you mean..maybe you want to use
sys.stdout.write() instead?
perhaps. unfortunately, i haven't had time to work on this since i made
that posting, so i have got no further; i expect there's some fairly easy
way to write (copy) all the lines from one file to another one, possibly
(though not necessarily) stdout. perl is nice in that this task is a one-
liner in that language; i don't *need* it that concise, but it would be
nice. i like one-liners.
Post by Donn Cave
Post by Donn Cave
i despise machines trying to second- guess my intentions, and silly
little print statements trying to be clever with their arguments are
nothing more than that.
print doesn't really try to be clever with its arguments, except for
printing a space between each one and a newline at the end.
that's trying to be clever. then, if i'm reading raw lines from a file and
want to print them verbatim, i have to strip a newline somewhere somehow?
how do i copy a file to another one, is there a file.copy method in some
module somewhere?
Post by Donn Cave
Maybe what you're looking for is the format operator?
it's certainly useful, but will it stop the print statement from
outputting things i may or occasionally may not want output?

[...]
Post by Donn Cave
Post by Donn Cave
i like it, i just wish its functions would be more consistent about
what sort of regexps they want - either all compiled or all not
compiled; i'm seeing some wanting one and some the other, for some
reason. might be just my system, i suppose...
The two should be interchangable..AFAIK wherever you can use a compiled
regexp, you can use an uncompiled one, and vice-versa.
that's what the documentation claims, but here's what i get:

Python 1.5.2 (#1, Feb 1 2000, 16:32:16) [GCC egcs-2.91.66 19990314/Linux (egcs- on linux-i386
Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
Post by Donn Cave
Post by Donn Cave
import re
test="this is a test string"
reg=r"est"
reg_c=re.compile(reg)
match=re.match(reg_c,test)
Traceback (innermost last):
File "<stdin>", line 1, in ?
File "/usr/lib/python1.5/re.py", line 40, in match
return _cachecompile(pattern, flags).match(string)
File "/usr/lib/python1.5/re.py", line 33, in _cachecompile
value = compile(pattern, flags)
File "/usr/lib/python1.5/re.py", line 79, in compile
code=pcre_compile(pattern, flags, groupindex)
TypeError: argument 1: expected string, instance found
Post by Donn Cave
Post by Donn Cave
match=re.match(reg,test)
the compiled one works in re.findall, though. i've no idea why this
bites me in this way, but it's not too hard to work around, so...
Cliff Crawford
2000-07-27 18:38:18 UTC
Permalink
* nobody <no at bo.dy> menulis:
|
| > Also, in your example you'd probably want to use search() rather than
| > match(), which only succeeds if the regexp matches the beginning of the
| > string (i.e. it acts as if there is an implicit '^' at the beginning of
| > the regexp).
|
| hm. any difference in performance, or can i just as well make the caret
| explicit and standardize myself on match?

Er, what I meant was that match() only succeeds if the regexp matches
the beginning of the string, while search() will succeed if the regexp
matches anywhere in the string. I doubt there's any difference in
performance.
--
cliff crawford -><- http://www.people.cornell.edu/pages/cjc26/
"But why is it TILED?!?" icq 68165166
nobody
2000-07-24 12:39:22 UTC
Permalink
cjc26 at nospam.cornell.edu (Cliff Crawford), in
Oh, right..I should've explained it better :) A compiled regexp is an
object which has match() and search() methods, so instead of
match=re.match(reg_c, test)
you want to do
match=reg_c.match(test)
oy. i knew i was stupid, but now i know just *how* stupid. thanks, i
guess. X-)
Also, in your example you'd probably want to use search() rather than
match(), which only succeeds if the regexp matches the beginning of the
string (i.e. it acts as if there is an implicit '^' at the beginning of
the regexp).
hm. any difference in performance, or can i just as well make the caret
explicit and standardize myself on match?
David Bolen
2000-07-23 23:28:33 UTC
Permalink
Post by Donn Cave
perhaps. unfortunately, i haven't had time to work on this since i made
that posting, so i have got no further; i expect there's some fairly easy
way to write (copy) all the lines from one file to another one, possibly
(though not necessarily) stdout. perl is nice in that this task is a one-
liner in that language; i don't *need* it that concise, but it would be
nice. i like one-liners.
As long as you're willing to think in terms of "lines" (which does
technically involve some "cleverness" or intelligence on the I/O
part), then this sort of thing is probably the most straight forward,
assuming stdin->stdout:

import sys

while 1:
line = sys.stdin.readline()
if not line: break
sys.stdout.write(line)

The overhead compared to the Perl example is just about the same as
just processing input (e.g., the "while 1" idiom - the extra operation
to dump to stdout is only one line, just as with Perl), or if you
switch to fileinput:

import fileinput

for line in fileinput.input():
sys.stdout.write(line)

which aside from the import, is pretty much as brief as the Perl
approach :-) To be honest though, while I'm as much a fan of brevity
and one-liners as the next guy, they do carry some risk sometimes in
terms of encapsulating implicit behavior that can sometimes make the
code less readable.

Now, this does assume a line formatted setup, which I think is also
assumed by the Perl example. Both the "while <FILE>" in Perl, and the
"readline" in Python read up until the end of line, and include the
newline in the returned string. So there is some non-raw processing
going on. Just as with "print" under Perl, Python's "write()"
file method just dumps the contents of the string to the file, any
newline already in the string included.

But if you really wanted something closer to a raw copy, with no
assumptions about a format of lines, then you could switch to using a
read() method rather than readline(), at least in the first approach
above.
Post by Donn Cave
that's trying to be clever. then, if i'm reading raw lines from a file and
want to print them verbatim, i have to strip a newline somewhere somehow?
how do i copy a file to another one, is there a file.copy method in some
module somewhere?
BTW, I do agree that "print" has a number of clever items to it as a
statement, and that's fine - it's how it is defined, and it's very
convenient interactively and even in scripts. It just doesn't match
the definition of Perl's "print", despite the same name.

The file write() method is the closest match to Perl's "print FILE"
statement - just use "sys.stdout.write()" where you'd leave off FILE
in Perl. If you don't like the length of the name, just assign it to
any name you'd like in your script and use that instead :-)

Where a typical incorporation of variables into Perl's print statement
is just embedding $variable in the string, with Python you would use
the string formatting operations to build up the appropriate string to
send to write(), say something like ("%s is %d" % (var1,var2)). If
you use the dictionary approach ("%(name)") then it doesn't even read
all that differently from Perl :-)

Just to note too - you mention reading "raw lines" - that's sort of
oxymoronic, since by definition "lines" have format information (end
of line markers that differ by platform) that have to be processed,
and thus you aren't really reading anything "raw" :-)

Oh BTW, if you really just wanted a file copy, then yes, you could
also use the higher level copy() method in the shutil module.

--
-- David
--
/-----------------------------------------------------------------------\
\ David Bolen \ E-mail: db3l at fitlinxx.com /
| FitLinxx, Inc. \ Phone: (203) 708-5192 |
/ 860 Canal Street, Stamford, CT 06902 \ Fax: (203) 316-5150 \
\-----------------------------------------------------------------------/
Jason F. McBrayer
2000-07-22 17:15:38 UTC
Permalink
n> cjc26 at nospam.cornell.edu (Cliff Crawford), in
Post by Donn Cave
Post by Donn Cave
i despise machines trying to second- guess my intentions, and silly
little print statements trying to be clever with their arguments are
nothing more than that.
print doesn't really try to be clever with its arguments, except for
printing a space between each one and a newline at the end.
n> that's trying to be clever. then, if i'm reading raw lines from a file and
n> want to print them verbatim, i have to strip a newline somewhere somehow?
n> how do i copy a file to another one, is there a file.copy method in some
n> module somewhere?

There is, but that's not important here. What you really want
is to not use print; use the write method of a file object (such as
sys.stderr or sys.stdout). That doesn't try to be clever with its
arguments like print does (python's print is not perl's print).
--
+-----------------------------------------------------------+
| Jason F. McBrayer jmcbray at carcosa.net |
| A flower falls, even though we love it; and a weed grows, |
| even though we do not love it. -- Dogen |
Cliff Crawford
2000-07-23 14:43:10 UTC
Permalink
* nobody <no at bo.dy> menulis:
|
| >> i like it, i just wish its functions would be more consistent about
| >> what sort of regexps they want - either all compiled or all not
| >> compiled; i'm seeing some wanting one and some the other, for some
| >> reason. might be just my system, i suppose...
|
| > The two should be interchangable..AFAIK wherever you can use a compiled
| > regexp, you can use an uncompiled one, and vice-versa.
|
| that's what the documentation claims, but here's what i get:
|
| >>> match=re.match(reg_c, test)
| [...]
| TypeError: argument 1: expected string, instance found
| >>> match=re.match(reg,test)
| >>>
|
| the compiled one works in re.findall, though. i've no idea why this
| bites me in this way, but it's not too hard to work around, so...

Oh, right..I should've explained it better :) A compiled regexp is an
object which has match() and search() methods, so instead of

match=re.match(reg_c, test)

you want to do

match=reg_c.match(test)

Also, in your example you'd probably want to use search() rather than
match(), which only succeeds if the regexp matches the beginning of the
string (i.e. it acts as if there is an implicit '^' at the beginning of
the regexp).
--
cliff crawford -><- http://www.people.cornell.edu/pages/cjc26/
Synaesthesia now! icq 68165166
Donn Cave
2000-07-21 16:36:18 UTC
Permalink
Quoth "Roger Upole" <rupole at compaq.net>:
| I must take exception to your insistence that
| "Everything should be done ONCE, and only ONCE. In ONE
| place in the code."
| Besides sounding didactic, it is also completely false. In many cases,
| an initial read is necessary to determine how (or even if) the rest of a
| file will be processed. Also, different code may need to be called if
| the file is empty.
| Additionally, this overly simplistic programming style would be completely
| inadequate for most involved programming tasks.

Whatever. You do what you want. But when that means stuff like this,

|>> f=open('filename','r')
|>> fline = f.readline()
|>> while fline:
|>> ....
|>> fline = f.readline()

eventually you're going to have a debugging problem when you change that
function at one site and neglect to make the same change at the other.
That was the context for that statement.

If you have to sort this out here before you can go on to do anything
today, clarity may come if you look at it this way: what is a "thing",
when we say "everything should be done once?" Does that mean f.readline(),
i.e., f.readline() can be called in only one place? If we have two
places in our program that test __name__ == '__main__', should those
two be consolidated into one? What about "import sys"?

As stupid as that would be, to write code like the example above just
to avoid a "while 1" is worse, because at least our "one" philosophy
has an explanation.

If you're looking for a sensible statement that you wouldn't have to
take exception to, what if a "thing" is a section of code that if
duplicated would certainly be subject to the same change issues in
each instance. I.e., if changed at site 1 it would certainly have
to be changed in the same way at site 2. Where that's not the case,
we are evidently talking about a different "thing", something that
looks the same but has a different meaning. No one would want to
argue that two things with different meanings should be forced
together, only when the meaning is the same.

Donn Cave, donn at u.washington.edu
Roger Upole
2000-07-20 22:51:10 UTC
Permalink
I must take exception to your insistence that
"Everything should be done ONCE, and only ONCE. In ONE
place in the code."
Besides sounding didactic, it is also completely false. In many cases,
an initial read is necessary to determine how (or even if) the rest of a
file will be processed. Also, different code may need to be called if
the file is empty.
Additionally, this overly simplistic programming style would be completely
inadequate for most involved programming tasks.
Roger Upole

"Alex Martelli" <alex at magenta.com> wrote in message
Post by Alex Martelli
"Roger Upole" <rupole at compaq.net> wrote in message
Using an initial read is a common enough idiom in any language.
It is extremely rare in languages which allow assignment in
expressions, and would normally denote lack of familiarity
with the language's idioms.
f=open('filename','r')
fline = f.readline()
....
fline = f.readline()
Everything should be done ONCE, and only ONCE. In ONE
place in the code. This expands the abstract operation
"get next thingy if any" in two places, just because of a
language quirk.
line = self.source.readline()
self.line = line
return line # may be empty, thus false
self.line = self.source.readline()
return self.line
Seems to have exactly the same semantics; the local-variable
line does not appear to be playing any role.
could somebody enlighten me, please? and is there any easier way to
iterate over lines in a file without resorting to ugly break statements?
import fileinput
# do whatever you wish with line
Alex
nobody
2000-07-21 03:57:00 UTC
Permalink
"Roger Upole" <rupole at compaq.net>, in
I must take exception to your insistence that
"Everything should be done ONCE, and only ONCE. In ONE
place in the code."
Besides sounding didactic, it is also completely false. In many cases,
an initial read is necessary to determine how (or even if) the rest of a
file will be processed.
true, but i think you will find that most of these cases are reading a
header of (usually) an entirely different format from the rest of the
data. doing things as few times as possible, in as few places as
possible, is nothing more than simple abstraction - or, as i like to
think about it, code hygiene.

now, this newbie has run into another perl idiom he'd like to figure out
how to rewrite in python - "while (<FILE>) { print; }" - and, by extension,
how to get python to print the string i hand it, the whole string i hand it,
and nothing but the string i hand it? i despise machines trying to second-
guess my intentions, and silly little print statements trying to be clever
with their arguments are nothing more than that.

even so, python is still worth trying to learn, IMO. its ways of dealing
with regexps and substitution might not be as handy and convenient as
perl's operators, but the re module seems to me to be more "programmable"
somehow, easier to write code around. i like it, i just wish its functions
would be more consistent about what sort of regexps they want - either all
compiled or all not compiled; i'm seeing some wanting one and some the
other, for some reason. might be just my system, i suppose...
Cliff Crawford
2000-07-21 22:14:55 UTC
Permalink
* nobody <no at bo.dy> menulis:
|
| now, this newbie has run into another perl idiom he'd like to figure out
| how to rewrite in python - "while (<FILE>) { print; }" - and, by extension,
| how to get python to print the string i hand it, the whole string i hand it,
| and nothing but the string i hand it?

Hmm..not quite sure what you mean..maybe you want to use
sys.stdout.write() instead?


| i despise machines trying to second-
| guess my intentions, and silly little print statements trying to be clever
| with their arguments are nothing more than that.

print doesn't really try to be clever with its arguments, except for
printing a space between each one and a newline at the end. Maybe what
you're looking for is the format operator?

var=14
print "%s=%d" % ("var", var)


| even so, python is still worth trying to learn, IMO. its ways of dealing
| with regexps and substitution might not be as handy and convenient as
| perl's operators, but the re module seems to me to be more "programmable"
| somehow, easier to write code around.

Been a while since I've done perl, but I seem to remember it being
difficult to treat regexps as objects in themselves..for example, it's
hard to store a list of them, and then compare a string to each one in
the list later.


| i like it, i just wish its functions
| would be more consistent about what sort of regexps they want - either all
| compiled or all not compiled; i'm seeing some wanting one and some the
| other, for some reason. might be just my system, i suppose...

The two should be interchangable..AFAIK wherever you can use a
compiled regexp, you can use an uncompiled one, and vice-versa.
--
cliff crawford -><- http://www.people.cornell.edu/pages/cjc26/
Synaesthesia now! icq 68165166
Alex Martelli
2000-08-05 16:11:58 UTC
Permalink
"gbp" <gpepice1 at nycap.rr.com> wrote in message
That person is coming from a Perl background and in Perl all those
issues are taken care of automatically. Perl would not crash if given
an empty file.
I'm not quite sure what this comment is about -- presumably it's
an initial read is necessary to determine how (or even if) the rest of a
file will be processed. Also, different code may need to be called if
the file is empty.
And "that person" is supposed to be me, to which Roger was replying -- I
Post by Alex Martelli
# do whatever you wish with line
Now, it's quite true that I also have Perl in my background (as well as
C++, Java, Fortran, C, Pascal, Rexx, IBM/370 BAL, x86 assembler, Icon,
awk, Scheme, Tcl, Sather, Visual Basic, GPL, APL, APL2, and several
other languages, in different amounts...), but exactly because of this
I'm perplexed by your suggestion that (in any language) "all those issues
are taken care of automatically". If an empty file needs to be handled
differently from a non-empty one, or if the first few lines determine
what is to be done with the rest (e.g., the first few lines are to be
taken as 'headers', as for news, e-mails, responses from an HTTP server,
etc), as Roger suggested, then it's going to have to be your program
that 'takes care' of it.

It's not a matter of "crashing". The above-mentioned Python idiom
will not 'crash' if myfile.txt is empty, just as the Perl equivalent
won't; but each will simply call zero times the "do whatever" part.
If this "natural extrapolation" is not what is desired, then _some_
special-purpose test, and if/else construct (or equivalent thereof),
will have to be coded by the programmer, whatever language is in use!


Alex
gbp
2000-08-05 15:12:07 UTC
Permalink
That person is coming from a Perl background and in Perl all those
issues are taken care of automatically. Perl would not crash if given
an empty file.
I must take exception to your insistence that
"Everything should be done ONCE, and only ONCE. In ONE
place in the code."
Besides sounding didactic, it is also completely false. In many cases,
an initial read is necessary to determine how (or even if) the rest of a
file will be processed. Also, different code may need to be called if
the file is empty.
Additionally, this overly simplistic programming style would be completely
inadequate for most involved programming tasks.
Roger Upole
"Alex Martelli" <alex at magenta.com> wrote in message
Post by Alex Martelli
"Roger Upole" <rupole at compaq.net> wrote in message
Using an initial read is a common enough idiom in any language.
It is extremely rare in languages which allow assignment in
expressions, and would normally denote lack of familiarity
with the language's idioms.
f=open('filename','r')
fline = f.readline()
....
fline = f.readline()
Everything should be done ONCE, and only ONCE. In ONE
place in the code. This expands the abstract operation
"get next thingy if any" in two places, just because of a
language quirk.
line = self.source.readline()
self.line = line
return line # may be empty, thus false
self.line = self.source.readline()
return self.line
Seems to have exactly the same semantics; the local-variable
line does not appear to be playing any role.
could somebody enlighten me, please? and is there any easier way to
iterate over lines in a file without resorting to ugly break
statements?
Post by Alex Martelli
import fileinput
# do whatever you wish with line
Alex
Bjorn Pettersen
2000-07-22 15:35:40 UTC
Permalink
Preventing world war III, when Python achieves world domination.
If C is allowed to achieve world domination, WWIII will be cause by
if(everything_is_ok = 0) {
launch_missile();
}
No, that will *prevent* WW3 happening when it was supposed to.
Where do I sign up for the campaign to help C dominate the
world? :-)
You probably need to reread the condition.
Probably not... Since 0 (zero) is false, and the result of an assignment
expression in the rhs, you would never execute the block...

-b
Bjorn Pettersen
2000-07-22 16:06:31 UTC
Permalink
Supposing it evaluates to 0 (false). Now, everything_is_ok is equal to
zero, so the next time you evaluate it...
In this code sample, the same thing... but I see your point :-)

-b
--
----------------------------------------------------------------------
Olivier A. Dagenais - Carleton University - Computer Science III
"Bjorn Pettersen" <bjorn at roguewave.com> wrote in message
Post by Bjorn Pettersen
Preventing world war III, when Python achieves world domination.
If C is allowed to achieve world domination, WWIII will be cause by
if(everything_is_ok = 0) {
launch_missile();
}
No, that will *prevent* WW3 happening when it was supposed to.
Where do I sign up for the campaign to help C dominate the
world? :-)
You probably need to reread the condition.
Probably not... Since 0 (zero) is false, and the result of an assignment
expression in the rhs, you would never execute the block...
-b
--
http://www.python.org/mailman/listinfo/python-list
Olivier Dagenais
2000-07-22 15:41:49 UTC
Permalink
Supposing it evaluates to 0 (false). Now, everything_is_ok is equal to
zero, so the next time you evaluate it...

--
----------------------------------------------------------------------
Olivier A. Dagenais - Carleton University - Computer Science III


"Bjorn Pettersen" <bjorn at roguewave.com> wrote in message
Post by Bjorn Pettersen
Preventing world war III, when Python achieves world domination.
If C is allowed to achieve world domination, WWIII will be cause by
if(everything_is_ok = 0) {
launch_missile();
}
No, that will *prevent* WW3 happening when it was supposed to.
Where do I sign up for the campaign to help C dominate the
world? :-)
You probably need to reread the condition.
Probably not... Since 0 (zero) is false, and the result of an assignment
expression in the rhs, you would never execute the block...
-b
David Bolen
2000-07-20 01:38:28 UTC
Permalink
could somebody enlighten me, please? and is there any easier way to
iterate over lines in a file without resorting to ugly break statements?
Not without duplicating your actual I/O operation (the only real way
to avoid a break is to compare the current I/O within the expression
of the loop, so you have to load a value to check both prior to the
loop and then within the loop for the next cycle).

The basic Python approach to iteratiing over a file would be:

while 1:
line = file.readline()
if not line: break

(... operations to perform ...)

Yes, it has a break in it (although I personally don't consider
'break' ugly - break and continue are often the most elegant way to
handle flow), and yes, it seems clumsy to those of us used to
assignments within expressions, but its the sort of thing you just
acknowledge and move on - it's really not that big a deal.

The class you found wraps things up a bit, and effectively hides the
assignment within the class, so you can use the direct expression
approach in your higher level code, but that's about it.

There's also a FAQ on this common question, available at:

http://www.python.org/doc/FAQ.html#6.30


--
-- David
--
/-----------------------------------------------------------------------\
\ David Bolen \ E-mail: db3l at fitlinxx.com /
| FitLinxx, Inc. \ Phone: (203) 708-5192 |
/ 860 Canal Street, Stamford, CT 06902 \ Fax: (203) 316-5150 \
\-----------------------------------------------------------------------/
Remco Gerlich
2000-07-20 06:36:51 UTC
Permalink
(fwiw, my prefeerence for this stems mostly from a liking for brevity;
There's another common idiom noone mentioned yet:

import fileinput

for line in fileinput.input("somefile"):
process(line)

(it opens it for you too)
--
Remco Gerlich, scarblac at pino.selwerd.nl
Hi! I'm a .sig virus! Join the fun and copy me into yours!
nobody
2000-07-21 04:04:32 UTC
Permalink
scarblac-spamtrap at pino.selwerd.nl (Remco Gerlich), in
Post by Alex Martelli
import fileinput
process(line)
ah, many thanks! i'm still spelunking my way around the standard
library; gems like this seem to be lying around all over it. :-)
Alexander Williams
2000-07-20 06:33:21 UTC
Permalink
(fwiw, my preference for this stems mostly from a liking for brevity; the
one real "flaw" in all the python workarounds is verbosity. i tend to think
If you want brevity, there are freeware APL interpreters. If you
actually have to LOOK at code again 6mo later, you'll be very, very
thankful for Python's verbosity.
--
Alexander Williams (thantos at gw.total-web.net) | In the End,
"I think sex is better than logic, | Oblivion
but I can't prove it." | Always
http://www.chancel.org | Wins
nobody
2000-07-21 04:03:05 UTC
Permalink
thantos at chancel.org (Alexander Williams), in
Post by Alexander Williams
(fwiw, my preference for this stems mostly from a liking for brevity;
the one real "flaw" in all the python workarounds is verbosity. i tend
to think
If you want brevity, there are freeware APL interpreters.
perhaps i was not clear; brevity and clarity need not be mutually
exclusive, and both are to me equally important. i like perl's
little idioms because, correctly used, they let me have both;
python seems to let me be clear (well, mostly - i'm having a hard
time seeing where blocks end, i'm noticing...) but not as concise
as i'd like to be.
Post by Alexander Williams
If you
actually have to LOOK at code again 6mo later, you'll be very, very
thankful for Python's verbosity.
no, i won't. i _will_ be grateful that python's syntax and style is
fairly clear, but i will also be wishing it could be as clear in
about one half as many lines and statements. well written perl can be;
please, don't complain about badly written any-language, as i'm not
talking about badly written code.
nobody
2000-07-20 03:13:12 UTC
Permalink
David Bolen <db3l at fitlinxx.com>, in <upuo9h97v.fsf at ctwd0143.fitlinxx.com>:

[...]
line = file.readline() if not line: break
(... operations to perform ...)
Yes, it has a break in it (although I personally don't consider
'break' ugly - break and continue are often the most elegant way to
handle flow), and yes, it seems clumsy to those of us used to
assignments within expressions, but its the sort of thing you just
acknowledge and move on - it's really not that big a deal.
true, it just irked me. being a python newbie i'm running into a lot of
little annoyances like this, nothing major or worth complaining much over,
but details that confuse me nonetheless. python doesn't really fit my
personal taste very well, though better than some languages, and there are
more important things to a language than matters of taste.

(fwiw, my preference for this stems mostly from a liking for brevity; the
one real "flaw" in all the python workarounds is verbosity. i tend to think
faster than i type, even faster than i read, and i like to take in the
concepts of a program at as close to the speed of my mind as possible.)
http://www.python.org/doc/FAQ.html#6.30
thank you most kindly, i had been wondering where the FAQ was hidden!
Moshe Zadka
2000-07-24 12:21:08 UTC
Permalink
Post by nobody
what i was actually trying to do was a prepend-this-string-to-that-file
routine, and i think i got it. the straight translation from perl got
really, really long due to all the try-except pairs; i like exceptions
better than perl's "do {} or die" idiom as they're clearer and seem more
flexible, but they're sure more verbose, too.
As a rule, if you find your code littered with try/except's, you're
probably doing something wrong. They're called exceptions for a reason.
--
Moshe Zadka <moshez at math.huji.ac.il>
There is no IGLU cabal.
http://advogato.org/person/moshez
Moshe Zadka
2000-07-20 18:57:41 UTC
Permalink
...
: Preventing world war III, when Python achieves world domination.
: If C is allowed to achieve world domination, WWIII will be cause by
: if(everything_is_ok = 0) {
: launch_missile();
: }
: Whereas since Python wwill achieve world domination,
: launch_missile()
: Is a syntax error, and wil be detected before it is allowed into the
: automated defense system.
....
return 0
return 1
launch_counterattack()
C won't help you here either:

int check_if_there_is_an_attack()
{
...
}

main()
{
if(check_if_there_is_an_attack)
launch_counterattack();
}

Is just as valid...a pointer to a valid function is never NULL.

--
Moshe Zadka <moshez at math.huji.ac.il>
There is no GOD but Python, and HTTP is its prophet.
http://advogato.org/person/moshez
Konrad Hinsen
2000-07-21 08:31:36 UTC
Permalink
| >
| >
| > block
|
| But that would read the whole thing into memory, and he doesn't want that.
ScientificPython has a special class for text files which allows this:

for line in TextFile('foo'):
...

It also permits the transparent handling of compressed files and URLs
(for reading only) and, under Unix, expands home directory names
(~user). I use this about everywhere where I read data from a text
file; it's nice to be able to use the most general file name notations
everywhere without much effort.

For those who don't want to install a complete package just for this
minor module: you can use it on its own perfectly well, just put the
file Scienticic/IO/TextFile.py somewhere on your PYTHONPATH.
--
-------------------------------------------------------------------------------
Konrad Hinsen | E-Mail: hinsen at cnrs-orleans.fr
Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69
Rue Charles Sadron | Fax: +33-2.38.63.15.17
45071 Orleans Cedex 2 | Deutsch/Esperanto/English/
France | Nederlands/Francais
-------------------------------------------------------------------------------
Thomas Wouters
2000-07-20 12:39:00 UTC
Permalink
while (line = <FILE>) { block; }
block
But that would read the whole thing into memory, and he doesn't want that.
Wouldn't an xreadlines() method on file objects be cool?
There's the 'fileinput' module, which is almost exactly that. (It does some
extra things as well, if you wish ;)
--
Thomas Wouters <thomas at xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!
nobody
2000-07-21 04:13:00 UTC
Permalink
Moshe Zadka <moshez at math.huji.ac.il>, in
<Pine.GSO.4.10.10007200934550.23388-100000 at sundial>:

[...]
Preventing world war III, when Python achieves world domination. If C is
allowed to achieve world domination, WWIII will be cause by
if(everything_is_ok = 0) {
launch_missile();
}
i sort of see the point, when you're speaking of beginner programmers
still confused by the common algebra equal sign and trying hard to
sort out assignment from comparison, but i honestly don't see many
more seasoned programmers making that mistake. maybe i'm just weird
that way. (then again, i keep wondering why native speakers of the
english language misplace the apostrophe. i never make that class of
mistakes either, my brain just isn't prone to it. maybe it's the
learning two other languages before english that inoculated me
somehow... both my other ones deal with that sort of grammar very
differently, so maybe i just don't see the apostrophe as something
confusing because of that. who knows.)

oy. that paragraph very little sense. time to go to bed, i think.
Donn Cave
2000-07-21 17:07:10 UTC
Permalink
Quoth nobody <no at bo.dy>:
| Moshe Zadka <moshez at math.huji.ac.il>, in
| <Pine.GSO.4.10.10007200934550.23388-100000 at sundial>:
|
| [...]
|> Preventing world war III, when Python achieves world domination. If C is
|> allowed to achieve world domination, WWIII will be cause by
|
|> if(everything_is_ok = 0) {
|> launch_missile();
|> }
|
| i sort of see the point, when you're speaking of beginner programmers
| still confused by the common algebra equal sign and trying hard to
| sort out assignment from comparison, but i honestly don't see many
| more seasoned programmers making that mistake. maybe i'm just weird
| that way. (then again, i keep wondering why native speakers of the
| english language misplace the apostrophe. i never make that class of
| mistakes either, my brain just isn't prone to it. maybe it's the
| learning two other languages before english that inoculated me
| somehow... both my other ones deal with that sort of grammar very
| differently, so maybe i just don't see the apostrophe as something
| confusing because of that. who knows.)

For what it's worth, apostrophe errors stand out like a sore thumb
for me, but I have made many =/== mistakes. Even a few in Python,
believe it or not.

Donn Cave, donn at u.washington.edu
PS.

It's a source of constant amazement to me how well some of you
non-native speakers can manage, but I suppose if my Portuguese
got as much practice it would be pretty good too. For the rest
of us native speakers who are wondering what's all this about
apostrophes -- it's really simple!

90% of it is "it's" vs. "its", and when there's time to think
about it we can easily solve this puzzle by analogy: just
change "it" to "he", and consider whether we would write "he's"
or "his". We have the same apostrophe here, and likewise in
"they're", "that's" and so forth: a contraction of two words.

The source of the confusion is a possessive "s" ending that
does take an apostrophe, e.g., "pig's eye". But "its" is
not "it" + that ending. "Its" is a natural pronoun like "their".

Now I suppose we will have no more of this foolishness! And
while we're improving our English - remember, "loose/loosed"
vs. "lose/lost".
Cliff Crawford
2000-07-21 20:32:29 UTC
Permalink
* David Bolen <db3l at fitlinxx.com> menulis:
|
| > readlines() has an optional argument which specifies the approximate
| > number of bytes to read in at a time, rather than the entire file.
| > So something like
| >
| > for line in file.readlines(8192):
| > # process line
| >
| > would only use about 8k of memory.
|
| And only fully process files less than 8K in size. The call to
| file.readlines(8192) returns the list of lines contained within the
| first 8K of the file (approximately), and that's all the 'for' is
| going to iterate over. You have to repeatedly call file.readlines()
| again to keep reading the file, which puts you pretty much back in the
| original readline() mode, just with bigger chunks.

That doesn't seem to be true--readlines() reads the whole file whether
f=open("file.txt")
size=0
... size=size+len(line)
...
size
3509
f.close()
f=open("file.txt")
size=0
... size=size+len(line)
...
size
3509
--
cliff crawford -><- http://www.people.cornell.edu/pages/cjc26/
Synaesthesia now! icq 68165166
David Bolen
2000-07-21 22:58:44 UTC
Permalink
Post by Cliff Crawford
That doesn't seem to be true--readlines() reads the whole file whether
You might want to try a larger sample - readlines() is documented that
the size value is a hint - it might be rounded up to some internal
buffer size, which I could easily imagine as being a few K.

At least on Windows, with Python 1.5.2, it appears to work in
multiples of about 8K. That is, size values are rounded up to the
next multiple of 8K, and the total size of information returned will
be somewhat shy of that multiple depending on line length since it
only returns full lines.

--
-- David
--
/-----------------------------------------------------------------------\
\ David Bolen \ E-mail: db3l at fitlinxx.com /
| FitLinxx, Inc. \ Phone: (203) 708-5192 |
/ 860 Canal Street, Stamford, CT 06902 \ Fax: (203) 316-5150 \
\-----------------------------------------------------------------------/
Cliff Crawford
2000-07-23 14:34:57 UTC
Permalink
* David Bolen <db3l at fitlinxx.com> menulis:
|
| > That doesn't seem to be true--readlines() reads the whole file whether
| > you specify a size argument or not. For example:
|
| You might want to try a larger sample - readlines() is documented that
| the size value is a hint - it might be rounded up to some internal
| buffer size, which I could easily imagine as being a few K.
|
| At least on Windows, with Python 1.5.2, it appears to work in
| multiples of about 8K. That is, size values are rounded up to the
| next multiple of 8K, and the total size of information returned will
| be somewhat shy of that multiple depending on line length since it
| only returns full lines.

Ugh, you're right (I checked fileobject.c to make sure). That makes
readlines() a lot less useful than I thought it was..:(

I guess in that case you would want something like xreadlines() (lazy
readlines) then. Below is a post I saved from a little over a year ago,
which describes one way to implement xreadlines() and other lazy
iterators. (I would've provided a link to Deja instead, but it seems
their "archive" doesn't go back far enough to have this post..:P )
--
cliff crawford -><- http://www.people.cornell.edu/pages/cjc26/
Synaesthesia now! icq 68165166

----------
Cliff Crawford
2000-07-20 20:03:46 UTC
Permalink
* Remco Gerlich <scarblac-spamtrap at pino.selwerd.nl> menulis:
| >
| > The easiest way:
| >
| > for line in file.readlines():
| > block
|
| But that would read the whole thing into memory, and he doesn't want that.
|
| Wouldn't an xreadlines() method on file objects be cool?

readlines() has an optional argument which specifies the approximate
number of bytes to read in at a time, rather than the entire file.
So something like

for line in file.readlines(8192):
# process line

would only use about 8k of memory.
--
cliff crawford -><- http://www.people.cornell.edu/pages/cjc26/
Synaesthesia now! icq 68165166
David Bolen
2000-07-21 18:14:04 UTC
Permalink
(...)
Post by Cliff Crawford
| And only fully process files less than 8K in size. The call to
| file.readlines(8192) returns the list of lines contained within the
| first 8K of the file (approximately), and that's all the 'for' is
| going to iterate over. You have to repeatedly call file.readlines()
| again to keep reading the file, which puts you pretty much back in the
| original readline() mode, just with bigger chunks.
Sure, but it's a useful idiom where the expected file is under
ca. 500 bytes in any sane case. The upper limit keeps the potential
insane case from wiping out the program. (Numbers arbitrary.)
Not an approach we would always want to take, but there could be
a place for it.
Oh sure - it's good as a safety valve - I was just pointing out that
it didn't address the underlying question in the thread which was to
iterate over all the lines of a file, regardless of size. Under the
conditions you note (small file, don't want to risk a large one
causing problems and don't mind ignoring the overflow) it's a
perfectly good idiom and safer than just a blind readlines().

--
-- David
--
/-----------------------------------------------------------------------\
\ David Bolen \ E-mail: db3l at fitlinxx.com /
| FitLinxx, Inc. \ Phone: (203) 708-5192 |
/ 860 Canal Street, Stamford, CT 06902 \ Fax: (203) 316-5150 \
\-----------------------------------------------------------------------/
Donn Cave
2000-07-21 16:44:20 UTC
Permalink
Quoth David Bolen <db3l at fitlinxx.com>:
| cjc26 at nospam.cornell.edu (Cliff Crawford) writes:
|> readlines() has an optional argument which specifies the approximate
|> number of bytes to read in at a time, rather than the entire file.
|> So something like
|>
|> for line in file.readlines(8192):
|> # process line
|>
|> would only use about 8k of memory.
|
| And only fully process files less than 8K in size. The call to
| file.readlines(8192) returns the list of lines contained within the
| first 8K of the file (approximately), and that's all the 'for' is
| going to iterate over. You have to repeatedly call file.readlines()
| again to keep reading the file, which puts you pretty much back in the
| original readline() mode, just with bigger chunks.

Sure, but it's a useful idiom where the expected file is under
ca. 500 bytes in any sane case. The upper limit keeps the potential
insane case from wiping out the program. (Numbers arbitrary.)
Not an approach we would always want to take, but there could be
a place for it.

Donn Cave, donn at u.washington.edu
David Bolen
2000-07-20 21:49:10 UTC
Permalink
Post by Cliff Crawford
readlines() has an optional argument which specifies the approximate
number of bytes to read in at a time, rather than the entire file.
So something like
# process line
would only use about 8k of memory.
And only fully process files less than 8K in size. The call to
file.readlines(8192) returns the list of lines contained within the
first 8K of the file (approximately), and that's all the 'for' is
going to iterate over. You have to repeatedly call file.readlines()
again to keep reading the file, which puts you pretty much back in the
original readline() mode, just with bigger chunks.

--
-- David
--
/-----------------------------------------------------------------------\
\ David Bolen \ E-mail: db3l at fitlinxx.com /
| FitLinxx, Inc. \ Phone: (203) 708-5192 |
/ 860 Canal Street, Stamford, CT 06902 \ Fax: (203) 316-5150 \
\-----------------------------------------------------------------------/
Gareth McCaughan
2000-07-21 21:43:07 UTC
Permalink
Preventing world war III, when Python achieves world domination.
If C is allowed to achieve world domination, WWIII will be cause by
if(everything_is_ok = 0) {
launch_missile();
}
No, that will *prevent* WW3 happening when it was supposed to.
Where do I sign up for the campaign to help C dominate the
world? :-)
--
Gareth McCaughan Gareth.McCaughan at pobox.com
sig under construction
Moshe Zadka
2000-07-21 04:41:36 UTC
Permalink
Post by nobody
Preventing world war III, when Python achieves world domination. If C is
allowed to achieve world domination, WWIII will be cause by
if(everything_is_ok = 0) {
launch_missile();
}
i sort of see the point, when you're speaking of beginner programmers
still confused by the common algebra equal sign and trying hard to
sort out assignment from comparison, but i honestly don't see many
more seasoned programmers making that mistake.
Huh? No programmer I know never made that mistake. Even seasoned
programmers sometimes forget the second "=" in the midst of coding...
(Note that Java partially solved it too)

--
Moshe Zadka <moshez at math.huji.ac.il>
There is no GOD but Python, and HTTP is its prophet.
http://advogato.org/person/moshez
Kirill Simonov
2000-07-20 09:43:05 UTC
Permalink
while (line = <FILE>) { block; }
The easiest way:

for line in file.readlines():
block


--
Kirill
Remco Gerlich
2000-07-20 10:16:33 UTC
Permalink
while (line = <FILE>) { block; }
block
But that would read the whole thing into memory, and he doesn't want that.

Wouldn't an xreadlines() method on file objects be cool?
--
Remco Gerlich, scarblac at pino.selwerd.nl
12:15pm up 136 days, 24 min, 6 users, load average: 0.35, 0.21, 0.15
nobody
2000-07-20 00:14:15 UTC
Permalink
assume i want to iterate a block of code over every line in a text file,
and that i don't want to snarf the whole thing into memory for fear
of coredumps or whatever. in perl (and many others) there is a simple,
common idiom:

while (line = <FILE>) { block; }

this doesn't seem to have a simple, direct analog in python. searching
around on the web i found a solution at faqts.com (though this newbie
might like to see it explained, but whatever works):

class Reader:
def __init__(self, source):
self.source = source
def readline(self):
line = self.source.readline()
self.line = line
return line # may be empty, thus false

file = Reader(open("filename")) # i might have got this worng...?

while (file.readline()):
line = file.line

now, this seems to me like an awful lot of typing just to get around
the fact that assignments in python do not seem to be expressions
returning the value assigned. since that is thus in several other
languages, and since it gives rise to several common idioms of this
type, i can only assume that there must be some good reason for
breaking this pattern in python; i just can't see what it could
possibly be.

could somebody enlighten me, please? and is there any easier way to
iterate over lines in a file without resorting to ugly break statements?
Roger Upole
2000-07-20 00:32:13 UTC
Permalink
Using an initial read is a common enough idiom in any language.

f=open('filename','r')
fline = f.readline()
while fline:
....
fline = f.readline()

Roger Upole

"nobody" <no at bo.dy> wrote in message
Post by nobody
assume i want to iterate a block of code over every line in a text file,
and that i don't want to snarf the whole thing into memory for fear
of coredumps or whatever. in perl (and many others) there is a simple,
while (line = <FILE>) { block; }
this doesn't seem to have a simple, direct analog in python. searching
around on the web i found a solution at faqts.com (though this newbie
self.source = source
line = self.source.readline()
self.line = line
return line # may be empty, thus false
file = Reader(open("filename")) # i might have got this worng...?
line = file.line
now, this seems to me like an awful lot of typing just to get around
the fact that assignments in python do not seem to be expressions
returning the value assigned. since that is thus in several other
languages, and since it gives rise to several common idioms of this
type, i can only assume that there must be some good reason for
breaking this pattern in python; i just can't see what it could
possibly be.
could somebody enlighten me, please? and is there any easier way to
iterate over lines in a file without resorting to ugly break statements?
Olivier Dagenais
2000-07-20 00:51:20 UTC
Permalink
You can get rid of the initial read if you have a function that returns
true/false and one of the parameters is passed by reference to be the actual
item. I'm not sure Python does by-reference arguments, though...

For example (I'm not sure this works, think of it as a pseudo-example...):

def enumerate ( retval ):
if outOfElements():
return 0
else:
retval = nextElement()
return 1

Then you call it like so:

while enumerate ( currentValue ):
print currentValue


--
----------------------------------------------------------------------
Olivier A. Dagenais - Carleton University - Computer Science III


"Roger Upole" <rupole at compaq.net> wrote in message
Using an initial read is a common enough idiom in any language.
f=open('filename','r')
fline = f.readline()
....
fline = f.readline()
Roger Upole
"nobody" <no at bo.dy> wrote in message
Post by nobody
assume i want to iterate a block of code over every line in a text file,
and that i don't want to snarf the whole thing into memory for fear
of coredumps or whatever. in perl (and many others) there is a simple,
while (line = <FILE>) { block; }
this doesn't seem to have a simple, direct analog in python. searching
around on the web i found a solution at faqts.com (though this newbie
self.source = source
line = self.source.readline()
self.line = line
return line # may be empty, thus false
file = Reader(open("filename")) # i might have got this worng...?
line = file.line
now, this seems to me like an awful lot of typing just to get around
the fact that assignments in python do not seem to be expressions
returning the value assigned. since that is thus in several other
languages, and since it gives rise to several common idioms of this
type, i can only assume that there must be some good reason for
breaking this pattern in python; i just can't see what it could
possibly be.
could somebody enlighten me, please? and is there any easier way to
iterate over lines in a file without resorting to ugly break statements?
robin
2000-07-21 13:21:27 UTC
Permalink
....
return 0
return 1
launch_counterattack()
For those who remember the seventies... and eighties... and... it's
more like:

def is_there_an_attack():
return launch_attack():

def launch_attack():
if is_there_an_attack():
launch_attack()
return 1

is_there_an_attack()

-----
robin robin at illusionsexeculink.com
media artist / remove illusions to reply
information architect www.execulink.com/~robin/

"I choked on my words but just couldn't spit. It will affect a lot of
physics -- it should have fallen from a bridge, but..." an official
said Wednesday. He was 74.
Grant Edwards
2000-07-22 00:43:41 UTC
Permalink
Isn't the argument for readlines() a memory size, as in 512 bytes? If so,
it is possible to have 3509 lines that are <= 512 bytes.
Last time I counted, a newline took up one byte in a file for
Unix/MacOS and two bytes for MS-DOS/Windows. Therefore, the
upper bound is 512 for number of lines in a 512 byte file.

There may be some degenerate way to put more that 512 lines in
a 512 byte file under one the eight gazillion file formats
supported by VMS, but off the top of my head, I don't think so.

NB: it looked like to me that the size in question was the sum
of the lengths of all of the lines read, and not the number of
lines read.
--
Grant Edwards grante Yow! I'm sitting on my
at SPEED QUEEN... To me,
visi.com it's ENJOYABLE... I'm
WARM... I'm VIBRATORY...
Tim Peters
2000-07-24 01:43:43 UTC
Permalink
[Cliff Crawford]
No, what I meant was that I thought readlines() always reads the entire
file, whether you specify a size argument or not, and so if you were
worried about "for line in file.readlines():" eating too much memory,
you could just do "for line in file.readlines(8192):" instead, and it
would read the entire file using only an 8k buffer. At least, that's
what I thought the library reference was saying..but the source code
proved me wrong :)
Ah. It *is* normally used to read the entire file, but with another level
of loop:

while 1:
# read next batch of lines
lines = file.readlines(8192) # or larger for more speed
if not lines:
break
for line in lines:
process(line)
I do agree, though, that the current behavior of readlines() is quite
useful for it's >intended< purpose ;)
Well, the "hint" argument is there to let you turn yourself into a compiler
<wink>.
Alex Martelli
2000-07-22 07:56:30 UTC
Permalink
"Daley, Mark W" <mark.w.daley at intel.com> wrote in message
Isn't the argument for readlines() a memory size, as in 512 bytes? If so,
it is possible to have 3509 lines that are <= 512 bytes.
How? Each line occupies at least one byte, its \n...


Alex
Moshe Zadka
2000-07-22 06:29:42 UTC
Permalink
Preventing world war III, when Python achieves world domination.
If C is allowed to achieve world domination, WWIII will be cause by
if(everything_is_ok = 0) {
launch_missile();
}
No, that will *prevent* WW3 happening when it was supposed to.
Where do I sign up for the campaign to help C dominate the
world? :-)
You probably need to reread the condition.
--
Moshe Zadka <moshez at math.huji.ac.il>
There is no GOD but Python, and HTTP is its prophet.
http://advogato.org/person/moshez
Gareth McCaughan
2000-07-23 16:51:06 UTC
Permalink
Preventing world war III, when Python achieves world domination.
If C is allowed to achieve world domination, WWIII will be cause by
if(everything_is_ok = 0) {
launch_missile();
}
No, that will *prevent* WW3 happening when it was supposed to.
Where do I sign up for the campaign to help C dominate the
world? :-)
You probably need to reread the condition.
I don't think so. |everything_is_ok| goes to 0 and
stays there, but that's OK because the missiles
can't get launched on account of the condition
being (apart from the assignment) equivalent to
if (0) { ... }

Of course, if somewhere else there's a bit of code
like

if (!everything_is_ok) {
mobilize_armies();
invade_nearest_enemy();
}

then we might get WW3 anyway. :-)

(If your point was that there might be such a piece of
code elsewhere then of course you're right; but I didn't
misread or misunderstand the condition.)
--
Gareth McCaughan Gareth.McCaughan at pobox.com
sig under construction
Cliff Crawford
2000-07-24 01:07:07 UTC
Permalink
* Tim Peters <tim_one at email.msn.com> menulis:
|
| > Ugh, you're right (I checked fileobject.c to make sure). That makes
| > readlines() a lot less useful than I thought it was..:(
|
| How so? It *can't* take the size argument exactly as given, lest it return
| a partial line at the end of the returned list more often than not. Most
| people would consider it "a lot less useful" then <wink>.

No, what I meant was that I thought readlines() always reads the entire
file, whether you specify a size argument or not, and so if you were
worried about "for line in file.readlines():" eating too much memory,
you could just do "for line in file.readlines(8192):" instead, and it
would read the entire file using only an 8k buffer. At least, that's
what I thought the library reference was saying..but the source code
proved me wrong :)

I do agree, though, that the current behavior of readlines() is quite
useful for it's >intended< purpose ;)
--
cliff crawford -><- http://www.people.cornell.edu/pages/cjc26/
Synaesthesia now! icq 68165166
Daley, Mark W
2000-07-21 22:42:17 UTC
Permalink
Isn't the argument for readlines() a memory size, as in 512 bytes? If so,
it is possible to have 3509 lines that are <= 512 bytes.

Just my two cents...

- Mark

-----Original Message-----
From: cjc26 at nospam.cornell.edu [mailto:cjc26 at nospam.cornell.edu]
Sent: Friday, July 21, 2000 1:32 PM
To: python-list at python.org
Subject: Re: iterating over lines in a file


* David Bolen <db3l at fitlinxx.com> menulis:
|
| > readlines() has an optional argument which specifies the approximate
| > number of bytes to read in at a time, rather than the entire file.
| > So something like
| >
| > for line in file.readlines(8192):
| > # process line
| >
| > would only use about 8k of memory.
|
| And only fully process files less than 8K in size. The call to
| file.readlines(8192) returns the list of lines contained within the
| first 8K of the file (approximately), and that's all the 'for' is
| going to iterate over. You have to repeatedly call file.readlines()
| again to keep reading the file, which puts you pretty much back in the
| original readline() mode, just with bigger chunks.

That doesn't seem to be true--readlines() reads the whole file whether
f=open("file.txt")
size=0
... size=size+len(line)
...
size
3509
f.close()
f=open("file.txt")
size=0
... size=size+len(line)
...
size
3509
--
cliff crawford -><- http://www.people.cornell.edu/pages/cjc26/
Synaesthesia now! icq 68165166
Tim Peters
2000-07-23 16:07:35 UTC
Permalink
[Cliff Crawford, reacting to that the optional "size" argument to
readlines() is treated as a hint, & that the implementation rounds
it in platform-specific ways to a "natural" buffer size]
Post by Cliff Crawford
Ugh, you're right (I checked fileobject.c to make sure). That makes
readlines() a lot less useful than I thought it was..:(
How so? It *can't* take the size argument exactly as given, lest it return
a partial line at the end of the returned list more often than not. Most
people would consider it "a lot less useful" then <wink>.
Moshe Zadka
2000-07-20 06:39:05 UTC
Permalink
Post by nobody
this doesn't seem to have a simple, direct analog in python. searching
around on the web i found a solution at faqts.com (though this newbie
self.source = source
line = self.source.readline()
self.line = line
return line # may be empty, thus false
Just put the class in a module you write, and use it everywhere
Post by nobody
now, this seems to me like an awful lot of typing just to get around
the fact that assignments in python do not seem to be expressions
returning the value assigned. since that is thus in several other
languages, and since it gives rise to several common idioms of this
type, i can only assume that there must be some good reason for
breaking this pattern in python; i just can't see what it could
possibly be.
Preventing world war III, when Python achieves world domination.
If C is allowed to achieve world domination, WWIII will be cause by

if(everything_is_ok = 0) {
launch_missile();
}

Whereas since Python wwill achieve world domination,

if everything_is_ok = 0:
launch_missile()

Is a syntax error, and wil be detected before it is allowed into the
automated defense system.
--
Moshe Zadka <moshez at math.huji.ac.il>
There is no GOD but Python, and HTTP is its prophet.
http://advogato.org/person/moshez
Radovan Garabik
2000-07-20 15:24:06 UTC
Permalink
Moshe Zadka <moshez at math.huji.ac.il> wrote:
...

: Preventing world war III, when Python achieves world domination.
: If C is allowed to achieve world domination, WWIII will be cause by

: if(everything_is_ok = 0) {
: launch_missile();
: }

: Whereas since Python wwill achieve world domination,

: if everything_is_ok = 0:
: launch_missile()

: Is a syntax error, and wil be detected before it is allowed into the
: automated defense system.

nah... WWIII will be started with this code:

def check_if_there_is_an_attack():
....
if ....:
return 0
else:
return 1

if check_if_there_is_an_attack:
launch_counterattack()
--
-----------------------------------------------------------
| Radovan Garabik http://melkor.dnp.fmph.uniba.sk/~garabik/ |
| __..--^^^--..__ garabik @ melkor.dnp.fmph.uniba.sk |
-----------------------------------------------------------
Antivirus alert: file .signature infected by signature virus.
Hi! I'm a signature virus! Copy me into your signature file to help me spread!
Continue reading on narkive:
Loading...