Discussion:
How can I get text of the body (payload) of an email?
andrew blah
2004-10-16 09:00:52 UTC
Permalink
I'm puzzled. Josiah suggested that this would allow me to get the
payload of an email message.

body = message.split('\r\n\r\n', 1)[1]

As I understand it, the headers of an email are terminated by a blank
line, after which comes the message payload. A blank line being
represented by \r\n\r\n

After trying Josiah's above suggestion on many emails and failing to
get it to work, I found that in fact the following works:

self.raw_data.split('\n\n', 1)[0]

But this doesn't agree with my understanding of the RFC822 email
format, which is that the blank line should be represented by \r\n\r\n

Can anyone suggest where my understanding is wrong?
Thanks

Andrew Stuart
Josiah Carlson
2004-10-16 01:44:19 UTC
Permalink
Can anyone suggest a convenient way to get access to the raw message
payload?
body = message.split('\r\n\r\n', 1)[1]

- Josiah
Josiah Carlson
2004-10-16 15:31:56 UTC
Permalink
Post by andrew blah
I'm puzzled. Josiah suggested that this would allow me to get the
payload of an email message.
body = message.split('\r\n\r\n', 1)[1]
As I understand it, the headers of an email are terminated by a blank
line, after which comes the message payload. A blank line being
represented by \r\n\r\n
After trying Josiah's above suggestion on many emails and failing to
self.raw_data.split('\n\n', 1)[0]
But this doesn't agree with my understanding of the RFC822 email
format, which is that the blank line should be represented by \r\n\r\n
Can anyone suggest where my understanding is wrong?
Thanks
Your understanding isn't wrong, but somehow you are acquiring emails
with only line feed line endings. This may be the case of opening a
file and getting universal line-ending support (which tosses '\r'). This
could be the case of some other processing you do perhaps stripping it
out (I don't use the email package, so don't know what it may or may not
be doing).

A known method of normalizing line endings for data that could come from
anywhere is through the use of regular expressions:

email = re.sub('(\r\n|\r|\n)', email_with_ambiguous_line_endings, '\r\n')


If you know your data to be good on disk, perhaps it would be better to
open files as 'rb' to make sure that universal line ending support is
not used.

- Josiah
andrew blah
2004-10-16 01:29:48 UTC
Permalink
Hello,

I need to get the text of the body (the payload) of an email.

As I understand it, an email has headers at the top, then a blank line,
then the body of the message.

I want to get the text of the body - every character from the new line
after the headers until the end of the message.

My objective is to do an SHA hash on the body text so the get_payload
method isn't what I am after.

Can anyone suggest a convenient way to get access to the raw message
payload?

Thanks in advance for your help.

Andrew Stuart
unknown
2004-10-16 01:55:54 UTC
Permalink
Can anyone suggest a convenient way to get access to the raw message
payload?
If you're using the mailbox module, the body text is what you get
from message.fp.read() where message is an rfc822 message object
from reading the mailbox. Is that what you wanted to know?
M.E.Farmer
2004-10-16 22:49:05 UTC
Permalink
Post by andrew blah
I need to get the text of the body (the payload) of an email.
As I understand it, an email has headers at the top, then a blank line,
then the body of the message.
I want to get the text of the body - every character from the new line
after the headers until the end of the message.
[headers]
[blank line]
[body]

You explained how to do it ;)
Post by andrew blah
I want to get the text of the body - every character from the new line
after the headers until the end of the message.
If you just find the first blank line then the next line is the start
of the email body ;)

import poplib
Mail = poplib.POP3('mail.yourserver.net')
Mail.user('username')
Mail.pass_("userpass")
# just get the first message
MyMessage=Mail.retr(1)
FullText=""
PastHeaders=0
for MsgLine in MyMessage[1]:
if PastHeaders==0:
if (len(MsgLine)==0):
PastHeaders = 1
else:
FullText +=MsgLine+'\n'
Mail.quit()
print FullText

This is from Python 2.1 Bible(Dave Brueck,Stephen Tanner);)
That book is an awesome reference still today!
Post by andrew blah
My objective is to do an SHA hash on the body text so the get_payload
method isn't what I am after.
Can anyone suggest a convenient way to get access to the raw message
payload?
Thanks in advance for your help.
HTH,
M.E.Farmer :)
Jeffrey Froman
2004-10-16 15:06:53 UTC
Permalink
Post by andrew blah
I want to get the text of the body - every character from the new line
after the headers until the end of the message.
My objective is to do an SHA hash on the body text so the get_payload
method isn't what I am after.
Funny, I recently undertook the same task. Here's my solution:

msg = email.message_from_string(foo)
x = sha.new()
for line in email.Iterators.body_line_iterator(msg):
x.update(line)
hash = x.digest()

This very cool iterator returns every body line, but skips all the headers,
including the headers present in each sub-part of the email. If you only
want plain text parts, you might combine this iterator with
email.Iterators.typed_subpart_iterator().

Jeffrey

Loading...