Content-Length must be the length of original, before compression.
So try
print "Content-Length: %d" % os.path.getsize("test.html")
Thanks again for the suggestion JJ.
However, my reading of RFC2616 tells me otherwise. Though I could be
wrong, since there is much room for interpretation in HTTP specs.
http://www.ietf.org/rfc/rfc2616.txt
From Section: 7.2.2 Entity Length
The entity-length of a message is the length of the message-body
before any transfer-codings have been applied. Section 4.4 defines
how the transfer-length of a message-body is determined.
AK> So the "entity-length" is the uncompressed length of the file
From Section: 4.4 Message Length
The transfer-length of a message is the length of the message-body
as it appears in the message; that is, after any transfer-codings
have been applied. When a message-body is included with a message,
the transfer-length of that body is determined by one of the
following (in order of precedence):
AK> And the "transfer-length" is the compressed length of the file,
AK> i.e. the length as it appears in the message
1.Any response message which "MUST NOT" include a message-body
(such as the 1xx, 204, and 304 responses
[elided, not relevant]
2.If a Transfer-Encoding header field (section 14.41) is present
[elided, not relevant]
3.If a Content-Length header field (section 14.13) is present,
its decimal value in OCTETs represents both the entity-length
and the transfer-length. The Content-Length header field MUST NOT
be sent if these two lengths are different (i.e., if a
Transfer-Encoding header field is present). If a message is
received with both a Transfer-Encoding header field and a
Content-Length header field, the latter MUST be ignored.
AK> So according to this, since the "entity-length" and the
AK> "transfer-length" are different in this case, I shouldn't be
AK> sending a "Content-length" at all! (Which I tried, and it didn't
AK> work)
4.If the message uses the media type "multipart/byteranges",
[elided, not relevant]
5.By the server closing the connection. (Closing the conn...
[elided, not relevant]
For compatibility with HTTP/1.0 applications, HTTP/1.1 requests
containing a message-body MUST include a valid Content-Length
[elided, relevant to requests only, not responses]
All HTTP/1.1 applications that receive entities MUST accept the
"chunked" transfer-coding (section 3.6), thus allowing this
mechanism to be used for messages when the message length cannot
be determined in advance.
Messages MUST NOT include both a Content-Length header field
and a non-identity transfer-coding. If the message does include
a non-identity transfer-coding, the Content-Length MUST be ignored.
AK> This appears to agree with point 3 above about not sending a
AK> Content-length when the "entity-length" and the "transfer-length"
AK> are different.
When a Content-Length is given in a message where a message-body
is allowed, its field value MUST exactly match the number of OCTETs
in the message-body. HTTP/1.1 user agents MUST notify the user when
an invalid length is received and detected.
AK> This seems to me the conclusive statement. If "Content-length"
AK> is present, it MUST represent the length of the message body, i.e.
AK> the compressed length of the file.
I think that there is confusion around the interpretation of
"Content-length" because of lack of clarity on the difference
between "Content-encoding" and "Transfer-encoding".
"Content-encoding" is supposed to represent the inherent encoding
of the entity (i.e. file) being transferred. Most likely it was
intended to communicate that a file was being sent which was
compressed in some way, and which is permanently exists in
compressed format.
"Transfer-encoding" is supposed to be a transient thing, lasting
only for the duration of the tx/rx of the HTTP message. That is,
it is a mechanism for temporarily encoding (compressing) a file
purely so that it can be transmitted safely or using less
bandwidth.
The difficulty comes in deciding when to use Transfer-encoding
or Content-encoding. For example, If I dynamically generate a HTML
"file", and want to send it compressed, is the compression
inherent to the nature of the "file", or is it merely a transient
thing which is purely to save bandwidth in transmission of the file.
Of course, the answer to this question is decided by the actual
interpretation that people have made in writing their software.
The general consensus seems to be that "Content-encoding" is the
way to go. "Transfer-encoding" seems not to be used.
Or I could be wrong :-)
But of course, none of this helps me with my current problem, since
I have tried all possible combinations of Content-encoding,
Transfer-encoding, Content-length, no Content-length, etc,
etc, etc, etc.
I'm not quite sure what to try next.
I think I am either
1. Sending the wrong compressed data
2. Sending correct compressed data in a way that is resulting in
corruption
3. Missing out on some further processing that must be conducted on
the message.
I will get to the bottom of this.
And I will document it so no-one will have to go through this hassle
again.
Regards,
Alan.