Discussion:
using urllib or httplib to post with ENCTYPE=mulitpart/form-data
Kevin Carlson
2003-06-05 14:21:02 UTC
Permalink
I actually only need to send a couple of text inputs to the form, but no
matter what, the host system acts like I input nothing into the form. I
Maybe the form has some Javascript that validates the data and invokes a
secret handshake, like sending extra hidden fields or returning a cookie?
Thanks for help on this one -- I got the first POST working. It was a
problem with formatting the multipart/form-data. Once I got the proper
format it worked fine. Interesting you should bring up cookies, though...

The response from the POST returns a cookie that I need to pass along to
the host on the subsequent request. I don't see and methods in httplib
to deal with cookies directly, nor can I find any in the HTTPResponse.py
source that allow me to extract the cookie from the response. I'm sure
I'm missing something. Any additional pointers?

Kevin
Kevin Carlson
2003-06-04 20:28:20 UTC
Permalink
Hi,

I am trying to post to a form using httplib or urllib. I have done this
successfully before with httplib, but when the enctype must be
multipart/form-data, things go awry.

I tried using a header {'Content-Type' : 'multipart/form-data', ...}
that I encode with urlencode and then pass to the HTTPConnection.request
method, but the server is returning a code indicating a bad request. I
have done a lot of troubleshooting on this and the encoding type is the
only remaining issue.

Can anyone shed some light on how to accomplish a POST of this type?

Thanks,

Kevin
John J. Lee
2003-06-05 18:06:49 UTC
Permalink
Kevin Carlson <khcarlso at bellsouth.net> writes:
[...]
Post by Kevin Carlson
I actually only need to send a couple of text inputs to the form,
but no matter what, the host system acts like I input nothing into
[...]
Post by Kevin Carlson
Thanks for help on this one -- I got the first POST working. It was a
problem with formatting the multipart/form-data. Once I got the
proper format it worked fine. Interesting you should bring up
cookies, though...
And as I just replied to Kevin, though he successfully solved the
problem by sniffing browser traffic (certainly often the quickest way
to get something working, though things can get messy that way), I
just realised that, contrary to my own statment, ClientForm *would*
have worked (barring bugs...). I forgot that the whole form gets sent
back as multipart/form-data. You don't need any FILE control for it
to work, and as long as the HTML says to use that encoding (or you ask
for it explicitly), it should work fine (the alpha version, that is).
Post by Kevin Carlson
The response from the POST returns a cookie that I need to pass along
to the host on the subsequent request. I don't see and methods in
httplib to deal with cookies directly, nor can I find any in the
HTTPResponse.py source that allow me to extract the cookie from the
response. I'm sure I'm missing something. Any additional pointers?
http://wwwsearch.sourceforge.net/ClientCookie/

:-)

If this is a public site, please let me know the URL: I haven't found
a public site that doesn't require a signup and that uses both cookies
and forms that would be good for example code (not that I've looked
very hard...).


John
John J. Lee
2003-06-09 20:10:00 UTC
Permalink
Kevin Carlson <nskhcarlso at bellsouth.net> writes:
[...]
Is there any way to use ClientCookie with httplib rather that urllib
or urllib2?
No reason why not, but IMHO perverse. :-)

You just have to call extract_cookies and add_cookie_header manually,
which means you need to make a couple of objects that satisfy the
request and response interfaces specified in the docstrings for those
methods (easy enough I guess, though I haven't done it myself):

from ClientCookie import Cookies
print Cookies.extract_cookies.__doc__
print Cookies.add_cookie_header.__doc__

I just released a 0.4.1a version. Despite the alpha status, I
recommend using that if you're writing new code (or even if you're
not, probably).


John
John J. Lee
2003-06-10 12:30:42 UTC
Permalink
jjl at pobox.com (John J. Lee) writes:
[...]
- req_domain, req_path = urlparse.urlparse(request,get_full_url())
+ req_domain, req_path = urlparse.urlparse(request,get_full_url())[1:3]
[...]


John
Kevin Carlson
2003-06-09 20:42:02 UTC
Permalink
Post by John J. Lee
[...]
No reason why not, but IMHO perverse. :-)
I agree, it is easier to use urllib, but in this case I have to post
data in multipart/form-data which means MIME. I can build the proper
request and send it using httplib, but haven't been able to figure out
how to do this with urllib.

If I create the MIME data as 'data', and them post the request using
urllib.request(URL, data), it doesn't seem to post the form correctly.
If I do the same with httplib it works fine. Am I missing something?

Thanks for the help!
John J. Lee
2003-06-10 11:33:07 UTC
Permalink
Post by Kevin Carlson
Post by John J. Lee
[...]
No reason why not, but IMHO perverse. :-)
I agree, it is easier to use urllib, but in this case I have to post
Easier still to use urllib2. AFAICS, urllib2 is better than urllib,
period. I don't think it makes sense to use urllib for new code --
perhaps it should be deprecated.

I suspect there's a perception amongst Python web-scrapers that using
urllib2 means you're stuck when you get down to the nitty gritty of
HTTP headers, but that's not true. Not sure how true that is of
urllib, because I've rarely used it.
Post by Kevin Carlson
data in multipart/form-data which means MIME. I can build the proper
request and send it using httplib, but haven't been able to figure out
how to do this with urllib.
Like I said, ClientForm (0.1.2a) should do this automatically (though
I admit the multipart/form-data stuff is dodgy ATM). Regardless of
presence or absence of file upload controls.

response = urllib2.urlopen("http://www.example.com/")
forms = ClientForm.ParseResponse(response)
form = forms[0]
response2 = urllib2.urlopen(form.click())
Post by Kevin Carlson
If I create the MIME data as 'data', and them post the request using
urllib.request(URL, data), it doesn't seem to post the form
correctly. If I do the same with httplib it works fine. Am I missing
something?
The third argument to the urllib2.Request constructor (or the
urllib2.Request.add_header method)?

request = urllib2.Request(url, data, headers)
response = urllib2.urlopen(request)
print response.read()


Obviously though, I wouldn't do that myself -- I'd use ClientForm :-)

BTW, note that urllib2 in 2.3b1 is broken (whose fault could that be,
I wonder?).


John
Kevin Carlson
2003-06-09 21:17:03 UTC
Permalink
Never mind. I don't think the ClientCookie is going to help me here.
As someone had eluded earlier in this thread, the cookie is not coming
back as a header. I was looking through the headers and couldn't see a
Set-Cookie header -- thought I was going crazy.

As it turns out, one of the pages in the frameset contains Javascript
that creates the cookie. I can parse that for the Cookie and send it
back in subsequent requests.

Thanks for everyone's help!
Post by John J. Lee
[...]
Is there any way to use ClientCookie with httplib rather that urllib
or urllib2?
No reason why not, but IMHO perverse. :-)
You just have to call extract_cookies and add_cookie_header manually,
which means you need to make a couple of objects that satisfy the
request and response interfaces specified in the docstrings for those
from ClientCookie import Cookies
print Cookies.extract_cookies.__doc__
print Cookies.add_cookie_header.__doc__
I just released a 0.4.1a version. Despite the alpha status, I
recommend using that if you're writing new code (or even if you're
not, probably).
John
John J. Lee
2003-06-10 12:14:21 UTC
Permalink
Post by Kevin Carlson
Never mind. I don't think the ClientCookie is going to help me
here. As someone had eluded earlier in this thread, the cookie is not
coming back as a header. I was looking through the headers and
couldn't see a Set-Cookie header -- thought I was going crazy.
As it turns out, one of the pages in the frameset contains Javascript
that creates the cookie. I can parse that for the Cookie and send it
back in subsequent requests.
[...]

Yes, that's a pain. I'm contemplating writing a Javascript/Python
thing that would solve some of these problems for HTTP cookies and
HTML forms. Vapourware.

Anyway, there is still some point in ClientCookie even for people who
have Javascript cookies, though. ATM, it knows which cookies to
accept and return to the server, and can do that for all requests --
so you just have to set a cookie once, then forget about it and just
call urlopen (or OpenerDirector.open). In future, I imagine it will
give you access to a proper constructor for Cookie structs.

import urllib2, urlparse

import ClientCookie # 0.4.1a required

cs = ClientCookie.Cookies()
opener = ClientCookie.build_opener(ClientCookie.HTTPHandler(c))

request = urllib2.Request("http://www.example.com")
response = opener.open(request)
cookies = your_cookie_parsing_routine(request, response)
for cookie in cookies:
cs.set_cookie_if_ok(cookie, request)

# go on using opener.open() -- cookies get sent back when they should be
# (and any that arrive via the traditional HTTP route will be set, if
# appropriate)
opener.open("http://www.example.com/blah.html")
opener.open("http://www.example.com/foo.html")


def your_cookie_parsing_routine(request, response):
data = response.read()

# Parse name, value, domain, path from data (ie. the HTML) here.

# If domain and path are not specified, set the domain_specified and
# path_specified flags accordingly, and use the request values:

req_domain, req_path = urlparse.urlparse(request,get_full_url())
if not domain_specified:
domain = req_domain
if not path_specified:
path = req_path

# Create a Cookie struct:
# (ClientCookie will aquire a better way of doing this eventually)
c = Cookie(0, name, value,
None, 0,
domain, domain_specified, domain.startswith("."),
path, path_specified,
secure, # true if must only be sent back via https
time.time()+(3600*24*365), # expires
0, "", "", {})

return [c] # you may be parsing out more than one, of course


Hmm, probably ClientCookie's cookie parsers and constructors should be
factored out of Cookies, so you can supply your own parser and not
need to construct Cookie structs by hand... more refactoring ahead
(but no more interface changes, I hope!).


John
Kevin Carlson
2003-06-08 21:08:55 UTC
Permalink
Post by John J. Lee
[...]
Post by Kevin Carlson
The response from the POST returns a cookie that I need to pass along
to the host on the subsequent request. I don't see and methods in
httplib to deal with cookies directly, nor can I find any in the
HTTPResponse.py source that allow me to extract the cookie from the
response. I'm sure I'm missing something. Any additional pointers?
http://wwwsearch.sourceforge.net/ClientCookie/
Is there any way to use ClientCookie with httplib rather that urllib or
urllib2?
Ng Pheng Siong
2003-06-05 05:45:50 UTC
Permalink
I actually only need to send a couple of text inputs to the form, but no
matter what, the host system acts like I input nothing into the form. I
Maybe the form has some Javascript that validates the data and invokes a
secret handshake, like sending extra hidden fields or returning a cookie?
--
Ng Pheng Siong <ngps at netmemetic.com>

http://firewall.rulemaker.net -+- Manage Your Firewall Rulebase Changes
http://www.post1.com/home/ngps -+- Open Source Python Crypto & SSL
John J. Lee
2003-06-04 23:58:09 UTC
Permalink
Kevin Carlson <khcarlso at bellsouth.net> writes:
[...]
Post by Kevin Carlson
I am trying to post to a form using httplib or urllib. I have done
this successfully before with httplib, but when the enctype must be
multipart/form-data, things go awry.
I tried using a header {'Content-Type' : 'multipart/form-data', ...}
that I encode with urlencode and then pass to the
HTTPConnection.request method, but the server is returning a code
indicating a bad request. I have done a lot of troubleshooting on
this and the encoding type is the only remaining issue.
Can anyone shed some light on how to accomplish a POST of this type?
Erm, I'm hesitant to suggest this module, because the version that
does INPUT TYPE=FILE file upload -- which I'm guessing is what you
want to do -- is currently still in alpha (I think I've only tested it
against the cgi module and it only does single-file upload -- not
multiple files -- and internally it's clunky, too). But if it doesn't
work, tell me and I'll likely fix it quickly.

Anyway (using urllib2 here for convenience, but you can use urllib
too, with minor changes), untested:

from urllib2 import urlopen
from ClientForm import ParseResponse

forms = ParseResponse(urlopen("http://blah.com/"))
form = forms[0]
# do some form filling, eg.
form["uname"] = "bob"
form["pwd"] = "XXX"
# add file to upload
f = open("/some/file.txt")
form.add_file(f, content_type="text/plain", name="file.txt")

# pass name to click method if there's more than one button
response = urlopen(form.click())

# urllib2's responses are file objects with some extra stuff:
print response2.geturl() # url
print response2.info() # headers
print response2.read(): # body (readline, readlines also work IIRC)


you'd need this:

http://wwwsearch.sourceforge.net/ClientForm/src/ClientForm-0.1.1a.tar.gz

home page of code:

http://wwwsearch.sourceforge.net/ClientForm/

HTH


John
John J Lee
2003-06-05 13:20:46 UTC
Permalink
Post by John J. Lee
Erm, I'm hesitant to suggest this module, because the version that
does INPUT TYPE=FILE file upload -- which I'm guessing is what you
want to do --
Thanks for the response, John. Sorry, I should have been more
specific. I don't need to upload a file in this situation. I need to
post to a form that is hosted by a business partner and they require the
multipart/form-data type.
I suppose I should support this anyway (but won't get around to it for a
while, probably).
I actually only need to send a couple of text inputs to the form, but no
matter what, the host system acts like I input nothing into the form. I
formData = "--AaB03x\r\n"
[...]

I can't remember what it's supposed to look like...
When I examine the data object, the HTML that I receive includes an
error message
that I need to include inputs for field1 and field2.
Any additional ideas?
Maybe the old webunit module will do the trick? I don't remember.

The quickest solution ATM is probably just to watch what your browser does
and copy that. Use ethereal, for example. If it's HTTPS, use lynx -trace
and filter out the junk, or one of the browser plugins that let you sniff
HTTPS traffic (livehttpheaders for Mozilla, http://www.simtec.ltd.uk/ for
MSIE).


John
Andrew Clover
2003-06-05 10:21:07 UTC
Permalink
httpObj.putheader('Content-type', 'multipart/form-data; boundary=--AaB03x')
The boundary paramater should be included in the Content-Type header
without the leading double-hyphen.

(There is no '----AaB03x' in the body, so the submission is malformed.)

HTH!
--
Andrew Clover
mailto:and at doxdesk.com
http://www.doxdesk.com/
Kevin Carlson
2003-06-05 03:07:05 UTC
Permalink
Post by John J. Lee
Erm, I'm hesitant to suggest this module, because the version that
does INPUT TYPE=FILE file upload -- which I'm guessing is what you
want to do --
Thanks for the response, John. Sorry, I should have been more
specific. I don't need to upload a file in this situation. I need to
post to a form that is hosted by a business partner and they require the
multipart/form-data type.

I actually only need to send a couple of text inputs to the form, but no
matter what, the host system acts like I input nothing into the form. I
have tried this:

formData = "--AaB03x\r\n"
formData += 'Content-Disposition: form-data; name="field1"'
formData += "\r\n\r\n"
formData += "somedata\r\n"
formData += "--AaB03x\r\n"
formData += 'Content-Disposition: form-data; name="field2"'
formData += "\r\n\r\n"
formData += "somemoredata\r\n"
formData += "--AaB03x--\r\n"
efd = urllib.quote(formData)

httpObj = httplib.HTTPS(hostName)
httpObj.putrequest("POST", postFormName)
httpObj.putheader('Content-type', 'multipart/form-data;
boundary=--AaB03x')
httpObj.putheader('Accept', '*/*')
httpObj.putheader('Content-length', str(len(formData)))
httpObj.endheaders()
httpObj.send(efd)
err, msg, hdrs = httpObj.getreply()
data = httpObj.getfile().read()

When I examine the data object, the HTML that I receive includes an
error message
that I need to include inputs for field1 and field2.

Any additional ideas?

Loading...