Discussion:
How to use win32com to convert a MS WORD doc to HTML ?
Tim Golden
2008-08-19 15:58:54 UTC
Permalink
Hi, all !
I'm a totally newbie huh:)
I want to convert MS WORD docs to HTML, I found python windows
extension win32com can make this. But I can't find the method, and I
can't find any document helpful.
You have broadly two approaches here, both
involving automating Word (ie using the
COM object model it exposes, referred to
in another post in this thread).

1) Use the COM model to have Word load your
doc, and SaveAs it in HTML format. Advantage:
it's relatively straightforward. Disadvantage:
you're at the mercy of whatever HTML Word emits.

2) Use the COM model to iterate over the paragraphs
in your document, emitting your own HTML. Advantage:
you get control. Disadvantage: the more complex your
doc, the more work you have to do. (What do you do with
images, for example? Internal links?)

To do the first, just record a macro in Word to
do what you want and then reproduce the macro
in Python. Something like this:

<code>
import win32com.client

doc = win32com.client.GetObject ("c:/data/temp/songs.doc")
doc.SaveAs (FileName="c:/data/temp/songs.html", FileFormat=8)
doc.Close ()

</code>

To do the second, you have to roll your own html
doc. Crudely, this would do it:

<code>
import codecs
import win32com.client
doc = win32com.client.GetObject ("c:/data/temp/songs.doc")
with codecs.open ("c:/data/temp/s2.html", "w", encoding="utf8") as f:
f.write ("<html><body>")
for para in doc.Paragraphs:
text = para.Range.Text
style = para.Style.NameLocal
f.write ('<p class="%(style)s">%(text)s</p>\n' % locals ())

doc.Close ()

</code>

TJG
Lave
2008-08-19 14:06:13 UTC
Permalink
Hi, all !

I'm a totally newbie huh:)

I want to convert MS WORD docs to HTML, I found python windows
extension win32com can make this. But I can't find the method, and I
can't find any document helpful.

Can anyone help me? Thank in advance.
Grzegorz Staniak
2008-08-20 07:23:40 UTC
Permalink
It's solved.
Thank you all! You saved my life! Thank you very much.
I love you! I love Python!
... I love the whole world, and all its languages,
Boom-de-yada, boom-de-yada, boom-de-yada, boom-de-yada...

Very sorry, but I just couldn't resist.

GS
--
Grzegorz Staniak <gstaniak _at_ wp [dot] pl>
Simon Brunning
2008-08-19 16:01:37 UTC
Permalink
Hi, all !
I'm a totally newbie huh:)
I want to convert MS WORD docs to HTML, I found python windows
extension win32com can make this. But I can't find the method, and I
can't find any document helpful.
This should be a useful starting point:
<http://code.activestate.com/recipes/279003/>.
--
Cheers,
Simon B.
simon at brunningonline.net
http://www.brunningonline.net/simon/blog/
GTalk: simon.brunning | MSN: small_values | Yahoo: smallvalues | Twitter: brunns
Lave
2008-08-20 05:29:17 UTC
Permalink
HUH! :)

It's solved.

Thank you all! You saved my life! Thank you very much.

I love you! I love Python!
Post by Simon Brunning
Hi, all !
I'm a totally newbie huh:)
I want to convert MS WORD docs to HTML, I found python windows
extension win32com can make this. But I can't find the method, and I
can't find any document helpful.
<http://code.activestate.com/recipes/279003/>.
--
Cheers,
Simon B.
simon at brunningonline.net
http://www.brunningonline.net/simon/blog/
GTalk: simon.brunning | MSN: small_values | Yahoo: smallvalues | Twitter: brunns
--
http://mail.python.org/mailman/listinfo/python-list
Reedick, Andrew
2008-08-19 14:15:56 UTC
Permalink
-----Original Message-----
From: python-list-bounces+jr9445=att.com at python.org [mailto:python-
list-bounces+jr9445=att.com at python.org] On Behalf Of Lave
Sent: Tuesday, August 19, 2008 10:06 AM
To: python-list at python.org
Subject: How to use win32com to convert a MS WORD doc to HTML ?
Hi, all !
I'm a totally newbie huh:)
I want to convert MS WORD docs to HTML, I found python windows
extension win32com can make this. But I can't find the method, and I
can't find any document helpful.
Word Object Model:
http://msdn.microsoft.com/en-us/library/bb244515.aspx

Specifically look at Document's SaveAs method.

*****

The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential, proprietary, and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from all computers. GA621
Continue reading on narkive:
Loading...