Discussion:
Remove whitespaces and line breaks in a XML file
David Vicente
2011-02-07 17:45:31 UTC
Permalink
Hi,



I?m parsing an xml file with xml.etree. It works correctly, but I have a
problem with the text attribute of the elements which should be empty. For
example, in this case:

<book>

<author>Ken<author>

</book>



The text element of ?book? should be empty, but it returns me some
whitespaces and break lines. I can?t get remove these whitespaces without
remove information.



Some idea?



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20110207/3498316f/attachment.html>
Stefan Behnel
2011-02-07 20:54:30 UTC
Permalink
Post by David Vicente
I?m parsing an xml file with xml.etree. It works correctly, but I have a
problem with the text attribute of the elements which should be empty. For
<book>
<author>Ken<author>
</book>
The text element of ?book? should be empty, but it returns me some
whitespaces and break lines. I can?t get remove these whitespaces without
remove information.
Only a DTD (or schema) can provide the information which whitespace in an
XML document is meaningful and which isn't, so there is no generic way to
"do it right", especially not for something as generic as an XML parser.

What may work for you is to check if an Element has children and only
whitespace as text ("not el.text or not el.text.strip()"), and only then
replace it by None.

Stefan
Josh English
2011-02-07 23:46:17 UTC
Permalink
I found the code posted at

http://infix.se/2007/02/06/gentlemen-indent-your-xml

quite helpful in turning my xml into human-readable structures. It works
best for XML-Data.

Josh
David Vicente
2011-02-09 07:23:39 UTC
Permalink
I?ll try to mix it with my code (xml.etree).
Thanks ;)


-----Mensaje original-----
De: python-list-bounces+dvicente=full-on-net.com at python.org
[mailto:python-list-bounces+dvicente=full-on-net.com at python.org] En nombre
de Josh English
Enviado el: martes, 08 de febrero de 2011 0:46
Para: python-list at python.org
CC: python-list at python.org
Asunto: Re: Remove whitespaces and line breaks in a XML file

I found the code posted at

http://infix.se/2007/02/06/gentlemen-indent-your-xml

quite helpful in turning my xml into human-readable structures. It works
best for XML-Data.

Josh
--
http://mail.python.org/mailman/listinfo/python-list
Jean-Michel Pichavant
2011-02-09 16:19:01 UTC
Permalink
Post by Josh English
I found the code posted at
http://infix.se/2007/02/06/gentlemen-indent-your-xml
quite helpful in turning my xml into human-readable structures. It works
best for XML-Data.
Josh
It's done in one line with

http://docs.python.org/library/xml.dom.minidom.html#xml.dom.minidom.Node.writexml

JM
Sven
2011-02-11 12:40:14 UTC
Permalink
Post by Josh English
I found the code posted at
http://infix.se/2007/02/06/gentlemen-indent-your-xml
quite helpful in turning my xml into human-readable structures. It works
best for XML-Data.
lxml, which the code is using, already has a pretty_print feature
according to this http://codespeak.net/lxml/tutorial.html which would
do the same, would it not?
--
./Sven
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20110211/aaa88486/attachment.html>
Josh English
2011-02-07 23:46:17 UTC
Permalink
I found the code posted at

http://infix.se/2007/02/06/gentlemen-indent-your-xml

quite helpful in turning my xml into human-readable structures. It works
best for XML-Data.

Josh
Loading...