Discussion:
Case-insensitive XML Parsing
C. R. Sandeep
2002-08-05 11:52:31 UTC
Permalink
Hi,

I am using the xml.dom.minidom module to do some XML parsing. Is
there any way I can set an option to do case-insensitive parsing? I
have a lot of files with inconsistent tag names and I need to parse
them in Python. I checked the Python Global Module Reference but
couldn't find any relevant information. Also, I do not want to use
xmllib at this time (which, I think, supports case-insensitive parsing
- not sure of this though).

Thanks in advance,

Sandeep.
Mark McEahern
2002-08-05 12:28:28 UTC
Permalink
Sorry for omitting context.

Perhaps the OP is parsing HTML and needs something like Tidy (if you're
wondering why I didn't include a link it's for 2 reasons: I'm lazy and
google exists)?

// mark

-
David LeBlanc
2002-08-05 19:35:25 UTC
Permalink
If your tag names are inconsistant, then xml probably won't help.

OTOH, if they differ only by case, then do a prepass on the data to
normalize the case of the tags.

David LeBlanc
Seattle, WA USA
-----Original Message-----
From: python-list-admin at python.org
[mailto:python-list-admin at python.org]On Behalf Of C. R. Sandeep
Sent: Monday, August 05, 2002 4:53
To: python-list at python.org
Subject: Case-insensitive XML Parsing
Hi,
I am using the xml.dom.minidom module to do some XML parsing. Is
there any way I can set an option to do case-insensitive parsing? I
have a lot of files with inconsistent tag names and I need to parse
them in Python. I checked the Python Global Module Reference but
couldn't find any relevant information. Also, I do not want to use
xmllib at this time (which, I think, supports case-insensitive parsing
- not sure of this though).
Thanks in advance,
Sandeep.
--
http://mail.python.org/mailman/listinfo/python-list
Steve Holden
2002-08-05 12:20:57 UTC
Permalink
"C. R. Sandeep" <sandeep at octetsoft.com> wrote in message
Post by C. R. Sandeep
Hi,
I am using the xml.dom.minidom module to do some XML parsing. Is
there any way I can set an option to do case-insensitive parsing? I
have a lot of files with inconsistent tag names and I need to parse
them in Python. I checked the Python Global Module Reference but
couldn't find any relevant information. Also, I do not want to use
xmllib at this time (which, I think, supports case-insensitive parsing
- not sure of this though).
Thanks in advance,
In that case (no pun intended) it isn't XML you are parsing.

regards
-----------------------------------------------------------------------
Steve Holden http://www.holdenweb.com/
Python Web Programming http://pydish.holdenweb.com/pwp/
-----------------------------------------------------------------------
Gillou
2002-08-05 16:47:50 UTC
Permalink
But be warned...

Tidy removes the entity references that are not HTML builtin entities. Even
in XML mode 8((

--Gilles
# Mail...
import base64
base64.decodestring('Z2xlbmZhbnRAYmlnZm9vdC5jb20=\n')

"Mark McEahern" <marklists at mceahern.com> a ?crit dans le message de news:
mailman.1028550629.10096.python-list at python.org...
Post by Mark McEahern
Sorry for omitting context.
Perhaps the OP is parsing HTML and needs something like Tidy (if you're
wondering why I didn't include a link it's for 2 reasons: I'm lazy and
google exists)?
// mark
-
Martin v. Löwis
2002-08-05 13:06:47 UTC
Permalink
Post by C. R. Sandeep
I am using the xml.dom.minidom module to do some XML parsing. Is
there any way I can set an option to do case-insensitive parsing?
Not built-in, no. You could do the following things:

- lower-case the entire input before parsing;
- if you know the complete list of tags in advance:
replace all misspellings of these tags before parsing.

Another option, if you have PyXML, is to use sgmlop. sgmlop's
SGMLParser is case-insensitive. You need to create an sgmlop SAX
driver (i.e. "xml.sax.drivers2.drv_sgmlop"), and pass this to
minidom.parse.

Notice that the parser will then operate in SGML mode, which may or
may not work for your input.

Regards,
Martin
Peter Hansen
2002-08-05 14:29:49 UTC
Permalink
Post by C. R. Sandeep
I am using the xml.dom.minidom module to do some XML parsing. Is
there any way I can set an option to do case-insensitive parsing? I
As Steve said, if the tags are case-insensitive, it's not XML.
In any case, I believe the PyRXP parser by http://www.reportlab.com
has a CaseInsensitive flag, though you should know it's not a DOM
parser, but a specialized Python one (very fast and efficient, mind you,
but it's not DOM).
Post by C. R. Sandeep
have a lot of files with inconsistent tag names and I need to parse
them in Python. I checked the Python Global Module Reference but
couldn't find any relevant information. Also, I do not want to use
xmllib at this time (which, I think, supports case-insensitive parsing
- not sure of this though).
-Peter

Continue reading on narkive:
Loading...