Discussion:
Newbie: How to convert "<" to "&lt;" ( encoding? )
Aahz Maruch
2000-10-04 03:03:19 UTC
Permalink
Actually, it appears that there is in fact no HTMLescape() function.
what about cgi.escape() ?
Ah, there it is. Okay, I wasn't familiar with the cgi module. Thanks!
There probably should still be a generic char-to-entity routine based on
the htmlentitydefs module, but that's lower priority.
--
--- Aahz (Copyright 2000 by aahz at pobox.com)

Androgynous poly kinky vanilla queer het <*> http://www.rahul.net/aahz/
Hugs and backrubs -- I break Rule 6

There's a difference between a person who gets shit zie doesn't
deserve and a person who gets more shit than zie deserves. --Aahz
Eric
2000-10-03 21:31:10 UTC
Permalink
Hello all,
The subject says it all. I'm sure there's a Python function that will
convert:
"<" to "&lt;"
">" to "&gt;"
etc.
Do you know what the function(s) is/are? Any help or pointers are greatly
appreciated!

Eric.
Aahz Maruch
2000-10-03 23:28:51 UTC
Permalink
In article <nfsC5.23$fn2.51625 at news.pacbell.net>,
Post by Eric
The subject says it all. I'm sure there's a Python function that will
"<" to "&lt;"
Actually, it appears that there is in fact no HTMLescape() function.
Should be fairly straightforward to use htmlentitydefs to solve this
problem, but I agree that it ought to be built into htmllib. I'll check
into this on python-dev. Do you need help writing code that uses
htmlentitydefs?
--
--- Aahz (Copyright 2000 by aahz at pobox.com)

Androgynous poly kinky vanilla queer het <*> http://www.rahul.net/aahz/
Hugs and backrubs -- I break Rule 6

There's a difference between a person who gets shit zie doesn't
deserve and a person who gets more shit than zie deserves. --Aahz
lynx
2000-10-04 02:43:25 UTC
Permalink
Actually, it appears that there is in fact no HTMLescape() function.
what about cgi.escape() ?

Bjorn Pettersen
2000-10-03 22:05:51 UTC
Permalink
This should get you going:

Python 1.5.2 (#0, Apr 13 1999, 10:51:12) [MSC 32 bit (Intel)] on win32
Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
Post by Eric
import string
s = "this <is> a <string> with <nested <brackets>>"
print string.replace.__doc__
replace (str, old, new[, maxsplit]) -> string

Return a copy of string str with all occurrences of substring
old replaced by new. If the optional argument maxsplit is
given, only the first maxsplit occurrences are replaced.
Post by Eric
tmp = string.replace(s, '<', '&lt;')
tmp
'this &lt;is> a &lt;string> with &lt;nested &lt;brackets>>'
Post by Eric
s
'this &lt;is&gt; a &lt;string&gt; with &lt;nested &lt;brackets&gt;&gt;'
-- bjorn
Post by Eric
Hello all,
The subject says it all. I'm sure there's a Python function that will
"<" to "&lt;"
etc.
Do you know what the function(s) is/are? Any help or pointers are greatly
appreciated!
Eric.
--
http://www.python.org/mailman/listinfo/python-list
Eric
2000-10-03 22:14:34 UTC
Permalink
Post by Bjorn Pettersen
replace (str, old, new[, maxsplit]) -> string
Thanks Bjorn. That DOES take care of the two examples I gave. However, I
am trying to implement something that will handle all the encoding needed to
make a string "HTML friendly." I'm sorry, there is a name for the kind of
encoding I am trying to do, but I don't know what that name is (HTML
encoding?).

Thanks,

Eric.
Fredrik Aronsson
2000-10-03 23:54:54 UTC
Permalink
In article <9UsC5.27$fn2.59024 at news.pacbell.net>,
Post by Eric
Post by Bjorn Pettersen
replace (str, old, new[, maxsplit]) -> string
Thanks Bjorn. That DOES take care of the two examples
I gave. However, I am trying to implement something
that will handle all the encoding needed to make a string
"HTML friendly." I'm sorry, there is a name for the
kind of encoding I am trying to do, but I don't know
what that name is (HTML encoding?).
Here is an example which replaces all characters with HTML entities.

import string

# Load dictionary of entities (HTML 2.0 only...)
from htmlentitydefs import entitydefs
# Here you could easily add more entities if needed...

def html_encode(s):
s = string.replace(s,"&","&amp;") # replace "&" first

#runs one replace for each entity except "&"
for (ent,char) in entitydefs.items():
if char != "&":
s = string.replace(s,char,"&"+ent+";")
return s
Post by Eric
Post by Bjorn Pettersen
print html_encode("this <is> a <string> with <nested <brackets>>")
print html_encode("&<>??????????????")
&amp;&lt;&gt;&aring;&auml;&ouml;&Aring;&Auml;&Ouml;&szlig;&pound;
&eacute;&egrave;&atilde;&ntilde;&icirc;&ecirc;
Another (probably better) solution is:

import string

from htmlentitydefs import entitydefs

inv_entitydefs = {}
for (ent,char) in entitydefs.items():
inv_entitydefs[char]="&"+ent+";" # Invert dictionary

def html_encode2(s):
res=""
for c in s: # Just loops through the string once
if inv_entitydefs.has_key(c): # looking for characters
res=res+inv_entitydefs[c] # to exchange
else:
res=res+c
return res

made-them-two-minutes-ago-so-be-careful-ly' yours
Fredrik
Jon Ribbens
2000-10-04 14:29:25 UTC
Permalink
Post by Aahz Maruch
what about cgi.escape() ?
Ah, there it is. Okay, I wasn't familiar with the cgi module. Thanks!
There probably should still be a generic char-to-entity routine based on
the htmlentitydefs module, but that's lower priority.
Note that cgi.escape() should be escaping the single-quote character also,
but isn't.

I think it would be nice if string.replace could take a map as a parameter,
to do multiple replacements at once ;-).

In my CGI library I did it like this:

_html_encre = re.compile("[&<>\"']")
_html_encodes = { "&": "&amp;", "<": "&lt;", ">": "&gt;", "\"": "&quot;",
"'": "&#39;" }

def html_encode(input):
return re.sub(_html_encre, lambda m: _html_encodes[m.group(0)], str(input))

(Just to prove you can do nasty one-line things in Python just as you can
in Perl ;-) )

Cheers


Jon
Loading...