Newbie: How to convert "<" to "<" ( encoding? )

Discussion:

Aahz Maruch

2000-10-04 03:03:19 UTC

Actually, it appears that there is in fact no HTMLescape() function.

what about cgi.escape() ?

Ah, there it is. Okay, I wasn't familiar with the cgi module. Thanks!
There probably should still be a generic char-to-entity routine based on
the htmlentitydefs module, but that's lower priority.

--
--- Aahz (Copyright 2000 by aahz at pobox.com)

Androgynous poly kinky vanilla queer het <*> http://www.rahul.net/aahz/
Hugs and backrubs -- I break Rule 6

There's a difference between a person who gets shit zie doesn't
deserve and a person who gets more shit than zie deserves. --Aahz

Eric

2000-10-03 21:31:10 UTC

Permalink

Hello all,
The subject says it all. I'm sure there's a Python function that will
convert:
"<" to "<"
">" to ">"
etc.
Do you know what the function(s) is/are? Any help or pointers are greatly
appreciated!

Eric.

Aahz Maruch

2000-10-03 23:28:51 UTC

Permalink

In article <nfsC5.23$fn2.51625 at news.pacbell.net>,

Post by Eric
The subject says it all. I'm sure there's a Python function that will
"<" to "<"

Actually, it appears that there is in fact no HTMLescape() function.
Should be fairly straightforward to use htmlentitydefs to solve this
problem, but I agree that it ought to be built into htmllib. I'll check
into this on python-dev. Do you need help writing code that uses
htmlentitydefs?

lynx

2000-10-04 02:43:25 UTC

Permalink

Actually, it appears that there is in fact no HTMLescape() function.

what about cgi.escape() ?

Bjorn Pettersen

2000-10-03 22:05:51 UTC

Permalink

This should get you going:

Python 1.5.2 (#0, Apr 13 1999, 10:51:12) [MSC 32 bit (Intel)] on win32
Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam

Post by Eric

import string
s = "this <is> a <string> with <nested <brackets>>"
print string.replace.__doc__

replace (str, old, new[, maxsplit]) -> string

Return a copy of string str with all occurrences of substring
old replaced by new. If the optional argument maxsplit is
given, only the first maxsplit occurrences are replaced.

Post by Eric

tmp = string.replace(s, '<', '<')
tmp

'this <is> a <string> with <nested <brackets>>'

Post by Eric

'this <is> a <string> with <nested <brackets>>'
-- bjorn

Post by Eric
Hello all,
The subject says it all. I'm sure there's a Python function that will
"<" to "<"
etc.
Do you know what the function(s) is/are? Any help or pointers are greatly
appreciated!
Eric.
--
http://www.python.org/mailman/listinfo/python-list

Eric

2000-10-03 22:14:34 UTC

Permalink

Post by Bjorn Pettersen
replace (str, old, new[, maxsplit]) -> string

Thanks Bjorn. That DOES take care of the two examples I gave. However, I
am trying to implement something that will handle all the encoding needed to
make a string "HTML friendly." I'm sorry, there is a name for the kind of
encoding I am trying to do, but I don't know what that name is (HTML
encoding?).

Thanks,

Eric.

Fredrik Aronsson

2000-10-03 23:54:54 UTC

Permalink

In article <9UsC5.27$fn2.59024 at news.pacbell.net>,

Post by Eric

Post by Bjorn Pettersen
replace (str, old, new[, maxsplit]) -> string

Thanks Bjorn. That DOES take care of the two examples
I gave. However, I am trying to implement something
that will handle all the encoding needed to make a string
"HTML friendly." I'm sorry, there is a name for the
kind of encoding I am trying to do, but I don't know
what that name is (HTML encoding?).

Here is an example which replaces all characters with HTML entities.

import string

# Load dictionary of entities (HTML 2.0 only...)
from htmlentitydefs import entitydefs
# Here you could easily add more entities if needed...

def html_encode(s):
s = string.replace(s,"&","&") # replace "&" first

#runs one replace for each entity except "&"
for (ent,char) in entitydefs.items():
if char != "&":
s = string.replace(s,char,"&"+ent+";")
return s

Post by Eric

Post by Bjorn Pettersen

print html_encode("this <is> a <string> with <nested <brackets>>")
print html_encode("&<>??????????????")

&<>åäöÅÄÖß£
éèãñîê
Another (probably better) solution is:

import string

from htmlentitydefs import entitydefs

inv_entitydefs = {}
for (ent,char) in entitydefs.items():
inv_entitydefs[char]="&"+ent+";" # Invert dictionary

def html_encode2(s):
res=""
for c in s: # Just loops through the string once
if inv_entitydefs.has_key(c): # looking for characters
res=res+inv_entitydefs[c] # to exchange
else:
res=res+c
return res

made-them-two-minutes-ago-so-be-careful-ly' yours
Fredrik

Jon Ribbens

2000-10-04 14:29:25 UTC

Permalink

Post by Aahz Maruch

what about cgi.escape() ?

Ah, there it is. Okay, I wasn't familiar with the cgi module. Thanks!
There probably should still be a generic char-to-entity routine based on
the htmlentitydefs module, but that's lower priority.

Note that cgi.escape() should be escaping the single-quote character also,
but isn't.

I think it would be nice if string.replace could take a map as a parameter,
to do multiple replacements at once ;-).

In my CGI library I did it like this:

_html_encre = re.compile("[&<>\"']")
_html_encodes = { "&": "&", "<": "<", ">": ">", "\"": """,
"'": "'" }

def html_encode(input):
return re.sub(_html_encre, lambda m: _html_encodes[m.group(0)], str(input))

(Just to prove you can do nasty one-line things in Python just as you can
in Perl ;-) )

Cheers

Jon