|
|
Searches, charset, and Words.pm |
| The input of non-ASCII characters only works as intended
if the reader has the correct charset selected in their browser.
For example a user who has utf-8 selected and inputs non-us-ascii
characters will get strange results.
By default, faq-o-matic is sending out documents without a charset specification. This means that they will be displayed using whatever default coding ("charset") the user has selected in their browser, AND anything which they submit will also be using this charset. This is a bad idea! Especially after CA-2000-02 http://www.cert.org/advisories/CA-2000-02.html
Admins should configure their server to send out .html files
with their selected coding, e.g in Apache AddType "text/html; charset=iso-8859-1" htmland care should be taken that dynamically generated documents also send an appropriate HTTP charset attribute (recent versions of CGI.pm will do this). I'm still working on this aspect with my own FAQomatics, but I thought this should be on record as I hadn't seen it mentioned.
In Words.pm appears the following code: $string =~ s/[()'-]//g;
$string =~ tr/A-Z/a-z/; # 7-bit ASCII
This (including the introductory s///) can be replaced by a single tr
as follows. IMHO much clearer and more compact (and incidentally more
efficient, though I wouldn't brag about efficiency if it
made the code pointlessly inscrutable...)$string =~ tr/A-Z\300-\326\330-\336()'-/a-z\340-\366\370-\376/d;cheers | |
| [Append to This Answer] |
| Previous: |
|
| Next: |
|
| ||||||||