Chris Notdisclosing
Philip Greenspun's Homepage : Community member
A member of the Philip Greenspun's Homepage community since April 22, 2005
If you were to log in, you'd be able to get more information on your fellow community member.
Static Page Comments
- April 22, 2005, on Converting from Microsoft Word to HTML:
There is a CMS called PHPWebSite which is open source. In a settings file, you can specify what html tags to strip from input (for example, when pasting from word into a textarea for creating content). I disallow the (P) tag.
I have customised this functionality to do the following: before stripping html tags, replace the (/P) and (/p) tags with (BR /).
Then there is a posting on www.php.net for the strip_tags function, in which a comment talks about a function which can strip attributes from specific tags. (The function allows you to specify an array of attributes not to strip - but that part doesn't seem to work - it will, however, strip all attributes).
Before stripping the tags, I then strip all the attributes for all the allowed tags except for anchor tags (A) and any tags to do with tables (table), (tr), (td), (th), etc.
The result is that only the tags I allow are kept, all paragraphs are converted to (BR) tags (and because Word seems to insert an empty paragraph bet...
philg@mit.edu