Blog
Squeaky Clean
Date: 25/5/2006
I wrote a little utility today that cleans up (X)HTML that has been written by a rich text editor and has masses of styles and strange little XML tags everywhere. i.e. the sort of HTML that Microsoft Office outputs. It's called Squeaky Clean and it uses a XML parser to read the XHTML into a DOM tree, cleans out all the styles and junk tags and then lets you do some clean up by hand. Then save it out to disk or copy some/all of it to the clipboard using the built in editor.

It is somewhat configurable via the 'Clean.xml' options file which specifies which attributes and elements should be deleted.

By default it removes all the styles, so you have to go back and reimplement the styles if you need them. But now that the markup is fixed it's relitively painless. Future versions may expand upon the level of style removal, but for my needs all the source styling is just junk, hence the alpha just removing them.
 
Reply
From:
Email (optional): (Will be HTML encoded to evade harvesting)
Message:
 
Remember username and/or email in a cookie.
Notify me of new posts in this thread via email.
BBcode:
[q]text[/q]
[url=link]description[/url]
[img]url_to_image[/img]
[pre]some_code[/pre]
[b]bold_text[/b]