About
Squeaky Clean was written with HTML exported from M$ office in mind.
It rips out
all the classes, styles, strange XML and conditionals. Thus it doesn't
look the same afterwards but at least the markup is nice and clean.
This makes it easy to go back in and reimplement the styles
using sensible CSS. Alternatively you can edit this file to
stop style and class attributes being removed.
Documents will be converted into utf8 from whatever charset they
started in. Installing iconv will increase the charset support to
include multi-byte charsets, like east asian and arabic charsets.
By default most single byte charsets and unicode are supported.
This program uses an XML parser to read the HTML. This means that
if the source file is highly non XML compliant it will fail to
parse. I have no interest in writing a robust HTML parser, so you'll
either have to fix the file or use some other tool. The parser is
not too strict about quotes and things. You can even tell it not
to look for child tags by adding tags to the "nochild" section
of the config file 'Clean.xml'.
The attributes and elements that get deleted are configurable via
the file 'Clean.xml' distributed with the app. It works for the one
file I needed clean, but I expect it'll need work to be useful for
the general case. Please read the comments in that file and edit
if neccessary for your own files. Generally useful changes should
be sent back to me for inclusion in future versions.
Future versions may dig into the CSS and clean it up instead of
just deleting it all. But that would require more invasive parsing, anyway
this is just an alpha release.
Downloads
History
0.10 [Alpha]
Initial Release:
- Basic load, clean and display of HTML.
- Log window for status and error messages.
- XML based parsing, cleans out specified attributes and tags.
- Inline editor for cleaning up by hand.
- Options specified in 'Clean.xml' gives the user some control over the attributes and elements that get nuked.