HTML Rewrite

The purpose of the HTML Rewrite class is to take some random HTML (typically from a WYSIWYG editor), and turn it into some beautiful, well formatted, HTML code.

It works by interpreting the HTML to create a DOM, ignoring anything that shouldn't be there (e.g. onclick attributes). Then using this (clean) DOM, it will create a new HTML document.

By using a white-list approach to create the new document, it should be theoretically impossible for any malicious or malformed code to get though.

I should also warn you though, this code is brutal (it's completely unforgiving to code that is not on the white list)... is defiantly goes along the lines of using a sledge hammer to crack a walnut.

At the moment I'm just trying out the base functionality, so I suspect there are going to be a few issues - for example, I haven't finished adding all the CSS tests (any rules it does not understand are dropped).

Source DOM Result
Some  <span> plain <strong> text
array (
  1 => 
  array (
    'parentId' => 0,
    'tag' => 'p',
    'inline' => false,
    'attributes' => 
    array (
    ),
    'children' => 
    array (
      2 => 'Some ',
      3 => 
      array (
        'parentId' => 1,
        'tag' => 'span',
        'inline' => true,
        'attributes' => 
        array (
        ),
        'children' => 
        array (
          4 => ' plain ',
          5 => 
          array (
            'parentId' => 3,
            'tag' => 'strong',
            'inline' => true,
            'attributes' => 
            array (
            ),
            'children' => 
            array (
              6 => ' text',
            ),
          ),
        ),
      ),
    ),
  ),
)
<p>Some <span> plain <strong> text</strong></span></p>