The purpose of the HTML Rewrite class is to take some random HTML (typically from a WYSIWYG editor), and turn it into some beautiful, well formatted, HTML code.
It works by interpreting the HTML to create a DOM, ignoring anything that shouldn't be there (e.g. onclick attributes). Then using this (clean) DOM, it will create a new HTML document.
By using a white-list approach to create the new document, it should be theoretically impossible for any malicious or malformed code to get though.
I should also warn you though, this code is brutal (it's completely unforgiving to code that is not on the white list)... is defiantly goes along the lines of using a sledge hammer to crack a walnut.
At the moment I'm just trying out the base functionality, so I suspect there are going to be a few issues - for example, I haven't finished adding all the CSS tests (any rules it does not understand are dropped).
| Source | DOM | Result |
|---|---|---|
Some <span> plain <strong> text |
array (
1 =>
array (
'parentId' => 0,
'tag' => 'p',
'inline' => false,
'attributes' =>
array (
),
'children' =>
array (
2 => 'Some ',
3 =>
array (
'parentId' => 1,
'tag' => 'span',
'inline' => true,
'attributes' =>
array (
),
'children' =>
array (
4 => ' plain ',
5 =>
array (
'parentId' => 3,
'tag' => 'strong',
'inline' => true,
'attributes' =>
array (
),
'children' =>
array (
6 => ' text',
),
),
),
),
),
),
) |
<p>Some <span> plain <strong> text</strong></span></p> |