Html > XHtml conversion?

Apr 27, 2010 at 12:52 PM

Hello there,

say is it possible with your library to convert any kind of html to proper xhtml? Or is that left out on purpose? Or maybe I just haven't found out 'how' just yet :)



Apr 27, 2010 at 2:59 PM


The library doesn't convert bad html to xhtml natively. Conversion to xhtml is more than fixing tags; there are some structural changes as well. What the library can do for you is to ensure that the data from the bad html conforms to the basic xhtml premise of html structured in xml form. In order to capture the entire html input under a single xml node, the entire input is wrapped in a HTMLDocument tag. This tag forms the root node of the xml structure (as there can only be 1 root node in xml). Under that node will be the contents of the html that was retrieved and parsed out. You can save the string on the DocumentElement.InnerXml property to obtain a near xhtml compliant document. 

Additionally the GetHtml method will attempt to obtain the "corrected" html in the same structure (with whitespaces) that was present in the original html.

Hope this helps,