Simply reference the library in your project.
//Create an instance
HtmlDocument _document = new HtmlDocument(); 

//load html from the web
_document.Load(new Uri("")); 

//save the resultant xml data

The transform function HtmlDocument.Transform(string xsltTemplate) takes in text based xslt and returns a HtmlDocument

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="" xmlns:msxsl="urn:schemas-microsoft-com:xslt" exclude-result-prefixes="msxsl">
  <xsl:output method="xml" indent="yes" />
  <xsl:template match="/">
      <xsl:for-each select="/HTMLDocument[1]/body[1]/div[5]/div[4]/div[1]/h3">
            <xsl:value-of select="a[1]/em[1]/text()" />
            <xsl:value-of select="a[1]/@href" />

The HtmlDocument.GetHtml() function gets the html text of the document extracting it from the Xml. The Html produced however isn't necessarily xhtml but more Html 4.0

Late Breaking news: Seems that google has changed their layout system, so the above example may not work for all. Expect and update to this tutorial with a more reliable "broken" html site.

Last edited Mar 24, 2010 at 1:41 PM by kurtnelle, version 14


No comments yet.