1

Closed

HtmlParser.Parse method deletes last text

description

Lets say you have a document like this: "<html><body><p>stuff 1</p><p>stuff 2</p></body></html>"
 
Then after using the htmlDocument.LoadHTML() method the htmlDocument.GetHtml() method will only return
<html><body><p>stuff 1</p><p></p></body></html>
 
It seems the content of the last 'text'/CDATA node of the document gets ignored.

file attachments

Closed Apr 16, 2010 at 9:38 PM by kurtnelle
Issue resolved

comments

RudolfHenning wrote Apr 9, 2010 at 7:10 AM

Added sample code

wrote Apr 9, 2010 at 7:10 AM

RudolfHenning wrote Apr 9, 2010 at 12:54 PM

It looks like replacing the line
string _attributeText = Chomp(ref html, _match.Length);
with
string _attributeText = string.Empty;
if (_tagName.Length > 0)
 _attributeText = Chomp(ref html, _match.Length);
resolves the issue.

kurtnelle wrote Apr 9, 2010 at 1:04 PM

I understand. Interesting.

wrote Apr 16, 2010 at 9:38 PM

wrote Feb 13, 2013 at 12:59 AM

wrote May 16, 2013 at 1:26 AM