HTML parsers are software for automated Hypertext Markup Language (HTML) parsing . They have two main purposes:
HTML traversal: offer an interface for programmers to easily access and modify the "HTML string code". Canonical example: DOM parsers .
HTML clean: to fix invalid HTML and to improve the layout and indent style of the resulting markup. Canonical example: HTML Tidy .
* Latest release (of significant changes) date.
** sanitize (generating standard-compatible web-page, reduce spam, etc.) and clean (strip out surplus presentational tags, remove XSS code, etc.) HTML code.
*** Updates HTML4.X to XHTML or to HTML5, converting deprecated tags (ex. CENTER) to valid ones (ex. DIV with style="text-align:center;"
).
References
12.2 Parsing HTML documents — HTML Standard Archived 2013-01-16 at the Wayback Machine
HTML Tidy release 5.8.0
^ What is Tidy?
HtmlUnit 3.7.0
Beautiful Soup release 4.10
jsoup Java HTML Parser release 1.18.1
Categories :
Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.
**DISCLAIMER** We are not affiliated with Wikipedia, and Cloudflare.
The information presented on this site is for general informational purposes only and does not constitute medical advice.
You should always have a personal consultation with a healthcare professional before making changes to your diet, medication, or exercise routine.
AI helps with the correspondence in our chat.
We participate in an affiliate program. If you buy something through a link, we may earn a commission 💕
↑