We are all thrilled to welcome exciting new HTML5 features for modern web application development. Nevertheless, HTML5 does not guarantee seamless migrations, making the transition to this new standard non-trivial. For instance, HTML5 lowered the precedence of HTML comment, as compared to HTML4. As a result, developers cannot blindly upgrade by changing only the doctype of an existing HTML4 document. On the other hand, amid meeting the HTML5 standard for new applications, developers cannot neglect legacy browsers which do not understand HTML5, yet are still substantially popular.
In case the issues are not properly handled, a document being “misinterpreted” by browsers will produce different DOM trees, and that break not only the presentation visually, but also business logic and even security. The traditional validator (or, lint) approach can give some warnings due to parsing discrepancies between HTML 4 and 5. However, it is still non-scalable and error-prone for developers to fix them manually, let alone other compatibility issues arising from typos and browser quirks.
In this talk, we will cover the identified compatibility issues, arising from different HTML versions and browser parsers. There however exists no automated solutions, and it does not scale to fix them manually. We propose the canonicalisation process to rewrite HTML that can be consistently parsed across popular browsers, thus preserving the presentation, business logic, and security of web applications. We will introduce how to use the automated utility, and how the issues are resolved automatically, gracefully, and securely. We also provide a proven use case in Safe JS Templating, in which the canonicalisation is crucial in defending against XSS attacks. The automated solution can help not only developers address the compatibility issues, but also the community in embracing HTML5 more aggressively.
 Context Parser
This work has a second author Nera Liu (firstname.lastname@example.org), also from Yahoo Paranoid.