Why it is important to specify a character encoding

Many website designers design really scrappy websites that do not follow standards at all. I myself tend to write all my XHTML to be XHTML1.1 compliant. As a reader of this blog, I will assume you also attempt to follow standards.

Usually I implement everything to pass xhtml transitional validation. One thing I usually ignore however, is the character encoding.

Put simply, character encoding allows a browser to display and render the document as originally intended. For instance, browsing a site developed using a Japanese-based encoding (e.g. JIS X 0208) will not display correctly unless you have the JIS X 0208 character set installed on your computer.

Without specifying a character encoding, a default character encoding is used. So specifying a character encoding when developing sites that use other characters is a must. But a more important reason exists even if you only develop english websites using UTF-8 or ISO 8859-1. It is a potential security vulnerability.

Essentially, when a character encoding is not specified it could allow for a potential XSS-style attack. This can be achieved by encoding the javascript code using UTF-7. When a clients webbrowser attempts to autodetect the type of encoding used, it will detect it as UTF-7, and the javascript code can then be executed.