🧰 ToolPilot

HTML Entity Encoder & Decoder

Encode special characters for safe HTML display, or decode HTML entities back to plain text.

Common HTML Entities

&
&
<
<
&gt;
>
&quot;
"
&#39;
'
&nbsp;
(space)
&copy;
©
&reg;
®

What Are HTML Entities?

HTML entities are special character sequences that represent reserved or special characters in HTML. They begin with an ampersand (&) and end with a semicolon (;). For example, the less-than sign (<) is written as &lt; in HTML source code. Without this encoding, the browser would interpret the character as the start of an HTML tag rather than displaying it as text.

The HTML specification defines two types of character references: named entities and numeric entities. Named entities use a descriptive word (like &amp; for ampersand or &copy; for the copyright symbol). Numeric entities use the Unicode code point in either decimal (&#169;) or hexadecimal (&#xA9;) form. Numeric entities can represent any Unicode character, while named entities are limited to a predefined set.

HTML entities exist because certain characters have special meaning in HTML syntax. The five most critical characters are: & (starts an entity reference), < (starts a tag), > (ends a tag), " (delimits attribute values), and ' (alternative attribute delimiter). If these characters appear in your content without being encoded, the browser may misinterpret your HTML, breaking the page layout or creating security vulnerabilities.

How HTML Entity Encoding Works

Encoding is the process of replacing special characters with their entity equivalents. When the browser renders the page, it converts the entities back to their visual characters. This round-trip ensures that the characters display correctly without being treated as HTML syntax.

Decoding is the reverse process: converting entity references back to their original characters. This is useful when you receive HTML-encoded content from an API or database and need to work with the plain text. For example, a search engine might return snippets with encoded entities that need to be decoded before displaying in a non-HTML context.

The encoding process in this tool scans the input character by character. When it finds one of the reserved characters (&, <, >, ", ', /), it replaces it with the corresponding entity. All other characters pass through unchanged. The decoding process uses the browser's built-in HTML parser to interpret entity references, which handles named entities, decimal numeric entities, and hexadecimal numeric entities.

Common Use Cases

  • Preventing XSS attacks — Cross-Site Scripting (XSS) is one of the most common web vulnerabilities. Encoding user-generated content before inserting it into HTML prevents malicious scripts from executing. If a user submits <script>alert('hacked')</script>, encoding turns it into harmless visible text.
  • Displaying code snippets — When showing HTML, XML, or code examples on a web page, you need to encode the angle brackets and ampersands so the browser displays them as text rather than interpreting them as markup.
  • Email templates — HTML email clients vary widely in how they handle special characters. Encoding ensures consistent rendering across Gmail, Outlook, Apple Mail, and other clients.
  • Database content rendering — Content stored in databases often needs to be encoded when rendered in HTML templates to prevent injection and display errors.
  • RSS and XML feeds — Feed content must be properly encoded to produce valid XML. Unencoded ampersands and angle brackets break XML parsers.
  • SEO and meta tags — Title tags, meta descriptions, and Open Graph properties need properly encoded special characters to avoid breaking the HTML structure of the page head.
  • Form handling — When pre-filling form values that contain quotes or ampersands, encoding prevents the values from breaking out of the HTML attribute and corrupting the form.

Tips and Best Practices

  • Encode at output, not input — Store data in its raw form and encode it only when inserting into HTML. This preserves the original data and allows you to encode appropriately for different contexts (HTML, URL, JavaScript, CSS).
  • Use framework-provided escaping — Modern frameworks like React, Vue, and Angular automatically escape content inserted into templates. Manual encoding is mainly needed when using dangerouslySetInnerHTML or v-html directives.
  • Do not double-encode — Encoding already-encoded content turns &amp; into &amp;amp;. Always check whether your content is already encoded before applying encoding again.
  • Context matters — HTML entity encoding protects content placed inside HTML elements and attributes. It does not protect against injection in JavaScript strings, CSS values, or URLs. Use the appropriate encoding for each context.
  • Test with edge cases — Test your encoding with strings that contain multiple consecutive special characters, mixed entities, and Unicode characters outside the ASCII range to ensure your implementation handles them correctly.

HTML Entity Encoding vs Alternatives

HTML encoding vs URL encoding:URL encoding (percent-encoding) uses %XX format to encode characters unsafe in URLs. HTML encoding uses &name; or &#number; format for characters unsafe in HTML. They serve different contexts and are not interchangeable.

HTML encoding vs Base64: Base64 encodes binary data as ASCII text for safe transport. It encodes the entire input, while HTML encoding only replaces specific characters. Base64 is used for embedding images or sending binary data, not for HTML safety.

HTML encoding vs sanitization: Encoding converts special characters to safe representations. Sanitization (using libraries like DOMPurify) parses HTML and removes dangerous elements and attributes entirely. For rich text content where you want to allow some HTML tags, sanitization is the better approach. For plain text content displayed in HTML, encoding is sufficient.

Frequently Asked Questions

Related Tools