Stripping HTML from Text Using JavaScript

When working with user-generated content or processing HTML data, you might encounter situations where you need to remove HTML tags and extract only the plain text.

JavaScript provides various approaches to achieve this. In this article, we'll explore methods to strip HTML from a given text using JavaScript.

Method 1: Using a DOM Parser

One approach is to leverage the DOMParser to create a DocumentFragment from the HTML string and then extract the text content. Here's an example:

// Function to strip HTML using DOMParser
function stripHtmlUsingDOMParser(html) {
  const doc = new DOMParser().parseFromString(html, 'text/html');
  return doc.body.textContent || "";
}

// Example usage
const htmlString = '<p>This is <strong>HTML</strong> content.</p>';
const textContent = stripHtmlUsingDOMParser(htmlString);
console.log(textContent);

In this code:

Method 2: Using a Regular Expression

Another approach involves using a regular expression to replace HTML tags with an empty string. Here's an example:

// Function to strip HTML using a regular expression
function stripHtmlUsingRegex(html) {
  return html.replace(/<[^>]*>/g, '');
}

// Example usage
const htmlString = '<p>This is <strong>HTML</strong> content.</p>';
const textContent = stripHtmlUsingRegex(htmlString);
console.log(textContent);

In this code:

Full Example

Let's create a complete HTML example that demonstrates both methods:

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <title>HTML Stripping Example</title>
  <script>
    // Function to strip HTML using DOMParser
    function stripHtmlUsingDOMParser(html) {
      const doc = new DOMParser().parseFromString(html, 'text/html');
      return doc.body.textContent || "";
    }

    // Function to strip HTML using a regular expression
    function stripHtmlUsingRegex(html) {
      return html.replace(/<[^>]*>/g, '');
    }

    // Example usage
    window.onload = function () {
      const htmlString = '<p>This is <strong>HTML</strong> content.</p>';

      // Using DOMParser
      const textContentDOMParser = stripHtmlUsingDOMParser(htmlString);
      console.log('Using DOMParser:', textContentDOMParser);

      // Using regular expression
      const textContentRegex = stripHtmlUsingRegex(htmlString);
      console.log('Using regular expression:', textContentRegex);
    };
  </script>
</head>
<body>
  <h1>HTML Stripping Example</h1>
  <p>Check the browser console for the stripped HTML content.</p>
</body>
</html>

In this example:

Browser Compatibility

Both methods are widely supported in modern browsers. However, keep in mind that using regular expressions for parsing HTML has limitations and may not cover all edge cases. The DOMParser method is generally more robust.

Conclusion

Stripping HTML from text is a common requirement in web development, especially when working with user-generated content.

JavaScript provides different methods to achieve this, such as using the DOMParser or regular expressions.

Choose the method that best fits your specific use case and consider the trade-offs in terms of performance and accuracy.

Incorporate these techniques into your projects to ensure clean and safe handling of HTML content.