How to write a Regex to match nested HTML tags in a Java application?

  Kiến thức lập trình

I am working on a Java application where I need to process HTML content. I need a Regex that can match nested HTML tags correctly. Here is an example of the HTML content I am working with:

<div>
  <p>Some <b>bold</b> text and <i> italic </i> text.</p>
  <span>Another <a href="#">link</a> inside span</span>
</div>

I want to match the entire tag along with its nested content, including all child tags. I’ve tried several regex patterns, but none of them seem to handle the nested tags correctly. Here is what I have tried:

String regex = "<div>(.*?)</div>";
Pattern p = Pattern.compile(regex, Pattern.DOTALL);
Matcher m = p.matcher(htmlContent);
while(m.find()){
    System.out.println(matcher.group());
}

This pattern only matches the outer tag and doesn’t correctly capture the nested nested tags. How can I write the regex pattern that correclt matches and extracts the content of nested HTML tags?

Theme wordpress giá rẻ Theme wordpress giá rẻ Thiết kế website

LEAVE A COMMENT