Project

General

Profile

Actions

Feature #2488

open

HTML Parsing / Buffers

Added by Jason Williams over 3 years ago. Updated almost 2 years ago.

Status:
New
Priority:
Normal
Assignee:
Target version:
Effort:
high
Difficulty:
high
Label:

Description

We write a lot of signatures on the contents of html in file_data. It would be awesome to be able to do some parsing/buffering here to avoid having to go through the whole file_data buffer. Alternatively perhaps this could be some kind of transform?

Some quick off the top of my head example html:

<html>
    <head>
        <title>Meerkat HQ</title>
        <!--Meerkat HQ cloned by z001ie -->
        <link rel="stylesheet" href="./z001ie_files/css/meerkats.css">
        <link rel="shortcut icon" href="./z001ie_files/images/favicon.gif" type="image/gif"/>
        <script src="./z001ie_files/jquery_003_002.html"></script>
    <script>
        function IsEmpty() {
            var x = document.forms["login"]["user"].value;
            var y = document.forms["login"]["pass"].value;
            if (x == "") {
                document.getElementById("ErrorBox").style.display = "block"; 
                document.getElementById("ErrorUser").style.display = "block"; 
                return false;
        }
     }
    </script>
    </head>
    <body>
        <form id="signon" name="login" action="login.php" method="post" autocomplete="off" onsubmit="return IsEmpty();">
            <input type="text" id="userid" placeholder="Username" class="required" name="user" value="" autocomplete="off">
            <input type="password" placeholder="Password" class="required" id="passwd" name="pass" value="" autocomplete="off">
            <input type="submit" class="signin" value="Sign On" onclick="return IsEmpty();">                                
        </form>
    </body>
</html>

I think that the following buffers could be very useful for detection to avoid parsing all of file_data (like parsing all of http_header)

html_title

literal:<title>Meerkat HQ</title>
rule: html_title; content:"Meerkat HQ"; nocase;

html_comment

literal comment: <!--Meerkat HQ cloned by z001ie -->
rule: html_comment; content:"cloned by z00lie"; nocase;

html_resources

literal resources: (there are a few)

<link rel="stylesheet" href="./z001ie_files/css/meerkats.css">
<link rel="shortcut icon" href="./z001ie_files/images/favicon.gif" type="image/gif"/>
<script src="./z001ie_files/jquery_003_002.html"></script>

rule: html_resource; content:"/z001ie"; nocase;

literal javascript:

function IsEmpty() {
var x = document.forms["login"]["user"].value;
var y = document.forms["login"]["pass"].value;
if (x == "") {
document.getElementById("ErrorBox").style.display = "block";
document.getElementById("ErrorUser").style.display = "block";
return false;
}
}

rule: html_javascript; strip_whitespace; content:"varx=document.forms[|22|login|22|][|22|user|22|]"

html_form

literal form:

<form id="signon" name="login" action="login.php" method="post" autocomplete="off" onsubmit="return IsEmpty();">
<input type="text" id="userid" placeholder="Username" class="required" name="user" value="" autocomplete="off">
<input type="password" placeholder="Password" class="required" id="passwd" name="pass" value="" autocomplete="off">
<input type="submit" class="signin" value="Sign On" onclick="return IsEmpty();">
</form>

rule: html_form; content:".php"; content:"method=|22|post|22|"; nocase; content:"onsubmit=|22|return IsEmpty()|3b|"; nocase; content:"user"; nocase; content:"pass"; nocase; distance:0;

Or maybe as a transform?


file_data; extract_html_title; content:"Meerkat HQ";
file_data; extract_html_comment; content:"cloned by z00lie"; nocase;
file_data; extract_html_resources; content:"/z001ie"; nocase;
file_data; extract_html_javascript; strip_whitespace; content:"varx=document.forms[|22|login|22|][|22|user|22|]";
file_data; extract_html_form; content:".php"; content:"method=|22|post|22|"; nocase; content:"onsubmit=|22|return IsEmpty()|3b|"; nocase; content:"user"; nocase; content:"pass"; nocase; distance:0;


Related issues

Related to Task #4097: Suricon 2020 brainstormNewVictor JulienActions
Actions #1

Updated by Jason Ish over 3 years ago

  • Effort set to high
  • Difficulty set to high
Actions #2

Updated by Victor Julien about 2 years ago

Maybe we can use a rust html parsing crate.

Actions #3

Updated by Victor Julien almost 2 years ago

Possible Rust HTML parser: https://github.com/servo/html5ever

Actions #4

Updated by Jeff Lucovsky 11 months ago

  • Related to Task #4097: Suricon 2020 brainstorm added
Actions

Also available in: Atom PDF