Project

General

Profile

Actions

Support #6404

open

flow.reason value is misleading?

Added by Sascha Steinbiss 7 months ago. Updated 7 months ago.

Status:
New
Priority:
Normal
Assignee:
Affected Versions:
Label:

Description

Just noticed that we have flow EVE events with flow.reason being timeout for traffic that should not have timeouted (and according to the stats likely didn't), with flow.state@s ranging from @new over closed, established to bypassed. What values would normally be expected here? I can confirm that we can see traffic from both directions on the machines in question.

Maybe I am misunderstanding something. Let's reproduce it: Consider https://storage.googleapis.com/tenzir-datasets/M57/PCAP/data.pcap replayed via a dummy interface:

$ sudo ip link add dummy0 type dummy 

$ sudo tcpreplay -i dummy0 -M500 ~/Downloads/data.pcap
[...]
Actual: 6402440 packets (3816702597 bytes) sent in 61.47 seconds
Rated: 62087462.7 Bps, 496.69 Mbps, 104150.44 pps
Flows: 656212 flows, 10674.78 fps, 6340962 unique flow packets, 55491 unique non-flow packets
Statistics for network device: dummy0
    Successful packets:        6402440
    Failed packets:            4363
    Truncated packets:         0
    Retried packets (ENOBUFS): 0
    Retried packets (EAGAIN):  0

and with default settings (and waiting 600s after finishing the replay) we get (for 6.x branch):
$ rm eve.json; sudo ./src/.libs/suricata -c suricata.yaml -i dummy0 -l .
[...]

6.0.14:
$ cat eve.json | jq 'select(.event_type == "flow") .flow.reason' | sort | uniq -c 
   1134 "shutdown" 
 563737 "timeout" 

6.0.2:
 $ cat eve.json | jq 'select(.event_type == "flow") .flow.reason' | sort | uniq -c
   1290 "shutdown" 
  49865 "timeout" 
 513718 "unknown" 

7.0.1:
$ cat eve.json | jq 'select(.event_type == "flow") .flow.reason' | sort | uniq -c
 455350 "timeout" 

Looking at https://github.com/OISF/suricata/blob/master/src/output-json-flow.c#L260:

    const char *reason = NULL;
    if (f->flow_end_flags & FLOW_END_FLAG_FORCED)
        reason = "forced";
    else if (f->flow_end_flags & FLOW_END_FLAG_SHUTDOWN)
        reason = "shutdown";
    else if (f->flow_end_flags & FLOW_END_FLAG_TIMEOUT)
        reason = "timeout";
    else
        reason = "unknown";

    jb_set_string(jb, "reason", reason);

there are only these four, none of which sound like they describe a flow that ended up, say, in a closed state due to being correctly closed through TCP means. (For UDP, I could understand that such a "flow" would always be closed through timeout, since there is no connection state to consider).

Are reason and state always supposed to be read together, and the value of reason only makes sense for specific state values? Maybe someone could give some background.

Actions #1

Updated by Sascha Steinbiss 7 months ago

  • Subject changed from flow.reason is always "timeout" to flow.reason value is misleading?
Actions #2

Updated by Victor Julien 7 months ago

"timeout" is what is normal, it means the normal flow manager garbage collection determined a flow is timeout (timeout value associated with it's state is expired). "forced" would mean a flow has been forcibly cleared & reused during resource stress. "shutdown" just means the flow was evicted during engine shutdown instead of during normal runtime. Not sure why you're not seeing that, perhaps it is broken.

Actions #3

Updated by Sascha Steinbiss 7 months ago

Victor Julien wrote in #note-2:

"timeout" is what is normal, it means the normal flow manager garbage collection determined a flow is timeout (timeout value associated with it's state is expired). "forced" would mean a flow has been forcibly cleared & reused during resource stress. "shutdown" just means the flow was evicted during engine shutdown instead of during normal runtime. Not sure why you're not seeing that, perhaps it is broken.

So the "timeout" string value here does not refer to the flow "timing out" as in "no TCP termination has taken place and Suricata stops tracking the flow because no packets were seen for it in the last <flow-timeout> seconds" but rather to something internal related to resource use within the flow manager?

I was just worried that Suricata never registers a proper TCP session termination on these machines and we'd have to debug the traffic feed there. But if you confirm that we're seeing normal behaviour, I am relieved. Maybe it makes sense to document the meanings of these values somewhere just in case others also misunderstand that field?

Actions

Also available in: Atom PDF