Documentation #6404
openflow.reason value is misleading?
Description
Just noticed that we have flow EVE events with flow.reason
being timeout
for traffic that should not have timeouted (and according to the stats likely didn't), with flow.state@s ranging from @new
over closed
, established
to bypassed
. What values would normally be expected here? I can confirm that we can see traffic from both directions on the machines in question.
Maybe I am misunderstanding something. Let's reproduce it: Consider https://storage.googleapis.com/tenzir-datasets/M57/PCAP/data.pcap replayed via a dummy interface:
$ sudo ip link add dummy0 type dummy $ sudo tcpreplay -i dummy0 -M500 ~/Downloads/data.pcap [...] Actual: 6402440 packets (3816702597 bytes) sent in 61.47 seconds Rated: 62087462.7 Bps, 496.69 Mbps, 104150.44 pps Flows: 656212 flows, 10674.78 fps, 6340962 unique flow packets, 55491 unique non-flow packets Statistics for network device: dummy0 Successful packets: 6402440 Failed packets: 4363 Truncated packets: 0 Retried packets (ENOBUFS): 0 Retried packets (EAGAIN): 0
and with default settings (and waiting 600s after finishing the replay) we get (for 6.x branch):
$ rm eve.json; sudo ./src/.libs/suricata -c suricata.yaml -i dummy0 -l . [...] 6.0.14: $ cat eve.json | jq 'select(.event_type == "flow") .flow.reason' | sort | uniq -c 1134 "shutdown" 563737 "timeout" 6.0.2: $ cat eve.json | jq 'select(.event_type == "flow") .flow.reason' | sort | uniq -c 1290 "shutdown" 49865 "timeout" 513718 "unknown" 7.0.1: $ cat eve.json | jq 'select(.event_type == "flow") .flow.reason' | sort | uniq -c 455350 "timeout"
Looking at https://github.com/OISF/suricata/blob/master/src/output-json-flow.c#L260:
const char *reason = NULL;
if (f->flow_end_flags & FLOW_END_FLAG_FORCED)
reason = "forced";
else if (f->flow_end_flags & FLOW_END_FLAG_SHUTDOWN)
reason = "shutdown";
else if (f->flow_end_flags & FLOW_END_FLAG_TIMEOUT)
reason = "timeout";
else
reason = "unknown";
jb_set_string(jb, "reason", reason);
there are only these four, none of which sound like they describe a flow that ended up, say, in a
closed
state due to being correctly closed through TCP means. (For UDP, I could understand that such a "flow" would always be closed through timeout, since there is no connection state to consider).
Are reason
and state
always supposed to be read together, and the value of reason
only makes sense for specific state
values? Maybe someone could give some background.
Updated by Sascha Steinbiss about 1 year ago
- Subject changed from flow.reason is always "timeout" to flow.reason value is misleading?
Updated by Victor Julien about 1 year ago
"timeout" is what is normal, it means the normal flow manager garbage collection determined a flow is timeout (timeout value associated with it's state is expired). "forced" would mean a flow has been forcibly cleared & reused during resource stress. "shutdown" just means the flow was evicted during engine shutdown instead of during normal runtime. Not sure why you're not seeing that, perhaps it is broken.
Updated by Sascha Steinbiss about 1 year ago
Victor Julien wrote in #note-2:
"timeout" is what is normal, it means the normal flow manager garbage collection determined a flow is timeout (timeout value associated with it's state is expired). "forced" would mean a flow has been forcibly cleared & reused during resource stress. "shutdown" just means the flow was evicted during engine shutdown instead of during normal runtime. Not sure why you're not seeing that, perhaps it is broken.
So the "timeout" string value here does not refer to the flow "timing out" as in "no TCP termination has taken place and Suricata stops tracking the flow because no packets were seen for it in the last <flow-timeout> seconds" but rather to something internal related to resource use within the flow manager?
I was just worried that Suricata never registers a proper TCP session termination on these machines and we'd have to debug the traffic feed there. But if you confirm that we're seeing normal behaviour, I am relieved. Maybe it makes sense to document the meanings of these values somewhere just in case others also misunderstand that field?
Updated by Philippe Antoine 6 months ago
- Tracker changed from Support to Documentation
- Target version set to TBD
@Sascha Steinbiss this is now a documentation of the json field, right ?
Updated by Sascha Steinbiss 6 months ago
Philippe Antoine wrote in #note-4:
@Sascha Steinbiss this is now a documentation of the json field, right ?
Yes, that was just about understanding that "timeout" here does not mean that the TCP session timeouted, but only describes the internal flow tracking state. This should probably be mentioned in the documentation to avoid misunderstanding.