Bug #5320
closedKey collisions in HTTP JSON eve-logs
Description
Hi,
During development of issue #2485 (commit bef190f767828f240f1ef9718e72b187faedc5af), the content_range
JSON field was added as part of the data exported by JsonHttpLogJSONBasic
https://github.com/OISF/suricata/blob/bef190f767828f240f1ef9718e72b187faedc5af/src/output-json-http.c#L204.
In commit 6ba93d905feb1905e38d13ab9335aa3a51f706a4, converting JSON logging to the JSON builder, the name content_range
stucked https://github.com/OISF/suricata/blob/6ba93d905feb1905e38d13ab9335aa3a51f706a4/src/output-json-http.c#L346.
Today, even though JsonHttpLogJSONBasic
is no more, we still output the content-range header unconditionnnally under the name content_range
. Unfortunately, this name collisions with the http_fields
https://github.com/OISF/suricata/blob/master/src/output-json-http.c#L173 which is generated in EveHttpLogJSONCustom
https://github.com/OISF/suricata/blob/master/src/output-json-http.c#L268
The end result is that the content_range
field is outputed twice in JSON eve-log of type http. The first content_range
value is a dict, generated by EveHttpLogJSONBasic
https://github.com/OISF/suricata/blob/master/src/output-json-http.c#L249 and the second is a string, generated by EveHttpLogJSONCustom
https://github.com/OISF/suricata/blob/master/src/output-json-http.c#L268 if the content_range
is present in the configuration under outputs.eve-log.http.custom
Here is a example of an eve-log containing the duplicated key.
{"timestamp":"2022-05-02T13:48:57.583006+0000","flow_id":1598560542515831,"in_iface":"mon0","event_type":"http","src_ip":"192.0.2.1","src_port":57118,"dest_ip":"192.0.2.2","dest_port":80,"proto":"TCP","tx_id":0,"ether":{"src_mac":"01:02:03:04:05:06","dest_mac":"FF:02:03:04:05:06"},"community_id":"1:soC8KFwwmPd5FiVB/IAQ4FjZIaI=","http":{"hostname":"someserver.gatewatcher.com","url":"/hello_world","http_user_agent":"curl/7.68.0","http_content_type":"application/octet-stream","content_range":{"raw":"bytes 0-128/14221642647","start":0,"end":128,"size":14221642647},"accept":"*/*","range":"bytes=0-128","connection":"keep-alive","content_length":"129","content_range":"bytes 0-128/14221642647","content_type":"application/octet-stream","date":"Mon, 02 May 2022 13:48:00 GMT","last_modified":"Mon, 02 May 2022 02:47:02 GMT","server":"nginx/9.99.9","http_method":"GET","protocol":"HTTP/1.1","status":206,"length":129},"host":"probe.gatewatcher.com"}
We believe that having JSON parameter pollution (named after the equivalent issue HTTP parameter pollution) is an issue that may lead to confusion for eve-log consumers. Some consumers will consider the first occurrence while some others may choose to consider the last one, and the data type is also changing, which may lead to deserialisation issues, depending on the order of evaluation.
content_range
is not the only field that is duplicated. For instance, the content-type is outputed twice; once under the name content_type
as a custom http field https://github.com/OISF/suricata/blob/master/src/output-json-http.c#L174 and another time under the name http_content_type
via EveHttpLogJSONBasic
https://github.com/OISF/suricata/blob/master/src/output-json-http.c#L247 The problem is of less importance for content-type though, since the names do not collision; we just have the same value outputed twice under two separate keys.
We believe a "coherent yet redundant" fix would be to have EveHttpLogJSONBasic
outputing the content-range value under a key named http_content_range
, as it is done for the content-type. This would, at least, prevent the collision.
However, we believe a better fix would be to remove the content_type
, content_range
and all other duplicated infos from the custom header output, thus preventing key collision and info duplication.
What is your preferred approach?
Thank you.
Florian Maury & Tommy Boiret
Gatewatcher Dev Team