Feature #602
closedavailability for http.log output - identical to apache log format
Added by Peter Manev about 12 years ago. Updated over 11 years ago.
Description
It would be beneficial if http.log, can be customized to output identical to apache2 access.log format.
That way - http.log would be immediately parable by all log parser programs that do that for apache - and there are quite a few.
To quickly check if the format of the output is the same - one could use "goaccess" (a top/htop like tool but for access.log in apache) -
sudo apt-get install goaccess
Example of use:
goaccess -f /var/log/apache2/access.log
If goaccess can parse the output of http.log (when customized for apache output format) - then any apache log parser would be able to parse the http.log i think.
Please find exemplary output of apache log customized and http.log customized with the same config lines:
apache2.conf:
LogFormat "%h l %u %t \"%r\" %>s %O \"{Referer}i\" \"%{User-Agent}i\"" combined
LogFormat "%h l %u %t \"%r\" %>s %O" common
LogFormat "\"{X-Forwarded-For}i\" h %l %u %t \"%r\" %>s %O \"{Referer}i\" \"%{User-Agent}i\"" combined_xforward
The yaml equivalent:
customformat: "%h l %u %t \"%r\" %>s %O \"{Referer}i\" \"%{User-Agent}i\""
customformat: "%h l %u %t \"%r\" %>s %O"
customformat: "\"{X-Forwarded-For}i\" h %l %u %t \"%r\" %>s %O \"{Referer}i\" \"%{User-Agent}i\""
and the results/differences attached.
Thank you
Files
apache_http_log.tar.gz (8.5 KB) apache_http_log.tar.gz | Peter Manev, 10/15/2012 05:00 AM |
Updated by Ignacio Sanchez about 12 years ago
- Assignee set to Ignacio Sanchez
- Target version set to TBD
- Start date changed from 10/15/2012 to 10/29/2012
OK. It will involve changing the meaning of some of the current format strings such as %u (which in suricata means URL including query string, but in apache mod_log it means remote user), and adding the missing ones such as %C.
Suricata: https://redmine.openinfosecfoundation.org/projects/suricata/wiki/Custom_http_logging
Apache mod_log_config module: http://httpd.apache.org/docs/2.2/mod/mod_log_config.html
I will take the implementation of this feature request.
Peter: Could you please attach a pcap file together with the several apache mod_log_config expected outputs, so that I can use it for my tests?
Updated by Peter Manev about 12 years ago
Hi,
First of all - thank you for taking the initiative.
The previously attached tar contains such an output - is that what you wee asking for Ignacio?
To test/create this I just visited the default http://127.0.0.1 after an Apache install.
thank you
Updated by Ignacio Sanchez about 12 years ago
Hi,
Yes, but there is no pcap file.
With a pcap file the testing process is easier for me. I run suricata against the pcap, and then I diff the http.log output with the one you have provided. The pcap file could be generated by running "tcpdump -s0 -w tests.pcap -i lo -n"
Each test would be made of (1) pcap file (2) customformat (3) expected apache log output, where the customformat is the actual apache log format string which generated the expected log output.
Updated by Erik C about 12 years ago
We could try to pull this, but i am not sure how we could clean the data up so that it didnt include SBU.... Give me a few, I will get you the config output we would use and what it would look like.
Updated by Charles Smutz about 12 years ago
I'm putting this here because this is the most recent thread on custom HTTP logging. These are also relatively minor things.
The log customization capability is awesome.
The original implementation didn't support cookie parsing. I'd love to see cookie parsing, ex "%{Foobar}C" work.
I'd also like to propose an extension that would allow for the specification of a maximum length for a given custom format string. For example, if you wanted to limit the Referer header to the first 100 characters you do the following: "%[100]{Referer}i" which would be the same as "%{Referer}i" but would be truncated to 100 characters if the Referer is longer than that.
This would be helpful for people who want to write data to logs that is usually relatively small but can be very large in some cases (Referer is a good example). While disk size/speed might be an issue for some users, this is more likely to be used to remove clutter from logs to make them more easily readable by humans or to deal with limitations in logs sizes for machine consumers such as environments that use syslog.
Note that this is an extension that is not currently in the apache custom log syntax but also doesn't make the format string incompatible with apache style format specifications. If this additional syntax is not used, there is no adverse impact.
Both of these seem like relatively small things, but could be very useful.
Updated by Victor Julien about 12 years ago
Charles Smutz wrote:
The original implementation didn't support cookie parsing. I'd love to see cookie parsing, ex "%{Foobar}C" work.
Not sure I get this notation. What would this do?
Updated by Charles Smutz about 12 years ago
Victor Julien wrote:
Charles Smutz wrote:
The original implementation didn't support cookie parsing. I'd love to see cookie parsing, ex "%{Foobar}C" work.
Not sure I get this notation. What would this do?
Given you have a request header as follows (from http://en.wikipedia.org/wiki/HTTP_cookie):
GET /spec.html HTTP/1.1
Host: www.example.org
Cookie: name=value; name2=value2
Accept: */*
%{Cookie}i should print the whole request cookie value:
name=value; name2=value2
But %{Foobar}C should print the value of the individual cookie "Foobar".
Ex.
%{name}C should print the value of the "name" cookie:
value
%{name2}C should should print the value of the "name2" cookie:
value2
Updated by Charles Smutz about 12 years ago
All, please correct me if this is not the right place to post these.
I've got two minor nits I'd like to point out.
I'm not sure if this is the same or related to what Eric's issue (#600), but I don't think we should be doing URI filtering on the literal values in the custom log format (possibly also %t/LOG_HTTP_CF_TIMESTAMP/strftime).
I agree with filtering all special characters in data coming from the HTTP data, but there should be no reason to do this filtering on IDS admin controlled/predictable data.
For example, if I put a "\t" in the custom format string, I want the log to be printed with a tab character not the escaped literal "\x09". Again, I think this escaping should be removed only for admin controlled values, that this escaping occurs on raw HTTP data is very useful. Lack of this sort of escaping would make the text logs unreliable and susceptable to all sorts of issues.
I don't see a huge difference between escaping the white space that apache does as (ex. "\t") instead of Suricata currently does (ex. "\x09").
To be clear, if the tab character is in the sysadmin defined literals, this should be printed as a tab character in the log. If this is data taken from the network data such as urls, header values, etc, it should be escaped as "\t" or \x09"--either one being fine for me.
I mention tab here because some may want to make tab delimited logs, which due to the escaping of tabs in network data, would be a reliable delimiter.
This is the change I made to disable the URI escaping:
switch (httplog_ctx->cf_nodes[i]->type){ case LOG_HTTP_CF_LITERAL: /* LITERAL */ - PrintRawUriBuf((char *)aft->buffer->buffer, &aft->buffer->offset, - aft->buffer->size, (uint8_t *)httplog_ctx->cf_nodes[i]->data, - strlen(httplog_ctx->cf_nodes[i]->data)); + MemBufferWriteString(aft->buffer, httplog_ctx->cf_nodes[i]->data); break;
I think it would be reasonable to make a similar change to %t/LOG_HTTP_CF_TIMESTAMP/strftime but can't see it being an issue in practice.
My second nit is with the status code (s/LOG_HTTP_CF_RESPONSE_STATUS) when it is an HTTP redirect a la HTTP 301/302. The current implementation which puts the Location header in the response code is likely to break a lot of things, including any post processing of the logs. This functionality is already available through the "{Location}o" format string for those who want such, so there is no need for this functionality to be built into the status code. I recommend this additional functionality be removed. It is redundant at best and makes logs unparseable for many uses.
Ex. Remove the whole block of code that begins as follows:
/* Redirect? */ if (tx->response_headers != NULL && tx->response_status_number > 300 && tx->response_status_number < 303) {
If you'd like me to provide patches for these changes, I'd be happy to.
Again, I consider these minor nits.
The custom log functionality is most useful. I'm not bent on this functionality needing to mirror apache's syntax exactly (the differences in format strings, lack of modifiers, etc are fine by me) but users will benefit from being able to define any log format of their choice and should be able to replicate web server formats to the degree possible given innate differences between the two. I also don't think Suricata should limit the custom log functionality to only what is found in apache. I've already proposed a (backwards compatible) extension for limiting length of data. I could see other data available in the future that would be useful to put in a custom log format including alerts, payload hashes, etc that would certainly be a superset of apache's custom log syntax.
Updated by Ignacio Sanchez almost 12 years ago
Ok. I have sent a pull request with the following changes:
Added support for %{cookiename}C
Added support for the definition of maximun length. ie: %[50]{user-agent}i
Some small bugfixes
Added the modifications suggested by Charles Smutz
https://github.com/inliniac/suricata/pull/282
Any feedback will be welcomed.
Updated by Vincent Fang almost 12 years ago
I was wondering if the %{}C feature was ever added or if this feature request is on hold?
Also I'm not sure if I should make a new feature request or add on to this one, but a truncation would be nice too [40] to specify the max number of characters the http.log should display for a field before truncating, in case there's too much info and the admin think it's ok to cut some information out. Example the http URL
[40]%u
would only show the first 40 characters of the URL.
Updated by Ignacio Sanchez almost 12 years ago
Yes, this is precisely what it has been added in the above mentioned PR (in addition to some bug fixes and the Charles' modifications).
You should now be able to use [40]%u and the %{}C feature. Please let me know your results if you test it.
Vincent Fang wrote:
I was wondering if the %{}C feature was ever added or if this feature request is on hold?
Also I'm not sure if I should make a new feature request or add on to this one, but a truncation would be nice too [40] to specify the max number of characters the http.log should display for a field before truncating, in case there's too much info and the admin think it's ok to cut some information out. Example the http URL
[40]%u
would only show the first 40 characters of the URL.
Updated by Vincent Fang almost 12 years ago
I must be doing something wrong. Ignacio can you tell me which repository I should be cloning from
https://github.com/inliniac/suricata.git
or
https://github.com/owlsec/suricata.git
I did a clone from inliniac and the cookie nor the [] truncation work or I must be doing the syntax wrong. I tried %[10]u and [10]%u and neither worked.
Updated by Vincent Fang almost 12 years ago
Ok I went to Owlsec's github suricata repository
git clone https://github.com/owlsec .
and I made sure to change branches to
git checkout customhttplog
and with the new
customformat: "%a:%p -> %A:%P %[10]u %{bdfpc}C"
All uris are truncated to 10 characters long and cookies that have the name bdfpc have their values show up so looks good so far.
I'm also wondering if it's possible for you to add a small additional change of \t for tabs in the http.log. If I put in \t or the actual tab in the customformat:
the result is that I only get one whitespace character in place.
Updated by Vincent Fang almost 12 years ago
I apologize for the spam of updates, but I tested the customformat as follows
customformat: "%a:%p\t->\t%A:%P\t%[10]u\t%{bdfpc}C"
and the \t creates tabs in the http.log, so far everything works as expected.
Updated by Peter Manev almost 12 years ago
yes !
works - now i can get it to be parsed by "goaccess" an other apache log tools - pretty cool ....
however ....
1.
the problem is ...that if a value does not exist...ex:
\"%r\"
in the apache log ..access.log - apache substitutes whatever does not exist with "-" (dash , no quotes) ... in http.log if the value does not exist the printing function substitutes it with nothing ...
So in other words:
if there is no \"%r\" ... in the http request/log line - apache2 does the following
127.0.0.1 - - [19/Feb/2013:10:42:23 +0100] "GET / HTTP/1.1" 304 210 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:18.0) Gecko/20100101 Firefox/18.0"
for
"%h l %u %t \"%r\" %>s %O \"{Referer}i\" \"%{User-Agent}i\""
in Suricata the following gets printed
127.0.0.1 [19/Feb/2013:10:42:23 +0100] "GET / HTTP/1.1" 304 210 "" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:18.0) Gecko/20100101 Firefox/18.0"
for
"%h l %u [%t] \"%r\" %>s %O \"{Referer}i\" \"%{User-Agent}i\""
in yaml
we skip(print nothing) it if it does not exist...
2.
When we print %t - time format
apache prints it like :
[19/Feb/2013:10:42:23 +0100] by default with the []
to mimic that behavior in yaml I use [%t] .... may be not such a big deal ...
3. The %>s - is not working properly (if we are to make use of apache style log format)
I think in order to be made "fully" apache compatible (if custom logging is used that is) - we should follow those.
Just because there are a number of apache log parser tools freely available already... my suggestion.
Maybe we could have :
custom: yes/no/apache
in yaml in order to make an option available for an apache log "compatibility" ?
Updated by Ignacio Sanchez over 11 years ago
I have submitted a new pull request with the following changes:
Cookie is parsed now using uint8_t pointers (following Victor Julien PR comments) …
Changed buffer size to a power of 2 (8192) and cookie value extraction function to static (following Victor Julien PR comments)
Added %b for request size (Vincent Fang patch)
Writing "-" if an unknown % directive is used (Vincent Fang patch)
Fixed bug in cookie parser
Fixed format string issue logging literal values
https://github.com/inliniac/suricata/pull/360
Peter: once the PR is accepted I can start looking into your 2nd and 3rd points (the 1st one should be ok now).
Any feedback is welcomed.
Updated by Ignacio Sanchez over 11 years ago
- % Done changed from 0 to 100
New pull request adding syntax error handling.
Updated by Victor Julien over 11 years ago
- Status changed from New to Closed
- Target version changed from TBD to 2.0beta2
Merged into master, thanks all!