Project

General

Profile

Bug #2751

Engine unable to disable detect thread, Killing engine. (in libpcap mode)

Added by jingyu YANG 6 months ago. Updated 3 months ago.

Status:
Closed
Priority:
Normal
Assignee:
Target version:
Affected Versions:
Effort:
low
Difficulty:
low
Label:

Description

When I terminate suricata (libpcap mode, --pcap=iface ), if the packet capturing thread is busy (100% cpu), the main thread have to kill the capturing thread after timeout.

It is better to implement PktAcqBreakLoop handler to call pcap_breakloop internally, and a pull request will be proposed later.

The error msg is attached.

[23694] 20/12/2018 -- 10:41:58 - (runmode-pcap.c:257) <Info> (RunModeIdsPcapSingle) -- RunModeIdsPcapSingle initialised
[23694] 20/12/2018 -- 10:41:58 - (tm-threads.c:2172) <Notice> (TmThreadWaitOnThreadInit) -- all 1 packet processing threads, 4 management threads initialized, engine started.
^C23694 20/12/2018 -- 10:42:13 - (suricata.c:2847) <Notice> (SuricataMainLoop) -- Signal Received. Stopping engine.
[a lone time to wait, 1 minute]
[23694] 20/12/2018 -- 10:43:14 - (tm-threads.c:1578) <Error> (TmThreadDisableReceiveThreads) -- [ERRCODE: SC_ERR_FATAL(171)] - Engine unable to disable detect thread - "W#01-*****". Killing engine

The main reason is that the PktAcqBreakLoop handler is set to NULL in the src/source-pcap.c:117.

I will submit a pull request to fix it.

History

#1

Updated by jingyu YANG 6 months ago

The pull request have been submitted.
https://github.com/OISF/suricata/pull/3592

Any feedback is highly welcome.
Thanks.

#2

Updated by Victor Julien 6 months ago

  • Affected Versions deleted (TBD)

The pcap dispatch function should not block for long because we set a timeout using pcap_set_timeout. The 100% CPU also suggests it's not libpcap blocking, but something else. It would be interesting to check what this thread is doing during shut down that is taking so long. Perhaps attaching to it with perf or gdb could give some more insight.

#3

Updated by jingyu YANG 6 months ago

Thank you for your reply.

I agree with you that pcap_set_timeout() could avoid hangup during quit. And it is OK to reject this pull request.
But I prefer to set PktAcqBreakLoop as pcap_breakloop to quit explicitly, instead of waiting for timeout. Because, in my situation, libpcap is not running in default mode, but in DPDK mode.

More background information is followed.

1. 100% CPU is not a problem for this time. But main thread have to wait a long time (1 minute) to kill capturing thread is the main issue.

2. The main background is that I would like to enable DPDK(https://www.dpdk.org/) for suricata. Instead implementing src/source-dpdk.c directly for suricata, I prefer to implement DPDK for libpcap firstly, then use libpcap mode (--pcap=dpdk:0) in suricata. This is the pull request: https://github.com/the-tcpdump-group/libpcap/pull/790

3. In this case (--pcap=dpdk:0), DPDK will bind 1 cpu lcore to capturing thread, and will make one CPU core 100%. It is normal for DPDK use case, and if the main thread would like to quit, pcap_breakloop() need to be called explicitly.

4. Currently, in DPDK mode of libpcap, pcap_dispatch() will return only if the max_cnt is achieved.

5. Regarding pcap_set_timeout() that you mentioned, the parameter will only affect read timeout according the doc of libpcap(https://linux.die.net/man/3/pcap_set_timeout). In suricata libpcapmode, the pcap_dispatch() will return if during LIBPCAP_COPYWAIT(500 ms), there is no more packet received. As max_cnt is 64 in suricata when calling pcap_dispatch(), this means we have to wait (500ms*64 = 32s) maximum, if packets arrives one by one between 500 ms. I think 32s is also too long to wait.

Thank you for your review.
And any feedback is welcome.

#4

Updated by Victor Julien 6 months ago

Ok that is clear. I'm ok with the solution. I don't think it has drawbacks.

#5

Updated by jingyu YANG 6 months ago

Thank you.Cheers.
I will read the guideline and resubmit later.

#6

Updated by jingyu YANG 6 months ago

New PR has been created.
https://github.com/OISF/suricata/pull/3599

Thanks.

#7

Updated by Victor Julien 3 months ago

  • Status changed from New to Closed
  • Assignee set to jingyu YANG
  • Target version changed from TBD to 5.0beta1

Also available in: Atom PDF