Project

General

Profile

Actions

Support #2715

closed

deadlock between all detect thread and flow manager thread

Added by denny luo over 5 years ago. Updated about 5 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Affected Versions:
Label:

Description

Hi:
I found that all detect thread in __lll_lock_wait() to get the lock from a FlowBucket which locked in flow manager thread(thread 28),
and the flow manager thread is __lll_lock_wait to get the lock from a flow,the problem is the flow is not right flow.the address of flow looks like wrong

33   Thread 0x7f2263fff700 (LWP 3974) 0x00007f2a839ff42d in __lll_lock_wait () from /lib64/libpthread.so.0
  32   Thread 0x7f221cff9700 (LWP 3994) 0x00007f2a816e67a3 in select () from /lib64/libc.so.6
  31   Thread 0x7f221d7fa700 (LWP 3993) 0x00007f2a839fccf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  30   Thread 0x7f221dffb700 (LWP 3992) 0x00007f2a839fccf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  29   Thread 0x7f221e7fc700 (LWP 3991) 0x00007f2a839fccf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
* 28   Thread 0x7f221effd700 (LWP 3990) 0x00007f2a839ff42d in __lll_lock_wait () from /lib64/libpthread.so.0
  27   Thread 0x7f221ffff700 (LWP 3988) 0x00007f2a839ff42d in __lll_lock_wait () from /lib64/libpthread.so.0
  26   Thread 0x7f22711fd700 (LWP 3972) 0x00007f2a839ff42d in __lll_lock_wait () from /lib64/libpthread.so.0
  25   Thread 0x7f223cff9700 (LWP 3987) 0x00007f2a839ff42d in __lll_lock_wait () from /lib64/libpthread.so.0
  24   Thread 0x7f223dffb700 (LWP 3985) 0x00007f2a839ff42d in __lll_lock_wait () from /lib64/libpthread.so.0
  23   Thread 0x7f223e7fc700 (LWP 3984) 0x00007f2a839ff42d in __lll_lock_wait () from /lib64/libpthread.so.0
  22   Thread 0x7f22627fc700 (LWP 3977) 0x00007f2a839ff42d in __lll_lock_wait () from /lib64/libpthread.so.0
  21   Thread 0x7f22617fa700 (LWP 3979) 0x00007f2a839ff42d in __lll_lock_wait () from /lib64/libpthread.so.0
  20   Thread 0x7f223effd700 (LWP 3983) 0x00007f2a839ff42d in __lll_lock_wait () from /lib64/libpthread.so.0
  19   Thread 0x7f2260ff9700 (LWP 3980) 0x00007f2a839ff42d in __lll_lock_wait () from /lib64/libpthread.so.0
  18   Thread 0x7f2261ffb700 (LWP 3978) 0x00007f2a839ff42d in __lll_lock_wait () from /lib64/libpthread.so.0
  17   Thread 0x7f223d7fa700 (LWP 3986) 0x00007f2a839ff42d in __lll_lock_wait () from /lib64/libpthread.so.0
  16   Thread 0x7f223f7fe700 (LWP 3982) 0x00007f2a839ff42d in __lll_lock_wait () from /lib64/libpthread.so.0
  15   Thread 0x7f223ffff700 (LWP 3981) 0x00007f2a839ff42d in __lll_lock_wait () from /lib64/libpthread.so.0
  14   Thread 0x7f22721ff700 (LWP 3970) 0x00007f2a839ff42d in __lll_lock_wait () from /lib64/libpthread.so.0
  13   Thread 0x7f22719fe700 (LWP 3971) 0x00007f2a839ff42d in __lll_lock_wait () from /lib64/libpthread.so.0
  12   Thread 0x7f22637fe700 (LWP 3975) 0x00007f2a839ff42d in __lll_lock_wait () from /lib64/libpthread.so.0
  11   Thread 0x7f221f7fe700 (LWP 3989) 0x00007f2a839ff42d in __lll_lock_wait () from /lib64/libpthread.so.0
  10   Thread 0x7f22709fc700 (LWP 3973) 0x00007f2a839ff42d in __lll_lock_wait () from /lib64/libpthread.so.0
  9    Thread 0x7f2262ffd700 (LWP 3976) 0x00007f2a839ff42d in __lll_lock_wait () from /lib64/libpthread.so.0
  8    Thread 0x7f2a7e1b3700 (LWP 3968) 0x00007f2a816b61ad in nanosleep () from /lib64/libc.so.6
  7    Thread 0x7f2a809b8700 (LWP 3963) 0x00007f2a816ef923 in epoll_wait () from /lib64/libc.so.6
  6    Thread 0x7f2a7d9b2700 (LWP 3969) 0x000000000056d998 in rte_rdtsc () at /usr/local/include/dpdk/rte_cycles.h:103
  5    Thread 0x7f2a7e9b4700 (LWP 3967) em_main_loop (dummy=<optimized out>) at /root/DPI/suricata_4.0/l3fwd_em.c:220
  4    Thread 0x7f2a7f1b5700 (LWP 3966) 0x00000000004548f9 in rte_eth_rx_burst (nb_pkts=32, rx_pkts=0x7f2a7f1b48a0, queue_id=<optimized out>, port_id=0 '\000')
    at /usr/local/include/dpdk/rte_ethdev.h:2667
  3    Thread 0x7f2a7f9b6700 (LWP 3965) 0x00000000005fca7f in ixgbe_recv_pkts_vec ()
  2    Thread 0x7f2a801b7700 (LWP 3964) em_parse_ptype (m=<optimized out>) at /root/DPI/suricata_4.0/l3fwd_em.c:141
  1    Thread 0x7f2a84ef7b40 (LWP 3962) 0x00007f2a816b61ad in nanosleep () from /lib64/libc.so.6

you can see thread 22 and 27 both be lock_wait for fb whose mutex lock __owner's LWP is 3990(thread 28)

(gdb) thread 27
[Switching to thread 27 (Thread 0x7f221ffff700 (LWP 3988))]
#0  0x00007f2a839ff42d in __lll_lock_wait () from /lib64/libpthread.so.0
(gdb) f 3
#3  0x0000000000516507 in FlowGetFlowFromHash (tv=tv@entry=0x13d999c0, dtv=0x7f221829e250, p=p@entry=0x7f2218267f20, dest=dest@entry=0x7f2218267f58)
    at /root/DPI/suricata_4.0/flow-hash.c:479
479         FBLOCK_LOCK(fb);
(gdb) p *fb
$1 = {head = 0x7f22149481e0, tail = 0x13076d60, m = {__data = {__lock = 2, __count = 0, __owner = 3990, __nusers = 1, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, 
        __next = 0x0}}, __size = "\002\000\000\000\000\000\000\000\226\017\000\000\001", '\000' <repeats 26 times>, __align = 2}, next_ts_sc_atomic__ = 0}

(gdb) thread 22
[Switching to thread 22 (Thread 0x7f22627fc700 (LWP 3977))]
#0  0x00007f2a839ff42d in __lll_lock_wait () from /lib64/libpthread.so.0
(gdb) f 3      
#3  0x0000000000516507 in FlowGetFlowFromHash (tv=tv@entry=0x14175e30, dtv=0x7f224829e250, p=p@entry=0x7f2248267f20, dest=dest@entry=0x7f2248267f58)
    at /root/DPI/suricata_4.0/flow-hash.c:479
479         FBLOCK_LOCK(fb);
(gdb) p *fb    
$2 = {head = 0x7f22149481e0, tail = 0x13076d60, m = {__data = {__lock = 2, __count = 0, __owner = 3990, __nusers = 1, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, 
        __next = 0x0}}, __size = "\002\000\000\000\000\000\000\000\226\017\000\000\001", '\000' <repeats 26 times>, __align = 2}, next_ts_sc_atomic__ = 0}

and see this you will find that the f which in flow_bucket .tail looks like not a right flow compare the other flow

(gdb) thread 28
[Switching to thread 28 (Thread 0x7f221effd700 (LWP 3990))]
#0  0x00007f2a839ff42d in __lll_lock_wait () from /lib64/libpthread.so.0
(gdb) f 3
#3  0x000000000049c432 in FlowManagerHashRowTimeout (next_ts=<optimized out>, counters=<optimized out>, emergency=<optimized out>, ts=<optimized out>, f=0x13076d60)
    at /root/DPI/suricata_4.0/flow-manager.c:335
335             FLOWLOCK_WRLOCK(f);
(gdb) p *f
$3 = {eth_dst = "\000\000\000\000\000", eth_src = "\000\000\061\000\000", src = {address = {address_un_data32 = {0, 6146, 4930427, 0}, address_un_data16 = {0, 0, 6146, 0, 15227, 75, 0, 
        0}, address_un_data8 = "\000\000\000\000\002\030\000\000{;K\000\000\000\000"}}, dst = {address = {address_un_data32 = {0, 0, 0, 319253824}, address_un_data16 = {0, 0, 0, 0, 0, 0, 
        27968, 4871}, address_un_data8 = '\000' <repeats 12 times>, "@m\a\023"}}, {sp = 0, type = 0 '\000'}, {dp = 0, code = 0 '\000'}, proto = 0 '\000', recursion_level = 0 '\000', 
  vlan_id = {0, 0}, flow_hash = 49, lastts = {tv_sec = 21457497697032194, tv_usec = 0}, flow_state_sc_atomic__ = 0, use_cnt_sc_atomic__ = 0, tenant_id = 0, 
  probing_parser_toserver_alproto_masks = 319253872, probing_parser_toclient_alproto_masks = 0, flags = 0, file_flags = 0, protodetect_dp = 0, m = {__data = {__lock = 2, __count = 0, 
      __owner = 6146, __nusers = 5061499, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x13076da0}}, 
    __size = "\002\000\000\000\000\000\000\000\002\030\000\000{;M", '\000' <repeats 17 times>, "\240m\a\023\000\000\000", __align = 2}, protoctx = 0x0, protomap = 49 '1', 
  flow_end_flags = 0 '\000', alproto = 0, alproto_ts = 0, alproto_tc = 0, alproto_orig = 6146, alproto_expect = 0, de_ctx_version = 5127035, thread_id = 0, alparser = 0x0, 
  alstate = 0x13076dd0, sgh_toclient = 0x0, sgh_toserver = 0x31, flowvar = 0x4f3b7b00001802, hnext = 0x0, hprev = 0x0, fb = 0x13076e00, lnext = 0x0, lprev = 0x31, startts = {
    tv_sec = 22583397603874818, tv_usec = 0}, todstpktcnt = 0, tosrcpktcnt = 0, todstbytecnt = 319254064, tosrcbytecnt = 0}

(gdb) p f
$4 = (Flow *) 0x13076d60
(gdb) p flow_hash[33115] 
$5 = {head = 0x7f22149481e0, tail = 0x13076d60, m = {__data = {__lock = 2, __count = 0, __owner = 3990, __nusers = 1, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, 
        __next = 0x0}}, __size = "\002\000\000\000\000\000\000\000\226\017\000\000\001", '\000' <repeats 26 times>, __align = 2}, next_ts_sc_atomic__ = 0}

(gdb) p *flow_hash[33115].head
$6 = {eth_dst = "\314\032\372\347", <incomplete sequence \340>, eth_src = "\274?\217\206\265\222", src = {address = {address_un_data32 = {2381335402, 0, 0, 0}, address_un_data16 = {
        19306, 36336, 0, 0, 0, 0, 0, 0}, address_un_data8 = "jK\360\215", '\000' <repeats 11 times>}}, dst = {address = {address_un_data32 = {3363685991, 0, 0, 0}, address_un_data16 = {
        50791, 51325, 0, 0, 0, 0, 0, 0}, address_un_data8 = "g\306}\310", '\000' <repeats 11 times>}}, {sp = 22195, type = 179 '\263'}, {dp = 80, code = 80 'P'}, proto = 6 '\006', 
  recursion_level = 0 '\000', vlan_id = {0, 0}, flow_hash = 3460006235, lastts = {tv_sec = 1542018506, tv_usec = 149171}, flow_state_sc_atomic__ = 0, use_cnt_sc_atomic__ = 0, 
  tenant_id = 0, probing_parser_toserver_alproto_masks = 0, probing_parser_toclient_alproto_masks = 0, flags = 262939, file_flags = 4095, protodetect_dp = 0, m = {__data = {__lock = 0, 
      __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}, 
  protoctx = 0x0, protomap = 0 '\000', flow_end_flags = 0 '\000', alproto = 0, alproto_ts = 0, alproto_tc = 0, alproto_orig = 0, alproto_expect = 0, de_ctx_version = 99, thread_id = 14, 
  alparser = 0x0, alstate = 0x0, sgh_toclient = 0x19ef6110, sgh_toserver = 0x19e11bc0, flowvar = 0x0, hnext = 0x7f221a338230, hprev = 0x0, fb = 0x7f2a7becb700, lnext = 0x0, lprev = 0x0, 
  startts = {tv_sec = 1542018506, tv_usec = 138639}, todstpktcnt = 9, tosrcpktcnt = 8, todstbytecnt = 1027, tosrcbytecnt = 7386}

I don't know how to fix the problem

Actions #1

Updated by Victor Julien over 5 years ago

  • Assignee deleted (Victor Julien)
  • Target version deleted (4.0.6)

What version of Suricata is this? Also, it looks like it is modified? I don't recognize rte_eth_rx_burst

Actions #2

Updated by denny luo over 5 years ago

Just add dpdk to receive the packet from nic instead of pcap or af-packet, the version of suricata is 4.0.4

Actions #3

Updated by Victor Julien over 5 years ago

  • Tracker changed from Bug to Support

It's hard to say what is happening w/o being able to see how your DPDK code interacts with Suricata. Here is some kind of clue though, this flow dump looks corrupted. The values don't make sense and the mutex gives a owner pid that is not part of the process. So I suspect some kind of memory corruption is happening.

(gdb) p *f
$3 = {eth_dst = "\000\000\000\000\000", eth_src = "\000\000\061\000\000", src = {address = {address_un_data32 = {0, 6146, 4930427, 0}, address_un_data16 = {0, 0, 6146, 0, 15227, 75, 0, 
        0}, address_un_data8 = "\000\000\000\000\002\030\000\000{;K\000\000\000\000"}}, dst = {address = {address_un_data32 = {0, 0, 0, 319253824}, address_un_data16 = {0, 0, 0, 0, 0, 0, 
        27968, 4871}, address_un_data8 = '\000' <repeats 12 times>, "@m\a\023"}}, {sp = 0, type = 0 '\000'}, {dp = 0, code = 0 '\000'}, proto = 0 '\000', recursion_level = 0 '\000', 
  vlan_id = {0, 0}, flow_hash = 49, lastts = {tv_sec = 21457497697032194, tv_usec = 0}, flow_state_sc_atomic__ = 0, use_cnt_sc_atomic__ = 0, tenant_id = 0, 
  probing_parser_toserver_alproto_masks = 319253872, probing_parser_toclient_alproto_masks = 0, flags = 0, file_flags = 0, protodetect_dp = 0, m = {__data = {__lock = 2, __count = 0, 
      __owner = 6146, __nusers = 5061499, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x13076da0}}, 
    __size = "\002\000\000\000\000\000\000\000\002\030\000\000{;M", '\000' <repeats 17 times>, "\240m\a\023\000\000\000", __align = 2}, protoctx = 0x0, protomap = 49 '1', 
  flow_end_flags = 0 '\000', alproto = 0, alproto_ts = 0, alproto_tc = 0, alproto_orig = 6146, alproto_expect = 0, de_ctx_version = 5127035, thread_id = 0, alparser = 0x0, 
  alstate = 0x13076dd0, sgh_toclient = 0x0, sgh_toserver = 0x31, flowvar = 0x4f3b7b00001802, hnext = 0x0, hprev = 0x0, fb = 0x13076e00, lnext = 0x0, lprev = 0x31, startts = {
    tv_sec = 22583397603874818, tv_usec = 0}, todstpktcnt = 0, tosrcpktcnt = 0, todstbytecnt = 319254064, tosrcbytecnt = 0}
Actions #4

Updated by Victor Julien about 5 years ago

  • Status changed from New to Closed
Actions

Also available in: Atom PDF