Project

General

Profile

Actions

Support #2715

closed

deadlock between all detect thread and flow manager thread

Added by denny luo over 5 years ago. Updated about 5 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Affected Versions:
Label:

Description

Hi:
I found that all detect thread in __lll_lock_wait() to get the lock from a FlowBucket which locked in flow manager thread(thread 28),
and the flow manager thread is __lll_lock_wait to get the lock from a flow,the problem is the flow is not right flow.the address of flow looks like wrong

33   Thread 0x7f2263fff700 (LWP 3974) 0x00007f2a839ff42d in __lll_lock_wait () from /lib64/libpthread.so.0
  32   Thread 0x7f221cff9700 (LWP 3994) 0x00007f2a816e67a3 in select () from /lib64/libc.so.6
  31   Thread 0x7f221d7fa700 (LWP 3993) 0x00007f2a839fccf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  30   Thread 0x7f221dffb700 (LWP 3992) 0x00007f2a839fccf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  29   Thread 0x7f221e7fc700 (LWP 3991) 0x00007f2a839fccf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
* 28   Thread 0x7f221effd700 (LWP 3990) 0x00007f2a839ff42d in __lll_lock_wait () from /lib64/libpthread.so.0
  27   Thread 0x7f221ffff700 (LWP 3988) 0x00007f2a839ff42d in __lll_lock_wait () from /lib64/libpthread.so.0
  26   Thread 0x7f22711fd700 (LWP 3972) 0x00007f2a839ff42d in __lll_lock_wait () from /lib64/libpthread.so.0
  25   Thread 0x7f223cff9700 (LWP 3987) 0x00007f2a839ff42d in __lll_lock_wait () from /lib64/libpthread.so.0
  24   Thread 0x7f223dffb700 (LWP 3985) 0x00007f2a839ff42d in __lll_lock_wait () from /lib64/libpthread.so.0
  23   Thread 0x7f223e7fc700 (LWP 3984) 0x00007f2a839ff42d in __lll_lock_wait () from /lib64/libpthread.so.0
  22   Thread 0x7f22627fc700 (LWP 3977) 0x00007f2a839ff42d in __lll_lock_wait () from /lib64/libpthread.so.0
  21   Thread 0x7f22617fa700 (LWP 3979) 0x00007f2a839ff42d in __lll_lock_wait () from /lib64/libpthread.so.0
  20   Thread 0x7f223effd700 (LWP 3983) 0x00007f2a839ff42d in __lll_lock_wait () from /lib64/libpthread.so.0
  19   Thread 0x7f2260ff9700 (LWP 3980) 0x00007f2a839ff42d in __lll_lock_wait () from /lib64/libpthread.so.0
  18   Thread 0x7f2261ffb700 (LWP 3978) 0x00007f2a839ff42d in __lll_lock_wait () from /lib64/libpthread.so.0
  17   Thread 0x7f223d7fa700 (LWP 3986) 0x00007f2a839ff42d in __lll_lock_wait () from /lib64/libpthread.so.0
  16   Thread 0x7f223f7fe700 (LWP 3982) 0x00007f2a839ff42d in __lll_lock_wait () from /lib64/libpthread.so.0
  15   Thread 0x7f223ffff700 (LWP 3981) 0x00007f2a839ff42d in __lll_lock_wait () from /lib64/libpthread.so.0
  14   Thread 0x7f22721ff700 (LWP 3970) 0x00007f2a839ff42d in __lll_lock_wait () from /lib64/libpthread.so.0
  13   Thread 0x7f22719fe700 (LWP 3971) 0x00007f2a839ff42d in __lll_lock_wait () from /lib64/libpthread.so.0
  12   Thread 0x7f22637fe700 (LWP 3975) 0x00007f2a839ff42d in __lll_lock_wait () from /lib64/libpthread.so.0
  11   Thread 0x7f221f7fe700 (LWP 3989) 0x00007f2a839ff42d in __lll_lock_wait () from /lib64/libpthread.so.0
  10   Thread 0x7f22709fc700 (LWP 3973) 0x00007f2a839ff42d in __lll_lock_wait () from /lib64/libpthread.so.0
  9    Thread 0x7f2262ffd700 (LWP 3976) 0x00007f2a839ff42d in __lll_lock_wait () from /lib64/libpthread.so.0
  8    Thread 0x7f2a7e1b3700 (LWP 3968) 0x00007f2a816b61ad in nanosleep () from /lib64/libc.so.6
  7    Thread 0x7f2a809b8700 (LWP 3963) 0x00007f2a816ef923 in epoll_wait () from /lib64/libc.so.6
  6    Thread 0x7f2a7d9b2700 (LWP 3969) 0x000000000056d998 in rte_rdtsc () at /usr/local/include/dpdk/rte_cycles.h:103
  5    Thread 0x7f2a7e9b4700 (LWP 3967) em_main_loop (dummy=<optimized out>) at /root/DPI/suricata_4.0/l3fwd_em.c:220
  4    Thread 0x7f2a7f1b5700 (LWP 3966) 0x00000000004548f9 in rte_eth_rx_burst (nb_pkts=32, rx_pkts=0x7f2a7f1b48a0, queue_id=<optimized out>, port_id=0 '\000')
    at /usr/local/include/dpdk/rte_ethdev.h:2667
  3    Thread 0x7f2a7f9b6700 (LWP 3965) 0x00000000005fca7f in ixgbe_recv_pkts_vec ()
  2    Thread 0x7f2a801b7700 (LWP 3964) em_parse_ptype (m=<optimized out>) at /root/DPI/suricata_4.0/l3fwd_em.c:141
  1    Thread 0x7f2a84ef7b40 (LWP 3962) 0x00007f2a816b61ad in nanosleep () from /lib64/libc.so.6

you can see thread 22 and 27 both be lock_wait for fb whose mutex lock __owner's LWP is 3990(thread 28)

(gdb) thread 27
[Switching to thread 27 (Thread 0x7f221ffff700 (LWP 3988))]
#0  0x00007f2a839ff42d in __lll_lock_wait () from /lib64/libpthread.so.0
(gdb) f 3
#3  0x0000000000516507 in FlowGetFlowFromHash (tv=tv@entry=0x13d999c0, dtv=0x7f221829e250, p=p@entry=0x7f2218267f20, dest=dest@entry=0x7f2218267f58)
    at /root/DPI/suricata_4.0/flow-hash.c:479
479         FBLOCK_LOCK(fb);
(gdb) p *fb
$1 = {head = 0x7f22149481e0, tail = 0x13076d60, m = {__data = {__lock = 2, __count = 0, __owner = 3990, __nusers = 1, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, 
        __next = 0x0}}, __size = "\002\000\000\000\000\000\000\000\226\017\000\000\001", '\000' <repeats 26 times>, __align = 2}, next_ts_sc_atomic__ = 0}

(gdb) thread 22
[Switching to thread 22 (Thread 0x7f22627fc700 (LWP 3977))]
#0  0x00007f2a839ff42d in __lll_lock_wait () from /lib64/libpthread.so.0
(gdb) f 3      
#3  0x0000000000516507 in FlowGetFlowFromHash (tv=tv@entry=0x14175e30, dtv=0x7f224829e250, p=p@entry=0x7f2248267f20, dest=dest@entry=0x7f2248267f58)
    at /root/DPI/suricata_4.0/flow-hash.c:479
479         FBLOCK_LOCK(fb);
(gdb) p *fb    
$2 = {head = 0x7f22149481e0, tail = 0x13076d60, m = {__data = {__lock = 2, __count = 0, __owner = 3990, __nusers = 1, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, 
        __next = 0x0}}, __size = "\002\000\000\000\000\000\000\000\226\017\000\000\001", '\000' <repeats 26 times>, __align = 2}, next_ts_sc_atomic__ = 0}

and see this you will find that the f which in flow_bucket .tail looks like not a right flow compare the other flow

(gdb) thread 28
[Switching to thread 28 (Thread 0x7f221effd700 (LWP 3990))]
#0  0x00007f2a839ff42d in __lll_lock_wait () from /lib64/libpthread.so.0
(gdb) f 3
#3  0x000000000049c432 in FlowManagerHashRowTimeout (next_ts=<optimized out>, counters=<optimized out>, emergency=<optimized out>, ts=<optimized out>, f=0x13076d60)
    at /root/DPI/suricata_4.0/flow-manager.c:335
335             FLOWLOCK_WRLOCK(f);
(gdb) p *f
$3 = {eth_dst = "\000\000\000\000\000", eth_src = "\000\000\061\000\000", src = {address = {address_un_data32 = {0, 6146, 4930427, 0}, address_un_data16 = {0, 0, 6146, 0, 15227, 75, 0, 
        0}, address_un_data8 = "\000\000\000\000\002\030\000\000{;K\000\000\000\000"}}, dst = {address = {address_un_data32 = {0, 0, 0, 319253824}, address_un_data16 = {0, 0, 0, 0, 0, 0, 
        27968, 4871}, address_un_data8 = '\000' <repeats 12 times>, "@m\a\023"}}, {sp = 0, type = 0 '\000'}, {dp = 0, code = 0 '\000'}, proto = 0 '\000', recursion_level = 0 '\000', 
  vlan_id = {0, 0}, flow_hash = 49, lastts = {tv_sec = 21457497697032194, tv_usec = 0}, flow_state_sc_atomic__ = 0, use_cnt_sc_atomic__ = 0, tenant_id = 0, 
  probing_parser_toserver_alproto_masks = 319253872, probing_parser_toclient_alproto_masks = 0, flags = 0, file_flags = 0, protodetect_dp = 0, m = {__data = {__lock = 2, __count = 0, 
      __owner = 6146, __nusers = 5061499, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x13076da0}}, 
    __size = "\002\000\000\000\000\000\000\000\002\030\000\000{;M", '\000' <repeats 17 times>, "\240m\a\023\000\000\000", __align = 2}, protoctx = 0x0, protomap = 49 '1', 
  flow_end_flags = 0 '\000', alproto = 0, alproto_ts = 0, alproto_tc = 0, alproto_orig = 6146, alproto_expect = 0, de_ctx_version = 5127035, thread_id = 0, alparser = 0x0, 
  alstate = 0x13076dd0, sgh_toclient = 0x0, sgh_toserver = 0x31, flowvar = 0x4f3b7b00001802, hnext = 0x0, hprev = 0x0, fb = 0x13076e00, lnext = 0x0, lprev = 0x31, startts = {
    tv_sec = 22583397603874818, tv_usec = 0}, todstpktcnt = 0, tosrcpktcnt = 0, todstbytecnt = 319254064, tosrcbytecnt = 0}

(gdb) p f
$4 = (Flow *) 0x13076d60
(gdb) p flow_hash[33115] 
$5 = {head = 0x7f22149481e0, tail = 0x13076d60, m = {__data = {__lock = 2, __count = 0, __owner = 3990, __nusers = 1, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, 
        __next = 0x0}}, __size = "\002\000\000\000\000\000\000\000\226\017\000\000\001", '\000' <repeats 26 times>, __align = 2}, next_ts_sc_atomic__ = 0}

(gdb) p *flow_hash[33115].head
$6 = {eth_dst = "\314\032\372\347", <incomplete sequence \340>, eth_src = "\274?\217\206\265\222", src = {address = {address_un_data32 = {2381335402, 0, 0, 0}, address_un_data16 = {
        19306, 36336, 0, 0, 0, 0, 0, 0}, address_un_data8 = "jK\360\215", '\000' <repeats 11 times>}}, dst = {address = {address_un_data32 = {3363685991, 0, 0, 0}, address_un_data16 = {
        50791, 51325, 0, 0, 0, 0, 0, 0}, address_un_data8 = "g\306}\310", '\000' <repeats 11 times>}}, {sp = 22195, type = 179 '\263'}, {dp = 80, code = 80 'P'}, proto = 6 '\006', 
  recursion_level = 0 '\000', vlan_id = {0, 0}, flow_hash = 3460006235, lastts = {tv_sec = 1542018506, tv_usec = 149171}, flow_state_sc_atomic__ = 0, use_cnt_sc_atomic__ = 0, 
  tenant_id = 0, probing_parser_toserver_alproto_masks = 0, probing_parser_toclient_alproto_masks = 0, flags = 262939, file_flags = 4095, protodetect_dp = 0, m = {__data = {__lock = 0, 
      __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}, 
  protoctx = 0x0, protomap = 0 '\000', flow_end_flags = 0 '\000', alproto = 0, alproto_ts = 0, alproto_tc = 0, alproto_orig = 0, alproto_expect = 0, de_ctx_version = 99, thread_id = 14, 
  alparser = 0x0, alstate = 0x0, sgh_toclient = 0x19ef6110, sgh_toserver = 0x19e11bc0, flowvar = 0x0, hnext = 0x7f221a338230, hprev = 0x0, fb = 0x7f2a7becb700, lnext = 0x0, lprev = 0x0, 
  startts = {tv_sec = 1542018506, tv_usec = 138639}, todstpktcnt = 9, tosrcpktcnt = 8, todstbytecnt = 1027, tosrcbytecnt = 7386}

I don't know how to fix the problem

Actions

Also available in: Atom PDF