https://redmine.openinfosecfoundation.org/https://redmine.openinfosecfoundation.org/favicon.ico?17011170022013-11-15T07:34:14ZOpen Information Security FoundationSuricata - Optimization #1039: Packetpool should be a stackhttps://redmine.openinfosecfoundation.org/issues/1039?journal_id=37722013-11-15T07:34:14ZKen Steeleken@tilera.com
<ul></ul><p>The code is in tmqh-packetpool.c</p> Suricata - Optimization #1039: Packetpool should be a stackhttps://redmine.openinfosecfoundation.org/issues/1039?journal_id=38582013-12-04T02:26:35ZVictor Julienvictor@inliniac.net
<ul><li><strong>Target version</strong> set to <i>3.0RC2</i></li></ul> Suricata - Optimization #1039: Packetpool should be a stackhttps://redmine.openinfosecfoundation.org/issues/1039?journal_id=39522014-01-10T00:45:21ZSong Liuvan20052005@hotmail.com
<ul><li><strong>Assignee</strong> set to <i>Song Liu</i></li></ul> Suricata - Optimization #1039: Packetpool should be a stackhttps://redmine.openinfosecfoundation.org/issues/1039?journal_id=40742014-02-20T10:04:54ZKen Steeleken@tilera.com
<ul></ul><p>I would recommend having a per thread free stack of packets that is only accessed by the thread, thus not needing a mutex. To allow threads to free a Packet back to the free stack on another thread, use a second "return-stack", protected by a mutex. When the thread's local free stack is empty, it can lock the return-stack and move all the packets to its local free stack.</p>
<p>This requires that each Packet record the thread on which is was allocated, but that can be stored in one byte for up to 256 threads.</p> Suricata - Optimization #1039: Packetpool should be a stackhttps://redmine.openinfosecfoundation.org/issues/1039?journal_id=40752014-02-20T10:44:44ZVictor Julienvictor@inliniac.net
<ul></ul><p>I agree Ken. There is one common use case to consider, the autofp runmodes. In this case the packet will almost certainly be freed by another thread than the one that alloc'd it.</p> Suricata - Optimization #1039: Packetpool should be a stackhttps://redmine.openinfosecfoundation.org/issues/1039?journal_id=40762014-02-20T11:05:33ZVictor Julienvictor@inliniac.net
<ul></ul><p>Btw, a while ago I played with this code: <a class="external" href="https://github.com/inliniac/suricata/pull/845">https://github.com/inliniac/suricata/pull/845</a>, at the time I was investigating slowdowns. It seemed like we could experience 'pseudo packet storms', where the packet processing virtually stopped. The goal of this queue experiment was not to reduce locking, but reduce lock contention. IIRC it worked well. Might consider an approach like this for the 'return stack', so that we'll never get serious contention there.</p> Suricata - Optimization #1039: Packetpool should be a stackhttps://redmine.openinfosecfoundation.org/issues/1039?journal_id=40772014-02-21T20:22:15ZAnoop Saldanhaanoopsaldanha@gmail.com
<ul></ul><p>If we are planning ton use the LIFO approach, for cuda we might need another "still-in-use" kind of return stack. In cuda once I send the packet over to the gpu, on the cpu side I might not need the results from the gpu and pass the packet back to the packetpool, despite the gpu holding a reference to this packet. If we reuse this packet from the packetpool, inside the decoder we would wait till the gpu frees this packet up.</p>
<p>Cuda now only works with autofp, so be default we would end up using the return-stack, but the thread might need to check for the "in-use-by-gpu" flag on the packet before transfering it back to its free stack pool or maybe the thread is ready to take a gpu wait hit in decoder, and move all of them back to its "free" packet pool. Either ways assigning a sufficiently huge no in the free stack would give the gpu enough time to free the packet up.</p>
<p>An additional advantage I see with LIFO is we won't be constrained by 65k packets we are constrained now, again keeping cuda in mind. We can provide additional queue types to support > 65k packets, but LIFO seems easier.</p> Suricata - Optimization #1039: Packetpool should be a stackhttps://redmine.openinfosecfoundation.org/issues/1039?journal_id=40782014-02-24T07:30:12ZVictor Julienvictor@inliniac.net
<ul></ul><p>That sounds like an architecture problem in the CUDA code then. We shouldn't be putting packets back into the pool if they are still referenced elsewhere. Think we can exclude this from the general packet stack discussion and need to address it separately.</p> Suricata - Optimization #1039: Packetpool should be a stackhttps://redmine.openinfosecfoundation.org/issues/1039?journal_id=40792014-02-24T20:39:59ZAnoop Saldanhaanoopsaldanha@gmail.com
<ul></ul><p>Right, the cuda-packet-return issue lies outside the packetpool.</p>
<p>From cuda perspective though, the advantage with LIFO packetpool is that it's much easier to have more than 65k packets per packetpool, than use other methods like multiple queues.</p> Suricata - Optimization #1039: Packetpool should be a stackhttps://redmine.openinfosecfoundation.org/issues/1039?journal_id=40922014-02-27T00:11:20ZSong Liuvan20052005@hotmail.com
<ul></ul><p>In worker mode(or single mode), even the return-stack does not need a mutex. Actually return-stack is not necessary in worker mode, as only one thread to handle from the beginning to end. Therefore the question comes down to whether we should handle this based on each mode, or use two per-thread-stacks for all modes?</p>
<p>But one byte for up to 256 threads might not be enough. Tilera already supported up to 288 cores, and I bet it will support more later.</p> Suricata - Optimization #1039: Packetpool should be a stackhttps://redmine.openinfosecfoundation.org/issues/1039?journal_id=40932014-02-27T09:15:17ZPeter Manevpetermanev@gmail.com
<ul></ul><p>I also think 256 threads might not be enough.<br />Is it a lot of effort to redesign (increase) that number?</p> Suricata - Optimization #1039: Packetpool should be a stackhttps://redmine.openinfosecfoundation.org/issues/1039?journal_id=41962014-03-31T20:21:33ZKen Steeleken@tilera.com
<ul><li><strong>Assignee</strong> changed from <i>Song Liu</i> to <i>Ken Steele</i></li><li><strong>% Done</strong> changed from <i>0</i> to <i>90</i></li><li><strong>Estimated time</strong> set to <i>8.00 h</i></li></ul><p>Fixed in Pull 913 (<a class="external" href="https://github.com/inliniac/suricata/pull/913">https://github.com/inliniac/suricata/pull/913</a>).</p> Suricata - Optimization #1039: Packetpool should be a stackhttps://redmine.openinfosecfoundation.org/issues/1039?journal_id=41972014-03-31T20:22:52ZKen Steeleken@tilera.com
<ul></ul><p>Instead of using an index byte or short, which would have limited the number of stacks. The Packet has a pointer to the stack, which then allows any thread, even one without its own PacketPool, to free packets.</p> Suricata - Optimization #1039: Packetpool should be a stackhttps://redmine.openinfosecfoundation.org/issues/1039?journal_id=44802014-07-28T08:46:42ZVictor Julienvictor@inliniac.net
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>Closed</i></li><li><strong>% Done</strong> changed from <i>90</i> to <i>100</i></li></ul><p>Implemented through <a class="external" href="https://github.com/inliniac/suricata/pull/1053">https://github.com/inliniac/suricata/pull/1053</a>, with some additional fixes through <a class="external" href="https://github.com/inliniac/suricata/pull/1057">https://github.com/inliniac/suricata/pull/1057</a></p> Suricata - Optimization #1039: Packetpool should be a stackhttps://redmine.openinfosecfoundation.org/issues/1039?journal_id=45112014-08-06T07:23:42ZVictor Julienvictor@inliniac.net
<ul><li><strong>Target version</strong> changed from <i>3.0RC2</i> to <i>2.1beta1</i></li></ul>