Quantcast

Multiqueue support for bpf

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Multiqueue support for bpf

syuu1228
Hi all,

I implemented multiqueue support for bpf, I'd like to present for review.
This is a Google Summer of Code project, the project goal is to
support multiqueue network interface on BPF, and provide interfaces
for multithreaded packet processing using BPF.
Modern high performance NICs have multiple receive/send queues and RSS
feature, this allows to process packet concurrently on multiple
processors.
Main purpose of the project is to support these hardware and get
benefit of parallelism.

This provides following new APIs:
- queue filter for each bpf descriptor (bpf ioctl)
    - BIOCENAQMASK    Enables multiqueue filter on the descriptor
    - BIOCDISQMASK    Disables multiqueue filter on the descriptor
    - BIOCSTRXQMASK    Set mask bit on specified RX queue
    - BIOCCRRXQMASK    Clear mask bit on specified RX queue
    - BIOCGTRXQMASK    Get mask bit on specified RX queue
    - BIOCSTTXQMASK    Set mask bit on specified TX queue
    - BIOCCRTXQMASK    Clear mask bit on specified TX queue
    - BIOCGTTXQMASK    Get mask bit on specified TX queue
    - BIOCSTOTHERMASK    Set mask bit for the packets which not tied
with any queues
    - BIOCCROTHERMASK    Clear mask bit for the packets which not tied
with any queues
    - BIOCGTOTHERMASK    Get mask bit for the packets which not tied
with any queues

- generic interface for getting hardware queue information from NIC
driver (socket ioctl)
    - SIOCGIFQLEN    Get interface RX/TX queue length
    - SIOCGIFRXQAFFINITY    Get interface RX queue affinity
    - SIOCGIFTXQAFFINITY    Get interface TX queue affinity

Patch for -CURRENT is here, right now it only supports igb(4),
ixgbe(4), mxge(4):
http://www.dokukino.com/mq_bpf_20110813.diff

And below is performance benchmark:

====
I implemented benchmark programs based on
bpfnull(//depot/projects/zcopybpf/utils/bpfnull/),

test_sqbpf measures bpf throughput on one thread, without using multiqueue APIs.
http://p4db.freebsd.org/fileViewer.cgi?FSPC=//depot/projects/soc2011/mq_bpf/src/tools/regression/bpf/mq_bpf/test_sqbpf/test_sqbpf.c

test_mqbpf is multithreaded version of test_sqbpf, using multiqueue APIs.
http://p4db.freebsd.org/fileViewer.cgi?FSPC=//depot/projects/soc2011/mq_bpf/src/tools/regression/bpf/mq_bpf/test_mqbpf/test_mqbpf.c

I benchmarked with six conditions:
 - benchmark1 only reads bpf, doesn't write packet anywhere
 - benchmark2 writes packet on memory(mfs)
 - benchmark3 writes packet on hdd(zfs)
 - benchmark4 only reads bpf, doesn't write packet anywhere, with zerocopy
 - benchmark5 writes packet on memory(mfs), with zerocopy
 - benchmark6 writes packet on hdd(zfs), with zerocopy

>From benchmark result, I can say the performance is increased using
mq_bpf on 10GbE, but not on GbE.

* Throughput benchmark
- Test environment
 - FreeBSD node
   CPU: Core i7 X980 (12 threads)
   MB: ASUS P6X58D Premium(Intel X58)
   NIC1: Intel Gigabit ET Dual Port Server Adapter(82576)
   NIC2: Intel Ethernet X520-DA2 Server Adapter(82599)
 - Linux node
   CPU: Core 2 Quad (4 threads)
   MB: GIGABYTE GA-G33-DS3R(Intel G33)
   NIC1: Intel Gigabit ET Dual Port Server Adapter(82576)
   NIC2: Intel Ethernet X520-DA2 Server Adapter(82599)

iperf used for generate network traffic, with following argument options
   - Linux node: iperf -c [IP] -i 10 -t 100000 -P12
   - FreeBSD node: iperf -s
   # 12 threads, TCP

following sysctl parameter is changed
   sysctl -w net.bpf.maxbufsize=1048576

- Benchmark1
Benchmark1 doesn't write packet anywhere using following commands
./test_sqbpf -i [interface] -b 1048576
./test_mqbpf -i [interface] -b 1048576
   - ixgbe
       test_mqbpf: 5303.09007533333 Mbps
       test_sqbpf: 3959.83021733333 Mbps
   - igb
       test_mqbpf: 916.752133333333 Mbps
       test_sqbpf: 917.597079 Mbps

- Benchmark2
Benchmark2 write packet on mfs using following commands
mdmfs -s 10G md /mnt
./test_sqbpf -i [interface] -b 1048576 -w -f /mnt/test
./test_mqbpf -i [interface] -b 1048576 -w -f /mnt/test
   - ixgbe
       test_mqbpf: 1061.24890333333 Mbps
       test_sqbpf: 204.779881 Mbps
   - igb
       test_mqbpf: 916.656664666667 Mbps
       test_sqbpf: 914.378636 Mbps

- Benchmark3
Benchmark3 write packet on zfs(on HDD) using following commands
./test_sqbpf -i [interface] -b 1048576 -w -f test
./test_mqbpf -i [interface] -b 1048576 -w -f test
   - ixgbe
       test_mqbpf: 119.912253333333 Mbps
       test_sqbpf: 101.195918 Mbps
   - igb
       test_mqbpf: 228.910355333333 Mbps
       test_sqbpf: 199.639093666667 Mbps

- Benchmark4
Benchmark4 doesn't write packet anywhere using following commands, with zerocopy
./test_sqbpf -i [interface] -b 1048576
./test_mqbpf -i [interface] -b 1048576
   - ixgbe
       test_mqbpf: 4772.924974 Mbps
       test_sqbpf: 3173.19967133333 Mbps
   - igb
       test_mqbpf: 931.217345 Mbps
       test_sqbpf: 925.965270666667 Mbps

- Benchmark5
Benchmark5 write packet on mfs using following commands, with zerocopy
mdmfs -s 10G md /mnt
./test_sqbpf -i [interface] -b 1048576 -w -f /mnt/test
./test_mqbpf -i [interface] -b 1048576 -w -f /mnt/test
   - ixgbe
       test_mqbpf: 306.902822333333 Mbps
       test_sqbpf: 317.605016666667 Mbps
   - igb
       test_mqbpf: 729.075349666667 Mbps
       test_sqbpf: 708.987822666667 Mbps

- Benchmark6
Benchmark6 write packet on zfs(on HDD) using following commands, with zerocopy
./test_sqbpf -i [interface] -b 1048576 -w -f test
./test_mqbpf -i [interface] -b 1048576 -w -f test
   - ixgbe
       test_mqbpf: 174.016136666667 Mbps
       test_sqbpf: 138.068732666667 Mbps
   - igb
       test_mqbpf: 228.794880333333 Mbps
       test_sqbpf: 229.367386333333 Mbps
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Multiqueue support for bpf

Vlad GALU-2
On Aug 16, 2011, at 11:13 AM, Takuya ASADA wrote:

> Hi all,
>
> I implemented multiqueue support for bpf, I'd like to present for review.
> This is a Google Summer of Code project, the project goal is to
> support multiqueue network interface on BPF, and provide interfaces
> for multithreaded packet processing using BPF.
> Modern high performance NICs have multiple receive/send queues and RSS
> feature, this allows to process packet concurrently on multiple
> processors.
> Main purpose of the project is to support these hardware and get
> benefit of parallelism.
>
> This provides following new APIs:
> - queue filter for each bpf descriptor (bpf ioctl)
>    - BIOCENAQMASK    Enables multiqueue filter on the descriptor
>    - BIOCDISQMASK    Disables multiqueue filter on the descriptor
>    - BIOCSTRXQMASK    Set mask bit on specified RX queue
>    - BIOCCRRXQMASK    Clear mask bit on specified RX queue
>    - BIOCGTRXQMASK    Get mask bit on specified RX queue
>    - BIOCSTTXQMASK    Set mask bit on specified TX queue
>    - BIOCCRTXQMASK    Clear mask bit on specified TX queue
>    - BIOCGTTXQMASK    Get mask bit on specified TX queue
>    - BIOCSTOTHERMASK    Set mask bit for the packets which not tied
> with any queues
>    - BIOCCROTHERMASK    Clear mask bit for the packets which not tied
> with any queues
>    - BIOCGTOTHERMASK    Get mask bit for the packets which not tied
> with any queues
>
> - generic interface for getting hardware queue information from NIC
> driver (socket ioctl)
>    - SIOCGIFQLEN    Get interface RX/TX queue length
>    - SIOCGIFRXQAFFINITY    Get interface RX queue affinity
>    - SIOCGIFTXQAFFINITY    Get interface TX queue affinity
>
> Patch for -CURRENT is here, right now it only supports igb(4),
> ixgbe(4), mxge(4):
> http://www.dokukino.com/mq_bpf_20110813.diff
>
> And below is performance benchmark:
>
> ====
> I implemented benchmark programs based on
> bpfnull(//depot/projects/zcopybpf/utils/bpfnull/),
>
> test_sqbpf measures bpf throughput on one thread, without using multiqueue APIs.
> http://p4db.freebsd.org/fileViewer.cgi?FSPC=//depot/projects/soc2011/mq_bpf/src/tools/regression/bpf/mq_bpf/test_sqbpf/test_sqbpf.c
>
> test_mqbpf is multithreaded version of test_sqbpf, using multiqueue APIs.
> http://p4db.freebsd.org/fileViewer.cgi?FSPC=//depot/projects/soc2011/mq_bpf/src/tools/regression/bpf/mq_bpf/test_mqbpf/test_mqbpf.c
>
> I benchmarked with six conditions:
> - benchmark1 only reads bpf, doesn't write packet anywhere
> - benchmark2 writes packet on memory(mfs)
> - benchmark3 writes packet on hdd(zfs)
> - benchmark4 only reads bpf, doesn't write packet anywhere, with zerocopy
> - benchmark5 writes packet on memory(mfs), with zerocopy
> - benchmark6 writes packet on hdd(zfs), with zerocopy
>
>> From benchmark result, I can say the performance is increased using
> mq_bpf on 10GbE, but not on GbE.
>
> * Throughput benchmark
> - Test environment
> - FreeBSD node
>   CPU: Core i7 X980 (12 threads)
>   MB: ASUS P6X58D Premium(Intel X58)
>   NIC1: Intel Gigabit ET Dual Port Server Adapter(82576)
>   NIC2: Intel Ethernet X520-DA2 Server Adapter(82599)
> - Linux node
>   CPU: Core 2 Quad (4 threads)
>   MB: GIGABYTE GA-G33-DS3R(Intel G33)
>   NIC1: Intel Gigabit ET Dual Port Server Adapter(82576)
>   NIC2: Intel Ethernet X520-DA2 Server Adapter(82599)
>
> iperf used for generate network traffic, with following argument options
>   - Linux node: iperf -c [IP] -i 10 -t 100000 -P12
>   - FreeBSD node: iperf -s
>   # 12 threads, TCP
>
> following sysctl parameter is changed
>   sysctl -w net.bpf.maxbufsize=1048576


Thank you for your work! You may want to increase that (4x/8x) and rerun the test, though._______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Multiqueue support for bpf

Vlad GALU-2
On Aug 16, 2011, at 11:50 AM, Vlad Galu wrote:

> On Aug 16, 2011, at 11:13 AM, Takuya ASADA wrote:
>> Hi all,
>>
>> I implemented multiqueue support for bpf, I'd like to present for review.
>> This is a Google Summer of Code project, the project goal is to
>> support multiqueue network interface on BPF, and provide interfaces
>> for multithreaded packet processing using BPF.
>> Modern high performance NICs have multiple receive/send queues and RSS
>> feature, this allows to process packet concurrently on multiple
>> processors.
>> Main purpose of the project is to support these hardware and get
>> benefit of parallelism.
>>
>> This provides following new APIs:
>> - queue filter for each bpf descriptor (bpf ioctl)
>>   - BIOCENAQMASK    Enables multiqueue filter on the descriptor
>>   - BIOCDISQMASK    Disables multiqueue filter on the descriptor
>>   - BIOCSTRXQMASK    Set mask bit on specified RX queue
>>   - BIOCCRRXQMASK    Clear mask bit on specified RX queue
>>   - BIOCGTRXQMASK    Get mask bit on specified RX queue
>>   - BIOCSTTXQMASK    Set mask bit on specified TX queue
>>   - BIOCCRTXQMASK    Clear mask bit on specified TX queue
>>   - BIOCGTTXQMASK    Get mask bit on specified TX queue
>>   - BIOCSTOTHERMASK    Set mask bit for the packets which not tied
>> with any queues
>>   - BIOCCROTHERMASK    Clear mask bit for the packets which not tied
>> with any queues
>>   - BIOCGTOTHERMASK    Get mask bit for the packets which not tied
>> with any queues
>>
>> - generic interface for getting hardware queue information from NIC
>> driver (socket ioctl)
>>   - SIOCGIFQLEN    Get interface RX/TX queue length
>>   - SIOCGIFRXQAFFINITY    Get interface RX queue affinity
>>   - SIOCGIFTXQAFFINITY    Get interface TX queue affinity
>>
>> Patch for -CURRENT is here, right now it only supports igb(4),
>> ixgbe(4), mxge(4):
>> http://www.dokukino.com/mq_bpf_20110813.diff
>>
>> And below is performance benchmark:
>>
>> ====
>> I implemented benchmark programs based on
>> bpfnull(//depot/projects/zcopybpf/utils/bpfnull/),
>>
>> test_sqbpf measures bpf throughput on one thread, without using multiqueue APIs.
>> http://p4db.freebsd.org/fileViewer.cgi?FSPC=//depot/projects/soc2011/mq_bpf/src/tools/regression/bpf/mq_bpf/test_sqbpf/test_sqbpf.c
>>
>> test_mqbpf is multithreaded version of test_sqbpf, using multiqueue APIs.
>> http://p4db.freebsd.org/fileViewer.cgi?FSPC=//depot/projects/soc2011/mq_bpf/src/tools/regression/bpf/mq_bpf/test_mqbpf/test_mqbpf.c
>>
>> I benchmarked with six conditions:
>> - benchmark1 only reads bpf, doesn't write packet anywhere
>> - benchmark2 writes packet on memory(mfs)
>> - benchmark3 writes packet on hdd(zfs)
>> - benchmark4 only reads bpf, doesn't write packet anywhere, with zerocopy
>> - benchmark5 writes packet on memory(mfs), with zerocopy
>> - benchmark6 writes packet on hdd(zfs), with zerocopy
>>
>>> From benchmark result, I can say the performance is increased using
>> mq_bpf on 10GbE, but not on GbE.
>>
>> * Throughput benchmark
>> - Test environment
>> - FreeBSD node
>>  CPU: Core i7 X980 (12 threads)
>>  MB: ASUS P6X58D Premium(Intel X58)
>>  NIC1: Intel Gigabit ET Dual Port Server Adapter(82576)
>>  NIC2: Intel Ethernet X520-DA2 Server Adapter(82599)
>> - Linux node
>>  CPU: Core 2 Quad (4 threads)
>>  MB: GIGABYTE GA-G33-DS3R(Intel G33)
>>  NIC1: Intel Gigabit ET Dual Port Server Adapter(82576)
>>  NIC2: Intel Ethernet X520-DA2 Server Adapter(82599)
>>
>> iperf used for generate network traffic, with following argument options
>>  - Linux node: iperf -c [IP] -i 10 -t 100000 -P12
>>  - FreeBSD node: iperf -s
>>  # 12 threads, TCP
>>
>> following sysctl parameter is changed
>>  sysctl -w net.bpf.maxbufsize=1048576
>
>
> Thank you for your work! You may want to increase that (4x/8x) and rerun the test, though.

More, actually. Your current buffer is easily filled._______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Multiqueue support for bpf

syuu1228
2011/8/16 Vlad Galu <[hidden email]>:

> On Aug 16, 2011, at 11:50 AM, Vlad Galu wrote:
>> On Aug 16, 2011, at 11:13 AM, Takuya ASADA wrote:
>>> Hi all,
>>>
>>> I implemented multiqueue support for bpf, I'd like to present for review.
>>> This is a Google Summer of Code project, the project goal is to
>>> support multiqueue network interface on BPF, and provide interfaces
>>> for multithreaded packet processing using BPF.
>>> Modern high performance NICs have multiple receive/send queues and RSS
>>> feature, this allows to process packet concurrently on multiple
>>> processors.
>>> Main purpose of the project is to support these hardware and get
>>> benefit of parallelism.
>>>
>>> This provides following new APIs:
>>> - queue filter for each bpf descriptor (bpf ioctl)
>>>   - BIOCENAQMASK    Enables multiqueue filter on the descriptor
>>>   - BIOCDISQMASK    Disables multiqueue filter on the descriptor
>>>   - BIOCSTRXQMASK    Set mask bit on specified RX queue
>>>   - BIOCCRRXQMASK    Clear mask bit on specified RX queue
>>>   - BIOCGTRXQMASK    Get mask bit on specified RX queue
>>>   - BIOCSTTXQMASK    Set mask bit on specified TX queue
>>>   - BIOCCRTXQMASK    Clear mask bit on specified TX queue
>>>   - BIOCGTTXQMASK    Get mask bit on specified TX queue
>>>   - BIOCSTOTHERMASK    Set mask bit for the packets which not tied
>>> with any queues
>>>   - BIOCCROTHERMASK    Clear mask bit for the packets which not tied
>>> with any queues
>>>   - BIOCGTOTHERMASK    Get mask bit for the packets which not tied
>>> with any queues
>>>
>>> - generic interface for getting hardware queue information from NIC
>>> driver (socket ioctl)
>>>   - SIOCGIFQLEN    Get interface RX/TX queue length
>>>   - SIOCGIFRXQAFFINITY    Get interface RX queue affinity
>>>   - SIOCGIFTXQAFFINITY    Get interface TX queue affinity
>>>
>>> Patch for -CURRENT is here, right now it only supports igb(4),
>>> ixgbe(4), mxge(4):
>>> http://www.dokukino.com/mq_bpf_20110813.diff
>>>
>>> And below is performance benchmark:
>>>
>>> ====
>>> I implemented benchmark programs based on
>>> bpfnull(//depot/projects/zcopybpf/utils/bpfnull/),
>>>
>>> test_sqbpf measures bpf throughput on one thread, without using multiqueue APIs.
>>> http://p4db.freebsd.org/fileViewer.cgi?FSPC=//depot/projects/soc2011/mq_bpf/src/tools/regression/bpf/mq_bpf/test_sqbpf/test_sqbpf.c
>>>
>>> test_mqbpf is multithreaded version of test_sqbpf, using multiqueue APIs.
>>> http://p4db.freebsd.org/fileViewer.cgi?FSPC=//depot/projects/soc2011/mq_bpf/src/tools/regression/bpf/mq_bpf/test_mqbpf/test_mqbpf.c
>>>
>>> I benchmarked with six conditions:
>>> - benchmark1 only reads bpf, doesn't write packet anywhere
>>> - benchmark2 writes packet on memory(mfs)
>>> - benchmark3 writes packet on hdd(zfs)
>>> - benchmark4 only reads bpf, doesn't write packet anywhere, with zerocopy
>>> - benchmark5 writes packet on memory(mfs), with zerocopy
>>> - benchmark6 writes packet on hdd(zfs), with zerocopy
>>>
>>>> From benchmark result, I can say the performance is increased using
>>> mq_bpf on 10GbE, but not on GbE.
>>>
>>> * Throughput benchmark
>>> - Test environment
>>> - FreeBSD node
>>>  CPU: Core i7 X980 (12 threads)
>>>  MB: ASUS P6X58D Premium(Intel X58)
>>>  NIC1: Intel Gigabit ET Dual Port Server Adapter(82576)
>>>  NIC2: Intel Ethernet X520-DA2 Server Adapter(82599)
>>> - Linux node
>>>  CPU: Core 2 Quad (4 threads)
>>>  MB: GIGABYTE GA-G33-DS3R(Intel G33)
>>>  NIC1: Intel Gigabit ET Dual Port Server Adapter(82576)
>>>  NIC2: Intel Ethernet X520-DA2 Server Adapter(82599)
>>>
>>> iperf used for generate network traffic, with following argument options
>>>  - Linux node: iperf -c [IP] -i 10 -t 100000 -P12
>>>  - FreeBSD node: iperf -s
>>>  # 12 threads, TCP
>>>
>>> following sysctl parameter is changed
>>>  sysctl -w net.bpf.maxbufsize=1048576
>>
>>
>> Thank you for your work! You may want to increase that (4x/8x) and rerun the test, though.
>
> More, actually. Your current buffer is easily filled.

Hi,

I measured performance again with maxbufsize = 268435456 and multiple
cpu configurations, here's an result.
It seems the performance on 10GbE is bit unstable, not scaling
linearly by adding cpus/queues.
Maybe it depends some sort of system parameter, but I don't figure out
the answer.

Multithreaded BPF performance is increasing than single thread BPF in
all case, anyway.

* Test environment
 - FreeBSD node
  CPU: Core i7 X980 (12 threads)
  # Tested on 1 core, 2 core, 4 core and 6 core configuration (Each
core has 2 threads using HT)
  MB: ASUS P6X58D Premium(Intel X58)
  NIC: Intel Ethernet X520-DA2 Server Adapter(82599)

 - Linux node
  CPU: Core 2 Quad (4 threads)
  MB: GIGABYTE GA-G33-DS3R(Intel G33)
  NIC: Intel Ethernet X520-DA2 Server Adapter(82599)

 - iperf
   Linux node: iperf -c [IP] -i 10 -t 100000 -P16
   FreeBSD node: iperf -s
   # 16 threads, TCP
 - system parameter
   net.bpf.maxbufsize=268435456
   hw.ixgbe.num_queues=[n queues]

* 2threads, 2queues
 - iperf throughput
   iperf only: 8.845Gbps
   test_mqbpf: 5.78Gbps
   test_sqbpf: 6.89Gbps
 - test program throughput
   test_mqbpf: 4526.863414 Mbps
   test_sqbpf: 762.452475 Mbps
 - received/dropped
   test_mqbpf:
      45315011 packets received (BPF)
      9646958 packets dropped (BPF)
   test_sqbpf:
      56216145 packets received (BPF)
      49765127 packets dropped (BPF)

* 4threads, 4queues
 - iperf throughput
   iperf only: 3.03Gbps
   test_mqbpf: 2.49Gbps
   test_sqbpf: 2.57Gbps
 - test program throughput
   test_mqbpf: 2420.195051 Mbps
   test_sqbpf: 430.774870 Mbps
 - received/dropped
   test_mqbpf:
      19601503 packets received (BPF)
      0 packets dropped (BPF)
   test_sqbpf:
      22803778 packets received (BPF)
      18869653 packets dropped (BPF)

* 8threads, 8queues
 - iperf throughput
   iperf only: 5.80Gbps
   test_mqbpf: 4.42Gbps
   test_sqbpf: 4.30Gbps
 - test program throughput
   test_mqbpf: 4242.314913 Mbps
   test_sqbpf: 1291.719866 Mbps
 - received/dropped
   test_mqbpf:
      34996953 packets received (BPF)
      361947 packets dropped (BPF)
   test_sqbpf:
      35738058 packets received (BPF)
      24749546 packets dropped (BPF)

* 12threads, 12queues
 - iperf throughput
   iperf only: 9.31Gbps
   test_mqbpf: 8.06Gbps
   test_sqbpf: 5.67Gbps
 - test program throughput
   test_mqbpf: 8089.242472 Mbps
   test_sqbpf: 5754.910665 Mbps
 - received/dropped
   test_mqbpf:
      73783957 packets received (BPF)
      9938 packets dropped (BPF)
   test_sqbpf:
      49434479 packets received (BPF)
      0 packets dropped (BPF)
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Multiqueue support for bpf

syuu1228
Any comments or suggestions?

2011/8/18 Takuya ASADA <[hidden email]>:

> 2011/8/16 Vlad Galu <[hidden email]>:
>> On Aug 16, 2011, at 11:50 AM, Vlad Galu wrote:
>>> On Aug 16, 2011, at 11:13 AM, Takuya ASADA wrote:
>>>> Hi all,
>>>>
>>>> I implemented multiqueue support for bpf, I'd like to present for review.
>>>> This is a Google Summer of Code project, the project goal is to
>>>> support multiqueue network interface on BPF, and provide interfaces
>>>> for multithreaded packet processing using BPF.
>>>> Modern high performance NICs have multiple receive/send queues and RSS
>>>> feature, this allows to process packet concurrently on multiple
>>>> processors.
>>>> Main purpose of the project is to support these hardware and get
>>>> benefit of parallelism.
>>>>
>>>> This provides following new APIs:
>>>> - queue filter for each bpf descriptor (bpf ioctl)
>>>>   - BIOCENAQMASK    Enables multiqueue filter on the descriptor
>>>>   - BIOCDISQMASK    Disables multiqueue filter on the descriptor
>>>>   - BIOCSTRXQMASK    Set mask bit on specified RX queue
>>>>   - BIOCCRRXQMASK    Clear mask bit on specified RX queue
>>>>   - BIOCGTRXQMASK    Get mask bit on specified RX queue
>>>>   - BIOCSTTXQMASK    Set mask bit on specified TX queue
>>>>   - BIOCCRTXQMASK    Clear mask bit on specified TX queue
>>>>   - BIOCGTTXQMASK    Get mask bit on specified TX queue
>>>>   - BIOCSTOTHERMASK    Set mask bit for the packets which not tied
>>>> with any queues
>>>>   - BIOCCROTHERMASK    Clear mask bit for the packets which not tied
>>>> with any queues
>>>>   - BIOCGTOTHERMASK    Get mask bit for the packets which not tied
>>>> with any queues
>>>>
>>>> - generic interface for getting hardware queue information from NIC
>>>> driver (socket ioctl)
>>>>   - SIOCGIFQLEN    Get interface RX/TX queue length
>>>>   - SIOCGIFRXQAFFINITY    Get interface RX queue affinity
>>>>   - SIOCGIFTXQAFFINITY    Get interface TX queue affinity
>>>>
>>>> Patch for -CURRENT is here, right now it only supports igb(4),
>>>> ixgbe(4), mxge(4):
>>>> http://www.dokukino.com/mq_bpf_20110813.diff
>>>>
>>>> And below is performance benchmark:
>>>>
>>>> ====
>>>> I implemented benchmark programs based on
>>>> bpfnull(//depot/projects/zcopybpf/utils/bpfnull/),
>>>>
>>>> test_sqbpf measures bpf throughput on one thread, without using multiqueue APIs.
>>>> http://p4db.freebsd.org/fileViewer.cgi?FSPC=//depot/projects/soc2011/mq_bpf/src/tools/regression/bpf/mq_bpf/test_sqbpf/test_sqbpf.c
>>>>
>>>> test_mqbpf is multithreaded version of test_sqbpf, using multiqueue APIs.
>>>> http://p4db.freebsd.org/fileViewer.cgi?FSPC=//depot/projects/soc2011/mq_bpf/src/tools/regression/bpf/mq_bpf/test_mqbpf/test_mqbpf.c
>>>>
>>>> I benchmarked with six conditions:
>>>> - benchmark1 only reads bpf, doesn't write packet anywhere
>>>> - benchmark2 writes packet on memory(mfs)
>>>> - benchmark3 writes packet on hdd(zfs)
>>>> - benchmark4 only reads bpf, doesn't write packet anywhere, with zerocopy
>>>> - benchmark5 writes packet on memory(mfs), with zerocopy
>>>> - benchmark6 writes packet on hdd(zfs), with zerocopy
>>>>
>>>>> From benchmark result, I can say the performance is increased using
>>>> mq_bpf on 10GbE, but not on GbE.
>>>>
>>>> * Throughput benchmark
>>>> - Test environment
>>>> - FreeBSD node
>>>>  CPU: Core i7 X980 (12 threads)
>>>>  MB: ASUS P6X58D Premium(Intel X58)
>>>>  NIC1: Intel Gigabit ET Dual Port Server Adapter(82576)
>>>>  NIC2: Intel Ethernet X520-DA2 Server Adapter(82599)
>>>> - Linux node
>>>>  CPU: Core 2 Quad (4 threads)
>>>>  MB: GIGABYTE GA-G33-DS3R(Intel G33)
>>>>  NIC1: Intel Gigabit ET Dual Port Server Adapter(82576)
>>>>  NIC2: Intel Ethernet X520-DA2 Server Adapter(82599)
>>>>
>>>> iperf used for generate network traffic, with following argument options
>>>>  - Linux node: iperf -c [IP] -i 10 -t 100000 -P12
>>>>  - FreeBSD node: iperf -s
>>>>  # 12 threads, TCP
>>>>
>>>> following sysctl parameter is changed
>>>>  sysctl -w net.bpf.maxbufsize=1048576
>>>
>>>
>>> Thank you for your work! You may want to increase that (4x/8x) and rerun the test, though.
>>
>> More, actually. Your current buffer is easily filled.
>
> Hi,
>
> I measured performance again with maxbufsize = 268435456 and multiple
> cpu configurations, here's an result.
> It seems the performance on 10GbE is bit unstable, not scaling
> linearly by adding cpus/queues.
> Maybe it depends some sort of system parameter, but I don't figure out
> the answer.
>
> Multithreaded BPF performance is increasing than single thread BPF in
> all case, anyway.
>
> * Test environment
>  - FreeBSD node
>   CPU: Core i7 X980 (12 threads)
>  # Tested on 1 core, 2 core, 4 core and 6 core configuration (Each
> core has 2 threads using HT)
>   MB: ASUS P6X58D Premium(Intel X58)
>   NIC: Intel Ethernet X520-DA2 Server Adapter(82599)
>
>  - Linux node
>   CPU: Core 2 Quad (4 threads)
>   MB: GIGABYTE GA-G33-DS3R(Intel G33)
>   NIC: Intel Ethernet X520-DA2 Server Adapter(82599)
>
>  - iperf
>   Linux node: iperf -c [IP] -i 10 -t 100000 -P16
>   FreeBSD node: iperf -s
>   # 16 threads, TCP
>  - system parameter
>   net.bpf.maxbufsize=268435456
>   hw.ixgbe.num_queues=[n queues]
>
> * 2threads, 2queues
>  - iperf throughput
>   iperf only: 8.845Gbps
>   test_mqbpf: 5.78Gbps
>   test_sqbpf: 6.89Gbps
>  - test program throughput
>   test_mqbpf: 4526.863414 Mbps
>   test_sqbpf: 762.452475 Mbps
>  - received/dropped
>   test_mqbpf:
>      45315011 packets received (BPF)
>      9646958 packets dropped (BPF)
>   test_sqbpf:
>      56216145 packets received (BPF)
>      49765127 packets dropped (BPF)
>
> * 4threads, 4queues
>  - iperf throughput
>   iperf only: 3.03Gbps
>   test_mqbpf: 2.49Gbps
>   test_sqbpf: 2.57Gbps
>  - test program throughput
>   test_mqbpf: 2420.195051 Mbps
>   test_sqbpf: 430.774870 Mbps
>  - received/dropped
>   test_mqbpf:
>      19601503 packets received (BPF)
>      0 packets dropped (BPF)
>   test_sqbpf:
>      22803778 packets received (BPF)
>      18869653 packets dropped (BPF)
>
> * 8threads, 8queues
>  - iperf throughput
>   iperf only: 5.80Gbps
>   test_mqbpf: 4.42Gbps
>   test_sqbpf: 4.30Gbps
>  - test program throughput
>   test_mqbpf: 4242.314913 Mbps
>   test_sqbpf: 1291.719866 Mbps
>  - received/dropped
>   test_mqbpf:
>      34996953 packets received (BPF)
>      361947 packets dropped (BPF)
>   test_sqbpf:
>      35738058 packets received (BPF)
>      24749546 packets dropped (BPF)
>
> * 12threads, 12queues
>  - iperf throughput
>   iperf only: 9.31Gbps
>   test_mqbpf: 8.06Gbps
>   test_sqbpf: 5.67Gbps
>  - test program throughput
>   test_mqbpf: 8089.242472 Mbps
>   test_sqbpf: 5754.910665 Mbps
>  - received/dropped
>   test_mqbpf:
>      73783957 packets received (BPF)
>      9938 packets dropped (BPF)
>   test_sqbpf:
>      49434479 packets received (BPF)
>      0 packets dropped (BPF)
>
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Multiqueue support for bpf

George Neville-Neil

On Aug 19, 2011, at 04:21 , Takuya ASADA wrote:

> Any comments or suggestions?
>
>

One comment, one question.

First, I think we should try to integrate this work and then tune it up more.  The API
is, I think, fine, and performance tuning takes a bit of work.

Second, what are the parameters set on buffers for the drivers?  I.e. how many slots
do they have in their queues etc.?  If they defaults are too small, and often they are,
then that's going to hurt your performance.

Best,
George


_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Multiqueue support for bpf

syuu1228
Sorry for late replying,

> One comment, one question.
>
> First, I think we should try to integrate this work and then tune it up more.  The API
> is, I think, fine, and performance tuning takes a bit of work.

Is there good way(I mean tools or something) to find the bottleneck?

> Second, what are the parameters set on buffers for the drivers?  I.e. how many slots
> do they have in their queues etc.?  If they defaults are too small, and often they are,
> then that's going to hurt your performance.

It does equals to number of descriptors per queue, right?
If I'm correct, it's 2048 descriptors per queue by default, and I used
default parameter when I perform benchmarks.

It's on line 290 of
http://p4db.freebsd.org/fileViewer.cgi?FSPC=//depot/projects/soc2011/mq_bpf/src/sys/dev/ixgbe/ixgbe.c&REV=2

and line 105 of
http://p4db.freebsd.org/fileViewer.cgi?FSPC=//depot/projects/soc2011/mq_bpf/src/sys/dev/ixgbe/ixgbe.h&REV=2
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Multiqueue support for bpf

syuu1228
Hi,

Probably my previous mail had been skipped or forgot replying, so I'd
like to try notice again.
# This is original post of this thread, if you don't remember what is
this: http://lists.freebsd.org/pipermail/freebsd-net/2011-August/029585.html

George said "I think we should try to integrate this work and then
tune it up more. in previous mail, then I want to merge this now.

Is there any additional work required to merge, or just fine?

2011/9/22 Takuya ASADA <[hidden email]>:

> Sorry for late replying,
>
>> One comment, one question.
>>
>> First, I think we should try to integrate this work and then tune it up more.  The API
>> is, I think, fine, and performance tuning takes a bit of work.
>
> Is there good way(I mean tools or something) to find the bottleneck?
>
>> Second, what are the parameters set on buffers for the drivers?  I.e. how many slots
>> do they have in their queues etc.?  If they defaults are too small, and often they are,
>> then that's going to hurt your performance.
>
> It does equals to number of descriptors per queue, right?
> If I'm correct, it's 2048 descriptors per queue by default, and I used
> default parameter when I perform benchmarks.
>
> It's on line 290 of
> http://p4db.freebsd.org/fileViewer.cgi?FSPC=//depot/projects/soc2011/mq_bpf/src/sys/dev/ixgbe/ixgbe.c&REV=2
>
> and line 105 of
> http://p4db.freebsd.org/fileViewer.cgi?FSPC=//depot/projects/soc2011/mq_bpf/src/sys/dev/ixgbe/ixgbe.h&REV=2
>
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[hidden email]"
Loading...