So I decided to test NFS over RDMA somehow.
Hostname | ft2000 (Server) | t3640 (Client) |
---|---|---|
CPU | Phytium FT-2000+/64 | Intel Core i9-10900K |
RAM | Quad Channel DDR4 128GB | Dual Channel DDR4 64GB |
NIC | Mellanox ConnextX-4 Lx 25GbE | Mellanox ConnextX-4 Lx 25GbE |
OS | UOS Server 20 1070a | Debian 12 |
Two servers are connected using a 25GbE network. The goal is to set up NFS over RDMA and benchmark it to see the performance difference between NFS over TCP and NFS over RDMA.
RDMA Link Verification
Install RDMA/IB dependencies:
Dependencies should be the same on both devices (no server and client differences). The difference is only the package manager (and OS).
If you are using Mellanox OFED, you can skip this step. Be sure to install Mellanox OFED using
--with-nfsrdma
flag. Otherwise, you will not be able to use NFS over RDMA.
On UOS Server 20 1070a, which is Anolis8-based, which is again CentOS8-based:
|
|
On Debian 12:
|
|
Make sure link is up on both servers using ip
or ibstatus
command:
Client:
Use ip
(note state UP
):
|
|
Or use ibstatis
(note phys state: 5: LinkUp
):
|
|
Server:
Use ip
(note state UP
):
|
|
Or use ibstatis
(note phys state: 5: LinkUp
):
|
|
Verify IP connectivity between the servers (ping the IP address of the other server):
|
|
Make sure the the InfiniBand kernel modules are enabled.
|
|
Make sure you have a lossless network:
In case the RDMA is running over Ethernet (RoCE) you need to make sure that the network is configured to be loss-less, which means that either flow control (FC) or priority flow control PFC is enabled on the adapter ports and the switch.
In case of lab environment or small setup, you can use Global Pause Flow Control to create loss-less environment. To check what is the global pause configuration use the following command (by default it is enabled normally).
|
|
In case it is disabled, run:
|
|
Test RDMA speed. Refer to my previous blog InfiniBand Performance Test for details.
Run the following command on client and server, respectively:
|
|
|
|
As you can see, the speed is around 23 Gb/sec. This is the maximum speed of the network (25G). If you see a lower speed, you need to check the network configuration and the switch configuration.
Configure NFS without RDMA
Just to make sure NFS is working.
Install NFS on server:
|
|
Use /tmp
as a test dir and share it:
Since I want to test the network performance (TCP and RDMA), I don’t want to be bottlenecked by the disk speed. So I will use
/tmp
, which is backed by RAM (usually), as a reasonably fast directory to avoid bottlenecks.Make sure it is
tmpfs
, otherwise IO will be limited by your disk speed.
1 2
root@ft2000:/# mount | grep /tmp tmpfs on /tmp type tmpfs (rw,nosuid,nodev)
|
|
Make it available:
|
|
Install NFS on client:
|
|
Mount the NFS share temporarily:
|
|
Verify the mount:
Should see the files in /tmp
on the server.
|
|
Configure NFS with RDMA
Load RDMA transport module on the server:
|
|
Instruct the server to listen on the RDMA transport port (20049 is the default port):
Note: if you see echo: write error: protocol not supported
, it means that the NFSoRDMA is not supported in the Mellanox OFED. You need to use the inbox OS driver (refer to Install RDMA/IB dependencies section in this blog) or re-install Mellanox OFED with --with-nfsrdma
flag. I was tripped by this issue.
|
|
Load the RDMA transport module on the client
|
|
Mount the NFS share with RDMA:
Before running the mount command, unmount the NFS share if it is mounted (umount /mnt/nfs
).
|
|
Check the mount parameters:
If you see proto=rdma
, it means the NFS is mounted using RDMA.
|
|
Verify the mount:
Should see the files in /tmp
on the server.
|
|
Benchmark
Use fio
to benchmark the NFS performance. Parameters such as bs
, iodepth
will be changed to different values later to evaluate the performance in different scenarios.
I will only test random read performance since sequential read performance is not a main concern even using TCP let alone RDMA.
The actual command will be:
|
|
The full log and related scripts can be found in the Appendix.
Here are the results we have all been waiting for!
For IO operations per second (IOPS), the average IOPS for NFS over RDMA is higher than NFS over TCP. The difference is more significant when the block size is larger and the most significant when iodepth=12
. We can also see that no matter the block size and iodepth, the IOPS for NFS over TCP is capped at around 47 KIOPS, while the one of RDMA is significantly higher at over 180 KIOPS.
The reason why the difference is smaller with larger block sizes is that the network bandwidth is the bottleneck. For our 25GbE network, the maximum bandwidth is around 2800 MiB/s. We are hitting this limit with larger block sizes and this much IOPS. This is unrelated to the RDMA or TCP. If you have a faster network, you will see this difference for larger block sizes as well.
For 4K block size:
IO Depth | KIOPS (RDMA) | KIOPS (TCP) | Ratio |
---|---|---|---|
8 | 124.5 | 36.1 | 3.4 |
12 | 156.0 | 38.1 | 4.1 |
16 | 165.8 | 41.4 | 4.0 |
24 | 177.0 | 45.8 | 3.9 |
32 | 179.4 | 50.7 | 3.5 |
For bandwidth, the average bandwidth for NFS over RDMA is significantly higher (around 4x for small block sizes).
The reason why the difference is more significant when the block size is small (4-16) is that we are limited by the network bandwidth. Since we have a 25GbE network, the maximum bandwidth is around 2800 MiB/s so you can see that the bandwidth is capped at 2800 MiB/s for large block sizes and deep IO depths. If you have a faster network, you will see this 4x difference for larger block sizes as well.
Latency-wise, the average latency for NFS over RDMA is slightly lower than NFS over TCP. The difference is more significant for small block size and large iodepths.
Appendix
- Benchmark script benchmark.sh: automatically run
fio
with differentbs
andiodepth
values and saves logs (it saves stdout, not the fio-generated log, e.g.--bandwidth-log
). - Plotting script plot.py: parses the
fio
logs (stdout) saved bybenchmark.sh
and plots the results, which is the three figures you see above in the Benchmark section.
Raw fio Logs (Click to Show)
- rdma_fio_bs4k_iodepth1.log
- rdma_fio_bs4k_iodepth2.log
- rdma_fio_bs4k_iodepth3.log
- rdma_fio_bs4k_iodepth4.log
- rdma_fio_bs4k_iodepth6.log
- rdma_fio_bs4k_iodepth8.log
- rdma_fio_bs4k_iodepth12.log
- rdma_fio_bs4k_iodepth16.log
- rdma_fio_bs4k_iodepth24.log
- rdma_fio_bs4k_iodepth32.log
- rdma_fio_bs4k_iodepth64.log
- rdma_fio_bs4k_iodepth128.log
- rdma_fio_bs8k_iodepth1.log
- rdma_fio_bs8k_iodepth2.log
- rdma_fio_bs8k_iodepth3.log
- rdma_fio_bs8k_iodepth4.log
- rdma_fio_bs8k_iodepth6.log
- rdma_fio_bs8k_iodepth8.log
- rdma_fio_bs8k_iodepth12.log
- rdma_fio_bs8k_iodepth16.log
- rdma_fio_bs8k_iodepth24.log
- rdma_fio_bs8k_iodepth32.log
- rdma_fio_bs8k_iodepth64.log
- rdma_fio_bs8k_iodepth128.log
- rdma_fio_bs16k_iodepth1.log
- rdma_fio_bs16k_iodepth2.log
- rdma_fio_bs16k_iodepth3.log
- rdma_fio_bs16k_iodepth4.log
- rdma_fio_bs16k_iodepth6.log
- rdma_fio_bs16k_iodepth8.log
- rdma_fio_bs16k_iodepth12.log
- rdma_fio_bs16k_iodepth16.log
- rdma_fio_bs16k_iodepth24.log
- rdma_fio_bs16k_iodepth32.log
- rdma_fio_bs16k_iodepth64.log
- rdma_fio_bs16k_iodepth128.log
- rdma_fio_bs32k_iodepth1.log
- rdma_fio_bs32k_iodepth2.log
- rdma_fio_bs32k_iodepth3.log
- rdma_fio_bs32k_iodepth4.log
- rdma_fio_bs32k_iodepth6.log
- rdma_fio_bs32k_iodepth8.log
- rdma_fio_bs32k_iodepth12.log
- rdma_fio_bs32k_iodepth16.log
- rdma_fio_bs32k_iodepth24.log
- rdma_fio_bs32k_iodepth32.log
- rdma_fio_bs32k_iodepth64.log
- rdma_fio_bs32k_iodepth128.log
- rdma_fio_bs64k_iodepth1.log
- rdma_fio_bs64k_iodepth2.log
- rdma_fio_bs64k_iodepth3.log
- rdma_fio_bs64k_iodepth4.log
- rdma_fio_bs64k_iodepth6.log
- rdma_fio_bs64k_iodepth8.log
- rdma_fio_bs64k_iodepth12.log
- rdma_fio_bs64k_iodepth16.log
- rdma_fio_bs64k_iodepth24.log
- rdma_fio_bs64k_iodepth32.log
- rdma_fio_bs64k_iodepth64.log
- rdma_fio_bs64k_iodepth128.log
- rdma_fio_bs128k_iodepth1.log
- rdma_fio_bs128k_iodepth2.log
- rdma_fio_bs128k_iodepth3.log
- rdma_fio_bs128k_iodepth4.log
- rdma_fio_bs128k_iodepth6.log
- rdma_fio_bs128k_iodepth8.log
- rdma_fio_bs128k_iodepth12.log
- rdma_fio_bs128k_iodepth16.log
- rdma_fio_bs128k_iodepth24.log
- rdma_fio_bs128k_iodepth32.log
- rdma_fio_bs128k_iodepth64.log
- rdma_fio_bs128k_iodepth128.log
- rdma_fio_bs256k_iodepth1.log
- rdma_fio_bs256k_iodepth2.log
- rdma_fio_bs256k_iodepth3.log
- rdma_fio_bs256k_iodepth4.log
- rdma_fio_bs256k_iodepth6.log
- rdma_fio_bs256k_iodepth8.log
- rdma_fio_bs256k_iodepth12.log
- rdma_fio_bs256k_iodepth16.log
- rdma_fio_bs256k_iodepth24.log
- rdma_fio_bs256k_iodepth32.log
- rdma_fio_bs256k_iodepth64.log
- rdma_fio_bs256k_iodepth128.log
- rdma_fio_bs512k_iodepth1.log
- rdma_fio_bs512k_iodepth2.log
- rdma_fio_bs512k_iodepth3.log
- rdma_fio_bs512k_iodepth4.log
- rdma_fio_bs512k_iodepth6.log
- rdma_fio_bs512k_iodepth8.log
- rdma_fio_bs512k_iodepth12.log
- rdma_fio_bs512k_iodepth16.log
- rdma_fio_bs512k_iodepth24.log
- rdma_fio_bs512k_iodepth32.log
- rdma_fio_bs512k_iodepth64.log
- rdma_fio_bs512k_iodepth128.log
- rdma_fio_bs1024k_iodepth1.log
- rdma_fio_bs1024k_iodepth2.log
- rdma_fio_bs1024k_iodepth3.log
- rdma_fio_bs1024k_iodepth4.log
- rdma_fio_bs1024k_iodepth6.log
- rdma_fio_bs1024k_iodepth8.log
- rdma_fio_bs1024k_iodepth12.log
- rdma_fio_bs1024k_iodepth16.log
- rdma_fio_bs1024k_iodepth24.log
- rdma_fio_bs1024k_iodepth32.log
- rdma_fio_bs1024k_iodepth64.log
- rdma_fio_bs1024k_iodepth128.log
- tcp_fio_bs4k_iodepth1.log
- tcp_fio_bs4k_iodepth2.log
- tcp_fio_bs4k_iodepth3.log
- tcp_fio_bs4k_iodepth4.log
- tcp_fio_bs4k_iodepth6.log
- tcp_fio_bs4k_iodepth8.log
- tcp_fio_bs4k_iodepth12.log
- tcp_fio_bs4k_iodepth16.log
- tcp_fio_bs4k_iodepth24.log
- tcp_fio_bs4k_iodepth32.log
- tcp_fio_bs4k_iodepth64.log
- tcp_fio_bs4k_iodepth128.log
- tcp_fio_bs8k_iodepth1.log
- tcp_fio_bs8k_iodepth2.log
- tcp_fio_bs8k_iodepth3.log
- tcp_fio_bs8k_iodepth4.log
- tcp_fio_bs8k_iodepth6.log
- tcp_fio_bs8k_iodepth8.log
- tcp_fio_bs8k_iodepth12.log
- tcp_fio_bs8k_iodepth16.log
- tcp_fio_bs8k_iodepth24.log
- tcp_fio_bs8k_iodepth32.log
- tcp_fio_bs8k_iodepth64.log
- tcp_fio_bs8k_iodepth128.log
- tcp_fio_bs16k_iodepth1.log
- tcp_fio_bs16k_iodepth2.log
- tcp_fio_bs16k_iodepth3.log
- tcp_fio_bs16k_iodepth4.log
- tcp_fio_bs16k_iodepth6.log
- tcp_fio_bs16k_iodepth8.log
- tcp_fio_bs16k_iodepth12.log
- tcp_fio_bs16k_iodepth16.log
- tcp_fio_bs16k_iodepth24.log
- tcp_fio_bs16k_iodepth32.log
- tcp_fio_bs16k_iodepth64.log
- tcp_fio_bs16k_iodepth128.log
- tcp_fio_bs32k_iodepth1.log
- tcp_fio_bs32k_iodepth2.log
- tcp_fio_bs32k_iodepth3.log
- tcp_fio_bs32k_iodepth4.log
- tcp_fio_bs32k_iodepth6.log
- tcp_fio_bs32k_iodepth8.log
- tcp_fio_bs32k_iodepth12.log
- tcp_fio_bs32k_iodepth16.log
- tcp_fio_bs32k_iodepth24.log
- tcp_fio_bs32k_iodepth32.log
- tcp_fio_bs32k_iodepth64.log
- tcp_fio_bs32k_iodepth128.log
- tcp_fio_bs64k_iodepth1.log
- tcp_fio_bs64k_iodepth2.log
- tcp_fio_bs64k_iodepth3.log
- tcp_fio_bs64k_iodepth4.log
- tcp_fio_bs64k_iodepth6.log
- tcp_fio_bs64k_iodepth8.log
- tcp_fio_bs64k_iodepth12.log
- tcp_fio_bs64k_iodepth16.log
- tcp_fio_bs64k_iodepth24.log
- tcp_fio_bs64k_iodepth32.log
- tcp_fio_bs64k_iodepth64.log
- tcp_fio_bs64k_iodepth128.log
- tcp_fio_bs128k_iodepth1.log
- tcp_fio_bs128k_iodepth2.log
- tcp_fio_bs128k_iodepth3.log
- tcp_fio_bs128k_iodepth4.log
- tcp_fio_bs128k_iodepth6.log
- tcp_fio_bs128k_iodepth8.log
- tcp_fio_bs128k_iodepth12.log
- tcp_fio_bs128k_iodepth16.log
- tcp_fio_bs128k_iodepth24.log
- tcp_fio_bs128k_iodepth32.log
- tcp_fio_bs128k_iodepth64.log
- tcp_fio_bs128k_iodepth128.log
- tcp_fio_bs256k_iodepth1.log
- tcp_fio_bs256k_iodepth2.log
- tcp_fio_bs256k_iodepth3.log
- tcp_fio_bs256k_iodepth4.log
- tcp_fio_bs256k_iodepth6.log
- tcp_fio_bs256k_iodepth8.log
- tcp_fio_bs256k_iodepth12.log
- tcp_fio_bs256k_iodepth16.log
- tcp_fio_bs256k_iodepth24.log
- tcp_fio_bs256k_iodepth32.log
- tcp_fio_bs256k_iodepth64.log
- tcp_fio_bs256k_iodepth128.log
- tcp_fio_bs512k_iodepth1.log
- tcp_fio_bs512k_iodepth2.log
- tcp_fio_bs512k_iodepth3.log
- tcp_fio_bs512k_iodepth4.log
- tcp_fio_bs512k_iodepth6.log
- tcp_fio_bs512k_iodepth8.log
- tcp_fio_bs512k_iodepth12.log
- tcp_fio_bs512k_iodepth16.log
- tcp_fio_bs512k_iodepth24.log
- tcp_fio_bs512k_iodepth32.log
- tcp_fio_bs512k_iodepth64.log
- tcp_fio_bs512k_iodepth128.log
- tcp_fio_bs1024k_iodepth1.log
- tcp_fio_bs1024k_iodepth2.log
- tcp_fio_bs1024k_iodepth3.log
- tcp_fio_bs1024k_iodepth4.log
- tcp_fio_bs1024k_iodepth6.log
- tcp_fio_bs1024k_iodepth8.log
- tcp_fio_bs1024k_iodepth12.log
- tcp_fio_bs1024k_iodepth16.log
- tcp_fio_bs1024k_iodepth24.log
- tcp_fio_bs1024k_iodepth32.log
- tcp_fio_bs1024k_iodepth64.log
- tcp_fio_bs1024k_iodepth128.log