hi all, I'm tasked with building out a 100GB Infiniband network for a group of Nvidia GPU servers which will use GPUdirect RDMA. The issue is we have a storage appliance that is 100GB ethernet (GBe) only. It is a toss-up what is more important for our users, storage performance or GPU direct RDMA performance. Having the "fastest" is important. We have nothing built so far, we are in the planning stages, also money isn't too much of a factor. Of course, we don't want to spend money just to spend money. I am new to Infiniband, GPUs and RDMA and don't want to miss something to cringe/embarrass on later.
Should we build-out with:
- both 100GB ethernet and 100GB Infiniband, or
- just 100GB Infiniband and include a 100GB ethernet to 100GB Infiniband gateway, or
- just 100GB ethernet and use RoCE for GPU direct RDMA
advice, opinions, pros/cons.
thanks!
No comments:
Post a Comment