Wednesday, June 24, 2020

Looking for input on a 10g Ceph storage network

Hello!

Not a network admin or certified network guy in any way but I like to think I have a decent network understanding. I work at a smaller game dev studio and we are looking at upgrading from our current NAS to a Ceph cluster and as part of that replacing the networking in our rack. Most of our devs have 10g NICs in their workstations and we have a LAG connection coming from that 10g 48 port Netgear switch into the server rack.

In the rack we aim to have:

  • 4x compute hypervisors
  • 3x Ceph OSD hosts

Each of those hosts should have 4 10g ports. Probably in the form of 2 NICs with 2 ports each. That allows each host to have 2 bonds in mode 1 (active+backup) that has 1 port from both NICs giving each bond redundancy over not just 2 switches but 2 NICs. For the Ceph nodes 1 bond would be public traffic and 1 bond cluster traffic. For the hypervisors 1 bond for the Ceph public traffic and 1 bond for the VMs public traffic.

With that in mind we then want 2 10g switches. With each host, Ceph or hypervisor, using 2 ports on each switch thats 14 ports on each switch just for the hosts. I assume we want a LAG connection from each of those switches to a switch above them and also a LAG connection between the switches. The switch at the top would be were the LAG connection from the Netgear comes from and where our pfsense box plugs in. I am less worried about the workstations or WAN going down or losing connection to the rack. The most important thing is the hypervisors connection to Ceph and Cephs internal cluster network.

So thats the general idea in my head. I would love any thoughts people have on that along with any suggestions on specific switches to use. Most my experience is with Ubiquiti gear, either Unifi or Edgerouter. I love the idea of being able to centrally manage the devices but their biggest 10g switch is 16 ports only. :(

Any thoughts or suggestions? Thanks in advance!



No comments:

Post a Comment