[CONTACT]

[ABOUT]

[POLICY]

title card title card Gigabit ether

Found at: gopher.blog.benjojo.co.uk:70/multipath-without-mptcp

 Going multipath without Multipath TCP
 ===

title card

 title card
 Gigabit ethernet has been around for a long time, it's so
ubiquitous that there is a very strong chance that if you have a
RJ-45 port on your computer, it's going to be a gigabit ethernet
network interface.
 Even if you look at computers that are over 20 years old,
the only thing that stands out on their spec sheets as still
being current is gigabit ethernet.

old mac pro

old mac pro

 old mac pro

are unlikely to ever get the gigabit performance

 However, gigabit ethernet (1GBE) sometimes is just not
enough these days. Gigabit residential internet access is becoming
more and more common, and while most of those consumers are
using WiFi and are unlikely to ever get the gigabit performance
they were likely expecting. A wired connection could easily max
out the internet and LAN link.

apple products have been shipping with 10 gigabit ethernet for a while now

 10 Gigabit ethernet (10GBE) is creeping into the consumer
market slowly, notably apple products have been shipping with 10
gigabit ethernet for a while now but it's still rare to find
people with 10 gigabit ethernet switches.

NVMe

mine can read at nearly 20 gigabits

 Most of the reason I suspect the consumer ethernet speeds
have not improved in the last 20 years is that for the most
part, it's already fast enough. However things like storage have
sped up remarkably now with not only faster disks (the average
SATA hard disk will run at around 2.5 gigabits) but flash
storage has given us single drive speeds of 6 gigabits, more if
you look at NVMe (mine can read at nearly 20 gigabits). But
when you look at what we can send to other computers (think
remote storage like NAS'es) then we can quickly get limited by
1GBE.
 Servers are a different story, while there are still a
considerable number of servers still running on 1GBE, 10GBE and 25GBE
(or faster!) are used for anything bandwidth intensive, since
the network can quickly become the major bottleneck when faced
with large compute power and storage capacity.
 ## Enter Link Aggregation
 However single 10GBE and 25GBE links are not always what
you want. What if you want more bandwidth? What if the switch
you are attached to needs a software upgrade or crashes? For
this reason, a lot of the NICs for servers have two physical
ports that are connected to the same switch (for a bandwidth
increase) or different switches (for redundancy/failover).

Link aggregation (LAG)

amusing blog titles from time to time

(A)

 But how do systems make use of multiple ethernet links?
They program their switches and the servers to use Link
aggregation (LAG). Link aggregation means that both links share a
single IP and MAC address, and requires the switch and system (at
least in the commonly deployed 802.3ad standard) to change their
behaviour. Link aggregation goes by different names depending on the
vendors you are configuring it on, names include; Aggregate
Interfaces, NIC Teaming, Port Channels, and Bond's. The last one
causing amusing blog titles from time to time (A)

congestion control

 However, Link aggregation does not directly cause performance
increases for single connections. This is because the OS and network
layer typically directs a connection down a single ethernet link
at a time. Since TCP and other protocols that do congestion
control could become confused when they are presented with
inconsistent performance feedback (as different links have different
capacities and latencies). While a 2x10GBE may mean that you have 20
gigabits of bandwidth available to you, a single TCP connection will
only run at 10 gigabits due to the connection directing/hashing
logic.
 ## Need for (single connection) speed

LTO 6 drive

 LTO 6 drive

I talked about LTO Tape backups

 In a previous post I talked about LTO Tape backups and
how that drives themselves could read/write than a standard
gigabit ethernet link, and that 10 gigabit networking is recommended
if you are streaming data to a tape over the network to
avoid issues.

mbuffer

 Sadly, in my case the machine that held the SAS card
required for running the tape drive consumed the last PCIe slot
that could have held a 10GBE network card. Even though the
systems feeding the PC hosting the tape drive were on 10GBE. This
meant that my tape backups were far slower (and stalling on
mbuffer) than was needed.
 But what I did have was USB 3.0 ports, and where there
are USB 3 ports there is the option to use USB gigabit
ethernet dongles.

mbuffer

 The issue comes however in that while I could have set
the motherboard and the USB NIC into a LAG, that would not
have improved the speed of my single TCP connection feeding the
tape drive (via mbuffer).
 I really needed a way to combine the throughput of both
NICs into one 2gbit/s stream.
 ## Going multipath _with_ MPTCP

MPTCP Wireshark

 MPTCP Wireshark

RFC8684

 Multipath TCP (MPTCP) or RFC8684 is an extension that
allows a single TCP socket to span across multiple IP addresses
and network interfaces.

OpenMTCPRouter Project

Apple's Siri

 This extension is currently used sparsely, with the only
two commonly deployed uses being the OpenMTCPRouter Project and
Apple's Siri
 OpenMTCPRouter uses MPTCP to proxy/tunnel connections for
better throughput, allowing you to chain multiple residential
connections into one faster link to a proxy server, while Siri uses
it to handle rapid failover between WiFi and Cellular to
ensure the best experience when using the voice assistant.
 MPTCP was merged into the Linux kernel in 5.6, however I
do not know of any mainstream distributions that have it
present and enabled. Ubuntu 20.10 ships with MPTCP on it's 5.13
kernel, while Debian Bullseye uses 5.10 but has MPTCP disabled.
 ```
 $ cat /boot/config-5.10.0-11-amd64 | grep MPTC
 # CONFIG_MPTCP is not set
 ```
 Outside of having MPTCP support available in your OS. I
have found the MPTCP usability story… Bad? It feels really bad
to say this against something but my own research into MPTCP
has been maddening.
 
This post was written on Feb 24th
2022, The situation may have changed by the time you have read
this

actual project website

the kernel tests for MPTCP

 Since MPTCP is now shipping in mainline Linux, the actual
project website itself appears to have not kept up, the docs on
the site sent me down wild goose chases only to find that the
things written down are not supported anymore, or maybe I just
have not found any way to do the things they have documented.
I may just honestly be stupid with this, but I found the
best living documentation to be the kernel tests for MPTCP since
they by definition have to reflect the current API for MPTCP.
 I think in general a lot of the pain that comes with
this is that MPTCP is designed to automatically detect and begin
multipathing traffic in a way where the user space has its hands off
the details of the connection.
 Because of this, I could not find a way to tell the
kernel from user space (ignoring netlink etc) about multiple
endpoints for a host, for this reason I gave up my attempt to
write a mptcp client tool.
 ## DIY Multipath TCP

had a library

 Not satisfied with MPTCP I figured that an entirely
userspace version of this concept is possible, and in fact someone I
was doing contract work with at the time had a library that
appeared to be able to "bond" together multiple connections. Upon
trying to get it working however I found that the library did
not achieve speed improvements, and failover behaviour was
unpredictable at best.
 So I worked on making sure it could, overhauling the
library and pushing fixes to it. Once it worked enough to the
point where I was making decisions that could break the library.
I made my own fork to contain the behaviour changes.

`net.Conn`

 The `multipath` go library is actually quite an involved
bit of machinery. Because while MPTCP has raw packet access to
do things with retries and subflow sorting, `multipath` does
not. It takes what is basically an array of `net.Conn`'s and
teams them together for bandwidth and resilience. This means that
anything that conforms to `net.Conn` in go and is an ordered
reliable stream (Like for example, WebSockets, TLS connections, SCTP
in some modes, etc) can be used in this library all
combined!
 Due to the usage of TCP-like sockets, the performance will
never be as good as if you wrote this using UDP yourself but
given that is the way the multipath library started, I was
determined to keep it that way even after I forked it.
 Now that I had the library working though, I still needed
a tool that wrapped it for day to day use...
 ## Introducing bondcat

bondcat mascot

 bondcat mascot
 To wrap this all together I made a new utility.
`bondcat`.

`ncat`

 Bondcat has a user interface inspired by `ncat` but accepts
the ability to connect to a host on multiple IP address/port
combos. This means that with some knowledge on address selection
(see below) you can easily and in a cross platform way beat
the limits of single gigabit ethernet speeds.

GridFTP

 I mention gigabit ethernet in particular because as far as
I've managed I've not attempted to optimise the multipath library
in going faster than 10 gigabit/s. This is because at that
stage there are likely better options for moving very large
amounts of data over very fast LAG'd networks, for example
GridFTP.

OpenSSH's ProxyCommand

 But since it acts like netcat/ncat then you can easily
wrap connections over it, for example you could use OpenSSH's
ProxyCommand to obtain faster than gigabit SFTP/SCP transfers:
 ```
 [11:13:24] ben@metropolis:~$ cat .ssh/config
 Host tapedrive
   ControlMaster no
   ProxyCommand bondcat 192.168.XXX.XXX:2222
192.168.XXX.XXX:2222
 ```
 For the listening side, bondcat includes a `-relay` mode
that accepts connections and forwards the data to another tcp
endpoint. Meaning to make this ssh setup work we can point the
relay mode to
 ```
 $ bondcat -relay 127.0.0.1:22 -l -p 2222
 ```
 If that is not your usage style, you can always just use
it as a regular netcat to send stuff around:

netcat bondcat

 netcat bondcat
 Now there are some considerations to be made for bondcat,
since MPTCP has a far better idea of your network stack that
bondcat can ever do (after all, the library doing the magic just
sees a bunch of connections, nothing more) you need to be
careful on what addresses you select for your use case:
 ### Use Case: Going over the internet or trying to escape
the limits of LAG bundles
 This use is simple, you just need to invoke it as you
would a normal netcat (single address) and use the `-multiplier`
flag for how many extra connections you want to start. This
flag also works with other addresses if you want to mix the
hosts IPv4 and IPv6 addresses.
 ### Use Case: Faster LAN transfers
 I find it's easiest to target IPv6 addresses for LAN's.
But that is all assuming that the LAN you are on has Router
Advertisements. Assuming it does then it's the best option. If not then
the local IPv4 address normally works fine.
 However this will generally only work in the 10GBE->{n}x1GBE
setup. Bondcat by default tries to connect with every IP address
on the machine to aid automatic speed boosts. To help this
"automagically" work it's best to start the connections from the machine
that has the most interfaces. This function can be disabled with
`-a or -no-auto-detect`
 ### Use Case: Backup link failover
 Assuming the system you are on has two links, you will
need to manually add a route for one of the endpoints to go
over your backup link. Other than that, it should transparently
work.
 ---
 You can pick up a copy of bondcat on github:
https://github.com/benjojo/bondcat
 I found it very useful for streaming backups to my tape
drive, and I'm sure the library itself (the forked version is in
the same code repo) would find uses outside of bondcat. However
I assume as MPTCP gets better and more supported (with any
luck) the tool will slowly become obsolete.

RSS feed

Twitter

 If you want to stay up to date with the blog you can
use the RSS feed or you can follow me on Twitter
 Also, I'm currently looking for work from March onwards. If
you like what I do or think that you could do with some of
my bizarre areas of knowledge, please contact me over at
workwith@benjojo.co.uk!
 Until next time!


AD:

Advertising? Click here to get started!