div Translations are available game
Found at: gopher.blog.benjojo.co.uk:70/eve-online-bgp-internet
What would a EvE online Internet look like?
===
Translations are availablein:href="https://habr.com/ru/company/edison/blog/439298/">русский
EvE online is a fascinating game. It's one of the few
MMO's
that only have a single "server" to log into, meaning
everyone is playing on the same logical world.
It's also had a
fascinating set of
events
that have happened inside it, and also remains a very
visually appealing game for what it is:
It is also home to a expansive world map to hold all of
these players.
At its peak EvE had 63,000 players online in a single
world with 500,000 paying subscriptions on top,
and while that number is getting lower by the year the
world remains infamously large.
Meaning that to get from one side to another is a sizable
amount of time (and risk due to player owned factions).
You travel to different areas using warping (within the
same system) or jumping to different systems using a jump gate:
a EvE Jump gate
And all of these systems combine together to produce a map
of beauty and complexity:
eve online map
I always viewed this map like a network, it’s a large
mesh of systems that connect to each other so that people can
get across, with most systems having more than two jump gates.
this lead me to think what would happen if you took the idea
of the map being a network literally? What would a EvE online
internet of systems look like?
For this we need to understand how the real Internet
works. The internet is a large collection of ISP’s that are all
numerically identified with a standardised and unique ISP number, called
an Autonomous System Number or ASN ( or AS for shorter ).
These AS’es need a way to exchange routes with each other,
since they will own ranges of IP addresses, and need a way to
tell other ISPs that their routers can route these IP addresses.
For this, the world has settled on Border Gateway Protocol or
BGP.
BGP works by “announcing” to another AS (known as a
peer) their routes:
two bgp peers announcing to each other
The default behavior of BGP is when it gets a route from
a peer, is to give it to all of the other peers it is
connected to as well. This means that peers will share their view
of the routing table with each other automatically.
a bgp network sharing all their routes with each other
However this default behavior is only useful if you are
using BGP to send routes in to internal routers, since the
modern internet has different logical relationships with each other.
Instead of a mesh, the modern internet looks more like this:
network political layout of the modern internet
However EvE online is set in the future. Who knows if the
internet still relies on this for-profit routing layout. Let’s
pretend that it doesn't so we can see how BGP scales in larger
networks.
For this we will need to simulate real BGP router behavior
and connections. Giving that EvE has a relatively low 8000~
systems, and a reasonable 13.8k links between them. I figured it
was not impossible to actually run 8000~ virtual machines with
real BGP and networking on them to find out how these real
systems looked like when acting together as a network.
eve online map to routers
However, we do not have unlimited resources, so we will
have to find a way to make the smallest Linux image possible
in both disk and memory consumption. For this I looked towards
embedded systems, since embedded systems often have to run in very
low resource environments. I stumbled upon Buildroot and within
a few hours had a small Linux image with just the things I
needed to make this project work.
```
$ ls -alh
total 17M
drwxrwxr-x 2 ben ben 4.0K Jan 22 22:46 .
drwxrwxr-x 6 ben ben 4.0K Jan 22 22:45 ..
-rw-r--r-- 1 ben ben 7.0M Jan 22 22:46 bzImage
-rw-r--r-- 1 ben ben 10M Jan 22 22:46 rootfs.ext2
```
This image contains a bootable linux setup that also has:
* Bird 2 BGP Daemon
* tcpdump and My Traceroute (mtr) for network debugging
* busybox for a basic shell and system utilities
This image can easily be run in qemu with a reasonably
little amount of options:
```
qemu-system-i386 -kernel ../bzImage \
-hda rootfs.ext2 \
-hdb fat:./30045343/ \
-append "root=/dev/sda rw" \
-m 64
```
For networking I chose to use an undocumented feature of
qemu (in my version), in where you can point two qemu processes
at each other and use UDP sockets to transfer data between
them. This is handy as we are planning to provision a lot of
links, so using the normal method of TUN/TAP adapters would get
messy fast.
Since this feature is somewhat undocumented, there were some
frustrations getting it working. It took me a _significant_ amount of
time to discover that the net name on the command line needs
to be the same for both sides of the connection. At a later
glance it appears that this option is now better documented
however as with all things. These changes take awhile to propagate
to downstream distributions.
Once that was working, we have a pair of VM’s that can
send packets between them with the hypervisor carrying them as
UDP datagrams. Since we will be running a lot of these
systems, we need a fast way to configure them all using
pre-generated configuration. For this we can use a handy feature of qemu
that allows you to take a directory on the hypervisor and turn
it into a virtual FAT32 filesystem. This is useful since it
allows us to make a directory per system we plan to run, and
have each qemu process point to that directory, meaning we can
use the same boot image for all VM’s in the cluster.
Since each system has 64MB of RAM, and we are planning to
run 8000~ VM’s, we will clearly need a decent amount of RAM.
For this I used 3 of packet.net’s `m2.xlarge.x86` instances,
since they offer 256GB of RAM with 2x Xeon Gold 5120’s meaning
they have a decent amount of grunt to them.
m2.xlarge.x86
I used another open source project to produce the EvE map
in JSON form, and then have a program of my own produce
configuration from that data. After having a few test runs of just a
few systems I proved that they could take configuration from
the VFAT and establish BGP sessions with each other over that.
test systems running and communicating
So I then took the plunge at booting the universe:
load average
At first I attempted to launch all systems in one big
event, unfortunately this ended up causing a _big bang_ in regards
to the load of the system, so after that I switched to
launching a system every 2.5 seconds, and the 48 cores on the
system ended up taking care of that nicely.
htop over the 3 hypervisors
During the booting process I found you would see large
"explosions" of CPU usage over all of the VM's, I later figured out
this was large sections of the universe getting connected to
each other, thus causing large amounts of BGP traffic to both
sides of the newly connected VM's.
lots of CPU usage
```
root@evehyper1:~/147.75.81.189# ifstat -i bond0,lo
bond0 lo
KB/s in KB/s out KB/s in KB/s out
690.46 157.37 11568.95 11568.95
352.62 392.74 20413.64 20413.64
468.95 424.58 21983.50 21983.50
```
In the end we got to see some pretty awesome BGP paths,
since every system is announcing a /48 of IPv6 addresses, you
could see the routes to every system, and all of the other
systems it would have to route though to get there.
```
$ birdc6 s ro all
2a07:1500:b4f::/48 unicast [session0 18:13:15.471] * (100)
[AS2895i]
via 2a07:1500:d45::2215 on eth0
Type: BGP univ
BGP.origin: IGP
BGP.as_path: 3397 3396 3394 3385 3386 3387
2049 2051 2721 2720
2719 2692 2645 2644 2643 145 144 146 2755
1381 1385 1386 1446
1448 849 847 862 867 863 854 861 859 1262
1263 1264 1266 1267
2890 2892 2895
BGP.next_hop: 2a07:1500:d45::2215
fe80::5054:ff:fe6e:5068
BGP.local_pref: 100
```
I took a snapshot of the routing table on each router in
the universe and then graphed out the highly used systems to
get to other systems, however this image is huge. So here is
a small version of it in this post, *if you click on the
image, be aware this image will likely make your device run out
of memory*
eve online bgp map
produced a file that mapped out the London underground system
After this I got thinking, what else could you map out in
BGP routed networks? Could you use a smaller model of this to
test out how routing config worked in large scale networks? I
produced a file that mapped out the London underground system to
test this out:
TFL map
The TFL system is much smaller, and has a lot more hops
that only have one direction to go in since most stations only
have a single “line” of transport, however we can learn one
thing out of this, we can use this to harmlessly play with BGP
MED’s.
There is however a problem when we view the TFL map as a
BGP network, in the real world the time/latency between every
stop is not the same, and so if we were to simulate this
latency we would not be moving our way around the system as fast
as we could be, since we are only looking at the fewest
stations to pass through.
TFL BGP map
However thanks to a Freedom Of Information Act request
(FOIA) that was sent to TFL, we have the time it takes to get
from one station to another. These got generated into the BGP
configuration of the routers, for example:
```
protocol bgp session1 {
}
protocol bgp session2 {
}
```
In the `session1` the time distance between the two
stations is 1.6 mins, the other path out of that station is 4.86
mins away. This number is added on to the route for each
station/router it passes through. Meaning every router/station on the
network knows it's time to get to every station through every
route:
birdc output on TFL BGP, showing MEDs
This means that traceroutes are accurate looks into how you
can move around London, for example my local station to
Paddington:
a mtr traceroute from my home station to paddington
We can also have some fun with BGP by simulating some
maintenance or a incident at Waterloo station:
shutting down waterloo
And how the entire network instantly picks the next
_fastest_ route, rather than the one with the fewest stations to
pass through.
new mtr without passing though waterloo
And that's the magic of BGP MED's in routing!
---
The code for all of this is open now, You can build your
own network structures with a reasonably simple JSON schema, or
just use EvE online or TFL since that is already in the repo.
You can find all of the code for this here:
https://github.com/benjojo/eve-online-bgp-backbone
If this is the sort of madness you enjoy, you will enjoy
the last half year where I've been doing various other things
like playing battleships with BGP or Testing if RPKI works. To
keep up to date with my posting you should use the blog's RSS
feed, or follow me on twitter.
I would also like to thank Basil Fillan (Website/Twitter)
for helping out with the BIRD2 iBGP configuration in this
post.
It is worth mentioning that I am looking for a EU based
job sometime in April 2019, so if you are (or know) a company
doing interesting things that align with some of the sensible or
non sensible things I do. Please let me know!