This is episode 2 of the VPN bonding series. In the last episode we doubled our internet speed by using two VPN tunnels and bonding two openvpn tap interfaces together. This can be used to increase the internet bandwidth if you live in a rural area where no high-speed Internet is available. The Linux bash scripts for this are on my Github repository. In this second episode of the Linux VPN bonding series we will have a closer technical look at the scripts and the principles that make this possible..
The Linux bash scripts that we use to achieve VPN bonding are heavily inspired by an article on serverfault.com. During my research for this video series I realized that channel bonding and increasing internet speed is a big concern in the community. Many people are asking questions about this in the various forums.
However, I could not find that many reliable answers that actually solve the problem. Bonding ISO layer 2 devices such as Ethernet devices is nothing new in Linux. In the enterprise this is a quite common solution to increase bandwidth for example to a heavily used file, mail or database server. Sometimes in larger companies the available 1 Gbits are just not enough and a 10 Gbit infrastructure is not available.
Openvpn can use different interface types for VPN tunnels. TAP devices are quite similar to Ethernet devices and act on ISO layer 2. All you have to do in order to switch from tun to tap in openvpn is one single line in the config file of the connection and specify device tapx or tunx. When creating the interface with the mktun option you can specify a device name. If the device name starts with tap or turn then openvpn will chose the right device type for you. If you give the devices a free name then you would need to add the dev-type directive.
One catch22 in the bonding configuration is the fact that you cannot bond an interface when it is already up. Therefore we need to first create the tap device using the mktun option, then bond all devices and only then can we actually start the VPN tunnel. So step by step : First load the bonding module, then create the bonding interface, then give it an IP address and then create the tap device for each tunnel and set the bonding interface as the master. Looking at the interface configuration of our system in the Linux shell we now see the interface and the tap1, tap2 and so on interface. Please note that the tap interface has no IP address. And it will not need any as it is ISO layer 2. IP addresses are layer 3.
That means, only the interface will actually get an IP address. Once we have created all tap devices and added them to the bond, we can then bring the device up. So if we do this on both sides we can now already generate a bonded tunnel. At this point all tunnels go over the same interface though on the client side – but remember that what we want is actually aggregate multiple interfaces. By default openvpn starts in no bind mode, that means it would from a client perspective just use the default gateway to get out to the internet. If we wanted to instruct openvpn to use different interfaces, then we have two challenges.
That one actually cost me a couple of hours to figure out…First we need to make sure that the tap device binds to the right physical device. In my example I want tap1 to bind to and tap2 to bind to I have no way of specifying a physical interface anywhere in openvpn, but I can define a local IP address. And as this local IP address is bound to a specific interface, this is nearly as good as specifying the interface directly. But this brings us to the second challenge. Even though the tap interface is linked to an IP address on the interface this does not mean that Linux will systematically try to reach the internet through that interface. It would instead go through the default route or rather use the default gateway.
If we look at the available routes on our system we see that Linux has automatically added two routes, one for each interface, but it has assigned different costs or metrics to them. Linux will first try the interface with the lower cost or metric before it would go through the interface with the higher cost. This can be useful for resiliency or fail-over configurations, let’s say use a 3G connection when your DSL is down. But we want load balance over both interfaces, so how can we achieve this ? The solution is called routing tables and rules.
In fact we can tell the operating system to apply different rules for different interfaces. The client installation script has created these rules in the rt_tables file but has left them commented out, because I did not want to have any interference with the default routes as long as the bond is not up. The start bond script removes that comment using this sed command line. In order to have 2 or 3 or 4 similar configurations I am using template files. If you look at the templates you can find values that start with the “at” sign. My script replaces these with the corresponding values for each tunnel. This is again done by a couple of sed commands in the script.
Talking about sed in order to replace text with Linux, there is a couple of lines we might want to have a closer look at – in order to read out properties of the interfaces I feed the result of the ip command into an array using the read array command. Let’s have a look at this line where I actually want to find out the ip address of the physical interface let’s say . I use ip -br addr, I then grep for the interface name in order to only get the line with the interface I am looking for, then I run sed with this expression in order to replace multiple spaces with just one – that is important for the read array command that can then fill the tumpline variable.
Tumpline contains the interface name, tumpline the state and tumpline the ip4 address. Just a quick remark at this point – the Linux bash scripts are made for IPv4 only at the moment. If you only have an IPv6 address then they won’t work. I might change this in a future version. Now the IP4 address contains the network identifier, that is /8 for a Class A, /16 for a Class B or /24 for a class C net which corresponds to subnet mask 255.255.255.0. Proper subnetting is an art of its own. The line I wanted to draw your attention to is where I remove the slash something from the IP address. Usually you use slashes in regular expressions as a separator, but with sed on Linux you do not have to.
As I am replacing a slash I am just using the “at” sign as a separator. So in fact I am replacing slash something with nothing. I had to look this one up, I didn’t know that I could just use a different character here. So in a nutshell I go through the four template files, fill them with the right values and then call openvpn with that config file. Basically I am doing this at run time because your interface might use DHCP and possibly get a different IP address each time. But let’s get back to our IP rules and routes. Look at the ip rules here. We have standard rules that should be available on any system, namely local, main and default. And you can see here, that they apply to all interfaces. Now we can add rules for specific destinations or for specific sources.
What we do here is we add a rule that applies for each of the IP addresses that we are using to bind our tap devices to and tell Linux to look up a specific table for this rule. We can look at the routes in that table just by typing in route list table and then the table name. I have named the tables vpn1, vpn2 and so on. We only need one single route per table and that is just telling ip route which gateway to use on this interface, so basically we just tell it the next hop. That’s it. Let’s do this by hand for two interfaces and specify two different routes and then check with traceroute to the same server which route the packets will use. As you can see here, the route to Google’s nameserver 8.8.8.8 is different depending
on the source IP.
Routing with a source IP of the interface goes over the interface, specifying a source IP on the eth0 interface goes over my normal LAN connection. Last but not least let’s have a look at the stopbond script. All this does is that it brings the bond interface down, removes it from the system, kills all instances of openvpn and then deletes each route, rule and tap interface. From the comments on the first video I can see a strong interest on VPN performance and especially using Wireguard for this. Guys, at the moment I am not very clear on how to achieve the bonding with Wireguard other than for example using a GRETAP device over Wireguard, because Wireguard is ISO layer 3.
So in order to increase performance I thought – hey, why not just remove the encryption from the VPN ? Now before you tell me that I am mad – let’s take a step back and ask the question why we are using a VPN at all for this. We are not using a VPN for the sake of privacy or using a VPN as such, we just do because we need to – in other words, our main concern here is speed, not encryption.
And I am not saying that we remove the authentication part. That remains. We just remove the encryption. Also let’s keep in mind that a malicious man in the middle would need to read all data streams in order to read the whole traffic. But in our case they go over many different routes. Let’s do some tests here. Let’s open a VPN connection from this router to another router and watch the CPU utilization as we put some load on it.
First with encryption – here we go – then without encryption – here we go – you can clearly see the difference. Perfect – let me repeat my call to action from the last episode here. I do need your feed-back. Where should we take this next ? Having had a look at the various comments and feed backs from you and the issues you and I have ran into I would suggest the following: We will not evaluate Wireguard for the time being, but rather make OpenVPN encryption optional.
Second, I will work on a script version for OpenWRT, meaning that you could run this transparently on an openWRT router. Third, I will add options for fail over and resiliency by doing a couple of things – Making the balance strategy an option and running a watchdog that checks the latency and availability of the lines and would dynamically remove or add interfaces to the bond.
Comments
Post a Comment