AWMN @ oZoNet

You are here: Home > Mirrors > Policy routing

Policy routing

Posted by Utumno on Wed 29 Mar 2006 at 07:21

Here's a brief tutorial how to connect a single server to 'the Internet' using multiple physical connections and route various services over different interfaces using a mechanism called 'policy routing'.

The Situation
I've got a home-built machine running Debian sid. It serves as my desktop, and also runs my personal weblog, a SSHd server and a small forum. Everything is connected to the Internet thru a DSL line (PPPoE mode, for a description of all possible DSL modes see DSL-HOWTO ).

Link speed is 1Mbps download / 64 Kbps upload. However, I can make 4 concurrent PPPoE connections and each one is going to achieve this speed.

The Problem
My connection speed is a bit too low to run the forum, use P2P and also confortably connect through SSH from the office (evil grin). However, I've got 3 connections sitting unused. So, the idea is to combine all 4 together and either use some kind of load-balancing setup or route given services through separate interfaces.

The Effect
At the end of this article, we are going to arrive at a setup where the server is connected through two PPPoE connections. The only open ports on ppp0 are 80 and 22 ( also some icmp ) and on ppp1 - tcp/udp 4662, Overnet server port, and tcp 4001, MLDonkey GUI port.

All packets sent by user 'mldonkey' are routed through ppp1, while all the rest is routed through ppp0. Using P2P no longer interferes with the rest of my networking activity.

The News
I am going to use just two connections, both of them through the same ISP and gateway. The good news is that this is by no means Linux' limitation; in fact, one can set up n connections to n different providers at the same time. Furthermore, besides routing decisions based on owner of the process who sends packets ( like in my case ) one can route packets based on many other criterions, like the TOS field, destination/source IP or incoming interface. One can even achieve a load-balancing setup by randomizing the route.

Stuff we need
First of all, we need some modules present in the kernel:

IP_ADVANCED_ROUTER
IP_MULTIPLE_TABLES
IP_ROUTE_FWMARK

and possibly IP_ROUTE_MULTIPATH if you're aiming at a load-balanced setup. All of them can be found in Networking -> Networking Support -> Networking Options -> TCP/IP networking in 2.6.15 kernel configuration. The 2.6.8 shipped with Sarge contains all of those compiled as modules.

We also need the excellent 'iproute2' and 'iptables' userspace written by Alexey Kuznetsov:
apt-get install iproute iptables


Step 1
First, we are going to bring up two concurrent PPPoE connections.

We will use two separate network cards on the server and a ADSL router with a 4-port switch (running in 'bridging' mode). We're going to use eth0 to make PPPoE connection 'ppp0', and eth1 ppp1. I am going to assume the server already has one working PPPoE connection, and the connection was configured with Debian standard utility 'pppoeconf'.
'pppoeconf' creates a configuration file in /etc/ppp/peers/ by default named 'dsl-provider'. Here are it's contents: ( without the comments, I accepted all of pppoeconf's defaults )
noipdefault
usepeerdns
defaultroute
hide-password
lcp-echo-interval 20
lcp-echo-failure 3
connect /bin/true
noauth
persist
mtu 1492
noaccomp
default-asyncmap
plugin rp-pppoe.so eth0
user "your username here"
With such configuration, connection can be made with the command
pon dsl-provider
So, in order to create a second connection,
1) connect eth1 to a port in your DSL router ( doh! )
2) create file /usr/ppp/peers/dsl-connection2' which looks like this
noipdefault
usepeerdns
#defaultroute
hide-password
lcp-echo-interval 20
lcp-echo-failure 3
connect /bin/true
noauth
persist
mtu 1492
noaccomp
default-asyncmap
plugin rp-pppoe.so eth1
user "your username here"
i.e. the differences are:
- 'defaultroute' is commented out
- the forelast line tells the Roaring Penguin 'rp-pppoe.so' driver to connect through eth1.
3) create the second connection with
pon dsl-provider2
4) make this setup permanent across reboots with adding
auto dsl-provider2
iface dsl-provider2 inet ppp
provider dsl-provider2
pre-up /sbin/ifconfig eth1 up # line maintained by pppoeconf

auto eth1
iface eth1 inet manual
to /etc/network/interfaces. At this point it is beneficial to use ifrename or udev to make interface names consistant across reboots.

At this time we should have two independent PPPoE connections. The 'ppp1' is useless, though, because the only routing table that is currently used - 'main' - tells the system to route everything through ppp0: ( MY.GA.TE.WAY is, obviously, IP of my gateway )
angband:/etc/ppp/peers# route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
MY.GA.TE.WAY 0.0.0.0 255.255.255.255 UH 0 0 0 ppp0
MY.GA.TE.WAY 0.0.0.0 255.255.255.255 UH 0 0 0 ppp1
10.0.0.0 0.0.0.0 255.0.0.0 U 0 0 0 eth2
0.0.0.0 0.0.0.0 0.0.0.0 U 0 0 0 ppp0


Step 2
Now we will create two additional routing tables 'PPP0' and 'PPP1'. As root:

1)create aliases for the routing tables
angband:/etc# echo 200 PPP0 >> /etc/iproute2/rt_tables
angband:/etc# echo 201 PPP1 >> /etc/iproute2/rt_tables
2) use 'ip' to create rules for table PPP0: just add a route to the gateway, and then a default route through ppp0:
angband:/etc# ip route add MY.GA.TE.WAY dev ppp0 table PPP0
angband:/etc# ip route add default via MY.GA.TE.WAY dev ppp0 table PPP0
3) the same for PPP1
angband:/etc# ip route add MY.GA.TE.WAY dev ppp1 table PPP1
angband:/etc# ip route add default via MY.GA.TE.WAY dev ppp1 table PPP1
You now can list the contents of your new routing tables with
angband:/etc/iproute2# ip route list table PPP0
MY.GA.TE.WAY dev ppp0 scope link
default via MY.GA.TE.WAY dev ppp0
angband:/etc/iproute2# ip route list table PPP1
MY.GA.TE.WAY dev ppp1 scope link
default via MY.GA.TE.WAY dev ppp1
At this point, 'ppp1' is still useless because the new routing tables are not used at all yet.

Step 3
So, how do we now route mldonkey's packets through ppp1? First, we mark all such packets with the following rule in the 'mangle' table:
iptables -t mangle -A OUTPUT -m owner --uid-owner 108 -j MARK --set-mark 1
'108' is the user id of mldonkey. The above rule will stamp all packets produced by user with such uid with a so-called 'FWMARK' equal to 1 ( all that before any routing decision is made )

A sidenote: here we took advantage of the fact that MLdonkey, as it is packaged in Debian, runs as a dedicated user 'mldonkey'. But what if you need to route some other system service that does not have its own user and runs as root, say, SSHd? Use the '--cmd-owner' parameter:
iptables -t mangle -A OUTPUT -m owner --cmd-owner sshd -j MARK --set-mark 1 
( another sidenote: AFAIK, the '--cmd-owner' flag does not work in recent ( >= 2.6.15 ) kernels )


Second, some of the promised 'policy routing' :
ip rule add fwmark 1 pri 100 table PPP1
That in turn tells the kernel to use table 'PPP1' when routing all packets marked with an FWMARK equal to 1.

However, there are a few more tricks we have to perform until this setup starts to work. For one, it is possible that the outgoing mldonkey's packets are already stamped with a source address which is different from the interface they're going out on. We have to remedy that using NAT:
iptables -t nat -A POSTROUTING -o ppp1 -j SNAT --to-source=I.P.OF.PPP1
where I.P.OF.PPP1 is, of course, ppp1's IP.

One more problem is that we have to disable rp_filter:
echo 0 > /proc/sys/net/ipv4/conf/ppp1/rp_filter 
rp_filter is a functionality which automatically rejects incoming packets if the routing table entry for their source address doesn't match the network interface they're arriving on. Normally, this has security advantages because it prevents the so-called IP-spoofing, but in our situation ( several IP addresses on different interfaces ) it can pose problems.

Next, we have to route all packets coming from interface X back through that interface:
ip rule add from I.P.OF.PPP0 pri 200 table PPP0
ip rule add from I.P.OF.PPP1 pri 300 table PPP1
The 'pri' ( short for 'priority' ) parameter controls the precedence in which the rules are applied. Routing algorithm goes from priority 0 upwards and applies the first matching rule. Notice that those two rules have to have a highier priority than the 'fwmark' one. ( also notice that, somewhat illogically, 'highier priority' here really means 'lower importance' ).

The rule table should now look like this:
angband:/etc/iproute2# ip rule list
0: from all lookup local
100: from all fwmark 0x1 lookup PPP1
200: from I.P.OF.PPP0 lookup PPP0
300: from I.P.OF.PPP1 lookup PPP1
32766: from all lookup main
32767: from all lookup default


At this point, everything should work correctly: all services ( except mldonkey ) should work through ppp0 normally, and if you start mldonkey and listen on ppp1:
/etc/init.d/mldonket-server start
tcpdump -i ppp1
you should see it working on ppp1.

Now if anyone initiates communication from outside using IP of interface ppp1, traffic will come out also through ppp1. Thus, even though when a SSH or Apache connection is initiated from inside the server it is always going to use ppp0, from outside you can connect to both of them through either ppp0 or ppp1.

Step 4
Let's now put everything together and make this setup permanent across reboots. While we are at it, let's also apply some firewalling so that only ports we actually need open are open.

I am going to achieve that in arguably not the best way: one script (/etc/init.d/my_initscript) will be added to initscripts ( this is the one that sets up firewalling and basic rules for our multipath setup ) and another (/usr/local/bin/check_ip) will run in a cronjob ( this one will watch if ppp0 and ppp1 are still up, bring them back up if not and adjust routing rules that depend on ppp0 and ppp1's IPs ( my IP is dynamic and my ISP kicks me out once every 3 days) )

Here goes the initscript, I hope the comments inside are sufficient:
#!/bin/sh

case "$1" in
start)

echo "Setting up firewall rules..."


IPTABLES=/sbin/iptables
INTERNAL_IFACE=eth2
EXTERNAL_IFACE0=ppp0
EXTERNAL_IFACE1=ppp1
INTERNAL_IP=10.0.0.1
INTERNAL_NETWORK=10.0.0.0/24

# Start with a tough policy.
$IPTABLES -P INPUT DROP
$IPTABLES -P FORWARD DROP
$IPTABLES -P OUTPUT ACCEPT

# clean up
$IPTABLES -F
$IPTABLES -X
$IPTABLES -Z

# Filtering section
# INPUT chain
# We want to allow ONLY:
# 1. local (loopback) traffic
# 2. traffic from the Internet that is part of an existing connection (no new connections)

$IPTABLES -A INPUT -i lo -j ACCEPT
$IPTABLES -m state -A INPUT -i $EXTERNAL_IFACE0 --state ESTABLISHED,RELATED -j ACCEPT
$IPTABLES -m state -A INPUT -i $EXTERNAL_IFACE1 --state ESTABLISHED,RELATED -j ACCEPT

# on ppp0, allow SSH and HTTP
$IPTABLES -A INPUT -p tcp -m tcp --dport 22 -i $EXTERNAL_IFACE0 -j ACCEPT
$IPTABLES -A INPUT -p tcp -m tcp --dport 80 -i $EXTERNAL_IFACE0 -j ACCEPT

# in ppp1, allow mldonkey
$IPTABLES -A INPUT -p tcp -m tcp --dport 4662 -i $EXTERNAL_IFACE1 -j ACCEPT
$IPTABLES -A INPUT -p udp -m udp --dport 4662 -i $EXTERNAL_IFACE1 -j ACCEPT
$IPTABLES -A INPUT -p tcp -m tcp --dport 4001 -i $EXTERNAL_IFACE1 -j ACCEPT

# everywhere, allow ping and traceroute
$IPTABLES -A INPUT -p icmp -m icmp --icmp-type 0 -j ACCEPT
$IPTABLES -A INPUT -p icmp -m icmp --icmp-type 8 -j ACCEPT
$IPTABLES -A INPUT -p icmp -m icmp --icmp-type 3 -j ACCEPT
$IPTABLES -A INPUT -p icmp -m icmp --icmp-type 11 -j ACCEPT
$IPTABLES -A INPUT -p icmp -m icmp --icmp-type 30 -j ACCEPT
$IPTABLES -A INPUT -p udp -m state --state ESTABLISHED -j ACCEPT
$IPTABLES -A INPUT -p icmp -m state --state RELATED,ESTABLISHED -j ACCEPT


# limit the number of incoming connections on port 22 ( SSH ) to 3 attempts a minute
$IPTABLES -I INPUT -p tcp --dport 22 -i $EXTERNAL_IFACE0 -m state --state NEW -m recent --set
$IPTABLES -I INPUT -p tcp --dport 22 -i $EXTERNAL_IFACE0 -m state --state NEW -m recent --update --seconds 60 --hitcount 3 -j DROP

# allow all ports on the internal interface
$IPTABLES -A INPUT -p udp -m udp -i $INTERNAL_IFACE --dport 1:65000 -j ACCEPT
$IPTABLES -A INPUT -p tcp -m tcp -i $INTERNAL_IFACE --dport 1:65000 -j ACCEPT

# mark all packets from mldonkey ( uid=108 ) so that later on we can route them thru ppp1
$IPTABLES -t mangle -A OUTPUT -m owner --uid-owner 108 -j MARK --set-mark 1

# switch off rp_filter ( otherwise packets coming back thru ppp1 get dropped by it )
echo 0 > /proc/sys/net/ipv4/conf/all/rp_filter

# hack: go around the case that we can get assigned the same IP
# as we had before the reboot
DATE=`date +"%F %r"`
echo "$DATE REBOOT 1.1.1.1" >> /var/log/check_ppp0
echo "$DATE REBOOT 1.1.1.1" >> /var/log/check_ppp1

# call the cronjob script to complete the work for us
# ( for example, set up the NAT rule - I can't do it here
# because I dont know ppp1's IP yet )
echo "Saving new ip..."
/usr/local/bin/check_ip 2> /var/log/check_errors

# add a routing policy to route all packets marked with a '1' thru the PPP1 table
ip rule add fwmark 1 pri 100 table PPP1

;;

stop)

;;

esac
Let's call this script '/etc/init.d/my_initscript' and add it to rc.d:
update-rc.d my_initscript defaults
And here's the /usr/local/bin/check_ip script:
#!/bin/sh

DIGITS="[0-9]\{1,3\}"
IP="$DIGITS\.$DIGITS\.$DIGITS\.$DIGITS"
DATE=`date +"%F %r"`
SLEEP=10
MAXTRIES=5

###############################################################################
#### check if PPP0 is up, if not, bring it up and remember its new IP

DEVICE0="ppp0"
LOGFILE0="/var/log/check_${DEVICE0}"
CURRENT_IP0=`/sbin/ifconfig $DEVICE0 | sed -n "s/.*addr:\($IP\) .*/\1/p"`

if [ "x$CURRENT_IP0" = "x" ]
then
pon dsl-provider > /dev/null
sleep $SLEEP
CURRENT_IP0=`/sbin/ifconfig $DEVICE0 | sed -n "s/.*addr:\($IP\) .*/\1/p"`
COUNTER=0

while [ "x$CURRENT_IP0" ="x" -a $COUNTER -le $MAXTRIES ]
do
echo "$DATE Waiting for device $DEVICE0 for the $COUNTER time..." >> $LOGFILE0
let "COUNTER += 1"
sleep $SLEEP
CURRENT_IP0=`/sbin/ifconfig $DEVICE0 | sed -n "s/.*addr:\($IP\) .*/\1/p"`
done

if [ $COUNTER -gt $MAXTRIES ]
then
echo "$DATE Failed to bring up device $DEVICE0, giving up..." >> $LOGFILE0
exit 1
fi
fi

if [[ -e $LOGFILE0 ]]
then
LAST_IP0=`cat $LOGFILE0 | grep $IP | tail -1 | sed -n "s/.*\ \(.*\)/\1/p"`
fi

###############################################################################
#### same for PPP1

DEVICE1="ppp1"
LOGFILE1="/var/log/check_${DEVICE1}"
CURRENT_IP1=`/sbin/ifconfig $DEVICE1 | sed -n "s/.*addr:\($IP\) .*/\1/p"`


if [ "x$CURRENT_IP1" = "x" ]
then
pon dsl-provider2 > /dev/null
sleep $SLEEP
CURRENT_IP1=`/sbin/ifconfig $DEVICE1 | sed -n "s/.*addr:\($IP\) .*/\1/p"`
COUNTER=0

while [ "x$CURRENT_IP1" ="x" -a $COUNTER -le $MAXTRIES ]
do
echo "$DATE Waiting for device $DEVICE1 for the $COUNTER time..." >> $LOGFILE1
let "COUNTER += 1"
CURRENT_IP1=`/sbin/ifconfig $DEVICE1 | sed -n "s/.*addr:\($IP\) .*/\1/p"`
sleep $SLEEP
done

if [ $COUNTER -gt $MAXTRIES ]
then
echo "$DATE Failed to bring up device $DEVICE1, giving up..." >> $LOGFILE1
exit 1
fi
fi


if [[ -e $LOGFILE1 ]]
then
LAST_IP1=`cat $LOGFILE1 | grep $IP | tail -1 | sed -n "s/.*\ \(.*\)/\1/p"`
fi

###############################################################################
#### Save new IP of ppp1; re-create ip rules that depend on ppp1's IP

if [ "x$LAST_IP1" != "x$CURRENT_IP1" ]
then
echo "$DATE $CURRENT_IP1" >> $LOGFILE1

ip rule del from $LAST_IP1 pri 300 table PPP1
ip rule add from $CURRENT_IP1 pri 300 table PPP1

GATEWAY1=`/sbin/ifconfig $DEVICE1 | sed -n "s/.*P-t-P:\(.*\)\ .*/\1/p"`

ip route add $GATEWAY1 dev $DEVICE1 table PPP1
ip route add default via $GATEWAY1 dev $DEVICE1 table PPP1

iptables -t nat -D POSTROUTING 1
iptables -t nat -A POSTROUTING -o $DEVICE1 -j SNAT --to-source=$CURRENT_IP1
fi

###############################################################################
#### same for ppp0 + save it's new IP in an external server.

if [ "x$LAST_IP0" != "x$CURRENT_IP0" ]
then
# save my new ip to my server with a static ip.
# CGI scripts there redirect the traffic back to my home server.
# I know I can do that with DynDNS, but somehow like to do
# everything by myself :) Here we have to have authorized_keys
# set up for this to work.
echo $CURRENT_IP0 | ssh MY_SERVER_WITH_STATIC_IP 'cat > ~/html/server/current_ip'
RET=$?
echo `date +"%F %r"` "new ip saved, ssh returned $RET" >> $LOGFILE0
echo "$DATE $CURRENT_IP0" >> $LOGFILE0

ip rule del from $LAST_IP0 pri 200 table PPP0
ip rule add from $CURRENT_IP0 pri 200 table PPP0

GATEWAY0=`/sbin/ifconfig $DEVICE0 | sed -n "s/.*P-t-P:\(.*\)\ .*/\1/p"`

ip route add $GATEWAY0 dev $DEVICE0 table PPP0
ip route add default via $GATEWAY0 dev $DEVICE0 table PPP0
fi
Add this script to root's cronjob with 'crontab -e' :
# m h  dom mon dow   command
# every 5 minutes check if we are still connected to DSL;
# if not, reconnect and save the new IP to MY_SERVER_WITH_STATIC_IP.
*/5 * * * * /usr/local/bin/check_ip 2> /var/log/check_errors
Voilla, we are done!

TODO
- implement the scripts from the last section as an if-up and a small cronjob which only checks if the interface is still up, and if not, brings it up with 'ifup interface'.
- implement ideas from Advanced Routing HOWTO, Section 15.8, to shape traffic on ppp0
- make everything more robust
- how about rather than running a cronjob once every 5 minutes and potentially being down for those 5 minutes, we were able to be notified when a network interface goes down ? Anybody has some suggestions how it could be done?

nach oben