Networking

https://developers.redhat.com/blog/2018/10/22/introduction-to-linux-interfaces-for-virtual-networking/

In default docker networking:

  • traffic from container to world is NATed.
  • incoming traffic for mapped port is handled by docker-proxy process

Setup default subnet for containers

By default docker try to choose not conflicting addressing scheme:

172.17.0.0/16", 172.18.0.0/16", "172.19.0.0/16", "172.20.0.0/14", "172.24.0.0/14" "172.28.0.0/14", "192.168.0.0/16"

but sometimes it doesn't work.

| /etc/docker/daemon.json
{
    "default-address-pools":
    [
	{"base":"172.17.0.0/16","size":24}
    ]
}

Will assign 16 bit class for docker daemon, and docker daemon will create 24 bit network per each network.

Another example:

| /etc/docker/daemon.json
{
  "bip": "10.200.0.1/24",
  "default-address-pools":[
    {"base":"10.201.0.0/16","size":24},
    {"base":"10.202.0.0/16","size":24}
  ]
}

How to connect container to real network

Idea is, how to start multiple containers, serving different services on the same port, but different IP. Similar to use bridged network with VirtualBox.

http://blog.oddbit.com/2014/08/11/four-ways-to-connect-a-docker/ http://stackoverflow.com/questions/26539727/giving-a-docker-container-a-routable-ip-address

Use IP alias

Simply add additional IP to one of host network interfaces. Then use port mapping to host IP:port during Docker start. PROBLEM: If there is a host service listening on all interfaces, it is not possible to use conflicting port.

Host DNAT

NOT TESTED.

  • Use IP alias like above.
  • Use Docker container in classic ways - expose container services on non conflicting host ports.
  • Use iptables with DNAT to redirect traffic to given IP:port to container exposed port.
  • Use iptables with SNAT to set correct originating address

Use real host bridge

Set docker internal bridge name to real one on host. NOTE: Docker will manipulate host bridge (assign configured address!) https://serverfault.com/questions/958367/how-do-i-give-a-docker-container-its-own-routable-ip-on-the-original-network

Connect docker bridge with real host bridge

Reference: https://linux-blog.anracom.com/tag/linux-bridge-linking/

Create virtual adapter pair:

ip link add dev veth_docker_lan type veth peer name veth_br-lan

Add each adapter to one of bridges:

brctl addif docker_lan veth_docker_lan
ip link set veth_docker_lan up 
 
brctl addif br-lan veth_br-lan
ip link set veth_br-lan up 

ISSUES: In theory it works, but problem with itpables and routing. Conntrack cannot see packets (different NS ?), so all packets are treated as INVALID on firewall.

From https://developers.redhat.com/blog/2018/10/22/introduction-to-linux-interfaces-for-virtual-networking/#macvlan:

  Before MACVLAN, if you wanted to connect to physical network from a VM or namespace, you would have needed to create TAP/VETH devices and attach one side to a bridge and attach a physical interface to   the bridge on the host at the same time, as shown below.
  Now, with MACVLAN, you can bind a physical interface that is associated with a MACVLAN directly to namespaces, without the need for a bridge.

MACVLAN & IPVLAN

Linux Kernel drivers

  • bridge - gives connectivity between endpoints, but external access requires NAT
  • macvlan - to expose endpoints directly to LAN (can get address from network DHCP server)
    • macvlan needs to be used in cases where common dhcp server is used since dhcp server would need unique mac address which ipvlan does not have.
    • PROBLEMS:
      • The switch the host is connected to may have a policy that limits the number of different MAC addresses on a physical port.
      • Many NICs have a limit on the number of MAC addresses they support in hardware. Exceeding the limit may affect the performance.
      • IEEE 802.11 doesn’t like multiple MAC addresses on a single client. It is likely macvlan sub-interfaces will be blocked by your wireless interface driver, AP or both
  • ipvlan - ipvlan.txt similar to macvlan, but endpoints have the same MAC address. Ipvlan has two modes of operation. Only one of the two modes can be selected on a single parent interface. All sub-interfaces operate in the selected mode:
    • L2 - bridge mode (requires external router if endpoints are in different networks)
    • L3 - packets are routed between endpoints (without touching TTL)
    • ipvlan - should be used in cases where some switches restrict the maximum number of mac address per physical port due to port security configuration.
      • use it if parent interface is wireless
    • PROBLEMS:
      • Shared MAC address can affect DHCP operations. If your VMs or containers use DHCP to acquire network settings, make sure they use unique ClientID in the DHCP request and ensure your DHCP server assigns IP addresses based on ClientID, not client’s MAC address.
      • Autoconfigured EUI-64 IPv6 addresses are based on MAC address. All VMs or containers sharing the same parent interface will auto-generate the same IPv6 address. Ensure that your VMs or containers use static IPv6 addresses or IPv6 privacy addresses and disable SLAAC.

NOTE: Both modes requires support from HW to use multiple MAC. Without it device needs to be switched into promiscuous mode, which is not easy. Working methods:

  • Virtualbox on host machine - during host machine startup it sets own driver

macvlan details

https://hicu.be/bridge-vs-macvlan Macvlan modes:

  • private - frames are sent into cable. But even if exernal switch forwards packets back according to mac address, packet will be dropped.
  • VEPA - alla frames are sent int cable. External switch has to forward it back to provide communication between maclvan interfaces.
    • IEEE 802.1Qbg aka Virtual Ethernet Port Aggregator physical switch
  • Bridge - all macvlan interfaces bridged internally. Traffic between macvlans are forwarded locally. Broadcast packets are formwared locally and into the cable. But if external switch reflects packets, packets are filtered to prevent duplicates.
  • passtrhru - assign real physical interface for single VM (and gives full controll to interface)

Issue with bridge:

  • macvlan0 added to host bridge works bad. It doesn't receive broadcast packets (so it cannot respond to ARP request). When device inside container is switched into promisc mode, everything starts working well.

macvlan example

There can be only one macvlan network with the same subnet and gateway. So better is to create network manually:

docker network create --driver=macvlan \
-o parent="br0" \
-o mode="bridge" \
--subnet="192.168.0.0/22" \
--gateway="192.168.0.1" \
real_lan

and then attach containers to existing network:

docker-compose.yml
version: '2'

services:
  myservice:
    networks:
      lan:
         ipv4_address: "192.168.0.241"

networks:
   lan:
        external:
            name: real_lan

or

docker network connect --ip="192.168.0.241" real_lan myservice

communication with host

Linux Macvlan interface types are not able to ping or communicate with the default namespace IP address. 
For example, if you create a container and try to ping the Docker host's eth0 it will not work. 
That traffic is explicitly filtered by the kernel to offer additional provider isolation and security. 
This is a common gotcha when a user first uses those Linux interface types since it is natural to ping local addresses when testing.

http://blog.oddbit.com/2018/03/12/using-docker-macvlan-networks/ - especially comments with solution to force docker to use existing bridge on host to avoid macvlan.

https://raid-zero.com/2017/08/02/exploring-docker-networking-host-none-and-macvlan/3/

As noted before, there is no possibility to communicate with real host IP address.

(possible solution with separate routed network and IPVlan L3 mode)

Workaround: host IP address has to be removed from network interface and used in macvlan interface. Do not touch real host IP address it is enough to use address-less routing.

Scenario:

  • br0 - host interface with IP 192.168.0.231 (bridged eth0)
  • 192.168.0.241 - docker container connected to real_lan
  • 192.168.0.242 - docker container connected to real_lan (openvpn daemon)

Create macvlan0 interface with dummy IP address, and route traffic to dockers into this interface:

/etc/network/interfaces
auto macvlan0
iface macvlan0 inet static
    address 192.168.143.91/32
    pre-up ip link add macvlan0 link br0 type macvlan mode bridge
    post-down ip link del macvlan0 link br0 type macvlan mode bridge
    post-up ip r add 192.168.0.242 dev macvlan0 src <host_ip>
    post-up ip r add 192.168.0.241 dev macvlan0 src <host_ip>

communication from containers to macvlan container

Problem: containers with default network settings (172.22.0.0) cannot communicate with 192.168.0.242 Docker creates default outgoing NAT rules for every container. From unknown reason automatic NAT rule using MASQUERADE makes SNAT translation to dummy macvlan0 IP 192.168.143.91 despite of routing rule which should force source address to <host_ip>. Solution for this is to insert own rule before docker's rules:

/etc/network/interfaces
auto macvlan0
iface macvlan0 inet static
    address 192.168.143.91/32
    pre-up ip link add macvlan0 link br0 type macvlan mode bridge
    post-down ip link del macvlan0 link br0 type macvlan mode bridge
    post-up ip r add 192.168.0.242 dev macvlan0 src <host_ip>
    post-up ip r add 192.168.0.241 dev macvlan0 src <host_ip>
    post-up iptables -t nat -I POSTROUTING -d 192.168.0.242 -j SNAT --to-source <host_ip>
    post-up iptables -t nat -I POSTROUTING -d 192.168.0.241 -j SNAT --to-source <host_ip>
   

openvpn

After above fixes, there is traffic from docker 192.168.0.242 to host 192.168.0.231 and vice versa. OpenVPN traffic also works to any host in network 192.168.0.0 but not to host 192.168.0.231. As workaround simple I've enabled NAT on OpenVPN docker:

iptables -t nat -A POSTROUTING -s 10.1.1.0/24 -o eth0 -d 192.168.0.231 -j MASQUERADE