meta data for this page
This is an old revision of the document!
LXC Issues
cannot stop container
Container works, responds to pings but it is not possible to SSH or attach.
Normal commands to stop or reboot doesn't help (even lxc-stop -k
).
CAUSE: Container was freezed for snapshot. All processess are in 'D' state. Cannot be killed. SOLUTION:
echo THAWED > /sys/fs/cgroup/freezer/lxc/200/freezer.state
Info:
Investigation
So killing container is solution:
pstree -p ├─lxc-start(3747487)───systemd(3747514)─┬─agetty(3748048) │ ├─agetty(3748049) kill -9 3747514
Now it is not possible to start LXC container again. Debugging:
lxc-start -o lxc-start.log -lDEBUG -F -n 200 cat lxc-start.log lxc-start 200 20210325085035.665 INFO conf - conf.c:run_script_argv:331 - Executing script "/usr/share/lxc/hooks/lxc-pve-prestart-hook" for container "200", config section "lxc" lxc-start 200 20210325085036.126 DEBUG conf - conf.c:run_buffer:303 - Script exec /usr/share/lxc/hooks/lxc-pve-prestart-hook 200 lxc pre-start produced output: failed to remove directory '/sys/fs/cgroup/net_cls/lxc/200/ns': Device or resource busy
Reason: after killing container systemd, orphaned cgroups left.
find /sys/fs/cgroup/*/lxc/200 -depth -type d -print -delete # Lot of errors: find: cannot delete ‘/sys/fs/cgroup/freezer/lxc/200’: Device or resource busy
All processess from container 200 are in 'D' state Uninterruptible Sleep
ps axl | awk '$10 ~ /D/'
echo w > /proc/sysrq_trigger [587314.999001] smbd D 0 1181293 42630 0x00004184 [587314.999002] Call Trace: [587314.999004] __schedule+0x2e6/0x6f0 [587314.999005] schedule+0x33/0xa0 [587314.999007] __refrigerator+0x44/0x160 [587314.999009] get_signal+0x814/0x850 [587314.999011] do_signal+0x34/0x6e0 [587314.999013] ? wait_woken+0x80/0x80 [587314.999014] ? __audit_syscall_exit+0x236/0x290 [587314.999016] exit_to_usermode_loop+0x90/0x130 [587314.999018] do_syscall_64+0x160/0x190 [587314.999020] entry_SYSCALL_64_after_hwframe+0x44/0xa9
So it looks like whole container cgroup was freezed for snapshot and problem happens.
nested docker in cpulimit
Gitlab runner fails to start docker executor:
ERROR: Job failed (system failure): prepare environment: Error response from daemon: OCI runtime create failed: container_linux.go:367: starting container process caused: process_linux.go:495: container init caused: process_linux.go:458: setting cgroup config for procHooks process caused: failed to write "2400000": write /sys/fs/cgroup/cpu,cpuacct/docker/af4fd93c304a3edc9edb85da6f7a7f9ec85a15262db37393a22141686647d060/cpu.cfs_quota_us: invalid argument: unknown (exec.go:57:0s). Check https://docs.gitlab.com/runner/shells/index.html#shell-profile-loading for more information
Reason: cpulimit
was set on container in PVE
Reproduction:
# works: docker run -it busybox # problem: docker run --cpuset-cpus='0' --cpus=1 --cpu-shares=256 -it busybox
Failed to set up mount namespacing: Permission denied
Inside LXC CT: Long ssh login delay, lots of errors in journal
# journalctl gru 28 08:18:48 hostname systemd[860]: systemd-logind.service: Failed to set up mount namespacing: /run/systemd/unit-root/proc: Permission denied gru 28 08:18:54 hostname nmbd[106]: This response was from IP 192.168.12.45, reporting an IP address of 172.16.0.131. gru 28 08:19:02 hostname systemd[866]: systemd-logind.service: Failed to set up mount namespacing: /run/systemd/unit-root/proc: Permission denied gru 28 08:19:10 hostname sshd[783]: pam_systemd(sshd:session): Failed to create session: Failed to activate service 'org.freedesktop.login1': timed out (service_start_timeout=25000ms) gru 28 08:19:14 hostname systemd[877]: systemd-logind.service: Failed to set up mount namespacing: /run/systemd/unit-root/proc: Permission denied