Como investigar “Unidades com falha” no CoreOS

Quando criei um cluster Kubernetes com CoreOS, um dos nós do CoreOS reivindicou “Unidades com falha” quando fiz login nele:

$ ssh -i ~/.ssh/key.pem core@xxx.xxx.xxx.xxx
Last login: Mon May 23 04:43:57 2016 from yyy.yyy.yyy.yyy
CoreOS beta (1010.3.0)
Update Strategy: No Reboots
Failed Units: 5
var-lib-kubelet-plugins-kubernetes.io-awsx2debs-mounts-aws-apx2dnortheastx2d1c-volx2d8c3b1734.mount
var-lib-rkt-pods-run-bf6f1c19x2d7bc0x2d4931x2d885ax2d811cc236973a-stage1-rootfs-opt-stage2-hyperkube-rootfs-var-lib-kubelet-plugins-kubernetes.io-awsx2debs-mounts-aws-apx2dnortheastx2d1c-volx2d8c3b1734.mount
docker
-0bd19e353b40194e4bcc35172fa5b954ef2ba366121de2163f07f557bbcd170a.scope
locksmithd
.service
polkit
.service

Você pode obter mais informações com systemctl --failed:

$ systemctl --failed
UNIT LOAD ACTIVE SUB DESCRIPTION

var-lib-kubelet-plugins-kubernetes.io-awsx2debs-mounts-aws-apx2dnortheastx2d1c-volx2d8c3b1734.mount loaded failed failed /var/lib/kubelet/plugins/kubernetes.io/aws-ebs/mounts/aws/ap-northeast-1c/vol-8c3b1734
var-lib-rkt-pods-run-bf6f1c19x2d7bc0x2d4931x2d885ax2d811cc236973a-stage1-rootfs-opt-stage2-hyperkube-rootfs-var-lib-kubelet-plugins-kubernetes.io-awsx2debs-mounts-aws-apx2dnortheastx2d1c-volx2d8c3b1734.mount loaded failed failed
docker-0bd19e353b40194e4bcc35172fa5b954ef2ba366121de2163f07f557bbcd170a.scope loaded failed failed docker container 0bd19e353b40194e4bcc35172fa5b954ef2ba366121de2163f07f557bbcd170a
locksmithd.service masked failed failed locksmithd.service
polkit.service loaded failed failed Authorization Manager

LOAD
= Reflects whether the unit definition was properly loaded.
ACTIVE
= The high-level unit activation state, i.e. generalization of SUB.
SUB
= The low-level unit activation state, values depend on unit type.

5 loaded units listed. Pass --all to see loaded but inactive units, too.
To show all installed unit files use 'systemctl list-unit-files'.

Você pode obter o status da unidade específica systemctl status ...:

$ systemctl status locksmithd.service
locksmithd.service
Loaded: masked (/dev/null)
Active: failed (Result: resources) since Tue 2016-05-17 06:34:35 UTC; 6 days ago
Main PID: 758 (code=exited, status=1/FAILURE)

Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable.

Você pode listar todas as unidades por systemctl list-units:

$ systemctl list-units

Se você acha que a falha é apenas uma falha temporária, execute o seguinte:

$ sudo systemctl reset-failed

Em seguida, verifique se está tudo bem:

$ systemctl --failed
0 loaded units listed. Pass --all to see loaded but inactive units, too.
To show all installed unit files use 'systemctl list-unit-files'.

Você pode fazer mais coisas systemctl. Verifique os systemctl --helpdetalhes. Aproveite!