Improve cleanup (--delete-all) usage
It cleans all orphaned images and volumes on each kubernetes node:
docker system prune --force --all --volumes
Also it now fixes the requirement for an override file when
'--clean-only' is used and it adds more kubernetes resource object to
redeploy.
+ improve help description
+ add use_help to not print all help when not asked
+ fix function name: redeploy_component -> undeploy_component
+ fix messages on some systems by switching echo to printf
+ fix grep to not match (e.g.) onap-bars when onap-bar was requested
+ add trap and fix terminal colors on abort
Issue-ID: OOM-2074
Change-Id: I54958d6e97febbda461bfb68f3829b002e7200b4
Signed-off-by: Petr Ospalý <p.ospaly@partner.samsung.com>
diff --git a/tools/helm-healer.sh b/tools/helm-healer.sh
index 0ec7a3d..0bfe013 100755
--- a/tools/helm-healer.sh
+++ b/tools/helm-healer.sh
@@ -54,72 +54,84 @@
(-D|--delete-all)]
[-C|--clean-only]
- Usage 1 (simple heuristics - redeploy failed components):
+EXAMPLES
+
+ Usage 1: (simple heuristics - redeploy failed components):
${CMD} -n onap -f /some/override1.yml -s /dockerdata-nfs
- Usage 2 (redeploy ONLY explicit listed components):
+ Usage 2: (redeploy ONLY explicitly listed components):
${CMD} -n onap -f /some/override1.yml -s /dockerdata-nfs \\
-c onap-aaf -c onap-sdc -c onap-portal
- Usage 3 (delete EVERYTHING and redeploy):
- ${CMD} -n onap -f /some/override1.yml -s /dockerdata-nfs \\
- --delete-all
+ Usage 3: (delete EVERYTHING and redeploy):
+ ${CMD} -n onap -f /some/override1.yml -s /dockerdata-nfs --delete-all
- Usage 4 (just clean - do not redeploy)
- ${CMD} -n onap -f /some/override1.yml -s /dockerdata-nfs \\
- --delete-all --clean-only
+ Usage 4: (delete EVERYTHING and DO NOT redeploy - clean env.)
+ ${CMD} -n onap -s /dockerdata-nfs --delete-all --clean-only
- Namespace argument and at least one override file are mandatory
- for this script to execute. Also you must provide path to the
- storage or explicitly request to not delete file storage of the
- component.
+NOTES
- Storage should be directory where persistent volume resides. It
- will work only if component created a persistent volume with the
- same filename as its release name. Otherwise no effect. The
- exception is when '--delete-all' is used - in that case all
- content of the storage is deleted (because ONAP is not consistent
- with the volume directory names - eg.: sdnc).
+ Namespace argument (always) and at least one override file (if you don't
+ use '--delete-all') are mandatory for this script to execute. Also you must
+ provide path to the storage ('--storage') OR explicitly request to not
+ delete file storage of the component ('--no-storage-deletion').
- CAUTION 1: filename of an override file cannot contain whitespace!
- This is actually helm/onap deploy plugin issue which does not
- handle such files. So I dropped the more complicated version of
- this script when there is no reason to support something on what
- will helm deploy choke anyway.
+ The storage should be a directory where persistent volume resides. It will
+ work only if the component created the persistent volume with the same
+ filename as its release name. Otherwise no files are deleted. The exception
+ is when '--delete-all' is used - in that case all content of the storage is
+ deleted (because ONAP is not consistent with the volume directory names
+ - e.g.: sdnc).
- '--prefix' option is helm release argument - it is actually prefix
- when you list the helm releases - helm is little confusing here.
+ '--file' can be used multiple of times and it is used for override files
+ which are passed on to helm. The order is significant because if two
+ override files modify one value the latest one is used. This option is
+ ignored if '--clean-only' is used.
- CAUTION 2: By default release prefix is 'onap' - if you deployed
- release 'onap' and now run this script with different prefix then
- it will skip all 'onap-*' components and will deploy a new release
- with new prefix - BEWARE TO USE PROPER RELEASE PREFIX!
+ CAUTION 1: filename of an override file cannot contain whitespace! This is
+ actually helm/onap deploy plugin issue which does not handle such files. So
+ I dropped the more complicated version of this script when there is no
+ reason to support something on what will helm deploy choke anyway.
- Timeout set the waiting time for helm deploy per component.
+ '--prefix' option is helm release argument - it is actually prefix when you
+ list the helm releases - helm is little confusing here.
- '--component' references to release name of the chart which you
- want to redeploy excplicitly - otherwise 'ALL FAILED' components
- will be redeployed. You can target more than one component at once
- - just use the argument multiple times.
+ CAUTION 2: By default release prefix is 'onap' - if you deployed release
+ 'onap' and now run this script with different prefix then it will skip all
+ 'onap-*' components and will deploy a new release with new prefix - BEWARE
+ TO USE PROPER RELEASE PREFIX!
- Component option is mutually exclusive with the '--delete-all'
- which will delete all components - healthy or not. Actually it will
- delete the whole NAMESPACE and everything in it.
+ Timeout sets the waiting time for helm deploy per component.
- '--clean-only' can be used with any usage: heuristics, explicit
- component list or with '--delete-all'. It basically just skips the
- last step - the actual redeploy.
+ '--component' references to the release name of the chart which you want to
+ redeploy excplicitly - otherwise 'ALL FAILED' components will be
+ redeployed. You can target more than one component at once - just use the
+ argument multiple times.
+
+ Component option is mutually exclusive with the '--delete-all' which will
+ delete all components - healthy or not. Actually it will delete the whole
+ NAMESPACE and everything in it. Also to be sure it will cleanup all
+ orphaned images and volumes on all kubernetes nodes.
+
+ '--clean-only' can be used with any usage: heuristics, explicit component
+ list or with '--delete-all'. It basically just skips the last step - the
+ actual redeploy.
EOF
}
+use_help()
+{
+ printf "Try help: ${CMD} --help\n"
+}
+
msg()
{
- echo -e "${COLOR_ON_GREEN}INFO: $@ ${COLOR_OFF}"
+ printf "${COLOR_ON_GREEN}INFO: $@ ${COLOR_OFF}\n"
}
error()
{
- echo -e "${COLOR_ON_RED}ERROR: $@ ${COLOR_OFF}"
+ printf "${COLOR_ON_RED}ERROR: $@ ${COLOR_OFF}\n"
}
# remove all successfully completed jobs
@@ -211,7 +223,7 @@
# this is due to missing "release" label in some pods
# grep for the rescue...
kubectl get ${_resource} -n ${NAMESPACE} \
- --no-headers=true | grep "^${_release}"
+ --no-headers=true | grep "^${_release}[-]"
} | awk '{print $1}' | sort -u | while read -r _name _rest ; do
echo "Deleting '${_name}'"
kubectl delete ${_resource} -n ${NAMESPACE} \
@@ -276,13 +288,53 @@
fi
}
+docker_cleanup()
+{
+ _nodes=$(kubectl get nodes \
+ --selector=node-role.kubernetes.io/worker \
+ -o wide \
+ --no-headers=true | \
+ awk '{print $6}')
+
+ if [ -z "$_nodes" ] ; then
+ error "Could not list kubernetes nodes - SKIPPING docker cleanup"
+ return
+ fi
+
+ for _node in $_nodes ; do
+ msg "Docker cleanup on $_node"
+ {
+ ssh -T $_node >/dev/null <<EOF
+if which docker >/dev/null ; then
+ docker system prune --force --all --volumes
+fi
+EOF
+ } &
+ done
+
+ msg "We are waiting now for docker cleanup to finish on all nodes..."
+ wait
+}
+
# arg: <release name>
-redeploy_component()
+undeploy_component()
{
_chart=$(echo "$1" | sed 's/[^-]*-//')
helm_undeploy ${1}
+
+ # for all kubernetes resources: kubectl api-resources
# TODO: does deleted secret per component break something?
- for x in jobs deployments pods pvc pv ; do
+ for x in jobs \
+ deployments \
+ services \
+ replicasets \
+ statefulsets \
+ daemonsets \
+ pods \
+ pvc \
+ pv \
+ ;
+ do
delete_resource ${x} ${1}
done
@@ -290,10 +342,15 @@
msg "Persistent volume data deletion in directory: ${VOLUME_STORAGE}/${1}"
delete_storage "$1"
fi
+}
+# arg: <release name>
+deploy_component()
+{
# TODO: until I can verify that this does the same for this component as helm deploy
#msg "Redeployment of the component ${1}..."
#helm install "local/${_chart}" --name ${1} --namespace ${NAMESPACE} --wait --timeout ${HELM_TIMEOUT}
+ error "NOT IMPLEMENTED"
}
@@ -334,6 +391,7 @@
--no-storage-deletion)
if [ -n "$arg_storage" ] ; then
error "Usage of storage argument together with no storage deletion option!"
+ use_help
exit 1
elif [ -z "$arg_nostorage" ] ; then
arg_nostorage=nostorage
@@ -344,6 +402,7 @@
-c|--component)
if [ -n "$arg_deleteall" ] ; then
error "'Delete all components' used already - argument mismatch"
+ use_help
exit 1
fi
state=component
@@ -351,6 +410,7 @@
-D|--delete-all)
if [ -n "$arg_components" ] ; then
error "Explicit component(s) provided already - argument mismatch"
+ use_help
exit 1
elif [ -z "$arg_deleteall" ] ; then
arg_deleteall=deleteall
@@ -370,6 +430,7 @@
;;
*)
error "Unknown parameter: $1"
+ use_help
exit 1
;;
esac
@@ -380,12 +441,14 @@
state=nil
else
error "Duplicit argument for namespace!"
+ use_help
exit 1
fi
;;
override)
if ! [ -f "$1" ] ; then
error "Wrong filename for override file: $1"
+ use_help
exit 1
fi
arg_overrides="${arg_overrides} -f $1"
@@ -401,6 +464,7 @@
state=nil
else
error "Duplicit argument for release prefix!"
+ use_help
exit 1
fi
;;
@@ -408,24 +472,28 @@
if [ -z "$arg_timeout" ] ; then
if ! echo "$1" | grep -q '^[0-9]\+$' ; then
error "Timeout must be an integer: $1"
+ use_help
exit 1
fi
arg_timeout="$1"
state=nil
else
error "Duplicit argument for timeout!"
+ use_help
exit 1
fi
;;
storage)
if [ -n "$arg_nostorage" ] ; then
error "Usage of storage argument together with no storage deletion option!"
+ use_help
exit 1
elif [ -z "$arg_storage" ] ; then
arg_storage="$1"
state=nil
else
error "Duplicit argument for storage!"
+ use_help
exit 1
fi
;;
@@ -433,18 +501,19 @@
shift
done
-# sanity check
+# sanity checks
+
if [ -z "$arg_namespace" ] ; then
error "Missing namespace"
- help
+ use_help
exit 1
else
NAMESPACE="$arg_namespace"
fi
-if [ -z "$arg_overrides" ] ; then
- error "Missing override file(s)"
- help
+if [ -z "$arg_overrides" ] && [ -z "$arg_cleanonly" ] ; then
+ error "Missing override file(s) or use '--clean-only'"
+ use_help
exit 1
else
OVERRIDES="$arg_overrides"
@@ -462,6 +531,7 @@
VOLUME_STORAGE="$arg_storage"
elif [ -z "$arg_nostorage" ] ; then
error "Missing storage argument! If it is intended then use '--no-storage-deletion' option"
+ use_help
exit 1
fi
@@ -490,6 +560,10 @@
# we will delete the whole namespace
delete_namespace
+ # we will cleanup docker on each node
+ docker_cleanup
+
+ # we will delete the content of storage (volumes)
if [ -n "$VOLUME_STORAGE" ] ; then
delete_storage
fi
@@ -511,7 +585,7 @@
for _component in ${_COMPONENTS} ; do
if echo "$_component" | grep -q "^${RELEASE_PREFIX}-" ; then
msg "Redeploy component: ${_component}"
- redeploy_component ${_component}
+ undeploy_component ${_component}
else
error "Component release name '${_component}' does not match release prefix: ${RELEASE_PREFIX} (SKIP)"
fi