DWS On CSM: Installation and use of DWS on a CSM cluster
Install DWS
DWS Dependencies
Install the DWS dependencies and set the expected node labels prior to installing DWS.
DWS Dependencies
DWS requires kube-rbac-proxy to be present in the cluster's container registry. In the past, we asked that a worker node be labelled with `cray.wlm.manager`. DWS no longer uses this label, and you may remove it.Retrieve DWS Configuration and Container Image
The DWS Configuration is in the DWS repo. We don't need to build DWS here, but we do need its configuration files from the repo. This is where we will get the DWS CRDs, Deployment, ServiceAccount, Roles and bindings, Services, and Secrets.
DWS_VER=0.0.11
git clone --branch v$DWS_VER https://github.com/DataWorkflowServices/dws.git
cd dws
Get the DWS container image corresponding to this repo. This must be made present in the cluster's container registry.
podman run --rm --network host quay.io/skopeo/stable:v1.4.1 copy --dest-tls-verify=false docker://ghcr.io/dataworkflowservices/dws:$DWS_VER docker://registry.local/dws:$DWS_VER
Deploy DWS to the cluster
Deploy DWS to the cluster. Set IMAGE_TAG_BASE
to point at the DWS image in the cluster's container registry from the previous step. Set the OVERLAY=csm
to pick the configuration for CSM clusters.
Wait for the deployment and webhook to become ready.
kubectl wait deployment --timeout=120s -n dws-operator-system dws-operator-controller-manager --for condition=Available=True
kubectl wait deployment --timeout=120s -n dws-operator-system dws-operator-webhook --for condition=Available=True
To undeploy DWS, after all Workflow resources have been deleted:
Configure Slurm for DWS
Slurm provides an API for burst buffer plugins that it uses to talk to a workload manager (WLM). The plugin is written in the Lua programming language and is placed in the same directory that holds the rest of the Slurm configuration files. This plugin contains the logic required to communicate with DWS.
This project will install two files into the slurm pod's /etc/slurm
directory and will update the main Slurm configuration file to enable the plugin. The first will be the Lua plugin script, named burst_buffer.lua
, and the second will be a configuration file named burst_buffer.conf
which tells Slurm that job scripts containing a #DW
directive should be run through the burst_buffer plugin.
HPE Cray Programming Environment Installation Guide
The following instructions use section 10.3.2 in HPE Cray Programming Environment Installation Guide: CSM on HPE Cray EX Systems (22.10) S-8003.
Checkout the DWS repo for Slurm plugins.
Update the Slurm Configuration Template
The changes to the Slurm pod's /etc/slurm
directory are controlled through a ConfigMap. These instructions follow section 10.3.2 in HPE Cray Programming Environment Installation Guide: CSM on HPE Cray EX Systems (22.10) S-8003 to update and activate the ConfigMap.
Get the slurm-config-templates ConfigMap.
kubectl get configmap -n services slurm-config-templates -o yaml > slurm-config-templates.yaml
Extract the slurm.conf
file, update it to enable the burst buffer plugin, and write it back into the ConfigMap.
yq r slurm-config-templates.yaml 'data."slurm.conf"' > slurm.conf
echo "BurstBufferType=burst_buffer/lua" >> slurm.conf
yq w -i slurm-config-templates.yaml 'data."slurm.conf"' "$(cat slurm.conf)"
Add the burst_buffer.lua
plugin script and burst_buffer.conf
configuration file to the ConfigMap.
yq w -i slurm-config-templates.yaml -- 'data."burst_buffer.lua"' "$(cat dws-slurm-bb-plugin/src/burst_buffer/burst_buffer.lua)"
yq w -i slurm-config-templates.yaml 'data."burst_buffer.conf"' "$(cat dws-slurm-bb-plugin/src/burst_buffer/burst_buffer.conf)"
Apply the updated ConfigMap resource.
Restart the Slurm Configuration Job
The Slurm configuration job will process the ConfigMap into a second one that is included in the pod spec. The following steps to reconfigure Slurm are found in the HPE Cray Programming Installation Guide.
JOBNAME=$(kubectl get job -n services | grep slurm-config | grep -v import | awk '{print $1}')
kubectl get job -n services -o yaml $JOBNAME > slurm-config.yaml
kubectl delete -f slurm-config.yaml
yq d -i slurm-config.yaml spec.template.metadata
yq d -i slurm-config.yaml spec.selector
kubectl apply -f slurm-config.yaml
SLURMCTLD_POD=$(kubectl get pod -n user -lapp=slurmctld -o jsonpath='{.items[0].metadata.name}')
kubectl exec -n user $SLURMCTLD_POD -c slurmctld -- scontrol reconfigure
After the configuration job has been restarted, wait about 30 seconds to see the new content appear inside the pod.
If you were only adding or modifying burst_buffer.lua, then Slurm is now ready for your jobs. If you were also modifying slurm.conf to set the BurstBufferType, or you were adding or modifying burst_buffer.conf, then you must also restart the Slurm deployments.
kubectl rollout restart deployment -n user slurmctld-backup
kubectl wait deployment --timeout=60s -n user --for condition=Available=True -l app=slurmctld-backup
kubectl rollout restart deployment -n user slurmctld
kubectl wait deployment --timeout=60s -n user --for condition=Available=True -l app=slurmctld
Submit a test job
Use a UAN host to submit a test job. The following assumes a UAN host named uan01
and a user account named vers
.
Create the following example test job:
vers@uan01:/lus/vers> cat /tmp/dws-test | |
---|---|
Note that we will submit the test job from a directory where our vers
account can write its output files. The output file location can be controlled by the sbatch
command or by an SBATCH
directive in the job script.
The sbatch command will tell you the ID of your job.
You can use that to query the job's status. In this example the job's output will be in the file named slurm-<ID>
in the same directory where the sbatch
command was executed.
Get the status of the workflow.
The output will appear as:
Note that the workflow resource will no longer exist after Slurm has completed its teardown state for this job.
Canceling a test job
Use of the scancel
command in states prior to PostRun will cause Slurm to pass the hurry
flag to the teardown function in the burst_buffer.lua script, and the teardown function will set the flag in the Workflow resource. In the PostRun or DataOut states the burst_buffer.lua teardown function will not be passed the hurry
flag because no work will be skipped. In all states, the scancel --hurry
command will cause Slurm to pass the hurry
flag to the teardown function.
The scancel
command will cause states prior to PostRun to terminate immediately and proceed to Teardown state. The use of scancel --hurry
does not alter this behavior on these pre-PostRun states. During PostRun or DataOut, the scancel
command does not cause early termination and does not skip the DataOut state. The use of scancel --hurry
during PostRun or DataOut causes early termination, skipping DataOut in the case of PostRun, and proceeds to Teardown.
Consult the Slurm Burst Buffer Guide for further details on the use of scancel
versus scancel --hurry
.
Troubleshooting
Collect slurmctld logs
To collect the slurmctld logs:
SLURM_POD=`kubectl get pods -n user -l app=slurmctld -o jsonpath='{.items[0].metadata.name}'`
kubectl logs -n user $SLURM_POD slurmctld
To pick out the messages from the burst_buffer.lua script:
Collect DWS logs
The DWS logs can be retrieved with the following:
DWS_POD=`kubectl get pods -n dws-operator-system -l control-plane=controller-manager -o jsonpath='{.items[0].metadata.name}'`
kubectl logs -n dws-operator-system $DWS_POD -c manager
Inspect DWS Workflow resources
Inspect DWS Workflow resources by using the fully-qualified CRD name.
The output will show the job ID used in the name of the Workflow resource:
To get more detail about the workflow while it exists:
Note that the Workflow resource will be deleted during the job's teardown state, at which time it will no longer appear in the workflow listing.