SEANK.H.LIAO

structured data scripting

handing structured data to scripts

scripting with structured data

GitOps and declarative management tools are all fine and good, until something messes up really bad, and you're needing to run some imperative command across your fleet of machines/clusters/etc.

shell script it

The quick and easy way is just loop over everything in a shell script, I call mine yolo, and this works well for a while. It's later when you need to pull in more tools that you start feeling the limits: you need 2 different tools but they have different cluster config names, you need extra data associated with the cluster etc.

 1#!/bin/zsh
 2
 3# do this for every cluster
 4run_one() {
 5  kubectl get pod
 6}
 7
 8# list of clusters, using filenames each having their own kubeconfig file
 9# comment out clusters to skip
10clusters=(
11  staging
12  dev
13  prod1
14  prod2
15)
16
17# loop over clusters, run_one for each
18run_all() {
19  for cluster in "${clusters[@]}"; do
20    # change the cluster kubectl targets
21    export KUBECONFIG="${HOME}/.config/kube/${cluster}"
22    # print a pretty header (in bold) for each cluster to separate output
23    printf "\n\033[1m%s==========\033[0m\n" "${cluster}"
24
25    # run the things
26    run_one
27  done
28}
29
30run_all

shell faked array

So now you want to store structured data in a per target way that is accessible to the script. Shells don't really have multidimensional arrays, but you can fake one by concatenating the keys together in to a string. Ex: data[foo,bar] accesses an entry with the key foo,bar, But they're not the nicest thing to work with or even a good way to specify the data.

 1#!/bin/zsh
 2
 3declare -a data
 4data[staging,desc]="staging-2"
 5data[staging,kubeconfig]=~/.config/kube/staging
 6data[staging,argoconfig]=~/.config/argo/staging
 7data[staging,suffix]="agdb"
 8# ...
 9
10# $1: cluster name
11run_one() {
12  export KUBECONFIG="${data[$1,kubeconfig]}"
13  kubectl get pod "${data[$1,name]}-${data[$1,suffix]}"
14}

shell with csv or json

My next idea was to store th data in CSV, but I couldn't find a good way to safely parse the data.

Next came JSON, but issuing a hundred jq calls to get values didn't seem very efficient.

cue wrapped shell script

So what if, you specify the data in CUE, and use cue to execute the script it generates?

CUE the language has decent support for reshaping data, allowing you to write in a compact representation, but reshape it to a more machine-friendly, verbose format for consumption in the same file. And cue the tool has built-in support for user-provided commands.

So in a file called yolo_tool.cue:

 1package main
 2
 3import (
 4	"tool/exec"
 5)
 6
 7// list of targets, comment out to skip
 8run_instances: [
 9	"staging",
10	"dev",
11  // prod1
12]
13
14// how to specify subcommand execution in cue
15// loop over every instance,
16// unifying with an exec.Run structure
17command: yolo: task: {for _idx, _instance in run_instances {"\(_instance)": exec.Run & {
18  // local reference to the data
19  let _data = instance_data[_instance]
20  // actual script goes here
21	let _script = """
22	      export KUBECONFIG="\(_data.kubeconfig)"
23
24	      echo instance "\(_instance)"
25	      kubectl get pod "\(_data.name)-\(data.suffix)"
26        """
27  // script execution
28	cmd: ["zsh", "-c", _script]
29	// sequential runs by forxing a dependency on the previous entry
30	if _idx > 0 {
31		_req: task[run_instances[_idx-1]].success
32	}
33}}}
34
35// compact form for data entry
36_instance_data: {
37  staging: ["staging-2", "~/.config/kube/staging", "~/.config/argo/staging-2", "agbd"]
38  dev:     ["dev-3",     "~/.config/kube/dev",     "~/.config/qrgo/dev-3", "hgcb"]
39  prod1:   ["..."]
40}
41
42// reformat to verbose form
43instance_data: {for _instance,_d in _instance_data: {"\(_instance)": {
44  name: _d[0]
45  kubeconfig: _d[1]
46  argoconfig: _d[2]
47  suffix: _d[3]
48}}}