For $reasons, I had been manually requesting and updating the TLS certs used in my kubernetes cluster by hand. Specifically, using acme.sh to request wildcard certs from Let's Encrypt via DNS challenge in GCP Cloud DNS.
One uneventful (so far) Saturday, I saw that my certs had a month and a half left, and decided to renew them. Having not written the process down, I searched backwards in shell history (zsh-substring-search is great) for the right command and got some new certs. At the same time, I thought why not use Google Public CA to reduce the monoculture on Let's Encrypt.
1$ gcloud alpha publicca external-account-keys create
2$ acme.sh --register-account --email $EMAIL --server google --eab-kid $PUBLICCA_KID --eab-hmac-key $PUBLICCA_HMAC
3$ acme.sh --server google --ecc --renew --force --dns dns_gcloud --domain '*.liao.dev' --domain '*.ihwa.liao.dev'
This gave me the usual directory of:
1.acme.sh/
2 *.liao.dev/
3 *.liao.dev.cer
4 *.liao.dev.conf
5 *.liao.dev.csr
6 *.liao.dev.key
7 ca.cer
8 fullchain.cer
Not remembering what I used last time,
I used the cert *.liao.dev.cer
and key *.liao.dev.key
as the TLS key pair
in a server (Envoy Gateway),
and it worked, sort of.
Chrome happily connected and verified the cert,
but when I tried to use cli tools like curl, openss, and step-cli,
I would fail to verify the cert:
1$ curl https://ihwa.liao.dev
2curl: (60) SSL certificate problem: unable to get local issuer certificate
3More details here: https://curl.se/docs/sslcerts.html
4
5curl failed to verify the legitimacy of the server and therefore could not
6establish a secure connection to it. To learn more about this situation and
7how to fix it, please visit the web page mentioned above.
8
9$ openssl -connect ihwa.liao.dev:443 </dev/null
10openssl s_client -connect 127.0.0.1:8443 -servername ihwa.liao.dev < /dev/null
11CONNECTED(00000003)
12depth=0 CN = *.liao.dev
13verify error:num=20:unable to get local issuer certificate
14verify return:1
15depth=0 CN = *.liao.dev
16verify error:num=21:unable to verify the first certificate
17verify return:1
18depth=0 CN = *.liao.dev
19verify return:1
20---
21Certificate chain
22 0 s:CN = *.liao.dev
23 i:C = US, O = Google Trust Services LLC, CN = GTS CA 1P5
24 a:PKEY: id-ecPublicKey, 256 (bit); sigalg: RSA-SHA256
25 v:NotBefore: Nov 4 09:32:23 2023 GMT; NotAfter: Feb 2 09:32:22 2024 GMT
26---
27Server certificate
28-----BEGIN CERTIFICATE-----
29MIIEqDCCA5CgAwIBAgIRANZ6hF26ru42Dk1AEW+t7eUwDQYJKoZIhvcNAQELBQAw
30RjELMAkGA1UEBhMCVVMxIjAgBgNVBAoTGUdvb2dsZSBUcnVzdCBTZXJ2aWNlcyBM
31TEMxEzARBgNVBAMTCkdUUyBDQSAxUDUwHhcNMjMxMTA0MDkzMjIzWhcNMjQwMjAy
32MDkzMjIyWjAVMRMwEQYDVQQDDAoqLmxpYW8uZGV2MFkwEwYHKoZIzj0CAQYIKoZI
33zj0DAQcDQgAED+loglA3i/62NqohbPruCDQnjbtNiffzdMipYWrSBqzdgVE60aNn
34zbsI8PFDhGI/lSHNxu6GXpY0XUu4GKdSm6OCAoswggKHMA4GA1UdDwEB/wQEAwIH
35gDAdBgNVHSUEFjAUBggrBgEFBQcDAQYIKwYBBQUHAwIwDAYDVR0TAQH/BAIwADAd
36BgNVHQ4EFgQUFxukPjhCo6SqgU941B8UOgwJx9MwHwYDVR0jBBgwFoAU1fyeDd8e
37yt0Il5duK8VfxSv17LgweAYIKwYBBQUHAQEEbDBqMDUGCCsGAQUFBzABhilodHRw
38Oi8vb2NzcC5wa2kuZ29vZy9zL2d0czFwNS9Zcm9wWXhkZnlmNDAxBggrBgEFBQcw
39AoYlaHR0cDovL3BraS5nb29nL3JlcG8vY2VydHMvZ3RzMXA1LmRlcjAmBgNVHREE
40HzAdggoqLmxpYW8uZGV2gg8qLmlod2EubGlhby5kZXYwIQYDVR0gBBowGDAIBgZn
41gQwBAgEwDAYKKwYBBAHWeQIFAzA8BgNVHR8ENTAzMDGgL6AthitodHRwOi8vY3Js
42cy5wa2kuZ29vZy9ndHMxcDUvazRiRnFycUNBVkkuY3JsMIIBAwYKKwYBBAHWeQIE
43AgSB9ASB8QDvAHYASLDja9qmRzQP5WoC+p0w6xxSActW3SyB2bu/qznYhHMAAAGL
44meQV3QAABAMARzBFAiEAzJ7lwFWIIjzDNGMkPjryL3MWd2V1jkp2YYbFNsyOAI4C
45IHjJ6a5gvz1p770j/+gB6PB9Qmd30922a2ylz2ZEGh6iAHUA7s3QZNXbGs7FXLed
46tM0TojKHRny87N7DUUhZRnEftZsAAAGLmeQVuwAABAMARjBEAiBrJBSC0vkCyKhs
47YZQnAFPvf5/W6i8PhjjF9yxVGXBdogIgJY0tSHO5j6qmgK8PtfdDJBw0tFSXuYJn
48qv43QUazYEcwDQYJKoZIhvcNAQELBQADggEBAK6lM60o3cP6U7ahR+cbZE07JO/b
498dtrau0d89x8j8+d7/FIhmERzEgLlNGJzMliGxUuXu4RbBbV5U9DRkr2GnC+Pzyk
501qnpEOKdVQ7o7BzJ3AH/jtJMdJQ1dvaF8Z1NJZb0sj0lvUMoQt5DpSFFRzUO9U7l
51Km72HxJFPG5JTjr6aYW5WDee/bHbL72hIgLCiUtub5iVPX7mZ2UCEeXU6wdZrK8v
52ULpu/+vdY2yeHRdakC0DRY0qBSF+7zC9CWt4P8XRIXYj7c4zLdo9b2XXVod/Js8i
53TII8ZJTUFedv0MOeHGN8ltE7gGjk4auwpFQ17a+CiuNrml8lVsUp9TRKJ5k=
54-----END CERTIFICATE-----
55subject=CN = *.liao.dev
56issuer=C = US, O = Google Trust Services LLC, CN = GTS CA 1P5
57---
58No client certificate CA names sent
59Peer signing digest: SHA256
60Peer signature type: ECDSA
61Server Temp Key: X25519, 253 bits
62---
63SSL handshake has read 1551 bytes and written 379 bytes
64Verification error: unable to verify the first certificate
65---
66New, TLSv1.3, Cipher is TLS_AES_128_GCM_SHA256
67Server public key is 256 bit
68This TLS version forbids renegotiation.
69Compression: NONE
70Expansion: NONE
71No ALPN negotiated
72Early data was not sent
73Verify return code: 21 (unable to verify the first certificate)
74---
75---
76Post-Handshake New Session Ticket arrived:
77SSL-Session:
78 Protocol : TLSv1.3
79 Cipher : TLS_AES_128_GCM_SHA256
80 Session-ID: B655E45C351E5376E391FAAE8D33FDE522A049AFCDA9BEA16AA12646312385B6
81 Session-ID-ctx:
82 Resumption PSK: 9E5B00208476BC0CB3B19AB46DCD12266A25378AD3CD385D9243E5FA0B0541BD
83 PSK identity: None
84 PSK identity hint: None
85 SRP username: None
86 TLS session ticket lifetime hint: 604800 (seconds)
87 TLS session ticket:
88 0000 - 0d 5a 82 69 f6 2c 8d 43-62 4f ee 99 4b 01 16 51 .Z.i.,.CbO..K..Q
89 0010 - 4f 3c 1d b0 ef d0 cb c4-26 1a 75 15 c5 10 58 84 O<......&.u...X.
90 0020 - 9e a0 8b 97 b6 93 ba c1-30 c9 2e 45 24 95 f3 2a ........0..E$..*
91 0030 - 11 71 e4 70 19 c6 20 23-8e 5b 5a 60 fa fa 01 be .q.p.. #.[Z`....
92 0040 - e6 cc 7b 0e 73 92 62 cd-e8 2f f4 08 7e e5 0b d3 ..{.s.b../..~...
93 0050 - 8b a0 10 8a 3d 76 dd 81-96 da af 06 54 60 7c 7e ....=v......T`|~
94 0060 - 59 3d ce 31 bf ce a0 53-b5 Y=.1...S.
95
96 Start Time: 1699218296
97 Timeout : 7200 (sec)
98 Verify return code: 21 (unable to verify the first certificate)
99 Extended master secret: no
100 Max Early Data: 0
101---
102read R BLOCK
103DONE
104
105$ step-cli certificate verify https://ihwa.liao.dev
106failed to connect: tls: failed to verify certificate: x509: certificate signed by unknown authority
Now this was confusing,
since I was pretty sure I was using the right certs.
Testing the certs locally with a simple Go HTTPS server it logged the following,
which was even more confusing since bad record mac
was an internal error.
12023/11/05 21:03:31 http: TLS handshake error from 127.0.0.1:38672: local error: tls: bad record MAC
22023/11/05 21:05:57 http: TLS handshake error from 127.0.0.1:40788: remote error: tls: bad certificate
Stepping back a bit, I tried to verify the certs directly, which wasn't much more successful:
1$ openssl verify '*.liao.dev.cer'
2CN = *.liao.dev
3error 20 at 0 depth lookup: unable to get local issuer certificate
4error ./tls.crt: verification failed
5
6$ step-cli certificate verify '*.liao.dev.cer'
7failed to verify certificate: x509: certificate signed by unknown authority
Then I thought, maybe I need to pass the CA file:
1$ openssl verify -CAfile ca.cer '*.liao.dev.cer'
2tls.crt: OK
3
4$ step-cli certificate verify --roots ca.cer '*.liao.dev.cer'
When it finally clicked that I needed to use the fullchain cert (fullchain.cer
)
instead of just the leaf cert.
The actual process consisted of more mistakes, and my mind wandering to: are the root certs on my machine broken/out of date, did acme.sh mess up a cert somehow, and other weird ideas I don't remember.
Now that I finally had working certs,
time to save them.
I run my cluster via GitOps with the
OSS version of Config Sync.
For secrets,
I use isindir/sops-secrets-operator.
The workflow consists of creating a SopsSecret custom resource,
then encypting it with sops
sops -e -i file.yaml
(in conjunction with the .sops.yaml
config I have to specify keys).
1apiVersion: isindir.github.com/v1alpha3
2kind: SopsSecret
3metadata:
4 name: wildcard-google
5 namespace: envoy-gateway-system
6spec:
7 secretTemplates:
8 - name: wildcard-google
9 type: kubernetes.io/tls
10 stringData:
11 tls.crt: |
12 ...
13 tls.key: |
14 ...
15 ca.crt: |
16 ...
.sops.yaml
to only encrypt the data parts,
and with 2 age keys:
a local admin key, and a remote server key.
1creation_rules:
2 - encrypted_regex: "^(data|stringData)"
3 key_groups:
4 - age:
5 - age14mg08panez45c6lj2cut2l8nqja0k5vm2vxmv5zvc4ufqgptgy2qcjfmuu
6 - age19q63k49upkgc03e8rsvm5c04x09vqvp2g5u2x6fjjap5awvq0u6q25z8xp
I had 2 pairs of cert/keys: from Let's Encrypt and from Google Public CA, which I pushed into git.
I noticed that the sops operator failed to decode the secret, and upon looking into why, I realized it wasn't encrypted. It wouldn't have been so bad if I didn't have a public mirror of my repo.
So now I have the fun task of revoking the exposed secrets. I had issued certs from Google Public CA first, then overwrote the data in acme.sh's config with a second set of certs from Let's Encrypt (since I was testing if it was just Google Trust Services certs that wouldn't verify earlier).
This meant acme.sh --revoke
didn't want to work.
So I go about downloading certbot
,
which has the option to revoke using private key / cert pair:
1$ sudo certbot revoke --cert-path tls.crt --key-path tls.key --reason keyCompromise --server https://dv.acme-v02.api.pki.goog/directory
Later I realized that because I had issued my second set of certs via acme.sh --renew --force
,
it kept the same private key.
So my "unexposed" cert/key were actually exposed.
This time I could use acme.sh
:
1$ acme.sh --revoke --ecc -d '*.liao.dev'
Now I could start from scratch, and just remember to actually encrypt secrets. But I thought I might as well go through with the automation and setup cert-manager in my cluster. I had initially resisted because the last time I ran it it was during its graduation into 1.0.0 where there were deprecations to work around, but now it's a much more stable project.
Again, I wanted certs from both Let's Encrypt and Google Public CA, and this time, I would test with staging certs first. Let's Encrypt was straightforward to set up, while GCP had a surprise hiding in the footnote where the EAB secret is needs to be generated seprately by switching the api endpoint in config and not flags (instructions)
1$ gcloud config set api_endpoint_overrides/publicca https://preprod-publicca.googleapis.com/
2$ gcloud publicca external-account-keys create
3$ gcloud config unset api_endpoint_overrides/publicca
With this I could finally have my issuers setup:
1apiVersion: cert-manager.io/v1
2kind: ClusterIssuer
3metadata:
4 name: letsencrypt-staging
5spec:
6 acme:
7 email: acme+letsencrypt@liao.dev
8 server: https://acme-staging-v02.api.letsencrypt.org/directory
9 privateKeySecretRef:
10 name: letsencrypt-staging-account
11 solvers:
12 - dns01:
13 cloudDNS:
14 project: ...
15 serviceAccountSecretRef:
16 name: gcp-cert-manager-sa
17 key: key.json
18---
19apiVersion: cert-manager.io/v1
20kind: ClusterIssuer
21metadata:
22 name: google-staging
23spec:
24 acme:
25 email: acme+google@liao.dev
26 server: https://dv.acme-v02.test-api.pki.goog/directory
27 privateKeySecretRef:
28 name: google-staging-account
29 externalAccountBinding:
30 keyID: ...
31 keySecretRef:
32 name: gcp-publicca-staging
33 key: b64MacKey
34 solvers:
35 - dns01:
36 cloudDNS:
37 project: ...
38 serviceAccountSecretRef:
39 name: gcp-cert-manager-sa
40 key: key.json
And certs:
1apiVersion: cert-manager.io/v1
2kind: Certificate
3metadata:
4 name: google-staging
5 namespace: envoy-gateway-system
6spec:
7 secretName: google-staging-tls
8 duration: 720h # 30d
9 renewBefore: 360h # 15d
10 revisionHistoryLimit: 1
11 subject:
12 organizations:
13 - seankhliao
14 privateKey:
15 rotationPolicy: Always
16 algorithm: ECDSA
17 size: 256
18 dnsNames:
19 - "*.liao.dev"
20 - "*.ihwa.liao.dev"
21 issuerRef:
22 name: google-staging
23 kind: ClusterIssuer
Repeat for production with the prod endpoints, and I was finally done for the day.