messing up tls certs

doing things the hard way and making mistakes


messing up tls certs

doing things the hard way and making mistakes

tls certs the hard way

For $reasons, I had been manually requesting and updating the TLS certs used in my kubernetes cluster by hand. Specifically, using to request wildcard certs from Let's Encrypt via DNS challenge in GCP Cloud DNS.

renewing certs by hand

One uneventful (so far) Saturday, I saw that my certs had a month and a half left, and decided to renew them. Having not written the process down, I searched backwards in shell history (zsh-substring-search is great) for the right command and got some new certs. At the same time, I thought why not use Google Public CA to reduce the monoculture on Let's Encrypt.

1$ gcloud alpha publicca external-account-keys create
2$ --register-account --email $EMAIL --server google --eab-kid $PUBLICCA_KID --eab-hmac-key $PUBLICCA_HMAC
3$ --server google --ecc --renew --force --dns dns_gcloud --domain '*' --domain '*'

This gave me the usual directory of:
2  *
3    *
4    *
5    *
6    *
7    ca.cer
8    fullchain.cer

using new certs

Not remembering what I used last time, I used the cert * and key * as the TLS key pair in a server (Envoy Gateway), and it worked, sort of. Chrome happily connected and verified the cert, but when I tried to use cli tools like curl, openss, and step-cli, I would fail to verify the cert:

  1$ curl
  2curl: (60) SSL certificate problem: unable to get local issuer certificate
  3More details here:
  5curl failed to verify the legitimacy of the server and therefore could not
  6establish a secure connection to it. To learn more about this situation and
  7how to fix it, please visit the web page mentioned above.
  9$ openssl -connect </dev/null
 10openssl s_client -connect -servername < /dev/null
 12depth=0 CN = *
 13verify error:num=20:unable to get local issuer certificate
 14verify return:1
 15depth=0 CN = *
 16verify error:num=21:unable to verify the first certificate
 17verify return:1
 18depth=0 CN = *
 19verify return:1
 21Certificate chain
 22 0 s:CN = *
 23   i:C = US, O = Google Trust Services LLC, CN = GTS CA 1P5
 24   a:PKEY: id-ecPublicKey, 256 (bit); sigalg: RSA-SHA256
 25   v:NotBefore: Nov  4 09:32:23 2023 GMT; NotAfter: Feb  2 09:32:22 2024 GMT
 27Server certificate
 54-----END CERTIFICATE-----
 55subject=CN = *
 56issuer=C = US, O = Google Trust Services LLC, CN = GTS CA 1P5
 58No client certificate CA names sent
 59Peer signing digest: SHA256
 60Peer signature type: ECDSA
 61Server Temp Key: X25519, 253 bits
 63SSL handshake has read 1551 bytes and written 379 bytes
 64Verification error: unable to verify the first certificate
 66New, TLSv1.3, Cipher is TLS_AES_128_GCM_SHA256
 67Server public key is 256 bit
 68This TLS version forbids renegotiation.
 69Compression: NONE
 70Expansion: NONE
 71No ALPN negotiated
 72Early data was not sent
 73Verify return code: 21 (unable to verify the first certificate)
 76Post-Handshake New Session Ticket arrived:
 78    Protocol  : TLSv1.3
 79    Cipher    : TLS_AES_128_GCM_SHA256
 80    Session-ID: B655E45C351E5376E391FAAE8D33FDE522A049AFCDA9BEA16AA12646312385B6
 81    Session-ID-ctx:
 82    Resumption PSK: 9E5B00208476BC0CB3B19AB46DCD12266A25378AD3CD385D9243E5FA0B0541BD
 83    PSK identity: None
 84    PSK identity hint: None
 85    SRP username: None
 86    TLS session ticket lifetime hint: 604800 (seconds)
 87    TLS session ticket:
 88    0000 - 0d 5a 82 69 f6 2c 8d 43-62 4f ee 99 4b 01 16 51   .Z.i.,.CbO..K..Q
 89    0010 - 4f 3c 1d b0 ef d0 cb c4-26 1a 75 15 c5 10 58 84   O<......&.u...X.
 90    0020 - 9e a0 8b 97 b6 93 ba c1-30 c9 2e 45 24 95 f3 2a   ........0..E$..*
 91    0030 - 11 71 e4 70 19 c6 20 23-8e 5b 5a 60 fa fa 01 be   .q.p.. #.[Z`....
 92    0040 - e6 cc 7b 0e 73 92 62 cd-e8 2f f4 08 7e e5 0b d3   ..{.s.b../..~...
 93    0050 - 8b a0 10 8a 3d 76 dd 81-96 da af 06 54 60 7c 7e   ....=v......T`|~
 94    0060 - 59 3d ce 31 bf ce a0 53-b5                        Y=.1...S.
 96    Start Time: 1699218296
 97    Timeout   : 7200 (sec)
 98    Verify return code: 21 (unable to verify the first certificate)
 99    Extended master secret: no
100    Max Early Data: 0
102read R BLOCK
105$ step-cli certificate verify
106failed to connect: tls: failed to verify certificate: x509: certificate signed by unknown authority

Now this was confusing, since I was pretty sure I was using the right certs. Testing the certs locally with a simple Go HTTPS server it logged the following, which was even more confusing since bad record mac was an internal error.

12023/11/05 21:03:31 http: TLS handshake error from local error: tls: bad record MAC
22023/11/05 21:05:57 http: TLS handshake error from remote error: tls: bad certificate

Stepping back a bit, I tried to verify the certs directly, which wasn't much more successful:

1$ openssl verify '*'
2CN = *
3error 20 at 0 depth lookup: unable to get local issuer certificate
4error ./tls.crt: verification failed
6$ step-cli certificate verify '*'
7failed to verify certificate: x509: certificate signed by unknown authority

Then I thought, maybe I need to pass the CA file:

1$ openssl verify -CAfile ca.cer '*'
2tls.crt: OK
4$ step-cli certificate verify --roots ca.cer '*'

When it finally clicked that I needed to use the fullchain cert (fullchain.cer) instead of just the leaf cert.

The actual process consisted of more mistakes, and my mind wandering to: are the root certs on my machine broken/out of date, did mess up a cert somehow, and other weird ideas I don't remember.

exposing and revoking

Now that I finally had working certs, time to save them. I run my cluster via GitOps with the OSS version of Config Sync. For secrets, I use isindir/sops-secrets-operator. The workflow consists of creating a SopsSecret custom resource, then encypting it with sops sops -e -i file.yaml (in conjunction with the .sops.yaml config I have to specify keys).

 2kind: SopsSecret
 4  name: wildcard-google
 5  namespace: envoy-gateway-system
 7  secretTemplates:
 8    - name: wildcard-google
 9      type:
10      stringData:
11        tls.crt: |
12          ...          
13        tls.key: |
14          ...          
15        ca.crt: |
16          ...          

.sops.yaml to only encrypt the data parts, and with 2 age keys: a local admin key, and a remote server key.

2  - encrypted_regex: "^(data|stringData)"
3    key_groups:
4      - age:
5          - age14mg08panez45c6lj2cut2l8nqja0k5vm2vxmv5zvc4ufqgptgy2qcjfmuu
6          - age19q63k49upkgc03e8rsvm5c04x09vqvp2g5u2x6fjjap5awvq0u6q25z8xp

I had 2 pairs of cert/keys: from Let's Encrypt and from Google Public CA, which I pushed into git.

I noticed that the sops operator failed to decode the secret, and upon looking into why, I realized it wasn't encrypted. It wouldn't have been so bad if I didn't have a public mirror of my repo.

So now I have the fun task of revoking the exposed secrets. I had issued certs from Google Public CA first, then overwrote the data in's config with a second set of certs from Let's Encrypt (since I was testing if it was just Google Trust Services certs that wouldn't verify earlier).

This meant --revoke didn't want to work. So I go about downloading certbot, which has the option to revoke using private key / cert pair:

1$ sudo certbot revoke --cert-path tls.crt --key-path tls.key --reason keyCompromise  --server

Later I realized that because I had issued my second set of certs via --renew --force, it kept the same private key. So my "unexposed" cert/key were actually exposed. This time I could use

1$ --revoke --ecc -d '*'


Now I could start from scratch, and just remember to actually encrypt secrets. But I thought I might as well go through with the automation and setup cert-manager in my cluster. I had initially resisted because the last time I ran it it was during its graduation into 1.0.0 where there were deprecations to work around, but now it's a much more stable project.

Again, I wanted certs from both Let's Encrypt and Google Public CA, and this time, I would test with staging certs first. Let's Encrypt was straightforward to set up, while GCP had a surprise hiding in the footnote where the EAB secret is needs to be generated seprately by switching the api endpoint in config and not flags (instructions)

1$ gcloud config set api_endpoint_overrides/publicca
2$ gcloud publicca external-account-keys create
3$ gcloud config unset api_endpoint_overrides/publicca

With this I could finally have my issuers setup:

 2kind: ClusterIssuer
 4  name: letsencrypt-staging
 6  acme:
 7    email:
 8    server:
 9    privateKeySecretRef:
10      name: letsencrypt-staging-account
11    solvers:
12      - dns01:
13          cloudDNS:
14            project: ...
15            serviceAccountSecretRef:
16              name: gcp-cert-manager-sa
17              key: key.json
20kind: ClusterIssuer
22  name: google-staging
24  acme:
25    email:
26    server:
27    privateKeySecretRef:
28      name: google-staging-account
29    externalAccountBinding:
30      keyID: ...
31      keySecretRef:
32        name: gcp-publicca-staging
33        key: b64MacKey
34    solvers:
35      - dns01:
36          cloudDNS:
37            project: ...
38            serviceAccountSecretRef:
39              name: gcp-cert-manager-sa
40              key: key.json

And certs:

 2kind: Certificate
 4  name: google-staging
 5  namespace: envoy-gateway-system
 7  secretName: google-staging-tls
 8  duration: 720h # 30d
 9  renewBefore: 360h # 15d
10  revisionHistoryLimit: 1
11  subject:
12    organizations:
13      - seankhliao
14  privateKey:
15    rotationPolicy: Always
16    algorithm: ECDSA
17    size: 256
18  dnsNames:
19    - "*"
20    - "*"
21  issuerRef:
22    name: google-staging
23    kind: ClusterIssuer

Repeat for production with the prod endpoints, and I was finally done for the day.