Loading

Runtime Fabric Cluster is Disconnected or Degraded and Showing "remote error: tls: unknown certificate"

Veröffentlichungsdatum: Dec 23, 2024
Beschreibung

#Requisites:

*Admin access to the RTF namespace in your K8S cluster.

*Mac, Linux OS, or compatible shell to run/edit the replace.sh script.

Lösung

SYMPTOM

OCP (Openshift Container Platform) or BYOK (Bring Your Own Kubernetes) RTF clusters are disconnected or degraded and show the error message "remote error: tls: unknown certificate" in the Health Details console

transport-layer.prod.cloudhub.io:443 (Updated at: 01/16/2024 at 8:22AM)
dialing websocket: failed to WebSocket dial: failed to send handshake request: Get "https://transport-layer.prod.cloudhub.io:443": remote error: tls: unknown certificate
configuration-resolver.prod.cloudhub.io:443 (Updated at: 01/16/2024 at 8:22AM)
Get "https://configuration-resolver.prod.cloudhub.io:443": remote error: tls: unknown certificate

If the cluster is running on OCP (Openshift Container Platform), the cronjob certificate-renewal can not run jobs due to this error 

Error creating: pods "certificate-renewal-28419840-" is forbidden: unable to validate against any security context constraint: [provider "anyuid": Forbidden: not usable by user or serviceaccount, provider restricted-v2: .containers[0].runAsUser: Invalid value: 2020: must be in the ranges: [1001060000, 1001069999], provider "restricted": Forbidden: not usable by user or serviceaccount, provider restricted-seccomp: .containers[0].runAsUser: Invalid value: 2020: must be in the ranges: [1001060000, 1001069999], provider "nonroot-v2": Forbidden: not usable by user or serviceaccount, provider "nonroot": Forbidden: not usable by user or serviceaccount, provider "pcap-dedicated-admins": Forbidden: not usable by user or serviceaccount, provider "hostmount-anyuid": Forbidden: not usable by user or serviceaccount, provider "machine-api-termination-handler": Forbidden: not usable by user or serviceaccount, provider "hostnetwork-v2": Forbidden: not usable by user or serviceaccount, provider "hostnetwork": Forbidden: not usable by user or serviceaccount, provider "hostaccess": Forbidden: not usable by user or serviceaccount, provider "splunkforwarder": Forbidden: not usable by user or serviceaccount, provider "node-exporter": Forbidden: not usable by user or serviceaccount, provider "privileged": Forbidden: not usable by user or serviceaccount]

Even if you can make the certificate-renewal job run, the job log shows this error 

time="2024-01-18T02:50:53Z" level=info msg="Skipping configuring HTTPS_PROXY since HTTP_PROXY empty"
time="2024-01-18T02:50:53Z" level=info msg="System time: 2024-01-18 02:50:53.786904878 +0000 UTC m=+0.382387216, certificate expiration: 2024-01-16 16:20:19 +0000 UTC"
time="2024-01-18T02:50:53Z" level=info msg="Days until expiration -1"
time="2024-01-18T02:50:53Z" level=info msg="Generating private key..."
time="2024-01-18T02:51:00Z" level=info msg="Generating CSR..."
time="2024-01-18T02:51:00Z" level=info msg="Current cerificate serial: af84d268344f61ecd4f2292141a4ba06"
time="2024-01-18T02:51:00Z" level=info msg="Requesting new certificate from https://anypoint.mulesoft.com/amc/registration-facade/api/v1/organizations/xxxxx4/agents/exxxx/renew..."
time="2024-01-18T02:51:00Z" level=warning msg="{\"timestamp\":\"2024-01-18T02:51:00.931+00:00\",\"status\":409,\"error\":\"Conflict\",\"message\":\"Certificate was already exipired, please renew cert!\",\"path\":\"/api/v1/organizations/xxx/agents/xxx/renew\"}"
time="2024-01-18T02:51:00Z" level=fatal msg="Failed to Rotate (409)"


CAUSE

The certificate-renewal job renews the RTF agent client certificate before it expires. The job can't run due to a missing service account, leading to the expired agent-client certificate. The certificate-renewal job can not renew the certificate with an expired client certificate. 

SOLUTION

Step 1: Manually renew the expired certificate

1). Please open a case with MuleSoft Support to get a new certificate bundle for each of your RTF clusters (each cluster will need a separate certificate generated for a specific cluster ID). Please specify the organization and cluster name in the case
2). The Mulesoft support engineer will upload a certificate bundle file to the case. The bundle tar.gz file contains three files

1. agent.key
2. curl-response
3. replace_cert.sh
3). Download this bundle to the host which can connect to the impacted cluster. Note that each cluster needs a different bundle. Make ensure to use the right bundle with the cluster
4) You can check your current RTF certificate expiration running "rtfctl status"
5) If your RTF namespace is different than "rtf" please edit the "replace_cert.sh" file and replace the "rtf" value in the "NAMESPACES" variable with your righ value.
6) Run "replace_cert.sh" with other files in the same directory
7) The script will renew the certificates and reset the RTF components. The cluster should be connected in 5 minutes.
8) If you see any errors during this process please contact the MuleSoft support team.
9) You can run "rtfctl status" to check the new expiration of the just uploaded certificate.

Step2: Add the missing service account

To ensure the certificate-renewal job can renew the certificate automatically next time. Please reapply the role binding template in step #5 (https://docs.mulesoft.com/runtime-fabric/latest/install-openshift#before-you-begin-2). It missed the rtf-certificate-renewal service account before. This will fix the issue that certificate-renewal failed to run. 

...
  - kind: ServiceAccount
    name: rtf-certificate-renewal
    namespace: <rtf_namespace>
Nummer des Knowledge-Artikels

001117285

 
Laden
Salesforce Help | Article