:_mod-docs-content-type: ASSEMBLY [id="troubleshooting-key-manager-hsm-adoption_{context}"] = Troubleshooting Key Manager HSM adoption :context: troubleshooting-hsm [role="_abstract"] Review troubleshooting guidance for common issues that you might encounter while you perform the HSM-enabled Key Manager (Barbican) service adoption. == Next steps If issues persist after following the troubleshooting guide: * Collect adoption logs and configuration for analysis. * Check the HSM vendor documentation for vendor-specific troubleshooting. * Verify HSM server status and connectivity independently. * Review the adoption summary report for additional diagnostic information. == Resolving configuration validation failures .Problem [role="_abstract"] If the adoption fails with validation errors about placeholder values, replace the placeholder values with your environment's configuration values. Validation errors about placeholder values look similar to the following example: + ---- TASK [Validate all required variables are set] **** fatal: [localhost]: FAILED! => { "msg": "Required variable proteccio_certs_path contains placeholder value." } ---- .Procedure . Edit your hardware security module configuration in the Zuul job vars or CI framework configuration file. . Replace all placeholder values with actual configuration values for your environment. Check the following key variables: + ---- cifmw_hsm_password: cifmw_barbican_proteccio_partition: cifmw_barbican_proteccio_mkek_label: cifmw_barbican_proteccio_hmac_label: cifmw_hsm_proteccio_client_src: cifmw_hsm_proteccio_conf_src: ---- . Verify that no placeholder values remain in your configuration. == Resolving missing HSM file prerequisites [role="_abstract"] If the adoption fails because hardware security module (HSM) certificates or client software cannot be found, update your configuration to point to the files in their specific locations. The error looks similar to the following example: + ---- TASK [Validate Proteccio prerequisites exist] **** fatal: [localhost]: FAILED! => { "msg": "Proteccio client ISO not found: /opt/proteccio/Proteccio3.06.05.iso" } ---- .Procedure . Verify that all required HSM files are accessible from the configured URLs: + .Example .Example [source,bash] ---- $ curl -I https://your-server/path/to/Proteccio3.06.05.iso $ curl -I https://your-server/path/to/proteccio.rc $ curl -I https://your-server/path/to/client.crt $ curl -I https://your-server/path/to/client.key ---- . If the files are in different locations, update the URL variables in your configuration: + .Example ---- cifmw_hsm_proteccio_client_src: "https://correct-server/path/to/Proteccio3.06.05.iso" cifmw_hsm_proteccio_conf_src: "https://correct-server/path/to/proteccio.rc" cifmw_hsm_proteccio_client_crt_src: "https://correct-server/path/to/client.crt" cifmw_hsm_proteccio_client_key_src: "https://correct-server/path/to/client.key" ---- . Check the network connectivity and authentication to ensure that the URLs are accessible from the CI environment. == Resolving source environment connectivity issues [role="_abstract"] If the adoption cannot connect to the source {rhos_prev_long} environment to extract the configuration, check your SSH connectivity to the source Controller node and update the configuration if needed. The error for this issue looks similar to the following example: + ---- TASK [detect source environment HSM configuration] **** fatal: [localhost]: FAILED! => { "msg": "SSH connection to source environment failed" } ---- .Procedure . Verify SSH connectivity to the source Controller node: + [source,bash] ---- $ ssh -o StrictHostKeyChecking=no tripleo-admin@controller-0.ctlplane ---- . Update the `controller1_ssh` variable if needed: + ---- controller1_ssh: "ssh -o StrictHostKeyChecking=no tripleo-admin@" ---- . Ensure that the SSH keys are properly configured for passwordless access. == Resolving HSM secret creation failures .Problem HSM secrets cannot be created in the target environment. .Symptoms ---- TASK [Create HSM secrets in target environment] **** fatal: [localhost]: FAILED! => { "msg": "Failed to create secret proteccio-data" } ---- [role="_abstract"] If hardware security module (HSM) secrets cannot be created in the target environment, it might mean that you need to update the names of your secrets in your source configuration file. The error looks similar to the following example: + ---- TASK [Create HSM secrets in target environment] **** fatal: [localhost]: FAILED! => { "msg": "Failed to create secret proteccio-data" } ---- .Procedure . Verify target environment access: + [source,bash] ---- $ export KUBECONFIG=/path/to/.kube/config $ oc get secrets -n openstack ---- . Check if secrets already exist: + [source,bash] ---- $ oc get secret proteccio-data hsm-login -n openstack ---- . If secrets exist with different names, update the configuration variables: + [source,yaml] ---- proteccio_login_secret_name: "your-hsm-login-secret" proteccio_client_data_secret_name: "your-proteccio-data-secret" ---- == Resolving custom image registry issues [role="_abstract"] If you see the following error indicating that custom Barbican images cannot be pushed to or pulled from the configured registry, you can verify the authentication, test image push permissions, and then update the configuration as needed. + ---- TASK [Create Proteccio-enabled Barbican images] **** fatal: [localhost]: FAILED! => { "msg": "Failed to push image to registry" } ---- .Procedure . Verify registry authentication: + [source,bash] ---- $ podman login ---- . Test image push permissions: + [source,bash] ---- $ podman tag hello-world //test:latest $ podman push //test:latest ---- . Update registry configuration variables if needed: + [source,yaml] ---- cifmw_update_containers_registry: "your-registry:5001" cifmw_update_containers_org: "your-namespace" cifmw_image_registry_verify_tls: false ---- == Resolving HSM back-end detection failures [role="_abstract"] If the adoption role cannot detect hardware security module (HSM) configuration in the source environment, you must force the HSM adoption. The error looks similar to the following example: + ---- TASK [detect source environment HSM configuration] **** ok: [localhost] => { "msg": "No HSM configuration found - using standard adoption" } ---- .Procedure . Manually verify that the HSM configuration exists in the source environment: + [source,bash] ---- $ ssh tripleo-admin@controller-0.ctlplane \ "sudo grep -A 10 '\[p11_crypto_plugin\]' \ /var/lib/config-data/puppet-generated/barbican/etc/barbican/barbican.conf" ---- . If HSM is configured but not detected, force HSM adoption by setting the `barbican_hsm_enabled` variable: + [source,yaml] ---- # In your Zuul job vars or CI framework configuration barbican_hsm_enabled: true ---- + This configuration ensures that the `barbican_adoption` role uses the HSM-enabled patch for {key_manager_first_ref} deployment. == Resolving database migration issues [role="_abstract"] If hardware security module (HSM) metadata is not preserved during database migration, check the database logs for any errors and verify that the source database includes the HSM secrets. The error looks similar to the following example: + ---- TASK [Verify database migration preserves HSM references] **** ok: [localhost] => { "msg": "HSM secrets found in migrated database: 0" } ---- .Procedure . Verify that the source database contains the HSM secrets: + [source,bash] ---- $ ssh tripleo-admin@controller-0.ctlplane \ "sudo mysql barbican -e 'SELECT COUNT(*) FROM secret_store_metadata WHERE key=\"plugin_name\" AND value=\"PKCS11\";'" ---- . Check the database migration logs for errors: + [source,bash] ---- $ oc logs deployment/barbican-api | grep -i migration ---- . If the migration failed, restore the database from backup and retry. == Resolving service startup failures [role="_abstract"] If the {key_manager_first_ref} services fail to start after the hardware security module (HSM) configuration is applied, check the configuration in the pod. The error for the services failing to start looks similar to the following example: + ---- $ oc get pods -l service=barbican NAME READY STATUS RESTARTS AGE barbican-api-xyz 0/1 Error 0 2m ---- .Procedure . Check pod logs for HSM connectivity issues: + [source,bash] ---- $ oc logs barbican-api-xyz ---- . Verify HSM library is accessible: + [source,bash] ---- $ oc exec barbican-api-xyz -- ls -la /usr/lib64/libnethsm.so ---- . Check HSM configuration in the pod: + [source,bash] ---- $ oc exec barbican-api-xyz -- cat /etc/proteccio/proteccio.rc ---- == Resolving performance and connectivity issues [role="_abstract"] If the hardware security module (HSM) operations are slow or fail intermittently, check the HSM connectivity and monitor the HSM server logs. .Procedure . Test HSM connectivity from {key_manager_first_ref} pods: + [source,bash] ---- $ oc exec barbican-api-xyz -- pkcs11-tool --module /usr/lib64/libnethsm.so --list-slots ---- . Check HSM server connectivity: + [source,bash] ---- $ oc exec barbican-api-xyz -- nc -zv ---- . Monitor HSM server logs for authentication or capacity issues. == Getting additional help If issues persist after following this troubleshooting guide: . Collect adoption logs and configuration for analysis . Check the HSM vendor documentation for vendor-specific troubleshooting . Verify HSM server status and connectivity independently . Review the adoption summary report for additional diagnostic information