Troubleshooting
Last updated
Last updated
This chapter contains some miscellaneous troubleshooting tips for the embedded Ansible automation engine, and for the running playbooks themselves.
When troubleshooting why playbooks are not running as intended, it can sometimes be useful to confirm that the embedded Ansible automation engine - which is based on the AWX project - is running correctly. Although the Embedded Ansible server role can be started on several appliances in a region, it will only be active on one appliance.
The troubleshooting steps should be carried out on the appliance with the active Embedded Ansible role. The server with the currently active role can be determined from the Configuration -> Diagnostics (accordion) Region -> Servers by Roles tab (see screenshot Servers by Roles in Region).
The AWX processes are managed by supervisord, and can be checked using supervisorctl
as follows:
The supervisord configuration is located in /etc/supervisord.d/tower.ini.
There are three systemd services associated with AWX, as follows:
The status of these three services can be seen using the command:
The health of AWX can be checked from Rails. The EmbeddedAnsible.new.running?
method checks that the supervisord, nginx and rabbitmq-server services are running correctly.
The Rails EmbeddedAnsible.new.alive?
method pings the Ansible server using the auto-configured credentials. This verifies that the credentials are setup correctly.
The AWX services write to 2 log files:
There is also some AWX-related logging in /var/www/miq/vmdb/log/evm.log.
The AWX admin
account password is randomly generated during installation (when the Embedded Ansible server role is first enabled), and these credentials are used by CloudForms / ManageIQ to access the internal AWX API. If required for troubleshooting purposes the password string can be retrieved using the Rails console, for example:
AWX maintains all playbooks and python libraries in a virtual environment under /var/lib/awx/venv. To install or update anything in the virtual environment the activate/deactivate
commands should be used, for example:
Each invocation of an embedded Ansible playbook service or method is implemented by the running of an embedded Ansible (AWX) job.
On CloudForms 4.6 / ManageIQ Gaprindashvili, each time an embedded Ansible job runs, up to three .out files are created in /var/lib/awx/job_status on the CFME or ManageIQ appliance with the active Embedded Ansible role. The first two of these files show the results of synchronising the git repository and updating any roles, and the last file contains the output from the automation playbook itself.
The directory can be monitored for new files using the command watch "ls -lrt | tail -10"
With CloudForms 4.7 / ManageIQ Hammer the job status is recorded to the database and so is not available for viewing in this way.
The option of whether to log playbook output to evm.log can be made when the playbook service or method is created or edited (see screenshot Setting Logging Output).
The desired log verbosity can be selected when the playbook service or method is created or edited (see screenshot Setting Logging Verbosity).
This log verbosity affects the output to the job *.out file as well as any log output to evm.log.
If the Max TTL (mins) value for a playbook method is too low the ManageIQ::Providers::EmbeddedAnsible::AutomationManager::PlaybookRunner class will terminate the playbook job with an error such as:
The Max TTL (mins) value should be set to the maximum expected run-time in the Ansible playbook method definition.
The manageiq-automate
and manageiq-vmdb
modules can fail to connect (or authenticate) back to a valid API endpoint if the manageiq.api_url
playbook variable contains the IP address of a different appliance to the one that launched the playbook. In this situation an error similar to the following is seen in the playbook output:
In a multi-appliance region the value of manageiq.api_url
could be randomly set to any appliance with the Web Services server role enabled. The value of the manageiq.api_token
variable is used to authenticate the connection request back to the API, but this could fail unless the Configuration -> Advanced session_store
setting is set to "sql" on every appliance with the Web Services role enabled.
Note
The evmserverd service must be restarted after changing the Configuration -> Advanced
session_store
setting.
The manageiq-automate
and manageiq-vmdb
modules can also fail with the same error if the server at the manageiq.api_url
is experiencing problems with its evmserverd
service, or if the server or its service has been stopped less than 10 minutes prior to the connection attempt.
When the Embedded Ansible server role starts, the ManageIQ::Providers::EmbeddedAnsible::AutomationManager provider configures itself in the zone that the appliance is a member of. If the appliance is subsequently moved to a different zone, the provider does not immediately move, and so any new jobs may appear to be stalled as their messages still target the old zone.
If an appliance running the Embedded Ansible server role is moved between zones, the server role should be disabled and re-enabled. This will restart the ManageIQ::Providers::EmbeddedAnsible::AutomationManager provider in the correct zone.
This chapter has described some of the troubleshooting steps that can be taken to diagnose problems when running Ansible playbook methods or services.