On the surface, automation security looks simple: Scan the script/code, apply some policies, and that’s it. But you soon discover that every solution uncovers new security edge cases.
When it comes to Ansible playbooks, security isn’t just about what you automate, but how securely you automate it. After all, Ansible often holds the keys to your entire infrastructure. A small misstep in a playbook doesn’t just affect one server; it can scale a vulnerability across hundreds or thousands of systems at once.
With thousands of available modules and countless ways to structure your automation, the real question is how to ensure that your playbooks themselves don’t become security vulnerabilities.
That’s what we’ll cover in this article. We’ll start by exploring why playbook security matters, then walk through the most common risks, and finally share best practices to help you harden your Ansible automation.
Ansible security matters because it directly affects the integrity, confidentiality, and availability of the systems it manages. If an attacker compromises Ansible, they can gain privileged access to a wide range of infrastructure components.
Securing Ansible reduces the risk of infrastructure-wide breaches and ensures compliance with security best practices. It is especially important in automated CI/CD and infrastructure-as-code environments where Ansible actions have a broad impact.
Before exploring the specifics, it’s essential to understand why securing your playbooks should be a priority.
As previously mentioned, Ansible holds the keys to your infrastructure. For many organizations, all automation runs through Ansible, which significantly reduces human error and makes your infrastructure more repeatable.
However, this centralization creates an attack vector from a security perspective. If your playbooks are compromised, an attacker doesn’t just gain access to one system; they potentially gain access to your entire infrastructure, with the same privileges that your automation uses. This means that poorly secured playbooks can become a vector for lateral movement.
Ansible is generally not viewed as a security risk, given that 88% of all data breaches result from human error. Introducing automation via Ansible doesn’t solve the problem; it simply introduces a new layer of concerns.
Instead of worrying about whether a human will misconfigure a server or accidentally expose sensitive data, you now have to worry about whether your playbooks are doing these things systematically across your entire infrastructure.
A single poorly written task can propagate the same vulnerability to hundreds or thousands of systems simultaneously.
What’s worse is that these automated misconfigurations often occur at scale and follow consistent patterns, making them easier for attackers to discover and harder for security teams to catch, as they resemble “normal” automated deployment activity. The very consistency that makes Ansible powerful for legitimate operations also makes it dangerously efficient when something goes wrong.
Here are some common security risks when using Ansible:
- Hardcoded secrets and credentials: A surprisingly common mistake is embedding passwords, API keys, and other sensitive data directly in playbooks. When these playbooks end up in version control or shared repositories, those secrets become accessible to anyone with repository access and potentially to the entire internet if the repo is public.
- Excessive privilege escalation: Many playbooks use
become: yesor run with root privileges by default, even when only specific tasks require elevated permissions. This violates the principle of least privilege, meaning that if any task in your playbook is compromised, an attacker gains full system access rather than limited permissions. - Insecure variable handling: Variables containing sensitive information often get logged, displayed in output, or stored in places where they shouldn’t be. Without proper
no_logdirectives and variable scoping, sensitive data can leak through Ansible’s verbose output or get cached in temporary files. - Weak or missing input validation: Playbooks that accept user input or external data without proper validation can become vectors for injection attacks. This is especially dangerous when variables are used in shell commands or when external data sources aren’t properly sanitized before being processed.
- Insecure communication channels: While Ansible uses SSH by default, misconfigurations in your
ansible.cfgor inventory can create serious vulnerabilities. Settinghost_key_checking = Falsemight be convenient, but it disables SSH’s host verification and opens you up to man-in-the-middle attacks.
Similarly, usingansible_ssh_common_argswith options like-o StrictHostKeyChecking=noor-o UserKnownHostsFile=/dev/nullbypasses security checks.
Given the common risks your playbooks can identify, how do you address them? And more importantly, how do you identify and resolve the less common issues? Let’s take a look below:
1. Harden secrets management
While this is the most obvious point to start, there are a few parts often missed when it comes to properly securing sensitive data in your playbooks.
The foundation of Ansible secrets management is Ansible Vault, which encrypts your sensitive variables at rest.Â
Instead of hardcoding passwords directly in your playbooks, you should store them in encrypted vault files:
Typically, it will be stored in a variables or “vars” file, which looks something like this:
# vars/secrets.yml (encrypted with ansible-vault)
db_password: !vault |
$ANSIBLE_VAULT;1.1;AES256
65353065326130363162353463643064656132653266303738393337306435613261346662343334
6166636130373430623863323463636265346564613431620a306264643936643261656233363064
66643339653464323836363830393732326565376362656265396339666362373733376535396433
3764303636656431370a393335643363633465316130646635613962346332373739343966323365
63303761636165353533366362333232303864636139346462633632643562346636You can then reference these variables, as shown below:
# playbook.yml
- hosts: webservers
vars_files:
- vars/secrets.yml
tasks:
- name: Configure database connection
template:
src: config.j2
dest: /etc/myapp/config.yml
vars:
db_password: "{{ db_password }}" Vault files are only as secure as their encryption keys. A common mistake is committing a vault password file to version control.Â
Instead, store your vault keys securely outside your repository and reference them through environment variables or external key management systems. Never put .vault_pass files in your Git repository.Â
Another common pitfall is directly setting secrets via environment variables in your playbooks or inventory. This approach leaves sensitive data exposed in process lists and shell history. Instead, use the Ansible CLI to pass sensitive values securely. It’s important to remember that your control node is now a possible attack vector, and you should treat it as part of your threat model.
# Wrong: exposes secrets in process list
ansible-playbook -e "api_key=secret123" deploy.yml
# Better: prompt for sensitive input
ansible-playbook --ask-vault-pass deploy.yml
# Best: use vault variables
ansible-playbook --vault-password-file ~/.ansible/vault_pass deploy.ymlAWS users can access a native module that integrates with AWS Secrets Manager, so you can look up secrets directly using Ansible without storing them in your playbooks at all:
- name: Retrieve database credentials from AWS Secrets Manager
set_fact:
db_credentials: "{{ lookup('amazon.aws.secretsmanager_secret', 'prod/database/credentials', region='us-east-1') | from_json }}"
- name: Use the retrieved credentials
postgresql_user:
name: myapp
password: "{{ db_credentials.password }}"
login_host: "{{ db_credentials.host }}"
login_user: "{{ db_credentials.username }}"
login_password: "{{ db_credentials.password }}"This approach keeps your secrets in a dedicated secrets management service where they can be properly audited, rotated, and access-controlled, rather than living alongside your playbooks.
2. Enforce least privilege and scoped execution
As previously mentioned, a common mistake is slapping become: yes at the playbook level and calling it a day.Â
This approach might get your automation working quickly, but it essentially gives every task in your playbook root privileges — even tasks that should not be running as root.
Consider this typical scenario where privilege escalation is overused:
# Bad: Everything runs as root
- hosts: webservers
become: yes # This affects ALL tasks
tasks:
- name: Install packages
yum:
name: nginx
state: present
- name: Copy configuration file
copy:
src: nginx.conf
dest: /etc/nginx/nginx.conf
- name: Create application directory
file:
path: /var/www/myapp
state: directory
owner: nginx
group: nginx
- name: Deploy application code
git:
repo: https://github.com/myorg/myapp.git
dest: /var/www/myappA better approach is to avoid become altogether for certain tasks by using modules that handle permissions properly or by structuring your playbooks to work within existing user permissions:
- hosts: webservers
tasks:
- name: Install packages
yum:
name: nginx
state: present
become: yes
- name: Copy configuration file
copy:
src: nginx.conf
dest: /etc/nginx/nginx.conf
backup: yes
become: yes
- name: Ensure nginx user owns app directory
file:
path: /var/www/myapp
state: directory
owner: nginx
group: nginx
recurse: yes
become: yes
- name: Deploy application code
git:
repo: https://github.com/myorg/myapp.git
dest: /var/www/myapp
become_user: nginx
become: yesThe principle here is simple: Grant the minimum permissions necessary for each individual task to succeed, rather than the maximum permissions that might be convenient for the entire playbook.
Use become: yes (root privileges) only when tasks require system-level access, such as:
- Installing packages
- Modifying system configuration files in /etc
- Managing system services, or creating system users.
These operations genuinely need elevated permissions and cannot be accomplished otherwise.
Then, use become_user when you need to run tasks as a specific non-root user, such as:Â
- Deploying application code as the app user
- Creating files that should be owned by a service account
- Running commands that need to execute under a particular user context.
This is especially useful for web applications where your code should run as www-data or nginx, not as root.
3. Lock down the content supply chain
Given the numerous supply chain attacks over the last few years, it’s crucial to monitor any packages or binaries your playbooks rely on.Â
Whenever you’re using get_url or similar modules to download release binaries from URLs, it’s important that you verify the download matches the expected checksum. It’s an extra step, but it’s way cheaper than remediating a security breach caused by a compromised binary.
- name: Get checksum from URL
ansible.builtin.uri:
url: https://example.com/path/to/file.zip.sha256sum
return_content: true
register: checksum_content
- name: Download file and verify checksum from URL
ansible.builtin.get_url:
url: https://example.com/path/to/file.zip
dest: /tmp/file.zip
checksum: "sha256:{{ checksum_content.content.split(' ')[0] }}" Leverage GPG key verification
For packages and releases that provide GPG signatures, old reliable GPG key verification adds another layer of authenticity checking. Ansible can handle GPG verification through several approaches:
- name: Import GPG key for package verification
rpm_key:
state: present
key: https://packages.cloud.google.com/yum/doc/yum-key.gpg
- name: Add repository with GPG checking enabled
yum_repository:
name: kubernetes
description: Kubernetes Repository
baseurl: https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64
enabled: yes
gpgcheck: yes
repo_gpgcheck: yes
gpgkey: https://packages.cloud.google.com/yum/doc/yum-key.gpgFor manually downloaded files with detached signatures, you can verify them before installation:
- name: Download GPG signature
get_url:
url: https://github.com/example/tool/releases/download/v1.0.0/tool-linux-amd64.sig
dest: /tmp/tool-linux-amd64.sig
- name: Verify GPG signature
command: gpg --verify /tmp/tool-linux-amd64.sig /tmp/tool-linux-amd64
register: gpg_verification
failed_when: gpg_verification.rc != 04. Secure transport and inventory hygiene
With SSH being the backbone for Ansible’s communication with your infrastructure, it is important to look after the transport settings as well as your inventory.Â
Ansible allows you to pass additional SSH parameters through the ansible_ssh_common_args variable or the ssh_args setting in your ansible.cfg. These arguments are passed directly to the underlying SSH client.
When Ansible establishes connections, behind the scenes, it runs commands like ssh -o StrictHostKeyChecking=yes -o UserKnownHostsFile=~/.ssh/known_hosts target_host, and you can modify this behavior by injecting additional -o parameters.Â
Within your group_vars file, you can add additional SSH commands using:
ansible_ssh_common_args: >-
-o StrictHostKeyChecking=yes
-o UserKnownHostsFile=~/.ssh/known_hosts
-o PasswordAuthentication=no
-o PubkeyAuthentication=yes
-o PreferredAuthentications=publickeyIn this way, you can enable certain SSH settings for a group of hosts or even strengthen the cipher algorithm if you have -o Ciphers set.
If you’d like your settings to persist playbook-wide, you can set SSH options through the ansible.cfg file using:
[ssh_connection]
ssh_args = -o StrictHostKeyChecking=yes -o UserKnownHostsFile=~/.ssh/known_hosts -o PasswordAuthentication=no -o PreferredAuthentications=publickeyBeyond standard SSH parameters, Ansible has its own connection settings that can impact both performance and security. One of the most commonly used is SSH pipelining, which can significantly speed up playbook execution but comes with security considerations:
[ssh_connection]
pipelining = TrueSSH pipelining allows Ansible to execute multiple commands in a single SSH connection rather than opening a new connection for each task. While this improves performance, it requires requiretty to be disabled in your sudoers configuration.
Security spotlight: Paramiko
Paramiko is the Python library Ansible uses for SSH authentication client-side, so you should watch for security vulnerabilities as much as you can. When security researchers discover vulnerabilities in Paramiko, it is always worth looking into just to be sure you are not affected.
Upgrading your version of Paramiko is relatively easy. You can use the command below:
pip install --upgrade ansible paramikoIt is impossible to overstate the importance of not storing plaintext passwords in your inventory file. If you must store passwords, ensure you use an Ansible vault.
5. Harden ansible.cfg and runtime defaults
Your ansible.cfg can be used to set global defaults for your playbooks, making it a prime target for misconfiguration. A few simple configuration changes can significantly reduce your attack surface and limit the blast radius if something goes wrong.Â
If you have log_path set in your ansible.cfg, you should be careful, as you might be logging sensitive information without realizing it. While logging is useful for troubleshooting, it can become a security liability when passwords, API keys, or other sensitive data end up in plaintext log files:
[defaults]
log_path = /var/log/ansible.log # This logs everything, including sensitive dataIf you must enable logging, ensure your log files have restrictive permissions and consider using log rotation with secure deletion:
chmod 600 /var/log/ansible.log
chown ansible:ansible /var/log/ansible.logThis grants read-write access to the Ansible user.Â
Create dedicated Ansible service accounts
Instead of running Ansible with your personal user account or a generic service account, create a dedicated Ansible user with limited sudo privileges. This reduces the risk that a compromised user account can take over the entire machine:
[defaults]
remote_user = ansible-svc
become_user = root
become_method = sudo
[privilege_escalation]
become = False This ensures all playbooks will only run with root permissions when specified via the become field in a task.Â
For large fleets of VMs, you can make this sudo user setup part of your cloud-init configuration, which typically looks like this:
users:
- name: ansible-svc
sudo:
- 'ALL=(root) NOPASSWD: /usr/bin/yum, /bin/systemctl'
- 'ALL=(www-data) NOPASSWD: /bin/mkdir /var/www/*, /bin/chown www-data\: /var/www/*'
ssh_authorized_keys:
- ssh-rsa AAAAB3NzaC1yc2E... ansible-automation-key
shell: /bin/bash
groups: sudoThis allows you to set specific binaries that your Ansible user can access, reducing the likelihood of omitting one.Â
6. Audit, test, and gate changes
Your automation will probably change over time, which is why it is important to prioritize processes that enable you to catch misconfigurations.
The most obvious place to start is your continuous integration. If you prefer GitHub Actions, there is an official GitHub Action for linting, which can help you catch common errors:Â
# .github/workflows/ansible-lint.yml
name: Ansible Lint
on: [push, pull_request]
jobs:
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Run ansible-lint
uses: ansible/ansible-lint@mainLinting is only one piece of the puzzle. Part of creating reliable automation is ensuring it does not break in production.
Thankfully, Ansible has a mature testing framework called Molecule. This itself can be an entire guide, but writing a Molecule test typically looks like this:
# molecule/default/molecule.yml
dependency:
name: galaxy
driver:
name: docker
platforms:
- name: instance
image: quay.io/ansible/ubuntu2204-test-container:latest
provisioner:
name: ansible
verifier:
name: ansiblemolecule.yml allows you to specify the target on which your playbook will run. This example uses Ubuntu. However, you could switch it out for a distribution that matches your needs.Â
# molecule/default/verify.yml
- name: Verify
hosts: all
gather_facts: false
tasks:
- name: Verify nginx is listening on expected ports
ansible.builtin.shell: "ss -tuln | grep -E ':80|:443'"
register: nginx_ports
changed_when: false
failed_when: nginx_ports.rc != 0
- name: Verify nginx.conf has secure permissions
ansible.builtin.stat:
path: /etc/nginx/nginx.conf
register: nginx_conf_stat
- name: Fail if nginx.conf is too permissive
ansible.builtin.fail:
msg: "nginx.conf has insecure permissions: {{ nginx_conf_stat.stat.mode }}"
when: nginx_conf_stat.stat.mode|int(base=8) > 640
- name: Verify no world-writable files exist in nginx config directory
ansible.builtin.shell: "find /etc/nginx -type f -perm -002"
register: world_writable_files
changed_when: false
failed_when: world_writable_files.stdout != ""
- name: Verify nginx process is not running as root
ansible.builtin.shell: "ps -o user= -C nginx | grep -v '^root$'"
register: nginx_users
changed_when: false
failed_when: nginx_users.rc != 0From here on, you can write tests to ensure important services aren’t being overly permissive or exposed on ports you do not expect.Â
7. Shift-left checks & policy gates
Linting and testing are integral parts of the process, but how do you reliably gate changes to ensure they conform to an organizational standard?
Spacelift users have native policies to help them check Ansible code at the playbook level. These policies enable you to ensure that specific tasks are not run on private node pools, or you can use an approval policy to confirm everyone on the team is aligned before a change goes live.
Spacelift policies are written in Rego, so users of Open Policy Agent do not need to learn a new domain-specific language (DSL).Â
Building on top of policies is the idea of shifting left, which simply means catching security issues as early as possible in the development cycle rather than waiting until they reach production. A good way to think about this in terms of security automation is the idea of preventive controls versus detective controls.
Instead of monitoring for problems after they occur, you prevent them from happening in the first place. For example, rather than scanning your infrastructure for hardcoded secrets after deployment, you can block any playbook containing plaintext passwords from ever being executed.
When you entrust your infrastructure-as-code (IaC) pipelines to an external platform, you’re really handing over three things: control of sensitive credentials, visibility into what’s changing, and confidence that tomorrow’s run will behave exactly like today’s. Spacelift is engineered so you never have to surrender those assurances.
Spacelift is audited against SOC 2 Type II and its controls are aligned to GDPR. You authenticate through single sign-on (SAML 2.0 or OIDC), inheriting your IdP’s MFA rules and user-lifecycle management.
With Spacelift, you also get:
- Private worker pools that you host, giving you full control over OS hardening, network egress, and secrets management. Run state is end-to-end, asymmetrically encrypted. Only your pool’s private key can decrypt it.
- OIDC-based cloud roles and short-lived API tokens — no static keys.
- Audit trail allows you to record all actions, changes, and events that happen inside your Spacelift account. You can ensure data integrity, thus protecting sensitive information and maintaining user trust.
- Policies to control what kind of resources engineers can create, what parameters they can have, how many approvals you need for a run, what kind of task you execute, what happens when a pull request is open, and where to send your notifications.
- Stack dependencies to build multi-infrastructure automation workflows with dependencies, having the ability to build a workflow that, for example, generates your EC2 instances using Terraform and combines them with Ansible to configure them.
- Blueprints provide self-service templates so dev teams can launch new environments without waiting on ops. They can also surface directly in a ServiceNow catalog for ITSM approvals.
- Contexts package shared environment variables, files, or hooks so you write them once and reuse them safely everywhere.
- Drift detection with optional auto-reconciliation to keep reality in sync with code.
If you want to learn more about Spacelift and how to use Spacelift with Ansible, check our documentation, read our Ansible guide, or book a demo with one of our engineers.
Using a multi-layered approach to security automation will yield great results. Just as attackers often leverage multiple exploits to gain hold of a system, treating your security as layers will help protect your playbooks from becoming an attack vector.Â
In this post, we looked at Ansible playbook security automation, starting with an overview of what it is and why it is used. We explored the primitives Ansible provides for creating security automations, including the testing framework Molecule. Finally, we examined common pitfalls that could become attack vectors when writing Ansible playbooks.Â
The key takeaway is simple: Automation magnifies both your strengths and your mistakes. By treating playbook security as a primary concern, you ensure that tools designed to make your infrastructure safer and more reliable don’t end up working against you. Start small, layer your defenses, and you will ensure that your Ansible automation remains a powerful asset rather than a liability.
Manage Ansible better with Spacelift
Managing large-scale playbook execution is hard. Spacelift enables you to automate Ansible playbook execution with visibility and control over resources, and seamlessly link provisioning and configuration workflows.
