Heading to KubeCon North America 2025?

Ansible Security Automation: Risks & 7 Best Practices

30 Oct 2025·16 min read

🚀 Level Up Your Infrastructure Skills

You focus on building. We’ll keep you updated. Get curated infrastructure insights that help you make smarter decisions.

On the surface, automation security looks simple: Scan the script/code, apply some policies, and that’s it. But you soon discover that every solution uncovers new security edge cases.

When it comes to Ansible playbooks, security isn’t just about what you automate, but how securely you automate it. After all, Ansible often holds the keys to your entire infrastructure. A small misstep in a playbook doesn’t just affect one server; it can scale a vulnerability across hundreds or thousands of systems at once.

With thousands of available modules and countless ways to structure your automation, the real question is how to ensure that your playbooks themselves don’t become security vulnerabilities.

That’s what we’ll cover in this article. We’ll start by exploring why playbook security matters, then walk through the most common risks, and finally share best practices to help you harden your Ansible automation.

Why Ansible security matters?

Ansible security matters because it directly affects the integrity, confidentiality, and availability of the systems it manages. If an attacker compromises Ansible, they can gain privileged access to a wide range of infrastructure components.

Securing Ansible reduces the risk of infrastructure-wide breaches and ensures compliance with security best practices. It is especially important in automated CI/CD and infrastructure-as-code environments where Ansible actions have a broad impact.

Before exploring the specifics, it’s essential to understand why securing your playbooks should be a priority.

As previously mentioned, Ansible holds the keys to your infrastructure. For many organizations, all automation runs through Ansible, which significantly reduces human error and makes your infrastructure more repeatable.

However, this centralization creates an attack vector from a security perspective. If your playbooks are compromised, an attacker doesn’t just gain access to one system; they potentially gain access to your entire infrastructure, with the same privileges that your automation uses. This means that poorly secured playbooks can become a vector for lateral movement.

Common security risks when using Ansible

Ansible is generally not viewed as a security risk, given that 88% of all data breaches result from human error. Introducing automation via Ansible doesn’t solve the problem; it simply introduces a new layer of concerns.

Instead of worrying about whether a human will misconfigure a server or accidentally expose sensitive data, you now have to worry about whether your playbooks are doing these things systematically across your entire infrastructure.

A single poorly written task can propagate the same vulnerability to hundreds or thousands of systems simultaneously.

What’s worse is that these automated misconfigurations often occur at scale and follow consistent patterns, making them easier for attackers to discover and harder for security teams to catch, as they resemble “normal” automated deployment activity. The very consistency that makes Ansible powerful for legitimate operations also makes it dangerously efficient when something goes wrong.

Here are some common security risks when using Ansible:

Hardcoded secrets and credentials: A surprisingly common mistake is embedding passwords, API keys, and other sensitive data directly in playbooks. When these playbooks end up in version control or shared repositories, those secrets become accessible to anyone with repository access and potentially to the entire internet if the repo is public.
Excessive privilege escalation: Many playbooks use become: yes or run with root privileges by default, even when only specific tasks require elevated permissions. This violates the principle of least privilege, meaning that if any task in your playbook is compromised, an attacker gains full system access rather than limited permissions.
Insecure variable handling: Variables containing sensitive information often get logged, displayed in output, or stored in places where they shouldn’t be. Without proper no_log directives and variable scoping, sensitive data can leak through Ansible’s verbose output or get cached in temporary files.
Weak or missing input validation: Playbooks that accept user input or external data without proper validation can become vectors for injection attacks. This is especially dangerous when variables are used in shell commands or when external data sources aren’t properly sanitized before being processed.
Insecure communication channels: While Ansible uses SSH by default, misconfigurations in your ansible.cfg or inventory can create serious vulnerabilities. Setting host_key_checking = False might be convenient, but it disables SSH’s host verification and opens you up to man-in-the-middle attacks.

Similarly, using ansible_ssh_common_args with options like -o StrictHostKeyChecking=no or -o UserKnownHostsFile=/dev/null bypasses security checks.

💡 You might also like:

Best practices for Ansible security automation

Given the common risks your playbooks can identify, how do you address them? And more importantly, how do you identify and resolve the less common issues? Let’s take a look below:

1. Harden secrets management

While this is the most obvious point to start, there are a few parts often missed when it comes to properly securing sensitive data in your playbooks.

The foundation of Ansible secrets management is Ansible Vault, which encrypts your sensitive variables at rest.

Instead of hardcoding passwords directly in your playbooks, you should store them in encrypted vault files:

Typically, it will be stored in a variables or “vars” file, which looks something like this:

# vars/secrets.yml (encrypted with ansible-vault)
db_password: !vault |
          $ANSIBLE_VAULT;1.1;AES256
          65353065326130363162353463643064656132653266303738393337306435613261346662343334
          6166636130373430623863323463636265346564613431620a306264643936643261656233363064
          66643339653464323836363830393732326565376362656265396339666362373733376535396433
          3764303636656431370a393335643363633465316130646635613962346332373739343966323365
          63303761636165353533366362333232303864636139346462633632643562346636

You can then reference these variables, as shown below:

# playbook.yml
- hosts: webservers
  vars_files:
    - vars/secrets.yml
  tasks:
    - name: Configure database connection
      template:
        src: config.j2
        dest: /etc/myapp/config.yml
      vars:
        db_password: "{{ db_password }}"

Vault files are only as secure as their encryption keys. A common mistake is committing a vault password file to version control.

Instead, store your vault keys securely outside your repository and reference them through environment variables or external key management systems. Never put .vault_pass files in your Git repository.

Another common pitfall is directly setting secrets via environment variables in your playbooks or inventory. This approach leaves sensitive data exposed in process lists and shell history. Instead, use the Ansible CLI to pass sensitive values securely. It’s important to remember that your control node is now a possible attack vector, and you should treat it as part of your threat model.

# Wrong: exposes secrets in process list
ansible-playbook -e "api_key=secret123" deploy.yml

# Better: prompt for sensitive input
ansible-playbook --ask-vault-pass deploy.yml

# Best: use vault variables
ansible-playbook --vault-password-file ~/.ansible/vault_pass deploy.yml

AWS users can access a native module that integrates with AWS Secrets Manager, so you can look up secrets directly using Ansible without storing them in your playbooks at all:

- name: Retrieve database credentials from AWS Secrets Manager
  set_fact:
    db_credentials: "{{ lookup('amazon.aws.secretsmanager_secret', 'prod/database/credentials', region='us-east-1') | from_json }}"

- name: Use the retrieved credentials
  postgresql_user:
    name: myapp
    password: "{{ db_credentials.password }}"
    login_host: "{{ db_credentials.host }}"
    login_user: "{{ db_credentials.username }}"
    login_password: "{{ db_credentials.password }}"

This approach keeps your secrets in a dedicated secrets management service where they can be properly audited, rotated, and access-controlled, rather than living alongside your playbooks.

2. Enforce least privilege and scoped execution

As previously mentioned, a common mistake is slapping become: yes at the playbook level and calling it a day.

This approach might get your automation working quickly, but it essentially gives every task in your playbook root privileges — even tasks that should not be running as root.

Consider this typical scenario where privilege escalation is overused:

# Bad: Everything runs as root
- hosts: webservers
  become: yes  # This affects ALL tasks
  tasks:
    - name: Install packages
      yum:
        name: nginx
        state: present
    
    - name: Copy configuration file
      copy:
        src: nginx.conf
        dest: /etc/nginx/nginx.conf
    
    - name: Create application directory
      file:
        path: /var/www/myapp
        state: directory
        owner: nginx
        group: nginx
    
    - name: Deploy application code
      git:
        repo: https://github.com/myorg/myapp.git
        dest: /var/www/myapp

A better approach is to avoid become altogether for certain tasks by using modules that handle permissions properly or by structuring your playbooks to work within existing user permissions:

- hosts: webservers
  tasks:
    - name: Install packages
      yum:
        name: nginx
        state: present
      become: yes
    
    - name: Copy configuration file
      copy:
        src: nginx.conf
        dest: /etc/nginx/nginx.conf
        backup: yes
      become: yes
    
    - name: Ensure nginx user owns app directory
      file:
        path: /var/www/myapp
        state: directory
        owner: nginx
        group: nginx
        recurse: yes
      become: yes
    
    - name: Deploy application code
      git:
        repo: https://github.com/myorg/myapp.git
        dest: /var/www/myapp
      become_user: nginx
      become: yes

The principle here is simple: Grant the minimum permissions necessary for each individual task to succeed, rather than the maximum permissions that might be convenient for the entire playbook.

Use become: yes (root privileges) only when tasks require system-level access, such as:

Installing packages
Modifying system configuration files in /etc
Managing system services, or creating system users.

These operations genuinely need elevated permissions and cannot be accomplished otherwise.

Then, use become_user when you need to run tasks as a specific non-root user, such as:

Deploying application code as the app user
Creating files that should be owned by a service account
Running commands that need to execute under a particular user context.

This is especially useful for web applications where your code should run as www-data or nginx, not as root.

3. Lock down the content supply chain

Given the numerous supply chain attacks over the last few years, it’s crucial to monitor any packages or binaries your playbooks rely on.

Whenever you’re using get_url or similar modules to download release binaries from URLs, it’s important that you verify the download matches the expected checksum. It’s an extra step, but it’s way cheaper than remediating a security breach caused by a compromised binary.

   - name: Get checksum from URL
      ansible.builtin.uri:
        url: https://example.com/path/to/file.zip.sha256sum
        return_content: true
      register: checksum_content

    - name: Download file and verify checksum from URL
      ansible.builtin.get_url:
        url: https://example.com/path/to/file.zip
        dest: /tmp/file.zip
        checksum: "sha256:{{ checksum_content.content.split(' ')[0] }}"

Leverage GPG key verification

For packages and releases that provide GPG signatures, old reliable GPG key verification adds another layer of authenticity checking. Ansible can handle GPG verification through several approaches:

- name: Import GPG key for package verification
  rpm_key:
    state: present
    key: https://packages.cloud.google.com/yum/doc/yum-key.gpg

- name: Add repository with GPG checking enabled
  yum_repository:
    name: kubernetes
    description: Kubernetes Repository
    baseurl: https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64
    enabled: yes
    gpgcheck: yes
    repo_gpgcheck: yes
    gpgkey: https://packages.cloud.google.com/yum/doc/yum-key.gpg

For manually downloaded files with detached signatures, you can verify them before installation:

- name: Download GPG signature
  get_url:
    url: https://github.com/example/tool/releases/download/v1.0.0/tool-linux-amd64.sig
    dest: /tmp/tool-linux-amd64.sig

- name: Verify GPG signature
  command: gpg --verify /tmp/tool-linux-amd64.sig /tmp/tool-linux-amd64
  register: gpg_verification
  failed_when: gpg_verification.rc != 0

4. Secure transport and inventory hygiene

With SSH being the backbone for Ansible’s communication with your infrastructure, it is important to look after the transport settings as well as your inventory.

Ansible allows you to pass additional SSH parameters through the ansible_ssh_common_args variable or the ssh_args setting in your ansible.cfg. These arguments are passed directly to the underlying SSH client.

When Ansible establishes connections, behind the scenes, it runs commands like ssh -o StrictHostKeyChecking=yes -o UserKnownHostsFile=~/.ssh/known_hosts target_host, and you can modify this behavior by injecting additional -o parameters.

Within your group_vars file, you can add additional SSH commands using:

ansible_ssh_common_args: >-
  -o StrictHostKeyChecking=yes
  -o UserKnownHostsFile=~/.ssh/known_hosts
  -o PasswordAuthentication=no
  -o PubkeyAuthentication=yes
  -o PreferredAuthentications=publickey

In this way, you can enable certain SSH settings for a group of hosts or even strengthen the cipher algorithm if you have -o Ciphers set.

If you’d like your settings to persist playbook-wide, you can set SSH options through the ansible.cfg file using:

[ssh_connection]
ssh_args = -o StrictHostKeyChecking=yes -o UserKnownHostsFile=~/.ssh/known_hosts -o PasswordAuthentication=no -o PreferredAuthentications=publickey

Beyond standard SSH parameters, Ansible has its own connection settings that can impact both performance and security. One of the most commonly used is SSH pipelining, which can significantly speed up playbook execution but comes with security considerations:

[ssh_connection]
pipelining = True

SSH pipelining allows Ansible to execute multiple commands in a single SSH connection rather than opening a new connection for each task. While this improves performance, it requires requiretty to be disabled in your sudoers configuration.

Security spotlight: Paramiko

Paramiko is the Python library Ansible uses for SSH authentication client-side, so you should watch for security vulnerabilities as much as you can. When security researchers discover vulnerabilities in Paramiko, it is always worth looking into just to be sure you are not affected.

Upgrading your version of Paramiko is relatively easy. You can use the command below:

pip install --upgrade ansible paramiko

It is impossible to overstate the importance of not storing plaintext passwords in your inventory file. If you must store passwords, ensure you use an Ansible vault.

5. Harden ansible.cfg and runtime defaults

Your ansible.cfg can be used to set global defaults for your playbooks, making it a prime target for misconfiguration. A few simple configuration changes can significantly reduce your attack surface and limit the blast radius if something goes wrong.

If you have log_path set in your ansible.cfg, you should be careful, as you might be logging sensitive information without realizing it. While logging is useful for troubleshooting, it can become a security liability when passwords, API keys, or other sensitive data end up in plaintext log files:

[defaults]
log_path = /var/log/ansible.log  # This logs everything, including sensitive data

If you must enable logging, ensure your log files have restrictive permissions and consider using log rotation with secure deletion:

chmod 600 /var/log/ansible.log
chown ansible:ansible /var/log/ansible.log

This grants read-write access to the Ansible user.

Create dedicated Ansible service accounts

Instead of running Ansible with your personal user account or a generic service account, create a dedicated Ansible user with limited sudo privileges. This reduces the risk that a compromised user account can take over the entire machine:

[defaults]
remote_user = ansible-svc
become_user = root
become_method = sudo

[privilege_escalation]
become = False

This ensures all playbooks will only run with root permissions when specified via the become field in a task.

For large fleets of VMs, you can make this sudo user setup part of your cloud-init configuration, which typically looks like this:

users:
  - name: ansible-svc
    sudo: 
      - 'ALL=(root) NOPASSWD: /usr/bin/yum, /bin/systemctl'
      - 'ALL=(www-data) NOPASSWD: /bin/mkdir /var/www/*, /bin/chown www-data\: /var/www/*'
    ssh_authorized_keys:
      - ssh-rsa AAAAB3NzaC1yc2E... ansible-automation-key
    shell: /bin/bash
    groups: sudo

This allows you to set specific binaries that your Ansible user can access, reducing the likelihood of omitting one.

6. Audit, test, and gate changes

Your automation will probably change over time, which is why it is important to prioritize processes that enable you to catch misconfigurations.

The most obvious place to start is your continuous integration. If you prefer GitHub Actions, there is an official GitHub Action for linting, which can help you catch common errors:

# .github/workflows/ansible-lint.yml 
name: Ansible Lint
on: [push, pull_request]
jobs:
  lint:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Run ansible-lint
        uses: ansible/ansible-lint@main

Linting is only one piece of the puzzle. Part of creating reliable automation is ensuring it does not break in production.

Thankfully, Ansible has a mature testing framework called Molecule. This itself can be an entire guide, but writing a Molecule test typically looks like this:

# molecule/default/molecule.yml
dependency:
  name: galaxy
driver:
  name: docker
platforms:
  - name: instance
    image: quay.io/ansible/ubuntu2204-test-container:latest
provisioner:
  name: ansible
verifier:
  name: ansible

molecule.yml allows you to specify the target on which your playbook will run. This example uses Ubuntu. However, you could switch it out for a distribution that matches your needs.

# molecule/default/verify.yml
- name: Verify
  hosts: all
  gather_facts: false
  tasks:
    - name: Verify nginx is listening on expected ports
      ansible.builtin.shell: "ss -tuln | grep -E ':80|:443'"
      register: nginx_ports
      changed_when: false
      failed_when: nginx_ports.rc != 0

    - name: Verify nginx.conf has secure permissions
      ansible.builtin.stat:
        path: /etc/nginx/nginx.conf
      register: nginx_conf_stat

    - name: Fail if nginx.conf is too permissive
      ansible.builtin.fail:
        msg: "nginx.conf has insecure permissions: {{ nginx_conf_stat.stat.mode }}"
      when: nginx_conf_stat.stat.mode|int(base=8) > 640

    - name: Verify no world-writable files exist in nginx config directory
      ansible.builtin.shell: "find /etc/nginx -type f -perm -002"
      register: world_writable_files
      changed_when: false
      failed_when: world_writable_files.stdout != ""

    - name: Verify nginx process is not running as root
      ansible.builtin.shell: "ps -o user= -C nginx | grep -v '^root$'"
      register: nginx_users
      changed_when: false
      failed_when: nginx_users.rc != 0

From here on, you can write tests to ensure important services aren’t being overly permissive or exposed on ports you do not expect.

7. Shift-left checks & policy gates

Linting and testing are integral parts of the process, but how do you reliably gate changes to ensure they conform to an organizational standard?

Spacelift users have native policies to help them check Ansible code at the playbook level. These policies enable you to ensure that specific tasks are not run on private node pools, or you can use an approval policy to confirm everyone on the team is aligned before a change goes live.

Spacelift policies are written in Rego, so users of Open Policy Agent do not need to learn a new domain-specific language (DSL).

Building on top of policies is the idea of shifting left, which simply means catching security issues as early as possible in the development cycle rather than waiting until they reach production. A good way to think about this in terms of security automation is the idea of preventive controls versus detective controls.

Instead of monitoring for problems after they occur, you prevent them from happening in the first place. For example, rather than scanning your infrastructure for hardcoded secrets after deployment, you can block any playbook containing plaintext passwords from ever being executed.

How to keep your workflows secure with Spacelift

When you entrust your infrastructure-as-code (IaC) pipelines to an external platform, you’re really handing over three things: control of sensitive credentials, visibility into what’s changing, and confidence that tomorrow’s run will behave exactly like today’s. Spacelift is engineered so you never have to surrender those assurances.

Spacelift is audited against SOC 2 Type II and its controls are aligned to GDPR. You authenticate through single sign-on (SAML 2.0 or OIDC), inheriting your IdP’s MFA rules and user-lifecycle management.

With Spacelift, you also get:

Private worker pools that you host, giving you full control over OS hardening, network egress, and secrets management. Run state is end-to-end, asymmetrically encrypted. Only your pool’s private key can decrypt it.
OIDC-based cloud roles and short-lived API tokens — no static keys.
Audit trail allows you to record all actions, changes, and events that happen inside your Spacelift account. You can ensure data integrity, thus protecting sensitive information and maintaining user trust.
Policies to control what kind of resources engineers can create, what parameters they can have, how many approvals you need for a run, what kind of task you execute, what happens when a pull request is open, and where to send your notifications.
Stack dependencies to build multi-infrastructure automation workflows with dependencies, having the ability to build a workflow that, for example, generates your EC2 instances using Terraform and combines them with Ansible to configure them.
Blueprints provide self-service templates so dev teams can launch new environments without waiting on ops. They can also surface directly in a ServiceNow catalog for ITSM approvals.
Contexts package shared environment variables, files, or hooks so you write them once and reuse them safely everywhere.
Drift detection with optional auto-reconciliation to keep reality in sync with code.

If you want to learn more about Spacelift and how to use Spacelift with Ansible, check our documentation, read our Ansible guide, or book a demo with one of our engineers.

Key points

Using a multi-layered approach to security automation will yield great results. Just as attackers often leverage multiple exploits to gain hold of a system, treating your security as layers will help protect your playbooks from becoming an attack vector.

In this post, we looked at Ansible playbook security automation, starting with an overview of what it is and why it is used. We explored the primitives Ansible provides for creating security automations, including the testing framework Molecule. Finally, we examined common pitfalls that could become attack vectors when writing Ansible playbooks.

The key takeaway is simple: Automation magnifies both your strengths and your mistakes. By treating playbook security as a primary concern, you ensure that tools designed to make your infrastructure safer and more reliable don’t end up working against you. Start small, layer your defenses, and you will ensure that your Ansible automation remains a powerful asset rather than a liability.

Manage Ansible better with Spacelift

Managing large-scale playbook execution is hard. Spacelift enables you to automate Ansible playbook execution with visibility and control over resources, and seamlessly link provisioning and configuration workflows.

Learn more

Written by

Divine Odazie

Divine is a Developer Advocate and Technical Writer who spends his days building, writing and contributing to open-source software. He is also contributing to Ansible’s OSS Documentation and holds KCNA: Kubernetes and Cloud Native Associate Certification. He believes that consistency is key that’s why he tries to be consistent in his doings.

blog