Is drift detection the same as drift remediation?

No, drift detection and drift remediation are not the same, though they are closely related. Drift detection identifies differences between the current state of infrastructure and its desired or declared state, while drift remediation takes corrective action to resolve those differences.

Can I completely prevent drift?

No, you cannot completely prevent drift in infrastructure or configuration management, but you can significantly reduce and control it. The most effective approach is drift detection combined with automated remediation.

How often should I run drift detection?

Drift detection should be run at least daily in most production environments, but the ideal frequency depends on your infrastructure’s volatility and compliance requirements. Frequent detection supports early remediation, reduces risk, and aligns environments with declared configurations.

Infrastructure Drift: How to Detect & Fix It with IaC Tools

So your team learned about infrastructure as code (IaC)? They got all excited about this way of managing infrastructure. Soon enough, you had multiple stacks described in definition files. This source code was versioned in your favorite version control system such as GitHub, and integrated with your favorite CI/CD platform.

Finally, the infrastructure was built from those definition files, and it was time to celebrate the end of this journey, right? Or was it?

The definition files describe the desired state of the infrastructure — the way you want it to be set up. Your IaC tool’s responsibility is to turn that desired state into reality. We call the current reality of your infrastructure the actual state. Right after you run your IaC tool, both are identical, but unfortunately, they might not stay in sync for long.

Any difference between the desired state and the actual state is called drift.

In this article, we will cover:

What is infrastructure drift?

Infrastructure drift refers to the situation where the actual state of cloud infrastructure resources deviates from the desired state defined in the IaC configuration files. It occurs when changes are made to the cloud resources manually or through other means outside of the IaC management process, resulting in a difference between the codified definition and the real-world deployment.

Why does infrastructure drift happen?

Infrastructure drift can happen for mainly two reasons:

Manual changes
Overlapping IaC configurations

Manual changes

Most of the drift is usually caused by manual changes performed by individuals.

Some of those reasons are understandable. For example, during an incident, an engineer might need to increase the number of resources to handle an elevated load or make up for resources being down. During times of high stress, when a lot is at stake, manual mitigation changes are perfectly acceptable. The goal is to get to a better place as soon as possible, and the regular IaC process might take too long especially when you need to respond to fast-changing conditions.

This becomes a problem if the changes are not reverted or backported to the IaC definition files after the fire has been put out.

There are also bad reasons for manual changes. Those cannot be justified, even temporarily, and often stem from poor education on best IaC practices, loose access permissions, and a lack of proper communication regarding the infrastructure management process.

Overlapping/conflicting IaC code

In some cases, humans are not directly at fault. Resources may end up being managed by multiple sets of IaC definition files. Applying some definition files might revert changes made by other definition files.

This can happen when the IaC practices evolve over time. For example, it is not uncommon to switch to a different IaC tool after a few years because the team realized that, in hindsight, they had not picked the best tool for their use cases.

Another reason for this is overlapping boundaries between stack definitions, which can happen easily when the infrastructure is extensive and managed by many different teams over a long period of time.

Why do we want to avoid drift?

IaC is all about improving the governance of your infrastructure by defining it as code, which allows you to leverage a wealth of practices and tools that have been available to developers for a long time, such as code versioning, code reviews, static analysis, automated tests, etc.

Letting drift occur undermines that effort and provides a false sense of governance.

Learn how drift can affect your organization.

How to avoid drift in the first place?

As we have seen earlier, the primary source of drift is manual changes. Those are often linked to loose access control practices. The Principle of least privilege recommends granting only the necessary permissions. The fewer people who can manually modify the infrastructure, the better. Admin-level access is typically limited to senior infrastructure engineers and SREs.

Because you will always need to have some people with permission to perform manual changes to your infrastructure, you need to make sure that they are aware of the process to either revert or backport those changes in due time.

That being said, even with the most trained engineers and the best intentions, drift will happen. It is inevitable, so you need to make sure that you can easily and quickly detect it and possibly revert it.

To avoid drift, you need first to have a couple of things in place:

Use a VCS system for your code – Your infrastructure code should always be kept in a VCS system, which should be the only source of truth for your infrastructure. To be effective, you need a branching strategy and only merge changes to the main branch (your truth source) when all your checks pass.
Implement RBAC – Drift usually occurs when there are manual changes to your infrastructure. By implementing RBAC, you can even ensure that engineers won’t be able to do updates to your resources from the console. Even though this seems kind of harsh, by implementing least privilege access and having only a couple engineers be able to do changes to your infrastructure manually, drift chances are reduced.
Take advantage of change management – Change management processes are key for reducing infrastructure drift. By having a process into play that takes care of your deployment to higher environments, and is followed by all of the engineers, infrastructure drift simply won’t happen.
Have a process for critical issues – Sometimes, there are critical issues that result in downtime, and solving them quickly is mandatory for the business. In these cases, changes to your infrastructure may be done manually, to solve the issues as fast as possible. It is key to have a process into play that takes engineers back to the issue at hand and makes them solve the problem in the infrastructure as code, configuration management, CI/CD or container orchestration tools as well.

Even by taking all of these measures into account, drift will still happen. It is inevitable, so you need to make sure that you can easily and quickly detect it and possibly revert it.

What is drift detection?

Drift detection is a mechanism that helps you identify and manage discrepancies between the expected state of your infrastructure and the actual one. To achieve drift detection, you need a tool that constantly monitors, detects, and alerts you about these discrepancies. By detecting drift, you ensure your infrastructure is consistent and compliant.

Drift detection vs. drift management

Drift detection is responsible for identifying discrepancies between your infrastructure and your IaC, while drift management refers to the overall process of detecting drift and what actions to take when drift is detected.

How to detect drift?

As we said earlier, drift is the difference between the desired state of the infrastructure as defined in the IaC source code and the actual state of the infrastructure.

In other words, drift is getting a non-empty list of proposed changes when running the plan command for your IaC tool.

Here is how to display the proposed changes and detect drift with different IaC tools.

Terraform & OpenTofu drift detection

For Terraform and OpenTofu, running terraform plan against your current state will show you any proposed changes — which effectively is your drift.

In the screenshot below, we can see that the maximum number of servers in the Auto Scaling Group was set to 5 outside of Terraform which is drift.

CloudFormation drift detection

CloudFormation has a built-in drift detection feature that can be used either via the AWS Console or via the AWS CLI command.

CloudFormation’s native drift detection is still an on-demand operation — you must trigger it explicitly for a stack or resource. There is no first-class “drift scheduler” in the service itself, but you can automate checks using EventBridge, Lambda, or AWS Config rules to run drift detection on a cadence and alert you when drift is found.

Drift detection checks can be run via the AWS Console:

Drift detection checks via the AWS Console

Or with on the command line with AWS CLI:

Drift detection on the command line with AWS CLI

Pulumi drift detection

Run the pulumi preview --refresh --stack <STACK NAME> command to get the list of proposed changes.

The screenshot below shows that the tags and user data of the AWS EC2 instance have been modified manually.

Spacelift drift detection

Drift can occur at any time. As a result, drift detection must be run on a regular schedule to catch it as quickly as possible which is not practical when running those commands on one’s laptop.

A better approach would be to use a tool such as Spacelift that can check for drift automatically on a schedule that you set.

The view that shows all the resources for a stack uses an eye-catching icon for resources that have drifted so that they can be easily spotted.

Another benefit of using Spacelift is that the drift detection management experience is consistent across the supported IaC tools. Under the hood, different commands will be run but for the most part, the workflow and the screens will be identical.

“With Spacelift, one of the first things we did was a big drift detection. We overhauled our drift detection, drift remediation, how to handle and solve it, and how to prevent it from happening. Spacelift handles all of that for us automatically now.” - Trevor Rae, Cloud Platform Engineer, 1Password

Spacelift customer case study

Read the full story

What to do when drift is detected?

You probably want to get rid of most drift which we will explain in a bit, but there might be manual changes that should make their way to the definition files.

For example, you had to increase the number of resources during an elevated load episode, but realistically, this is not a one-off but the new normal. Then, you should not eliminate the drift but update the IaC definition files to reflect your new expectations.

How to fix configuration drift?

Since drift is having a non-empty list of proposed changes when the definition files have not changed, fixing that drift is applying the proposed changes. That will restore the infrastructure to its desired state.

Here is how to remove the drift and get back to the desired infrastructure state with the main IaC tools.

Terraform drift remediation

Run the terraform apply command to revert the external changes and remove the drift.

AWS CloudFormation drift remediation

CloudFormation can revert drift in some cases only.

For example, if a resource is missing it will be recreated but if a property of a resource was modified it might not be detected by CloudFormation and as a result, it won’t be fixed automatically.

If CloudFormation cannot automatically fix the detected drift, you can use the information provided to manually revert the unexpected changes.

AWS also offers drift-aware change sets, which compare your template to the actual state before you deploy. They highlight where drift exists and let you decide whether to overwrite drifted properties with the template, or update the template to match the current live configuration.

This gives you a safer, more predictable way to reconcile drifted stacks than blindly applying an update.

Pulumi drift remediation

Run the pulumi up --stack <STACK NAME> command to revert the external changes and remove the drift.

Spacelift drift remediation

Under the hood, Spacelift periodically executes proposed runs against your Finished stacks to look for drift. If a run finds changes, Spacelift flags drifted resources in the stack’s Resources view and, if you’ve enabled reconciliation, automatically kicks off a tracked reconciliation run that respects your existing policies, checks, and approval workflows.

When drift is detected, Spacelift can optionally revert the changes found by following the same workflow that is used for regular IaC code changes, enforcing all the configured guardrails such as automated validation of the plan and approval workflow.

You can also manage drift detection as code using the Spacelift Terraform provider. The spacelift_drift_detection resource lets you define the schedule and whether reconciliation should run automatically, so you can keep drift policies versioned alongside your infrastructure code.

Read more about drift detection with Spacelift.

Would you like to see it in action, or just want a tl;dr? Check out this video, where we demonstrate how drift can be automatically detected and remediated with Spacelift:

Spacelift Drift Detection Overview video

If you want to take your infrastructure automation to the next level, create a Spacelift account today or book a demo with one of our engineers.

Key points

Like incidents, drift is inevitable and part of the life of any infrastructure. It must be taken into account when defining the processes and selecting your tools so that you do not get caught off guard and stay on top of things regarding your infrastructure governance.

Detect and remediate drift with Spacelift

Drift happens, so let Spacelift deal with it. Spacelift provides drift detection capabilities to any IaC provider to enable the desired state for application infrastructure across teams, applications, and clouds.

Learn More

Frequently asked questions

Is drift detection the same as drift remediation?
No, drift detection and drift remediation are not the same, though they are closely related. Drift detection identifies differences between the current state of infrastructure and its desired or declared state, while drift remediation takes corrective action to resolve those differences.
Can I completely prevent drift?
No, you cannot completely prevent drift in infrastructure or configuration management, but you can significantly reduce and control it. The most effective approach is drift detection combined with automated remediation.
How often should I run drift detection?
Drift detection should be run at least daily in most production environments, but the ideal frequency depends on your infrastructure’s volatility and compliance requirements. Frequent detection supports early remediation, reduces risk, and aligns environments with declared configurations.