[November 20 Webinar] Detecting & Correcting Infrastructure Drift
Computer and Network Security
1Password’s reputation for exceptionally robust security used to mean that only its small cloud platform engineering team could manage the company’s IaC operations. Although this ensured optimal protection for the data entrusted to 1Password’s enterprise password management platform, it stymied collaboration and made IaC administration laborious. As staff cloud platform engineer Maxx Daymon explains, the biggest win associated with Spacelift’s IaC solution was the democratization of IaC to the broader engineering organization. “Confidently delegating much of IaC management to the individuals that own it allowed our teams to transform into service teams — not just remote hands gatekeeping AWS.”
Maxx and cloud platform engineer Trevor Rae spoke to us about how Spacelift has transformed IaC management at 1Password. These stats are just the start:
As an early adopter of Terraform, 1Password had accumulated repositories that spanned numerous versions and approaches over the years, adding dense layers of complexity and making it very stressful to perform Terraform operations.
Because of 1Password’s security posture, any changes to the infrastructure had to be channeled through the cloud platform engineering team, who had to sit and watch the changes take effect safely and securely before they could move on to anything else. Within weeks of joining 1Password, Maxx realized this process was untenable and requested a shift to automation and repeatable processes. They needed a way for teams to be able to safely manage their own deployments and repositories so that the cloud platform engineers would not have to intervene every time somebody needed to change something.
“I started exploring other options. The de facto assumption was that we would use the same solution I had used in a previous company. But increasing costs was a key consideration,” recalls Maxx. Depending on the features included, a package with Spacelift worked out to between one-tenth and one-fifth of the equivalent competitor product. Given that most users only needed to use the platform a few times per month, 1Password could not have democratized IaC at such a high per-user cost.
“Another consideration was the competitor’s focus on the Terraform workspaces concept as its means of managing different environments – the workspaces are very limited,” added Maxx. “So when I came across Spacelift, I started reading about it, looking into it, and seeing what people were saying about it.”
After presenting several demos to different decision-makers at the company, 1Password ultimately chose Spacelift for its IaC and began using it in November 2022. “A driving factor for me was that it’s very composable and flexible. When considering all of the different repositories and approaches we had accumulated, it became clear we actually didn’t need to change everything to accommodate Spacelift. Instead, we could bring everything into Spacelift and then migrate them over time to more uniform approaches,” says Maxx.
Spacelift’s use of Open Policy Agent was another deciding factor for Maxx and his team. “By using the policy engine with Rego (Open Policy Agent’s query language), we could ensure that the right teams could run their infrastructure jobs without affecting other teams. As a result, we were able to begin democratizing access to the infrastructure, which was really key in enabling everybody to move faster because it gave us a lot more time to execute on product project work.”
Productivity gains were virtually immediate. Before adopting Spacelift, teams might have to wait a week or two for the cloud platform engineering team to deal with their requests, which slowed things down significantly. Some projects were monolithic, so changes had to be run in a specific sequence and then merged. Having Spacelift in place democratized the infrastructure, enabling teams to run their own changes – without support from the cloud platform engineering team – on the same day. “We had work queued for weeks at a time that teams could now run themselves within the hour,” says Maxx. This equates to a whopping 10x reduction in the mean time to change/time to review and a doubling of deployment frequency: Previously, 1Password might manage between three and four deployments on a good day; now, they average over 60 deployments every weekday.
Adopting any new product is always a learning experience, particularly in a rapidly scaling organization, but Spacelift presented no major challenges for 1Password. Cloud platform engineer Trevor Rae recalls the onboarding experience: “We had far more stacks and resources than we thought, so we wrote a Terraform module for stacks. Once we had that in place, we were onboarding multiple stacks a day. It was smooth, and it was fast, and it maintained our high level of security standards.” With the module in place, the team accelerated from onboarding one or two stacks every day to up to three stacks per day. Adjusting the module benefited all the stacks: When a new feature was introduced, it was immediately available to all new and existing stacks. “Managing Spacelift via first-class IaC support was a major benefit,” adds Maxx.
We had work queued for weeks at a time that teams can now run themselves within the hour.
Another hurdle was related to the data warehouse repository. “Since it didn’t follow any of our traditional patterns, it was challenging to onboard,” explains Trevor. Through a dedicated Slack channel that Spacelift had set up for them, 1Password was able to figure out the process with minimal intervention. “We pinged the [Spacelift Slack] channel only once or twice with a clarifying question, which we probably could have found the answer to in Spacelift’s docs.” Maxx adds, “I was really impressed with the documentation. I read all of it before I started working on Spacelift, so I always had this memory of ‘Hey, I think I’ve read this before somewhere!’”
Unsurprisingly, 1Password had stringent requirements around security. Spacelift met those requirements in two key ways: the availability of private workers and the ability to limit actions. Teams can trigger runs and confirm runs, but they can’t perform ad hoc commands. In 1Password’s Spacelift setup, each team has their own space, and their services are in a subspace within that overarching space. “They can turn on some basic settings,” explains Trevor. “We have a global blackout policy. However, we do have a whitelist, so if someone needs to constantly run a random Terraform command and we approve it, we can add that to the whitelist.”
When asked what Spacelift features 1Password uses, Trevor laughs: “What don’t we use?” He lists GitLab issue creation, autodeploy, and extensive use of policies — particularly notification policies. Many of those notification policies are triggered by the company’s heavy use of drift detection. “With Spacelift, one of the first things we did was a big drift detection. We overhauled our drift detection, drift remediation, how to handle and solve it, and how to prevent it from happening. Spacelift handles all of that for us automatically now.”
They also combine 1Password with Spacelift Contexts by using their own product – their CLI (command line interface) – to populate the context.
And leveraging Spacelift is easy for everyone because of the way the 1Password team has set up onboarding. “We’ve just finished upgrading the module so that, instead of being a Terraform module, it just accepts the YAML files, and then we backend the whole thing in Terraform, so teams can onboard themselves.”
For anyone considering Spacelift for their IaC, Trevor points out that, although 1Password set up their IaC with Terraform, “there are many other options that Spacelift supports depending on which IaC language teams want to use, and those options should be explored thoroughly.”
For Maxx, the biggest win 1Password has seen with Spacelift has been the democratization of IaC to the broader engineering organization. “By using Spacelift’s guardrails and security, we were able to confidently delegate much of IaC management to the individuals that owned it. This also allowed our teams to transform into service teams, not just remote hands for other teams and move from gatekeeping AWS to providing expertise.”