While different deployment models are often talked about from a time to market, development or general IT perspective, deployment models can also have a profound impact on security. At ZippyDB we use canary deployments to keep our infrastructure up-to-date, secure, meet our high SLAs, get the best ROI from our IT investment, and to ease our future SOC 2 audit. While Canary or Rolling deployments are often talked about in conjunction with containerization, docker, or kubernetes, both deployment models also work well for traditional software stacks.
How Canary/Rolling Deployments Impact Security
Modern security requires frequent updates, but updates can break applications, cause configuration drift and constant system interaction can provide opportunities for breaches. The more secure option is to use Canary or rolling deployments, for a detailed description comparing them see https://opensource.com/article/17/5/colorful-deployments. Based on years of running enterprise systems and going through countless SOX, SOC 2, and PCI audits, it has become our GOTO mode of operation these days. It isn’t easy to go from deploy -> (patch+upgrade ∞) and long-lived systems to (deploy ∞) and short-lived systems. The benefits are tremendous though, with fewer events to audit, a known current state and a shorter window for hackers to attach. The good news is you don’t have to switch to full containerization to get the advantages, such as;
- Consistently up-to-date systems, services and software
- Significantly reduced attack points
- Very limited capability to perform interactive logins
- Easier rollback
- Much less audit activity to identify threats
- Less securables
However, you do have to rethink how you package and deploy your systems and software, which can be expensive and difficult. You also have to secure how you deploy, because now your main vulnerability is your deployment tools and the source of your deployment packaging.
Where to start
Some systems are a little easier to patch, for example the vast majority of web servers are very simple to move to a canary deployment, while databases, especially legacy ones, can be a challenge to do with little or no downtime. The good news is that usually the most vulnerable servers are the easiest. Note that many even start with manual deploys as automating can take time. It won’t get you 100% of the benefit, but it will be a start. Learning Chef, Puppet, Ansible, SaltStack, Terrform, Vagrant, Packer, Kubernetes, Nomad, or the 100s of other infrastructure tools will take time.
- Where possible, use a managed load balancer, otherwise invest upfront on no downtime canary deployments of your load balancers/reverse proxies
- Have a clear way to know if the roll-out is successful
- Limit who can deploy and where deployments can come from
- Implement robust monitoring and logging
- Where possible use your own internally approved package manager (npm, docker, apt, yum…) with an approval/review process
How we do it
At ZippyDB we have 3 classes of system, each with their own set of components. The primary systems are;
Redundant systems to provide our administration UI, API. These are public servers and so we update these most frequently
Internal APIs to manage our infrastructure. Similar in concept to Kube, Nomad and other systems, but much simpler for our purposes. Also includes our monitoring, alerting and logging infrastructure. Pretty frequent updates as we use a mix of .NET Core, Node.js, docker, Linux, logging stack, monitoring stack…
Each of our database servers runs our database instances. These get updates less frequently as they have minimal services, run everything isolated and non-root, and generally the database software and Linux OS requires less patching than our other 2 tiers.
We use Canary deployments to ensure that we can minimize the impact of any unforeseen errors. When we have a new version of a component, or an update, we roll out a new server with the patches and depending on the system we migrate a small % of the workload or mirror the workload to the new server.
For Redis we roll out the new servers and mirror a % of Redis instances for a couple of days, our load balancer allows us to mirror traffic. Once we have enough instances without any increase in error logs or unusual metrics, then we open up the server for new deployments. After 3 days, we start migrating existing instances to the new server. If everything goes well, after 7 days all instances have been switched over, and after 10 days we can shutdown the old unused servers
Here is a subset of things we monitor on Redis specifically;
- Key metrics (CPU, Disk, Network, Latency…)
- Logs for Errors
- RDB and AOF creation/updating
- Open ports
Over 98% of our server interactions are actually interactions with our logging, monitoring and deployment systems. In fact, after our deployment is complete and verified using our zero trust infrastructure, any remote access other than through our internal published API is turned off. In the rare case where somebody needs to access a server, a request must be approved to enable our zero trust system to provide access pinpointed to that user. Even if a server had a public endpoint (none do directly), the only endpoints visibly open are our HTTPS API endpoints and Redis ports.
See our Securing Redis article for more ways we secure Redis and recommendations to secure your own Redis or a ZippyDB serverless Redis instance.