Tag Archives: infrastructure

Infrastructure-as-Code Is Still *CODE*

After working in a DevOps environment for over a year, I’ve become an automation acolyte. The future is here and I’ve seen the benefits when you get it right: improved efficiency, better control and fewer errors. However, I’ve also seen the dark side with Infrastructure-as-Code (IaC). Bad things happen because people forget that it’s still code and it should be subject to the same types of security controls you use in the rest of your SDLC.

That means including automated or manual reviews, threat modeling and architectural risk assessments. Remember, you’re not only looking for mistakes in provisioning your infrastructure or opportunities for cost control. Some of this code might introduce vulnerabilities that could be exploited by attackers. Are you storing credentials in the code? Are you calling scripts or homegrown libraries and has that code been reviewed? Do you have version control in place? Are you using open source tools that haven’t been updated recently? Are your security groups overly permissive?

IaC is CODE. Why aren’t you treating it that way?

devops_borat

Tagged , , , , , ,

NTP Rules of the Road

There’s nothing more infuriating than watching organizations screw up foundational protocols and NTP seems to be one of the most commonly misconfigured. For some reason, people seem to think the goal is to have “perfect” time, when what is really needed is consistent organizational time. You need everything within a network to be synchronized for troubleshooting and incident management purposes. Otherwise, you’re going to waste a lot of energy identifying root causes and attacks.

It’s recommended to use a public stratum one server to synchronize with a few external systems or devices at your network perimeter, but this should only be configured if you don’t have your own stratum zero GPS with a stratum one server attached. I can’t tell you how many times I’ve seen a network team go to the trouble to set this up and the systems people still point everything to ntp.org.

Everything inside a network should cascade from those perimeter devices, which is usually a router, Active Directory system or stratum one server.  This design reduces the possibility of internal time drift, the load on public NTP servers and your firewalls, and the organizational risk of opening up unnecessary ports to allow outgoing traffic to the Internet. Over the last few years, some serious vulnerabilities have been identified in the protocol and it can also be used as a data exfiltration port by attackers.

In addition to the IETF’s draft on NTP “best practices,” the SEI also has an excellent guidance document.

While it’s not realistic to have your own stratum zero device in the cloud, within AWS, it is recommended to use the designated NTP pool specified in their documentation.

Oh, and for the love of all that is holy, please use UTC. I cannot understand why I’m still having this argument with people.

Tagged , , , , ,

Security Group Poop

One of the most critical elements of an organization’s security posture in AWS, is the configuration of security groups. In some of my architectural reviews, I often see rules that are confusing, overly-permissive and without any clear business justification for the access allowed. Basically, the result is a big, steaming pile of security turds.
While I understand many shops don’t have dedicated network or infrastructure engineers to help configure their VPCs, AWS has created some excellent documentation to make it a bit easier to deploy services there. You can and should plow through the entirety of this information. But for those with short attention spans or very little time, I’ll point out some key principles and “best practices” that you must grasp when configuring security groups.
  • A VPC automatically comes with a default security group and each instance created in that VPC will be associated with it, unless you create a new security group.
  • “Allow” rules are explicit, “deny” rules are implicit. With no rules, the default behavior is “deny.” If you want to authorize ingress or egress access you add a rule, if you remove a rule, you’re revoking access.
  • The default rule for a security group denies all inbound traffic and permits all outbound traffic. It is a “best practice” to remove this default rule, replacing it with more granular rules that allow outbound traffic specifically needed for the functionality of the systems and services in the VPC.
  • Security groups are stateful. This means that if you allow inbound traffic to an instance on a specific port, the return traffic is automatically allowed, regardless of outbound rules.
  • The use-cases requiring inbound and outbound rules for application functionality would be:
    • ELB/ALBs – If the default outbound rule has been removed from the security group containing an ELB/ALB, an outbound rule must be configured to forward traffic to the instances hosting the service(s) being load balanced.
    • If the instance must forward traffic to a system/service outside the configured security group.
AWS documentation, including security group templates, covering multiple use-cases:
Security groups are more effective when layered with Network ACLs, providing an additional control to help protect your resources in the event of a misconfiguration. But there are some important differences to keep in mind according to AWS:
Security Group
Network ACL
Operates at the instance level (first layer of defense)
Operates at the subnet level (second layer of defense)
Supports allow rules only
Supports allow rules and deny rules
Is stateful: Return traffic is automatically allowed, regardless of any rules
Is stateless: Return traffic must be explicitly allowed by rules
We evaluate all rules before deciding whether to allow traffic
We process rules in number order when deciding whether to allow traffic
Applies to an instance only if someone specifies the security group when launching the instance, or associates the security group with the instance later on
Automatically applies to all instances in the subnets it’s associated with (backup layer of defense, so you don’t have to rely on someone specifying the security group)
Additionally, the AWS Security Best Practices document, makes the following recommendations:
  • Always use security groups: They provide stateful firewalls for Amazon EC2 instances at the hypervisor level. You can apply multiple security groups to a single instance, and to a single ENI.
  • Augment security groups with Network ACLs: They are stateless but they provide fast and efficient controls. Network ACLs are not instance-specific so they can provide another layer of control in addition to security groups. You can apply separation of duties to ACLs management and security group management.
  • For large-scale deployments, design network security in layers. Instead of creating a single layer of network security protection, apply network security at external, DMZ, and internal layers. 

For those who believe the purchase of some vendor magic beans (i.e. a product) will instantly fix the problem, get ready for disappointment. You’re not going to be able to configure that tool properly for enforcement until you comprehend how security groups work and what the rules should be for your environment.

aws_poop

Tagged , , , , ,

The Question of Technical Debt

Not too long ago, I came across an interesting blog post by the former CTO of Etsy, Kellan Elliott-McCrea, which made me rethink my understanding and approach to the concept of technical debt. In it, he opined that technical debt doesn’t really exist and it’s an overused term. While specifically referencing code in his discussion, he makes some valid points that can be applied to information security and IT infrastructure.

In the post, he credits Peter Norvig with the quote, “All code is liability.” This echoes Nicholas Carr’s belief in the increased risk that arises from infrastructure technology due to the decreased advantage as it becomes more pervasive and non-proprietary.

When a resource becomes essential to competition but inconsequential to strategy, the risks it creates become more important than the advantages it provides. Think of electricity. Today, no company builds its business strategy around its electricity usage, but even a brief lapse in supply can be devastating…..

Over the years, I’ve collected a fair amount of “war stories” about less than optimal application deployments and infrastructure configurations. Too often, I’ve seen things that make me want to curl up in a fetal position beneath my desk. Web developers failing to close connections or set timeouts to back-end databases, causing horrible latency. STP misconfigurations resulting in network core meltdowns. Data centers built under bathrooms or network hub sites using window unit air conditioners. Critical production equipment that’s end-of-life or not even under support. But is this really technical debt or just the way of doing business in our modern world?

Life is messy and always a “development” project. Maybe the main reason DevOps has gathered such momentum in the IT world is because it reflects the constantly evolving, always shifting, nature of existence. In the real world, there is no greenfield. Every enterprise struggles to find the time and resources for ongoing maintenance, upgrades and improvements. As Elliott-McCrea so beautifully expresses, maybe our need to label this state of affairs as atypical is a cop-out. By turning this daily challenge into something momentous, we make it worse. We accuse the previous leadership and engineering staff  of incompetence. We come to believe that the problem will be fully eradicated through the addition of the latest miracle product. Or we invite some high-priced process junkies in to provide recommendations which often result in inertia.

We end up pathologizing something which is normal, often casting an earlier team as bumbling. A characterization that easily returns to haunt us.

When we take it a step further and turn these conflations into a judgement on the intellect, professionalism, and hygiene of whomever came before us we inure ourselves to the lessons those people learned. Quickly we find ourselves in a situation where we’re undertaking major engineering projects without having correctly diagnosed what caused the issues we’re trying to solve (making recapitulating those issues likely) and having discarded the iteratively won knowledge that had allowed our organization to survive to date.

Maybe it’s time to drop the “blame game” by information security teams when evaluating our infrastructures and applications. Stop crying about technical debt, because there are legitimate explanations for the technology decisions made in our organizations and it’s generally not because someone was inept. We need to realize that IT environments aren’t static and will always be changing and growing. We must transform with them.

Tagged , , , , , ,
%d bloggers like this: