Tag Archives: devops

Trapped by Technology Fallacies

After a working in tech at several large companies over a couple of decades, I’ve observed some of the worst fallacies that cause damage to organizations. They don’t arise from malice, but from a scarcity of professional reflection in our field. Technologists often jump to problem solving before spending sufficient time on problem setting, which leads to the creation of inappropriate and brittle solutions. Donald A. Schön discusses this disconnect in his seminal work, The Reflective Practitioner,

…with this emphasis on problem solving, we ignore problem setting, the process by which we define the decision to be made, the ends to be achieved, the means which may be chosen.

Problem solving relies on selecting from a menu of previously established formulas. While many of these tactics can be effective, let’s examine some of the dysfunctional approaches used by technologists that lead to pain for their organizations.
  • Fallacy #1 – Hammer-Nail: Technologists often assume that all problems can be beaten into submission with a technology hammer.  It’s like the bride’s father in My Big Fat Greek Wedding, who believes that Windex can be used to cure any ill. Similarly, technologists think that every challenge is just missing a certain type of technology to resolve it.  This, even though we generally speak about maturity models in terms of people, process, technology, and culture. I can’t tell you how often I’ve seen someone design and implement a seemingly elegant solution only to have it rejected because it was developed without understanding the context of the problem.
  • Fallacy #2 – Best in Class. I’ve heard this so many times in my career that I just want to stand on a chair and shake my fist in the middle of a Gartner conference. Most organizations don’t need “best in class,” they need “good enough.” The business needs fast and frugal solutions to keep them productive and efficient, but technologists are often too busy navel gazing to listen.
  • Fallacy #3 – Information Technology is the center of the business universe. I once worked for a well-known bank that had an informal motto, “We’re a technology company that happens to be a bank.” The idea was that because they were so reliant on technology, it transformed them into a cool tech company. I used to respond with, “We also use a lot of electricity, does that make us a utility company?” Maybe a little hyperbolic, but I was trying to make the point that IT Doesn’t Matter. When Nicholas Carr used that phrase as the title of his Harvard Business Review article in 2003, he was considering technology in the historical context of other advances such as electricity and telephones, “When a resource becomes essential to competition but inconsequential to strategy, the risks it creates become more important than the advantages it provides.” In the early days of tech, it gave you an edge. Today, when a core system fails, it could sink your business. The best solutions are often invisible to the organization so it can focus on its core competencies.
While technology can be very effective at solving technical problems, most organizational issues are adaptive challenges. In The Practice of Adaptive Leadership, the authors identify this failure to differentiate between the two as the root cause of business difficulties,

The most common cause of failure in leadership is produced by treating adaptive challenges as if they were technical problems. What’s the difference? While technical problems may be very complex and critically important (like replacing a faulty heart valve during cardiac surgery), they have known solutions that can be implemented by current know-how. They can be resolved through the application of authoritative expertise and through the organization’s current structures, procedures, and ways of doing things. Adaptive challenges can only be addressed through changes in people’s priorities, beliefs, habits, and loyalties. Making progress requires going beyond any authoritative expertise to mobilize discovery, shedding certain entrenched ways, tolerating losses, and generating the new capacity to thrive anew.

The end goals that we’re trying to reach can’t be clearly established if we don’t sufficiently reflect on the problem. When we jump to problem solving over problem setting, we’re assuming a level of confidence that hasn’t been earned. We’ve made assumptions in the way systems should work, without thoroughly investigating how they are actually functioning. When Postmodern critic Michel Foucault speaks of “an insurrection of subjugated knowledges,” he’s questioning the certainty of our perceptions when we’ve disqualified information that might be important in gaining a broader perspective. Technologists are more effective when they recognize the inherent expertise of the non-technologists in the businesses they serve and operate as trusted partners who understand change leadership. Instead of serving the “religion of tech,” we should focus on delivering what organizations really need.
Tagged , , , ,

Cloud-Native Consumption Principles

The promises of Cloud are alluring. Organizations are told they can reduce costs through a flexible consumption-based model, which minimizes waste in over-provisioning, while also achieving velocity in the development of new digital products without the dependence on heavy, centralized IT processes. This aligns closely with the goals of a DevOps transformation, which seeks to empower developers to build better software through a distributed operational model that delivers solutions more quickly with less overhead. However, most enterprise cloud journeys begin with a “lift and shift” from the on-premise data center to an IaaS provider. This seems like the easiest and fastest way to begin acclimating to the new environment by finding and leveraging similarities in deployment and consumption of digital assets. While this path may initially seem to expedite adoption, the migration is soon bogged down by the very issues that prompted the organization to adopt cloud: cumbersome, centralized processes that don’t support developers’ need for automation and speed.

With startups, which don’t have the existing processes and organizational hierarchy to be realigned to a new way of working, applications have no barriers to becoming Cloud-Native. They begin that way. Enterprises weren’t initially built around a Cloud model, so the implementation is often based Conway’s Law, the design and provisioning mirroring the existing organizational hierarchy. The only difference being that instead of a server team deploying bare-metal or on-premise virtual machines, they build an instance in the cloud. While there are some incremental gains, much of the latency from human middleware and legacy processes remain. After the short honeymoon based on a PoC or pilot projects, the realities of misaligned business processes grind progress to a halt. This also results in higher spend because cloud resources are not meant to be long-running snowflakes, but ephemeral and immutable. Cloud is made for cattle, not pets.

The source of this friction becomes clear. While cloud is referred to as “Infrastructure as a service,” many assume this is equivalent to data center hosting. However, Cloud is an evolution in the digital delivery model, where bare-metal is abstracted away from the customer, who now consumes resources through web interfaces and APIs. Cloud should be thought of and consumed as a software platform, i.e., Cloud-Native.  As defined by the Cloud Native Computing Foundation (CNCF):

Cloud-native technologies empower organizations to build and run scalable applications in modern, dynamic environments such as public, private, and hybrid clouds. Containers, service meshes, microservices, immutable infrastructure, and declarative APIs exemplify this approach.

These techniques enable loosely coupled systems that are resilient, manageable, and observable. Combined with robust automation, they allow engineers to make high-impact changes frequently and predictably with minimal toil.

Therefore, to maximize the value of cloud adoption at scale, it is necessary to become Cloud-Native, and the effort must be tightly coupled to DevOps automation efforts.

In 1967, Paul Baran discussed the creation of a “national computer public utility system,” a metered, time-sharing model. Cloud, and by extension Cloud-Native, is the manifestation of that prediction and, as with other utilities, the consumption of “compute as utility” must be distributed and self-service in order to achieve cost benefits. What about governance and security concerns? Cloud Service Providers (CSP) have built-in capabilities to establish policy restrictions at the organization, account, resource and/or identity level. Native security controls can be embedded to function seamlessly, providing the automated monitoring, alerting, and enforcement needed to minimize risk and meet audit requirements. By decoupling compliance from control, these capabilities are more efficiently consumed through the platform via policy-as-code integrated into declarative Infrastructure-as-code (IaC). Alternatively, organizational risk is increased when using manual provisioning, abstraction layers or traditional controls that are not cloud-ready or Cloud-Native with this environment.

In an attempt to ease organizations’ struggle with cloud adoption, Azure and AWS have developed Well-Architected Frameworks to promote better cloud consumption and design. Both consist of five pillars to evaluate the quality of a solution delivery: 

  • Operational excellence
  • Security
  • Reliability
  • Performance (Efficiency)
  • Cost optimization

While helpful, these frameworks fail to communicate the urgent need for automation and tight coupling to the application development lifecycle in order to achieve a successful cloud migration. For example, from the AWS Operational Excellence Pillar, “operations as code” is only listed as a design principle to “limit human error and enable consistent responses to events.”

Ultimately, Cloud at scale, is best consumed as a software platform though the automated development processes essential to DevOps, otherwise the costs of side-channel pipeline provisioning and long-running, inefficiently sized workloads soon outweigh the initial benefits.

To summarize, the principles of a Cloud-Native consumption model include:

  • Automated provisioning of all resources as code through pipelines owned by product teams
  • Distributed self-service to achieve velocity and empower business segments
  • “Shift Everywhere” security through Policy-as-code embedded into the Infrastructure-as-Code
  • Decoupling of compliance from operational control through the use of CSP native capabilities to automate governance, monitoring, alerting and enforcement

To be effective, these principles are best operationalized through the unification of any cloud initiative with a DevOps effort. Otherwise, the cloud effort will be crippled by the existing technology bureaucracy.

Tagged ,

DevSecOps Decisioning Principles

I know you’ve heard this before, but DevOps is not about tools. At its core, DevOps is really a supply chain for efficiently delivering software. At various stages of the process, you need testing and validation to ensure the delivery of a quality product. With that in mind, DevSecOps should adhere to certain principles to best support the automated SDLC process. To this end, I’ve developed a set of fundamental propositions for the practice of good DevSecOps.

  • Security tools should integrate as decision points in a DevOps pipeline aka DevSecOps.
  • DevSecOps tool(s) should have a policy engine that can respond with a pass/fail decision for the pipeline. 
    • This optimizes response time.
    • Supports separation of duties (SoD) by externalizing security decisions outside the pipeline.
    • “Fast and frugal” decisioning is preferred over customized scoring to better support velocity and consistency. 
    • Does not exclude the need for detailed information provided as pipeline output.
  • Full inspection of the supply chain element to be decisioned, aka “slow path,” should be used when an element is unknown to the pipeline decisioner. 
  • Minimal or incremental inspection of the supply chain element to be decisioned, aka “fast path,” should be used when an element is recognized (e.g. hash) by the pipeline decisioner.
  • Decision points should have a “fast path” available, where possible, to minimize any latency introduced from security decisioning.
  • There should be no attempt to use customized risk scores in the pipeline. While temporal and contextual elements are useful in reporting and judging how to mitigate operational risk, attempts to use custom scores in a pipeline could unnecessarily complicate the decisioning process, create inconsistency and decrease performance of the pipeline.  
  • Security policy engines should not be managed by the pipeline team, but externally by a security SME, to comply with SoD and reduce opportunities for subversion of security policy decisions during automation.

Using a master policy engine, such as the Open Policy Agent (OPA), is an ideal way to “shift left” by providing a validation capability-as-a-service that can be integrated at different phases into the development and deployment of applications. Ideally, this allows the decoupling of compliance from control, reducing bottlenecks and inconsistency in the process from faulty security criteria integrated into pipeline code. By using security policy-as-code that is created and managed by security teams, DevSecOps will align more closely with the rest of the SDLC. Because at the end of the day, the supply chain is only as good as the product it delivers.

Tagged , , , , , ,

Your Pets Don’t Belong in the Cloud

At too many organizations, I’ve seen a dangerous pattern when trying to migrate to public Infrastructure as a Service (IaaS) i.e. Cloud. It’s often approached like a colo or a data center hosting service and the result is eventual failure in the initiative due to massive cost overruns and terrible performance. Essentially, this can be attributed to inexperience on the side of the organization and a cloud provider business model based on consumption. The end result is usually layoffs and reorgs while senior leadership shakes its head, “But it worked for Netflix!”

Based on my experience with various public and hybrid cloud initiatives, I can offer the following advice.

  1. Treat public cloud like an application platform, not traditional infrastructure. That means you should have reference models and Infrastructure-as-Code (IaC) templates for the deployment of architecture and application components that have undergone security and peer reviews in advance. Practice “policy as code” by working with cloud engineers to build security requirements into IaC.
  2. Use public cloud like an ephemeral ecosystem with immutable components. Translation: your “pets” don’t belong there, only cattle. Deploy resources to meet demand and establish expiration dates. Don’t attempt to migrate your monolithic application without significant refactoring to make it cloud-friendly. If you need to change a configuration or resize, then redeploy. Identify validation points in your cloud supply chain, where you can catch vulnerable systems/components prior to deploy, because it reduces your attack surface AND it’s cheaper. You should also have monitoring in place (AWS Config or a 3rd party app) that catches any deviation and  automatically remediates. You want cloud infrastructure that is standardized, secure and repeatable.
  3. Become an expert in understanding the cost of services in public cloud. Remember, it’s a consumption model and the cloud provider isn’t going to lose any sleep over customers hemorrhaging money due to bad design.
  4. Hybrid cloud doesn’t mean creating inefficient design patterns based on dependencies between public cloud and on-premise infrastructure. You don’t do this with traditional data centers, why would you do it with hybrid could?
  5. Hire experienced automation engineers/developers to lead your cloud migration and train staff who believe in the initiative. Send the saboteurs home early on or you’ll have organizational chaos.

If software ate the world, it burped out the Cloud. If you don’t approach this initiative with the right architecture, processes and people, there aren’t enough fancy tools in the world to help you clean up the result: organizational indigestion.

burping_cloud

 

Tagged , , , , , , , ,

Moving Appsec To the Left

After spending the last year in a Product Security Architect role for a software company, I learned an important lesson:

Most application security efforts are misguided and ineffective.

While many security people have a good understanding of how to find application vulnerabilities and exploit them, they often don’t understand how software development teams work, especially in Agile/DevOps organizations. This leads to inefficiencies and a flawed program. If you really want to build secure applications, you have to meet developers where they are, by understanding how to embed security into their processes.

In some very mature Agile organizations, application security teams have started adding automated validation and testing points into their DevOps pipelines as DevSecOps (or SecDevOps, there seems to be a religious war over the proper terminology) to enforce the release of secure code. This is a huge improvement, because it ensures that you can eliminate the manual “gates” that block rapid deployment. My personal experience with this model is that it’s a work-in-progress, but a necessary aspirational goal for any application security program. Ultimately, if you can’t integrate your security testing into a CI/CD pipeline, the development process will circumvent security validation and introduce software vulnerabilities into your applications.

However, this is only one part of the effort. In Agile software development, there’s an expression, “shifting to the left,” which means moving validation to earlier parts of the development process.  While I could explain this in detail, DevSecOps.org already has an excellent post on the topic. In my role, I partnered with development teams by acting as a product manager and treating security as a customer feature, because this seemed more effective than the typical tactic of adding a bunch of non-functional requirements into a product backlog.

A common question I would receive from scrum teams is whether a security requirement should be written as a user story or simply as acceptance criteria. The short answer is, “it depends.” If the requirement translates into direct functional requirements for a user, i.e. for a public service, it is better suited as a user story with its own acceptance criteria. If the requirement is concerned with a back-end service or feature, this is better expressed as acceptance criteria in existing stories. One technique I found useful was to create a set of user security stories derived from the OWASP Application Security Verification Standard (ASVS) version 3.0.1 that could be used as a template to populate backlogs and referenced during sprint planning. I’m not talking about “evil user stories,” because I don’t find those particularly useful when working with a group of developers.

Another area where product teams struggle is whether a release should have a dedicated sprint for security or add the requirements as acceptance criteria to user stories throughout the release cycle. I recommend having a security sprint for all new or major releases due to the inclusion of time-intensive tasks such as manual penetration testing, architectural risk assessments and threat modeling. But this should be a collaborative process with a product team and  I met regularly with product owners to assist with sprint planning and backlog grooming. I also found it useful to add a security topic to the MVS (minimum viable solution) contract.

I don’t pretend to have all the answers when it comes to improving software security, but spending time in the trenches with product development teams was an enlightening experience. The biggest takeaway: security teams have to grok the DevOps principle of collaboration if we want more secure software. To further this aim, I’m posting the set of user security stories and acceptance criteria I created here. Hopefully, it will be the starting point for a useful dialogue with your own development teams.

Tagged , , , , ,

Fear and Loathing in Vulnerability Management

Vulnerability management – the security program that everyone loves to hate, especially those on the receiving end of a bunch of arcane reports. Increasingly, vulnerability management programs seem to be monopolizing more and more of a security team’s time, mostly because of the energy involved with scheduling, validating and interpreting scans. But does this effort actually lead anywhere besides a massive time suck and interdepartmental stalemates? I would argue that if the core of your program is built on scanning, then you’re doing it wrong.

The term vulnerability management is a misnomer, because “management” implies that you’re solving problems. Maybe in the beginning scanning and vulnerability tracking helped organizations, but now it’s just another method used by security leadership to justify their every increasing black-hole budgets. “See? I told you it was bad. Now I need more $$$$ for this super-terrific tool!”

Vulnerability management programs shouldn’t be based on scanning, they should be focused on the hard stuff: policies, standards and procedures with scans used for validation. If you’re stuck in an endless cycle of scanning, patching, more scanning and more patching; you’ve failed.

You should be focused on building processes that bake build standards and vulnerability remediation into a deployment procedure. Work with your Infrastructure team to support a DevOps model that eliminates those “pet” systems. Focus on “cattle” or immutable systems that can and should be continuously replaced when an application or operating system needs to be upgraded. Better yet, use containers. Have a glibc vulnerability? The infrastructure team should be creating new images that can be rolled out across the organization instead of trying to “patch and pray” after finally getting a maintenance window. You should have a resilient enough environment that can tolerate change; an infrastructure that’s self-healing because it’s in a state of continuous deployment.

I recognize that most organizations aren’t there, but THIS should be your goal, not scanning and patching. Because it’s a treadmill to nowhere. Or maybe you’re happy doing the same thing over and over again with no difference in the result. If so, let me find you a very large hamster wheel.

hamster_meme

 

Tagged , , , , , ,

The Question of Technical Debt

Not too long ago, I came across an interesting blog post by the former CTO of Etsy, Kellan Elliott-McCrea, which made me rethink my understanding and approach to the concept of technical debt. In it, he opined that technical debt doesn’t really exist and it’s an overused term. While specifically referencing code in his discussion, he makes some valid points that can be applied to information security and IT infrastructure.

In the post, he credits Peter Norvig with the quote, “All code is liability.” This echoes Nicholas Carr’s belief in the increased risk that arises from infrastructure technology due to the decreased advantage as it becomes more pervasive and non-proprietary.

When a resource becomes essential to competition but inconsequential to strategy, the risks it creates become more important than the advantages it provides. Think of electricity. Today, no company builds its business strategy around its electricity usage, but even a brief lapse in supply can be devastating…..

Over the years, I’ve collected a fair amount of “war stories” about less than optimal application deployments and infrastructure configurations. Too often, I’ve seen things that make me want to curl up in a fetal position beneath my desk. Web developers failing to close connections or set timeouts to back-end databases, causing horrible latency. STP misconfigurations resulting in network core meltdowns. Data centers built under bathrooms or network hub sites using window unit air conditioners. Critical production equipment that’s end-of-life or not even under support. But is this really technical debt or just the way of doing business in our modern world?

Life is messy and always a “development” project. Maybe the main reason DevOps has gathered such momentum in the IT world is because it reflects the constantly evolving, always shifting, nature of existence. In the real world, there is no greenfield. Every enterprise struggles to find the time and resources for ongoing maintenance, upgrades and improvements. As Elliott-McCrea so beautifully expresses, maybe our need to label this state of affairs as atypical is a cop-out. By turning this daily challenge into something momentous, we make it worse. We accuse the previous leadership and engineering staff  of incompetence. We come to believe that the problem will be fully eradicated through the addition of the latest miracle product. Or we invite some high-priced process junkies in to provide recommendations which often result in inertia.

We end up pathologizing something which is normal, often casting an earlier team as bumbling. A characterization that easily returns to haunt us.

When we take it a step further and turn these conflations into a judgement on the intellect, professionalism, and hygiene of whomever came before us we inure ourselves to the lessons those people learned. Quickly we find ourselves in a situation where we’re undertaking major engineering projects without having correctly diagnosed what caused the issues we’re trying to solve (making recapitulating those issues likely) and having discarded the iteratively won knowledge that had allowed our organization to survive to date.

Maybe it’s time to drop the “blame game” by information security teams when evaluating our infrastructures and applications. Stop crying about technical debt, because there are legitimate explanations for the technology decisions made in our organizations and it’s generally not because someone was inept. We need to realize that IT environments aren’t static and will always be changing and growing. We must transform with them.

Tagged , , , , , ,

Security and Ugly Babies

Recently a colleague confessed his frustration to me over the resistance he’s been encountering in a new job as a security architect. He’s been attempting to address security gaps with the operations team, but he’s being treated like a Bond villain and the paranoia and opposition are wearing him down.

It’s a familiar story for those of us in this field. We’re brought into an organization to defend against breaches and engineer solutions to reduce risk, but along the way often discover an architecture held together by bubble gum and shoestring. We point it out, because it’s part of our role, our vocation to protect and serve. Our “reward” is that we usually end up an object of disdain and fear. We become an outcast in the playground, dirt kicked in the face by the rest of IT, as we wonder what went wrong.

We forget that in most cases the infrastructure we criticize isn’t just cabling, silicon and metal. It represents the output of hundreds, sometimes thousands of hours from a team of people. Most of whom want to do good work, but are hampered by tight budgets and limited resources. Maybe they aren’t the best and brightest in their field, but that doesn’t necessarily mean that they don’t care.  I don’t think anyone starts their day by saying, “I’m going to do the worst job possible today.”

Then the security team arrives on the scene, the perpetual critic, we don’t actually build anything. All we do is tell the rest of IT that their baby is ugly. That they should get a new one. Why are we surprised that they’re defensive and hostile? No one wants to hear that their hard work and long hours have resulted in shit.

What we fail to realize is this is our baby too and our feedback would be better received if we were less of a critic and more of an ally.

Tagged , , , ,