Dancing with the Cloud

Recently, I’ve written about the dangers posed by technology fallacies and one of the most frustrating for me involves discussions of “best in class.” In my experience, this mindset causes technology teams to get themselves wrapped up in too many pointless discussions followed by never-ending proof-of-concept work all in search of that non-existent perfect tool. The truth is that most organizations don’t need the best, they need “good enough” so they can get on with business. But this delusion has more serious consequences within the cloud. When you choose tools solely based on the requirement of being “best in class,” you could compromise the integrity of your cloud architecture. Without considering the context, your selection could violate a sacred principle of data center design – minimizing the size of a failure domain.

For many, cloud’s abstraction of the underlying network reduced complexity in what often seemed like arcane and mystical knowledge. It also allowed development teams to work unencumbered by the fear of seemingly capricious network engineers. However, the downside of infrastructure-as-a-service (IaaS) is that this same obfuscation allows those without a background in traditional data center design to make critical errors in fault tolerance.  We’ve all heard the horror stories about cloud applications failing because they were only located in a single availability zone (AZ) or region. Even worse, partial outages that occur due to application dependencies that cross an AZ or region. While this could still happen with physical data centers, it was more obvious because you could see the location of a system in a rack and your network engineers would call you out when you didn’t build in fault-tolerance and thwarted their planned maintenance windows.   

Today, it’s also common for organizations to use a variety of software-as-a-service (SaaS) applications with no knowledge of which underlying cloud providers are being used or how those services are designed. While this simplicity is often beneficial for the business because it increases velocity in service delivery, it can also create blind spots that violate the same failure domain principles as with IaaS. Only the most persistent technologists can unmask the inner workings of these applications to determine how they’re deployed and whether they align with their organization’s infrastructure design. Unfortunately, the business doesn’t always understand this nuance and can fall prey to the “best in class” fallacy, leading to brittle environments due to large failure domains. By exchanging their on-premise, sluggish systems for SaaS, the organization often accepts a different set of problems associated with risk.

Ultimately, when choosing capabilities, it’s a better recommendation to “dance with the cloud that brought you.” Instead of worrying about “best in class,” you want to select the technologies that are closer to your infrastructure by leveraging the services of your cloud provider where feasible. This translates to a better technology architecture for your organization because the cloud provider is making sure that their managed services are low-latency, resilient and highly available. While it may not always be possible, by taking the failure domain principle into consideration during the selection and implementation of solutions, you’ll achieve better service delivery for your organization.

Tagged , , , , ,

Trapped by Technology Fallacies

After a working in tech at several large companies over a couple of decades, I’ve observed some of the worst fallacies that cause damage to organizations. They don’t arise from malice, but from a scarcity of professional reflection in our field. Technologists often jump to problem solving before spending sufficient time on problem setting, which leads to the creation of inappropriate and brittle solutions. Donald A. Schön discusses this disconnect in his seminal work, The Reflective Practitioner,

…with this emphasis on problem solving, we ignore problem setting, the process by which we define the decision to be made, the ends to be achieved, the means which may be chosen.

Problem solving relies on selecting from a menu of previously established formulas. While many of these tactics can be effective, let’s examine some of the dysfunctional approaches used by technologists that lead to pain for their organizations.
  • Fallacy #1 – Hammer-Nail: Technologists often assume that all problems can be beaten into submission with a technology hammer.  It’s like the bride’s father in My Big Fat Greek Wedding, who believes that Windex can be used to cure any ill. Similarly, technologists think that every challenge is just missing a certain type of technology to resolve it.  This, even though we generally speak about maturity models in terms of people, process, technology, and culture. I can’t tell you how often I’ve seen someone design and implement a seemingly elegant solution only to have it rejected because it was developed without understanding the context of the problem.
  • Fallacy #2 – Best in Class. I’ve heard this so many times in my career that I just want to stand on a chair and shake my fist in the middle of a Gartner conference. Most organizations don’t need “best in class,” they need “good enough.” The business needs fast and frugal solutions to keep them productive and efficient, but technologists are often too busy navel gazing to listen.
  • Fallacy #3 – Information Technology is the center of the business universe. I once worked for a well-known bank that had an informal motto, “We’re a technology company that happens to be a bank.” The idea was that because they were so reliant on technology, it transformed them into a cool tech company. I used to respond with, “We also use a lot of electricity, does that make us a utility company?” Maybe a little hyperbolic, but I was trying to make the point that IT Doesn’t Matter. When Nicholas Carr used that phrase as the title of his Harvard Business Review article in 2003, he was considering technology in the historical context of other advances such as electricity and telephones, “When a resource becomes essential to competition but inconsequential to strategy, the risks it creates become more important than the advantages it provides.” In the early days of tech, it gave you an edge. Today, when a core system fails, it could sink your business. The best solutions are often invisible to the organization so it can focus on its core competencies.
While technology can be very effective at solving technical problems, most organizational issues are adaptive challenges. In The Practice of Adaptive Leadership, the authors identify this failure to differentiate between the two as the root cause of business difficulties,

The most common cause of failure in leadership is produced by treating adaptive challenges as if they were technical problems. What’s the difference? While technical problems may be very complex and critically important (like replacing a faulty heart valve during cardiac surgery), they have known solutions that can be implemented by current know-how. They can be resolved through the application of authoritative expertise and through the organization’s current structures, procedures, and ways of doing things. Adaptive challenges can only be addressed through changes in people’s priorities, beliefs, habits, and loyalties. Making progress requires going beyond any authoritative expertise to mobilize discovery, shedding certain entrenched ways, tolerating losses, and generating the new capacity to thrive anew.

The end goals that we’re trying to reach can’t be clearly established if we don’t sufficiently reflect on the problem. When we jump to problem solving over problem setting, we’re assuming a level of confidence that hasn’t been earned. We’ve made assumptions in the way systems should work, without thoroughly investigating how they are actually functioning. When Postmodern critic Michel Foucault speaks of “an insurrection of subjugated knowledges,” he’s questioning the certainty of our perceptions when we’ve disqualified information that might be important in gaining a broader perspective. Technologists are more effective when they recognize the inherent expertise of the non-technologists in the businesses they serve and operate as trusted partners who understand change leadership. Instead of serving the “religion of tech,” we should focus on delivering what organizations really need.
Tagged , , , ,

Supply Chain Security Jumps the Shark

Can we collectively agree that the supply chain security discussion has grown tiresome? Ten years ago, I couldn’t get anyone to pay attention to the supply chain outside of the federal government crowd, but now it continues to be the security topic du jour. And while this might seem like a good thing, it’s increasingly becoming a distraction from other topics of product security, crowding out meaningful discussions about secure software development. So like a once-loved, long-running TV show that has worn out its welcome but looks for gimmicks to keep everyone’s attention, I’m officially declaring that Supply Chain Security has jumped the shark.

First, let’s clarify the meaning of the term Supply Chain Security. Contrary to what some believe, it’s not synonymous with the software development lifecycle (SDLC). That’s right, it’s time for a NIST definition! NIST, or the National Institute of Standards and Technology, defines supply chain security broadly because this term refers to anything acquired by an organization.

…the term supply chain refers to the linked set of resources and processes between and among multiple levels of an enterprise, each of which is an acquirer that begins with the sourcing of products and services and extends through the product and service life cycle.

Given the definition of supply chain, cybersecurity risks throughout the supply chain refers to the potential for harm or compromise that may arise from suppliers, their supply chains, their products, or their services. Cybersecurity risks throughout the supply chain are the results of threats that exploit vulnerabilities or exposures within products and services that traverse the supply chain or threats that exploit vulnerabilities or exposures within the supply chain itself.

(If you’re annoyed by the US-centric discussion, I encourage you to review ISO 28000 series, supply chain security management, which I haven’t included here because they charge you > $600 for downloading the standard.)

Typically, supply chain security refers to third parties, which is why the term is most often used in relation to open source software (OSS). You didn’t create the OSS you’re using, and it exists outside your own SDLC, so you need processes and capabilities in place to evaluate it for risk. But you also need to consider the commercial off-the-shelf software (COTS) you acquire as well. Consider SolarWinds. A series of attacks against the public and private sectors was caused by a breach against a commercial product. This compromise is what allowed malicious parties into SolarWinds customers’ internal networks. This isn’t a new concept, it just gained widespread attention due to the pervasive use of SolarWinds as an enterprise monitoring system. Most organizations that have procurement processes include robust third party security programs for this reason, but they aren’t perfect.

If supply chain security isn’t a novel topic and isn’t inclusive of the entire SDLC, then why does it continue to captivate the attention of security leaders? Maybe because it presents a measurable, systematic approach to addressing application security issues. Vulnerability management is attractive because it offers the comforting illusion that if you do the right things, like updating OSS, you’ll beat the security game. Unfortunately, the truth is far more complicated. Just take a look at the following diagram that illustrates the typical elements of a product security program:

Transforming_product_security_EXTERNAL

Executives wants uncomplicated answers when they ask, “Are we secure.” They often feel overwhelmed by security discussions because they want to focus on what they were hired for: to run a business. As security professionals, we need to remember this motivation as we build programs to comprehensively address security risk. We should be giving our organizations what they need, not more empty security promises based on the latest trends.

Tagged , , , , , , ,

Architecture Frameworks: Meaningful or Ridiculous?

Earlier this week someone reached out to me on LinkedIn after listening to a podcast episode I was on where I discussed security architecture and cloud migration. He had been thinking about moving into architecture from security engineering and wanted some suggestions about making that transition successful. Specifically, he wanted to know what I thought of architecture frameworks such as SABSA (Sherwood Applied Business Security Architecture). This discussion caused me to reconsider my thoughts on architecture and the lengthy arguments I’ve had over frameworks.

I should say that I have a love-hate relationship with architecture frameworks. I’m passionate about organized exercises in critical thinking, so the concept of a framework appeals to me. However, in practice, they can turn into pointless intellectual exercises equivalent to clerics arguing how many angels can fit on the end of a needle. In my experience, no one ever seems to be all that happy with architecture frameworks because they’re often esoteric and mired in complexity.

From what I’ve seen across the organizations where I’ve worked, if there is an architecture framework, it is usually some derivative of TOGAF (The Open Group Architecture Framework). This reality doesn’t mean someone within the organization intentionally chose it as the most appropriate for their environment. It’s just that TOGAF has been around long enough (1995) to have become pervasive to the practice of architecture and consequently embedded in organizations.

Regardless of what a technology organization is using as their framework, I’ve found that for a security architect to effectively collaborate, you need to align with whatever the other architects are using.  That might be based on TOGAF, but it might be something else entirely. You’ll have an easier time plugging security into the practice if you follow their lead. I’ve never actually seen an organization follow TOGAF or other frameworks very strictly though. It’s usually some slimmed down implementation and trying to lay SABSA on top of that is generally too heavy and convoluted. In my experience, I’ve never actually seen large organizations with mature architecture practices use anything as detailed as SABSA or TOGAF.

But I admit to not having had much formal architecture training. Frankly, I don’t know many professional architects that have. Maybe that’s why there are a lot of bad architects or possibly it says something about how architects are created and trained, which is informal. However, I have spent significant time studying frameworks such as these to become a thoughtful technologist. I personally find the TOGAF framework and docs helpful when trying to center an architectural conversation on a common taxonomy. Most importantly, I believe in pragmatism: meet the other architects where they are. Try to identify the common framework they’re using to work with your colleagues successfully. Because it’s not about using the best framework, it’s about finding the one that works within the given maturity of an organization.

Tagged ,

Why Your Security Program Is Failing

Why do I assert most programs are failing? Because it’s not getting any better. Just look at the 2021 holiday gift that was Log4J. Could the problem be with our approach? Some treat Information Security programs as a finite linear progression from an imperfect current state to a future improved state, or worse, a Sisyphean exercise in modern ennui. Both approaches are built on a foundation of coercive legislation that highlights failure, a corporate Crime and Punishment.

In truth, information security initiatives are exercises in change management. Security programs fail for the same reason many change initiatives fail: poor change management. The failure rate of change efforts commonly reported in books such as Paul Gibbons’ The Science of Organizational Change can range between 20% – 80%, depending on the type (2019). Even if the lower figure is more accurate, a failed change effort could still damage profitability and an organization’s reputation.  

A common theme emerges from the academic literature on successful change management approaches: the importance of collaborating with and respecting those individuals being asked to change. Most of the authors seem to agree on fundamentals such as recognizing the importance of change recipients’ emotions (Branson, 2008; Choi & Ruona, 2010; Dahl, 2011; Williams & Tobbell, 2017), engaging members of the organization to hear concerns and feedback (Choi & Ruona, 2010; de Waal & Heijtel, 2017) and fostering a continuous change environment by creating a learning culture, even when using coercive change methods (Canato et al., 2013;Choi & Ruona, 2010).

Building a continuous change culture is observed to be the greatest support to change success (Choi & Ruona, 2010; de Waal & Heijtel 2017; Hansen & Jervell, 2015). By establishing an anti-fragile organization that constantly adapts to meet new challenges, the need for large, heavy change efforts that fatigue employees is reduced.

Information Security programs could benefit from these approaches. The initiatives tend to be transformative for organizations, focusing on multiple domains of culture and technology. However, while there is information security academic literature that discusses the importance of change management planning (Ashenden, 2008) and attention to creating a security culture (AlHogail, 2015), the approaches often focus on empirical-rational strategies (Choi & Ruona, 2010) that are coercively implemented by security leadership.

Compliance As Property

In engineering, a common approach to security concerns is to address those requirements after delivery. This is inefficient for the following reasons:

  • Fails to consider how the requirement(s) can be integrated during development, thereby avoiding reengineering to accommodate the requirement 
  • Disempowers engineering teams by outsourcing compliance and the understanding of the requirements to another group.

To improve individual and team accountability, it is recommended to borrow a key concept from Restorative Justice, Conflict as Property. This concept asserts that the disempowerment of individuals in western criminal justice systems is the result of ceding ownership of conflict to a third-party. Similarly, enterprise security programs often operate as “policing” systems, with engineering teams considering security requirements as owned by a compliance group. While appearing to be efficient, this results in the siloing of compliance activities and infantilization of engineering teams. 

Does this mean that engineering teams must become deep experts in all aspects of information security? How can they own security requirements without a full grounding in these concepts? Ownership does not necessarily imply expertise. While one may own a house or vehicle and be responsible for maintenance, most owners will understand when outside expertise is required.

The ownership of all requirements by an engineering team is critical for accountability. To proactively address security concerns, a team must see these requirements as their “property” to address them efficiently during the design and development phases. It is neither effective nor scalable to hand off the management of security requirements to another group. While an information security office can and should validate that requirements have been met in support of Separation of Duties (SoD), ownership for implementation and understanding belongs to the engineering team. 

Tagged , , ,

Infosec Riot Grrrl Manifesto*

BECAUSE us girls crave respect and authority in our chosen field of Information Security.

BECAUSE we wanna make it easier for girls to see/hear each other’s work so that we can share strategies and criticize-applaud each other.

BECAUSE we must infiltrate the Infosec field in order to create our own destiny.

BECAUSE I am not your mother, your sister, your wife or your girlfriend. So when I speak with authority, keep your emotional baggage and neuroses to yourself.

BECAUSE we recognize fantasies of a macho security dictatorship as a set of impractical lies meant to keep us simply dreaming instead of creating the revolution in Information Security by envisioning and creating alternatives to the bullshit military-posturing way of doing things.

BECAUSE we want and need to encourage and be encouraged in the face of all our own insecurities, in the face of beergutboyinfosec that tells us we can’t play in their sandbox, in the face of “authorities” who say our skills are the worst.

BECAUSE we don’t wanna assimilate to someone else’s (boy) standards of what is or isn’t.

BECAUSE we are unwilling to falter under claims that we are reactionary “reverse sexists” AND NOT THE TRUEINFOSECCRUSADERS THAT WE KNOW we really are.

BECAUSE we know that information security is much more than just reactivity and are patently aware that the punk rock “you can do anything” idea is crucial to the coming angry infosec grrrl revolution which seeks to promote the psychic and cultural lives of girls and women in our profession everywhere, according to their own terms.

BECAUSE we are interested in creating non-heirarchical ways of being, collaborating and working, based on communication + understanding, instead of competition + good/bad categorizations.

BECAUSE doing/reading/seeing/hearing cool things that validate and challenge the status quo can help us gain the strength and sense of community that we need in order to figure out how bullshit like racism, able-bodieism, ageism, speciesism, classism, thinism, sexism, anti-semitism and heterosexism figures in our professional and personal lives

BECAUSE we see fostering and supporting girl infosec professionals of all kinds as integral to this process.

BECAUSE we see our main goal as sharing information and supporting allies over making profits according to traditional standards.

BECAUSE we are angry at a society that tells us Girl = Dumb, Girl = Bad, Girl = Weak, Girl = Not technical.

BECAUSE we are unwilling to let our real and valid anger be diffused and/or turned against us via the internalization of sexism as witnessed in girl/girl jealousism and self defeating girltype behaviors.

BECAUSE I have run out of time, patience and f*#&s in pandering to egos.

BECAUSE I believe with my wholeheartmindbody that girls constitute a revolutionary force in information security that can, and will revolutionize our profession and the world.

*Based on the original Riot Grrrl Manifesto by Kathleen Hanna and Bikini Kill.

When Compliance Goes Bad

You may laugh at the image above, but for many of us, similar absurdities can be found in our own policy frameworks. Governance matters, because badly written, confusing policies and standards will drain the productivity of your technical teams as they run around trying to figure out what’s actually required. Cormac Herley expressed this lunacy best in his paper, So Long, And No Thanks for the Externalities: The Rational Rejection of Security Advice by Users:

“Given a choice between dancing pigs and security, users will pick dancing pigs every time.” While amusing, this is unfair: users are never offered security, either on its own or as an alternative to anything else. They are offered long, complex and growing sets of advice, mandates, policy updates and tips. These sometimes carry vague and tentative suggestions of reduced risk, never security. We have shown that much of this advice does nothing to make users more secure, and some of it is harmful in its own right. Security is not something users are offered and turn down. What they are offered and do turn down is crushingly complex security advice that promises little and delivers less.

As security and governance professionals, we are trusted stewards for our organizations. We have an obligation to ensure that individuals make good choices by clearly communicating our expectations. Otherwise, we just come off like institutional bullies.

Security Policy RTFM

When I start a new position with an organization, the very first thing I do is review the policy framework and its contents. I don’t dig into the network diagrams. I don’t pester security engineers for current vulnerability findings or pentesting reports. I don’t even look at the strategy content first. Why would I spend time reading documents that are basically the digital equivalent of a sleeping pill? Because policies, standards and procedures represent the manual of an organization. Spend time reviewing it and you’ll soon discover how mature the security program really is.

Maybe I developed this habit from my time as a Unix engineer. In the days before Google and ubiquitous wireless, you had to know how to read man pages and use them to solve problems quickly. There were many times I would be sitting in an icy server room at midnight without a network connection, trying to figure out why a volume wouldn’t mount or a NIC wasn’t working, but apropos or man -k saved me. The CLI was the way through those troubleshooting sessions by uncovering various arguments and switches found in the man pages. It made me a better technologist, because I learned that good engineering is as much about documentation as it is about delivering a solution.

And yes, I was that person who when asked by a junior tech how to do something in *nix would respond with, “Man $insert_command_here.” I even threatened to change my middle name to RTFM at one point. While there was a part of me that reveled in the superiority of having pierced the highest levels of esoteric knowledge, I also genuinely wanted people to appreciate the elegance of a system that allowed you to have all the tools you needed to troubleshoot it.

Recently, I realized that an organization’s policy framework and its contents function in a similar way. You can learn how leadership prioritizes risk and empowers its governance team (or doesn’t). You can uncover processes and the inner workings of different business units. You’ll also find out quickly how dysfunctional the security program is based on the breadth of the content and how well it’s organized. Tedious, circuitous and often bloated, policy documents can be a challenging source to mine for intelligence, but it’s the best place to start. So, RTFM your organization by reviewing its policies and standards, otherwise you’ll struggle to separate the valuable elements of your program from pure security theater.

Cloud-Native Consumption Principles

The promises of Cloud are alluring. Organizations are told they can reduce costs through a flexible consumption-based model, which minimizes waste in over-provisioning, while also achieving velocity in the development of new digital products without the dependence on heavy, centralized IT processes. This aligns closely with the goals of a DevOps transformation, which seeks to empower developers to build better software through a distributed operational model that delivers solutions more quickly with less overhead. However, most enterprise cloud journeys begin with a “lift and shift” from the on-premise data center to an IaaS provider. This seems like the easiest and fastest way to begin acclimating to the new environment by finding and leveraging similarities in deployment and consumption of digital assets. While this path may initially seem to expedite adoption, the migration is soon bogged down by the very issues that prompted the organization to adopt cloud: cumbersome, centralized processes that don’t support developers’ need for automation and speed.

With startups, which don’t have the existing processes and organizational hierarchy to be realigned to a new way of working, applications have no barriers to becoming Cloud-Native. They begin that way. Enterprises weren’t initially built around a Cloud model, so the implementation is often based Conway’s Law, the design and provisioning mirroring the existing organizational hierarchy. The only difference being that instead of a server team deploying bare-metal or on-premise virtual machines, they build an instance in the cloud. While there are some incremental gains, much of the latency from human middleware and legacy processes remain. After the short honeymoon based on a PoC or pilot projects, the realities of misaligned business processes grind progress to a halt. This also results in higher spend because cloud resources are not meant to be long-running snowflakes, but ephemeral and immutable. Cloud is made for cattle, not pets.

The source of this friction becomes clear. While cloud is referred to as “Infrastructure as a service,” many assume this is equivalent to data center hosting. However, Cloud is an evolution in the digital delivery model, where bare-metal is abstracted away from the customer, who now consumes resources through web interfaces and APIs. Cloud should be thought of and consumed as a software platform, i.e., Cloud-Native.  As defined by the Cloud Native Computing Foundation (CNCF):

Cloud-native technologies empower organizations to build and run scalable applications in modern, dynamic environments such as public, private, and hybrid clouds. Containers, service meshes, microservices, immutable infrastructure, and declarative APIs exemplify this approach.

These techniques enable loosely coupled systems that are resilient, manageable, and observable. Combined with robust automation, they allow engineers to make high-impact changes frequently and predictably with minimal toil.

Therefore, to maximize the value of cloud adoption at scale, it is necessary to become Cloud-Native, and the effort must be tightly coupled to DevOps automation efforts.

In 1967, Paul Baran discussed the creation of a “national computer public utility system,” a metered, time-sharing model. Cloud, and by extension Cloud-Native, is the manifestation of that prediction and, as with other utilities, the consumption of “compute as utility” must be distributed and self-service in order to achieve cost benefits. What about governance and security concerns? Cloud Service Providers (CSP) have built-in capabilities to establish policy restrictions at the organization, account, resource and/or identity level. Native security controls can be embedded to function seamlessly, providing the automated monitoring, alerting, and enforcement needed to minimize risk and meet audit requirements. By decoupling compliance from control, these capabilities are more efficiently consumed through the platform via policy-as-code integrated into declarative Infrastructure-as-code (IaC). Alternatively, organizational risk is increased when using manual provisioning, abstraction layers or traditional controls that are not cloud-ready or Cloud-Native with this environment.

In an attempt to ease organizations’ struggle with cloud adoption, Azure and AWS have developed Well-Architected Frameworks to promote better cloud consumption and design. Both consist of five pillars to evaluate the quality of a solution delivery: 

  • Operational excellence
  • Security
  • Reliability
  • Performance (Efficiency)
  • Cost optimization

While helpful, these frameworks fail to communicate the urgent need for automation and tight coupling to the application development lifecycle in order to achieve a successful cloud migration. For example, from the AWS Operational Excellence Pillar, “operations as code” is only listed as a design principle to “limit human error and enable consistent responses to events.”

Ultimately, Cloud at scale, is best consumed as a software platform though the automated development processes essential to DevOps, otherwise the costs of side-channel pipeline provisioning and long-running, inefficiently sized workloads soon outweigh the initial benefits.

To summarize, the principles of a Cloud-Native consumption model include:

  • Automated provisioning of all resources as code through pipelines owned by product teams
  • Distributed self-service to achieve velocity and empower business segments
  • “Shift Everywhere” security through Policy-as-code embedded into the Infrastructure-as-Code
  • Decoupling of compliance from operational control through the use of CSP native capabilities to automate governance, monitoring, alerting and enforcement

To be effective, these principles are best operationalized through the unification of any cloud initiative with a DevOps effort. Otherwise, the cloud effort will be crippled by the existing technology bureaucracy.

Tagged ,
%d bloggers like this: