Recently, I’ve written about the dangers posed by technology fallacies and one of the most frustrating for me involves discussions of “best in class.” In my experience, this mindset causes technology teams to get themselves wrapped up in too many pointless discussions followed by never-ending proof-of-concept work all in search of that non-existent perfect tool. The truth is that most organizations don’t need the best, they need “good enough” so they can get on with business. But this delusion has more serious consequences within the cloud. When you choose tools solely based on the requirement of being “best in class,” you could compromise the integrity of your cloud architecture. Without considering the context, your selection could violate a sacred principle of data center design – minimizing the size of a failure domain.
For many, cloud’s abstraction of the underlying network reduced complexity in what often seemed like arcane and mystical knowledge. It also allowed development teams to work unencumbered by the fear of seemingly capricious network engineers. However, the downside of infrastructure-as-a-service (IaaS) is that this same obfuscation allows those without a background in traditional data center design to make critical errors in fault tolerance. We’ve all heard the horror stories about cloud applications failing because they were only located in a single availability zone (AZ) or region. Even worse, partial outages that occur due to application dependencies that cross an AZ or region. While this could still happen with physical data centers, it was more obvious because you could see the location of a system in a rack and your network engineers would call you out when you didn’t build in fault-tolerance and thwarted their planned maintenance windows.
Today, it’s also common for organizations to use a variety of software-as-a-service (SaaS) applications with no knowledge of which underlying cloud providers are being used or how those services are designed. While this simplicity is often beneficial for the business because it increases velocity in service delivery, it can also create blind spots that violate the same failure domain principles as with IaaS. Only the most persistent technologists can unmask the inner workings of these applications to determine how they’re deployed and whether they align with their organization’s infrastructure design. Unfortunately, the business doesn’t always understand this nuance and can fall prey to the “best in class” fallacy, leading to brittle environments due to large failure domains. By exchanging their on-premise, sluggish systems for SaaS, the organization often accepts a different set of problems associated with risk.
Ultimately, when choosing capabilities, it’s a better recommendation to “dance with the cloud that brought you.” Instead of worrying about “best in class,” you want to select the technologies that are closer to your infrastructure by leveraging the services of your cloud provider where feasible. This translates to a better technology architecture for your organization because the cloud provider is making sure that their managed services are low-latency, resilient and highly available. While it may not always be possible, by taking the failure domain principle into consideration during the selection and implementation of solutions, you’ll achieve better service delivery for your organization.
[…] Dancing with the cloud. Michele makes the point that doing “best in class” across clouds expands your failure […]