1. The Baby and the IT Bathwater

    © 2013 Jeff Sussna, Ingineering.IT

    Kevin Jones recently made the provocative proposal that some companies should consider dissolving IT altogether and disseminating its functions into their business units. Mark Thiele countered with the observation that, without addressing IT’s underlying ills, decentralization may just spread the disease. So what are we to do? Should we throw out the baby with the IT bathwater? I envision a solution that reconciles these perspectives, while at the same time going beyond them. In my opinion, the precise makeup of the org chart is less important than the nature of the service. As I’ve previously posted, we need to shift our perspective from “Information Technology” to “Digital Enablement”. Information technology is about managing a bunch of stuff. Digital enablement is about using a particular set of expertise to help the business succeed.

    Proceeding from the latter perspective, IT continually strives to divert energy from low-level, commodity functions to high-level, value-added ones. It embraces cloud computing as an aid towards that end. It keeps commodity functions and resources in-house with regret, unless they address unique, business-specific needs. Instead of considering cloud solutions by asking “do we have to?”, IT asks “is there any reason we can’t?” In the latter case, IT actively drives cloud providers to improve their services in order to remove obstacles to adoption. It also seeks ways to remove obstacles by redefining the problem.

    What does the newly energized IT organization look like? In my view, it looks a lot like an agency. Christian Reilly and Brian Gracely foresee a future where IT functions as a concierge, focused more on connecting the business to IT functionality than on directly delivering and controlling it. My only problem with the idea of IT-as-a-Concierge is that it implies passivity. The concierge patiently waits for guests, who know more or less what they want (a pleasant, nearby, not-too-expensive place to eat dinner) to ask for help. IT needs to be more proactive. It needs to go out into the business units in order to understand their needs. It needs to propose solutions, not just offer them up upon request. It needs to help move the business forward by identifying opportunities the business doesn’t even see.

    Some question IT’s ability to succeed when “the business doesn’t know what it wants”. Developers and designers have long understood that users don’t understand their own needs. This trait is natural, and need not be considered a problem. Practices such as Agile and Design Thinking specifically address limitations in self-understanding in order to surface unexpected, innovative, truly valuable solutions. IT needs to adopt similar approaches. The future IT organization likely will employ fewer “hardware huggers” and more BA’s and Service Designers.

    How does the CIO role change with this model? Does it go away entirely, or report to or even merge with the CMO? Again, the specific org chart structure is less important than the ability to provide a unique perspective and knowledge set. In the world of IT-as-agency, the CIO (or dare I suggest “Chief Digital Enablement Officer”) is responsible for understanding business and customer needs at the highest strategic level, and communicating those needs within IT. Even more importantly, though, the CIO also is responsible for defining opportunities to drive the business forward through digital infusion, and for communicating those opportunities beyond IT.

    To answer the question whether IT should remain centralized, or dissolve itself into the business, I think the answer is ‘yes’. I don’t expect the need for in-house IT systems to disappear completely any time soon. Even in an enterprise where IT is highly consumerized, the IT organization has important expertise to contribute. It can help business units make wise use of cloud services, in terms of cost, efficiency, availability, security, etc.

    The difference from current practice, though, is that, rather than doing things for people, IT helps people do things. To provide that service, IT needs to be out and about in the business, not to mention exposing itself to customers. Like any agency, IT can no longer be the department of ‘No’, or ‘My Way or the Highway’. Agencies that take that approach generally get fired. If nothing else, Kevin’s post illustrates the fact that people have started wondering whether they could/should fire IT. It’s up to us to reimagine ourselves in order to circumvent that outcome.

  2. Digital Infusion: Reimagining IT

    © 2013 Jeff Sussna, Ingineering.IT

    IT is in crisis. We’ve become the Department of No, of Slow, of Expensive, of Inadequate, of Misaligned.  We’re told we need to drive innovation, to become more service-oriented, to cut costs and improve quality and increase agility all at once. I believe that this crisis goes beyond just how we do things. It touches the very definition of what we do, and is reflected in what we call ourselves.

    “Information Technology” makes me think of mini-computers in the basement controlling giant continuous-form printers noisily spitting out reams of inventory reports. The idea that we are in the business of information is obsolete. The days of Management Information Systems are behind us. No longer does our job consist of extracting information for managers to use to make decisions. Digital experience has become woven into the warp and woof of corporate existence. In order to remain relevant, and valuable, IT needs to tranform itself into something else.

    On one hand, work still happens in the physical realm. I drive to work, walk to my office, chat around the water cooler, hand my coworker a printed document, attend a meeting in a meeting room, draw with a marker on a white board. On the other hand, I use Google Maps on my phone to navigate while driving, chat  around the virtual Yammer water cooler, email the document to my coworker, attend the meeting on Skype, and co-edit drawings in Google Docs. 

    One might claim that the digital realm is replacing the physical. Even in a virtual workplace, though, I still chat, share, read, write, talk, and draw. Work still consists of humans engaging with each other to create value. Whether you’re sitting in an office, or a coffee shop, or your kitchen, you’re still sitting somewhere, and communicating and collaborating with, or selling to and supporting, other people who are also sitting someplace. 

    In an excellent article, Bernard Golden defines the role of IT as “enabling computing to be performed on behalf of the larger company”. Bernard’s definition of IT as an enabler is excellent, and worthy of contemplation. The issue is that computing is no longer a separate activity ‘to be performed on behalf of’ anyone. It’s risen to the level of a pervasive quality that infuses physical work. Products and activities have taken on a physio-digital nature. Is a Cadillac with OnStar and Pandora a car, or an online software service? Yes. What about an office-building thermostat that sends data back to the vendor’s IT systems, and uses analytics from those systems to optimize temperature and humidity control?

    In the current world, a company’s ability to function at all requires seamless digital infusion. To make a simple phone call with a ‘regular’ telephone, I need the LAN and the VoIP server to work. Beyond that, I should be able to search my contact list from my phone no differently than from my email client.

    Digital infusion goes beyond internal enterprise functions. Corporate interaction with customers has also become physio-digital. Outbound marketing and customer support blend with each other, and with Twitter and Facebook. In turn, support personnel respond to online complaints by promising to call customers and resolve their problems voice-to-voice.

    The Cadillac/OnStar example is a case of a company going even further and enabling digital infusion on its customers’ behalf. When people talk about IT needing to drive innovation, I believe that ‘customer digital-infusion services’ like OnStar, or like the Big Data-augmented HVAC controller, are exactly the kind of thing they’re talking about.

    IT needs to do more than make itself more agile and aligned by embracing BYOD, or CoIT, or cloud. We need to understand our role in supporting companies’ very existence. We need to revisit our basic mission. I believe that that this mission consists of enabling companies to improve outcomes by infusing physical spaces, activities, and relationships with digital ones. Then we need to take the next step and  revisit what we call ourselves. To fully transform ‘Information Technology’ we should stop calling it by that name.

  3. Service-Dominant Logic: Why AWS Is So Far Ahead

    © 2013 Jeff Sussna, Ingineering.IT

    A couple of years ago I migrated a LAMP application from a colo space to AWS. While for the most part I took a forklift approach, I did try to take advantage of basic AWS services such as ELB, auto-scaling, multi-AZ RDS, and so on. My application architecture included a memcached server. After investigating possible memcached fault-tolerance solutions, I ended up settling for a single instance. 

    Just a few weeks later, Amazon released their ElastiCache service. It felt as if they’d been reading my mind. In fact I think they were. I believe Amazon views their customers through a fundamentally different lens from their competitors. In my opinion this lens explains both their mind-reading abilities and their dramatic lead in the cloud market.

    As I see it, Amazon hasn’t fooled themselves into thinking they’re in the servers-on-demand business. They don’t care whether they’re selling Infrastructure-as-a-Service, Platform-as-a-Service, or Beer-as-a-Service. They understand that their customers don’t just come to them looking for Elastic Things, whether servers or storage or load balancers. Instead, customers use Amazon’s elastic things to help themselves operate scalable, fault-tolerant, cost-effective web applications. Amazon’s understanding comes from applying service-dominant rather than goods-dominant logic.

    According to goods-dominant logic, a vendor creates value, and gives it to a customer in exchange for money. According to service-dominant logic, the vendor and the customer co-create value together. The customer hires the vendor’s products, as it were, almost as it would workers, to accomplish a so-called “job to be done”. In the case of cloud, customers hire Elastic Things to help themselves build and operate resilient applications.

    Amazon is able to read their customers’ minds because they’re always thinking about their customers’ jobs-to-be-done. They constantly strive to better understand those jobs, and to help their customers better accomplish them. This approach explains why Amazon releases composable components, not just monolithic systems. Not every customer’s job-to-be-done is identical; AWS components let customers combine them in subtly different ways to suit subtly different requirements.

    Amazon differs from many of their competitors by virtue of always having been a service company. They’ve also benefitted from being their own customer. Companies such as Oracle, Dell, HP, Microsoft, and VMWare, on the other hand, have product pedigrees. I find it interesting that Google, considered by some to be the only truly viable competitor to Amazon on its own terms, also has always been a service company, and also has been its own cloud platform customer. 

    Speculation abounds regarding what Amazon’s competitors need to do to catch up. I don’t think it’s just a matter of innovating faster, or spending more money, or applying any kind of straightforward business strategy pivot. I think Amazon’s competitors need to accomplish nothing less profound than a complete change of perspective. They need to pivot from goods-dominant logic to service-dominant logic. 

  4. Noise as Signal: Why Twitter is Good for People and Companies

    © 2013 Jeff Sussna, Ingineering.IT

    Twitter is an incredibly sloppy medium. If you follow more than a small number of people, there’s no way you can keep up with everyone’s tweets. You wake up in the morning and you’re 500 tweets behind. If you take the time to try to read them all, you’re 100 behind again by the time you finish. Much of what you read consists of repeated posts from people trying to make sure you don’t miss what they want to say. The 140 character limit makes it easy to misunderstand someone’s point. The way that conversations get mixed together makes it hard to follow a train of thought. Once you get included in a conversation, you’re stuck with it whether or not you want to participate. At the same time, you can’t prevent other people from jumping in. If you do want to have a conversation with someone, you have to deal with the fact that they might answer you in a minute, or an hour, or a day.

    How can this kind of communication medium possibly be a good thing? I’ve noticed some interesting things about myself as a Twitter user. I take things less personally. If someone doesn’t immediately respond to me, I don’t assume it’s because they think my comment is stupid or not worth their time. If someone does respond, and their response sounds snarky, I take the time to find out whether it really was, or whether it was innocent humor, or whether I misunderstood them entirely. If I post something I think is important, I do it repeatedly, though not enough to add too much noise into the system. I don’t get bent out of shape if people don’t respond to my first try, or retweet my post according to some preconceived expectations I have. Finally, I treat the 140-character limit as a challenge to express myself succinctly.

    We live in a time when high-fidelity information appears to be immediately and constantly available. We suffer from the illusion of near-perfect control. I rely on Google to point me quickly at exactly the right answer. I get frustrated when I can’t pause or rewind on-air radio like I can Tivo or Netflix. While I’m a big fan of Google Maps’ turn-by-turn directions feature, I’ve noticed that it’s making me lazy. I don’t bother any more reading the directions for myself; I just leave it to my phone to tell me what to do and when to do it. If for some reason I need to understand where I’m going, I’m caught helpless.

    It’s for this reason that I think Twitter is a beneficial medium. On one hand, I’m able to use it to communicate effectively. On the other hand, it doesn’t give me any illusion of control or information fidelity. Unlike other media, where fidelity and control fail rarely but catastrophically (Google leads me wildly astray, Netflix goes offline…), Twitter fails constantly but innocuously. Even an extended Twitter outage may annoy me, but doesn’t fundamentally compromise my ability to communicate. Twitter being offline isn’t all that distinguishable from the people with whom I’m conversing being out to dinner, or asleep. In fact, if I’m having trouble understanding or seeing eye to eye with someone, a multi-hour timeout might improve our communication. 

    Yahoo and other companies are making news for cancelling work-from-home policies. They believe face-to-face communications are more efficient. To some degree, that’s true. But anyone who works in a large organization can attest to the fact that face-to-face communications are far from perfect. They suffer from ‘noise’ introduced by politics, bureaucracy, interpersonal relationships, and simple human misunderstanding. 

    Those of us who have worked on business continuity plans know that physically colocated organizations also risk catastrophic communications failures due to earthquakes, floods, fires, and other environmental events. The proliferation of extreme weather makes it increasingly important for companies to be able to tolerate environmental outages. In Minnesota this winter, snow storms snarled the morning commute on eight out of ten Mondays in a row, leading to a regular litany of “Working From Home” emails. Reliance on physical proximity thus provides no guarantee of well-controlled, high-fidelity information exchange.

    We are living in a time of growing complexity and uncertainty. Ongoing social and technological shifts are creating opportunities for greater and greater successes as well as failures. Traditional strategies to maximize success and minimize failure through coordination and control no longer suffice. Instead, we need to learn to mix success and failure together in our minds and in our interactions with other people. We need to develop an underlying tolerance for loss of control and degradation of information fidelity across physical and digital dimensions. In my experience, Twitter fosters this capability via a kind of “eventual fidelity” that is unique among new communications media.

  5. What’s In a Name? DevOps…er…Continuous Delivery

    © 2013 Jeff Sussna, Ingineering.IT

    I am a strong proponent of DevOps. I like to think I’ve at least been doing proto-DevOps since 2003. I still, however, struggle with the DevOps name. I observe others doing so too. On the one hand, DevOps is moving into the enterprise IT mainstream. On the other hand, I question how many of its adopters could give a concise answer when asked to articulate what DevOps is, how to do it, or why it’s beneficial.

    In the process of contemplating this problem, I found myself comparing the word “DevOps” to the word “Agile”. Admittedly, there is still plenty of confusion about what constitutes proper Agile practice. Just yesterday I saw a tweet to the effect that “just because you hold standups doesn’t mean you practice Scrum”. Ultimately, though, the answer to the question “do you do Agile?” is straightforward. If you can rapidly, accurately, and efficiently respond to constantly changing business needs, then do you Agile. In other words, if you are agile, you do Agile. If by some strange miracle, you’re accomplishing agility using waterfall, then you shouldn’t change a thing. The point is that the name describes the value. In the case of DevOps, the name describes the implementation, not the desired outcome.

    I found a clue to a possible resolution in the fact that I approach DevOps slightly differently from most people. I define it, not in terms of how IT structures itself, but rather in terms of what customers expect. Delivering software as service makes operations an explicit part of the customer value proposition. Customers view functionality and operability as inseparable aspects of service. Imagine I tell you about a new restaurant I tried. When you ask me how it was, I say “the food was great but the service was terrible”. I’ve answered a single question with two statements. You’ll use both statements to decide whether to try the restaurant for yourself.

    People often talk about DevOps and Continuous Delivery in the same breath. By refining our understanding of “delivery”, I believe we can dispense with the need to differentiate between them. Continuous Delivery typically refers to continually delivering functionality through small, frequent application releases. What the customer expects, however, is the continual delivery of functionality + operability. We implicitly acknowledge this fact by integrating security, resilience, and performance-related user stories, spikes, and tests into the continuous application deployment pipeline.

    Cutting-edge operations practices enable more continuous delivery of operability. Auto-scaling, for example, makes application performance continuously consistent as user traffic ebbs and flows. This example may seem trite, until you compare it to pre-cloud IT environments that required manual acquisition, provisioning, and configuration of physical hardware to respond to increased traffic. Automated failover architectures, circuit-breaker RPC patterns, and other resilience engineering practices make availability more continuous, both by preventing failures and by enabling faster healing. Chaos Monkeys and blameless post-mortems are about discovering and learning from failures more quickly and proactively. The “monitor all the things” meme enables continuous visibility into functional and operational behavior. That visibility feeds user stories back into the front of the Continuous Delivery pipeline.

    Even the most basic and central DevOps tenet, dissolving cultural silos between development and operations, speaks to Continuous Delivery. What is the negative impact of dev/ops silos? They degrade quality, slow down progress, and generate waste. In other words, they increase delivery time and cost, while decreasing value. If I were asked “why should we dissolve development/operations silos?”, I would answer “because it lets us deliver software service (aka functionality + operability) more continuously”. In other words, DevOps IS Continuous Delivery.

    I have found Continuous Delivery to be easy to communicate. People across the spectrum, from InfoSec engineers to Agile product owners, seem to intuitively grasp both its value and more or less what it looks like. When I talk about using Continuous Delivery to address functional and non-functional requirements, no one bats an eye. Like Agile, the Continuous Delivery moniker describes its own value. How do you know if you’re doing Continuous Delivery? If you’re delivering functionality + operability continuously, then you’re doing it. ‘nuff said.

  6. Better DevOps Through Oreos or: How I Learned to Stop Worrying and Love Business Agility

    © 2013 Jeff Sussna, Ingineering.IT

    These days business agility is all the rage. Pundits tell enterprises they need to learn to go faster, faster, faster. Agile thinkers use airborne combat as a metaphor, advising companies to figure out how to “get inside your enemy’s OODA loop”.

    I worry we’re creating an environment of corporate anxiety and raised blood pressure. I find myself imagining workplaces that resemble stock-exchange trading pits. Cloud and mobile make it possible to work anywhere, any time. Will the need for speed create the expectation that we’ll work everywhere, all the time? After hearing the story of the Super Bowl Oreo ad, I feel a little less fearful.

    As the story goes, when the lights went out during this year’s Super Bowl, Oreo was able to conceive, produce, and release a clever ad on Twitter in only a few minutes. Understanding how important the Super Bowl was, they’d had the foresight to set up a war room for the game, and to fill the room with everyone, from executives on down, needed to make and act on decisions. When the lights went out at the game, they didn’t need to run around like chickens without heads, or fire off a bunch of emails, or make panicked cellphone calls to get people on a conference bridge, or anything like that. Instead, they were able to turn to the person next to them in the room and say “here’s an idea; what do you think?”

    Bringing together everyone they needed to be able to decide and execute made Oreo simultaneously more agile and more relaxed in a high-pressure situation. Unnecessary silos, procedures, and boundaries don’t just cause waste and impede responsiveness. They also cause frustration, anxiety, and unhappiness. Reducing separation has the potential to let us have our agility cake and eat it too. Thanks to Oreo, I feel a little less anxious myself. Of course, it’s been that way since I was a kid - nice to see some things never change :-).

  7. DevOps as Art

    © 2013 Jeff Sussna, Ingineering.IT

    I recently wrote about my belief that taking a dualistic view of ‘hard’ vs. ‘soft’ collegiate studies is a mistake. The other morning I had a twitter exchange with John Allspaw about the subtlety, and consequently the difficulty of adoption, of the complex-systems view of failure. That afternoon I read James Urquhart’s excellent post about DevOps and Anti-Fragility. By the end of the day I was more convinced than ever that IT managers and engineers should study and practice the arts.

    James characterizes an anti-fragile system as one that can tolerate many small failures, while capitalizing on any successes that arise from continual change. He posits DevOps as a potentially anti-fragile methodology for enterprise IT. While I think this idea has a lot of promise, its adoption, in any kind of non-cargo-cult manner, may be challenging. The anti-fragile view turns everything we’ve learned about engineering and IT on its head. We need some way to shift our minds into a new mode of thinking. I believe exposing oneself to the artistic process may be the best way to accomplish that mental shift.

    Art is anti-fragile by definition. Artists constantly experiment. A painter will paint the same subject over and over again, each time a little differently. When a good artist arrives at a good result, their impulse is not to start repeating themselves, but to do something different. They push themselves, and each other, to produce “bad” work. Why? Because they understand that failure is the fastest route to truly new success. Picasso, Joni Mitchell, and Miles Davis are examples of great artists who never rested where they were. They all produced multiple different styles of brilliant work. They did so by forcing themselves to change, and by having the courage to release pieces that their previous supporters might hate.

    Artistic experimentation has a relatively low cost of failure. One can paint over a painting, edit a poem, rewrite a piece of music, or just throw it away and start over. The key component, though, is the attitude that innovation requires continual change, and that continual failure is the driving force in that process. Ultimately, even a successful piece becomes a failure. The artist’s world changes, and the piece no longer accurately reflects that world. Art is not fundamentally about production, but about creation. Creation, driven by continual destruction, is never-ending.

    Enterprises seem to be starting to accept the need to be innovative and adaptive. They are showing increasing interest in technologies such as cloud, and methodologies such as DevOps, that support continual change. To fully embrace anti-fragility, they need to embrace continual failure as a mode of practice. To do that, they need to think less like managers or engineers and more like artists. 

  8. Why Marc Andreessen Is Wrong

    © 2013 Jeff Sussna, Ingineering.IT

    According to Marc Andreesen, college education comes in two distinct varieties: hard (math-based) and soft (humanities-based). Those who pursue the ‘hard’ path will position themselves for rewarding careers, while those who pursue the ‘soft’ path ‘likely will end up working in shoe stores’. I find this viewpoint disappointing and counterproductive. It smacks of antiquated, dualistic thinking. It also misses an opportunity to address important challenges facing IT. 

    I believe the dichotomy between hard and soft studies is false. In order to build systems that are useful to people, or operable by people, we need to understand people and societies. In order to manage complexity, we need to expand our minds to think in terms of systems, gestalts, relationships, and patterns. In order to improve quality and agility, we need to strive for elegance, beauty, and simplicity. These modes of understanding are the province of the soft studies. Instead of splitting out the two kinds of curricula, and denigrating the value of one of them, we should integrate them more deeply. 

    Businesses are asking IT to become more engaged and relevant. Agile methodologies expect developers and sysadmins to work with, listen to, and understand customers. New requirements approaches stress developing “shared language”. Cloud computing is dissolving boundaries between IT and users altogether. 

    Literature, art, sociology, and history concern themselves with ‘messy’ individual and collective behavior. They challenge students to think about problems without “right” answers. They explore motivations like hope, fear, trust/distrust, and habit. These motivations drive us all. Among other things, they drive responses to new systems and processes, and organizational change. If IT wants to evolve from the “department of no” to a source of innovation and leadership, it needs to develop emotional and social fluency. 

    Sidney Dekker’s “Drift Into Failure” is a book about engineering and systems safety. It discusses topics such as airplane hardware maintenance schedules. The books’ overarching theme, however, is the need to change our understanding of the construction of reality. It questions Cartesian/Newtonian beliefs about the separateness of subjective and objective truth. Dekker introduces his thesis with references to Michel Foucault and the post-structuralist notion of “the text”. This text was often a literary work; in other words, along with post-structuralism itself, something you’d encounter in an English or Critical Theory class, not an Engineering class.

    For the humanities, systems thinking is second nature. Having one’s perspective lifted “up and out” is standard fare. Readers of “Madame Bovary” confront questions about how the protagonist’s behavior did or didn’t make sense within her social and historical context. They also consider the author’s choice of topic and narrative structure within his historical context. Viewers of Cubist paintings face the challenge of seeing a subject from multiple time/space perspectives at once.

    Finally, software and system quality and agility correlate strongly with readability, elegance, and simplicity. Coding is a literary activity, done by authors for readers. Maintenance starts with reading. Github builds in support for a “read more…” link at the top of every repo. I have long believed that software developers should study great writers, especially Hemingway. 

    Complicated system architectures are harder to operate or change. The ability to grasp unnecessary complication involves the ability to see wholes and the relationships within them, and to recognize beauty as a form of “simple completeness.” I think system architects should spend time looking at paintings by Joan Miro and sculptures by Jean Arp and Alexander Calder. 

    More and more, IT and ordinary reality are blending, both in terms of how we do things and what we do. We need to engage our full capabilities in thinking about, designing, building, and operating our society’s physical-digital substrate. Not only is the disparagement of so-called ‘soft’ studies unhelpful; it reflects an out-of-date view of knowledge and reality. If we want to build systems that are usable as well as safe, we need to update our reading of “the text” that is education. 

  9. A Manifesto for 21st-Century IT

    © 2013 Jeff Sussna, Ingineering.IT

    Numerous blogs have described how Barack Obama’s IT team dramatically outperformed Romney’s team during the 2012 Presidential election. Obama’s team delivered greater quality, better functionality, and superior results for the campaign at significantly lower cost. They did it using cutting-edge tools and techniques such as public cloud computing, DevOps, gameday testing, and open source.

    The Obama IT team’s methodology is reminiscent of those used by 21st-century digital properties like Netflix and Facebook. Detractors have dismissed success stories from these companies as only applying to “non-mission-critical entertainment” applications. It would be hard, though, to find a more-mission critical situation than a presidential election. This year’s campaign invested a billion dollars to decide which single person would have unparalleled influence on the state of the U.S. and the world for the next four years. The election also provided a rare public opportunity to observe a bake-off between current-generation and next-generation approaches to similar IT problems. For these reasons, I think we may look back at the 2012 election as the seismic moment where next-generation IT moved out of its niche and proved itself in a major way.

    So what are the essential characteristics of this new kind of IT? Whatever we call it, whether “21st-Century”, “Post-Industrial”, or “Adaptive”, what differentiates it from current IT practice as epitomized by the Romney campaign? I find myself answering this question in terms of not-entirely-binary value choices, along the lines of the Agile Manifesto. I offer a strawman set of statements, and invite others to help refine it:

    1. We value resiliency over stability: since both external environments and internal structures for accomplishing things are complex and ever-shifting, failure is “always around the corner”. It should be be treated as just another expected event rather than as an exception

    2. We value minimizing Mean Time To Repair (MTTR) over maximixing Mean Time Between Failures (MTBF): the inevitability of failure makes trying to maximize MTBF a futile exercise. Instead the focus should be on maximizing one’s ability to repair failures. The dynamic nature of the market means that even working applications quickly fail to match shifting requirements. Not only operations but also development becomes an exercise in minimizing MTTR.

    3. We value elasticity over planning: Static planning produces solutions that are brittle when forced to change. Elasticity treats unpredictability as the plan.

    4. We value lightweight tools over comprehensive solutions: the more global, comprehensive, and tightly structured things are, the harder they are to adapt, change, or repair.

    5. We value loose coupling over coordination: the more complicated a situation is, the more overhead is required to coordinate it, and the more fragile a coordination solution becomes. Adaptability favors figuring out how to enable solution components to move independently from each other, while coordinating as needed.

    6. We value continuous innovation over best practices: the traditional approach to defining, encoding, and propagating best practices can never keep up with constant change. Instead, innovation itself should be a continuous practice.

    7. We value diversity over monoculture: forced adherence to single sets of tools and practices reduces opportunities for learning, change, and innovation, while simultaneously slowing selection, implementation, and propagation of those tools and practices.

    8. We value open source communities over hierarchical organizations: the open source model offers a mechanism for reconciling apparently conflicting needs for coherency and flexibility.

    9. We value unity of purpose over separation and specialization: providing service to customers is the overriding purpose for all employees, regardless of their role or location in an org chart. New functionality, the quality of that functionality, its operability, and its communication to customers are all intrinsically linked, and so should be the people, processes, and tools that deliver them.

  10. What Instagram Forgot: It’s Service, Not Software

    © 2012 Jeff Sussna, Ingineering.IT

    Yesterday Instagram introduced new Terms of Service that quickly set the Twitterverse on fire. People felt that the company was adopting a manipulative, disrespectful attitude towards the vendor-customer relationship. Some questioned whether the announcement was Instagram’s “suicide note”. National Geographic threatened to close their account. What happened? Instagram forgot that they’re in the service business, not the software business.

    A service is something experienced over time, through multiple touchpoints. Customers judge services by the entirety of their experience across touchpoints. You don’t just judge a restaurant by the quality of the food. You also judge it by things like the ambience of the room, the courteousness of the staff, and the convenience of the location. A bad interaction with the maitre’d may overshadow a good interaction with the meal. As IT more and more becomes, not just a business enabler, but “how we do business”, and as human interaction more and more becomes Internet-mediated, we may start to believe that software is the entirety of the service experience. Instagram reminded us that it’s still not true.

    Amazon Web Services, for example, provides a highly automated infrastructure service. AWS goes out of its way to minimize direct human interaction. They have not, however, automated their Evangelism program. Jeff Barr and his human-authored blog posts are a key part of how AWS communicates with its customers. AWS’s Reinvent user conference wasn’t automated. Nor was Werner Vogel’s keynote speech. The decision to hand out $5 gift certificates instead of free Kindles at the conference wasn’t automated. The writing of thorough, transparent outage post-mortems isn’t automated. These human touchpoints play important roles in contributing to customers’ understanding of, ability to use, and attitude towards AWS.

    Current software trends such as Agile, DevOps, and Continuous Delivery have integrated Quality Assurance more deeply into the software delivery process. QA has gained new respect as being critical to delivering software-mediated business value. QA’s job is to represent the user. In a service business, it must represent users’ entire experience, not just their interaction with the software at the core of that experience. Clearly Instagram failed to properly QA its new Terms of Service.

    To their credit, Instagram listened to the harsh feedback. They quickly issued a mea culpa, and didn’t try to shift blame away from themselves. Their apology was well written, and created a positive service moment. It remains to be seen whether that moment can stem the tide of account cancellations. Perhaps, if they’d written a hypothetical mea culpa as part of their internal QA process, they wouldn’t have had to write a real one. 

    The need for user-centered service design goes beyond QA. Everyone involved in delivering software services, whether development, operations, marketing, or legal, needs to understand the entire customer journey. They need to remind themselves that all of their customer interactions are part of the service, and that service is what they are delivering, not just software. 

  11. What Business is IT In, Anyway?

    © 2012 Jeff Sussna, Ingineering.IT

    In an otherwise excellent post, Danalynne Wheeler of enStratus quoted Bernard Golden as stating that “IT is in the business of infrastructure management”. I couldn’t disagree more strongly. As a service provider to corporations, IT should be in the business of helping employees accomplish their “jobs to be done”. Managing infrastructure may be a necessary activity towards that end, but it is not “the business which IT is in”.

    To properly understand IT’s true purpose, we should start by questioning the very title “Information Technology”. We are long past the days where IT’s primary purpose was the delivery of “Management Information”. Now, companies can’t even really exist without digital technology. People can’t answer the phone, send messages to each other, produce documents, listen to the marketplace, or deliver products without using PC’s, laptops, mobile devices, email, VoIP, the web, and social media. IT is responsible for providing the “digital substrate” that underlies all corporate activity.

    Netflix uses the term “Employee Technology” instead of “Information Technology”. This nomenclature is a step in the right direction. It still, though, implies separation between internally and externally facing technologies. What is the “job to be done” of every single Netflix employee? To maximize shareholder value by making it convenient and pleasurable for consumers to watch movies and TV shows. That mission applies whether one is developing the next Netflix client player for the Tivo platform, or preparing the quarterly SEC filing.

    Furthermore, in a service economy, vendors and customers co-create value together. A complete product value stream incorporates, not just delivery, but also usage. IT must concern itself with customer jobs-to-be-done such as getting support, providing feedback, integrating services into larger processes, and so on. That’s not to mention the purpose that brought them to the vendor in the first place: in Netflix’ case, the desire to watch a movie. In order to make satisfying customer experiences possible, IT (along with the rest of the corporation) needs to address the entire spectrum of employee and user activities that contribute to the ongoing relationship between the customer and the service provider.

    What about cloud computing and the Consumerization of IT? Do they render centralized corporate IT obsolete? Should the IT function wither and die altogether, or dissolve into business units? If it turns out to be feasible for non-technical people to manage cloud services unchaperoned, then IT can and should disappear. In that case, the digital enablement onus will fall on the cloud vendor. If not, then IT’s role will evolve into one of guidance and support. Instead of creating digital platforms, IT will help business units assemble and manage them. The need to understand co-creative value streams will remain. IT can play an important role by helping business units understand and implement that perspective.

    In conclusion, I must confess that I’ve failed to come up with a better name than “Information Technology”. “Digital Enablement” is vague, and sounds a little stuffy. I welcome any better ideas. Whatever we call IT, though, and whoever provides it, I believe its true purpose is nothing less than creating and supporting end-to-end, top-to-bottom, digital platforms that enable corporations to co-create service value through ongoing, mutual relationships with their customers.

  12. Why ‘Production Freeze’ Should Be a Four-Letter Word

    © 2012 Jeff Sussna, Ingineering.IT

    It has long been standard practice for IT organizations to enact ‘production freezes’ during important business periods. Whether it’s tax season, HR open enrollment, the days leading up to the election, or any other business-specific “critical time”, this practice makes intuitive sense. If the system is stable, you don’t want to risk business operations by destabilizing it. Change introduces uncertainty, which generates real as well as potential instability.

    This approach, however, suffers from two important misunderstandings. First, “the system is stable” really means “the system is stable, as far as we can tell, at this particular moment”. By definition, business-critical events stress systems. Freezing production means removing your ability to respond to unforeseen instability.

    Second, production freezes presume that the system meets user expectations. Just as with stability, functional validity is a matter of “as far as we can tell, at this particular moment”. If you discover you were wrong about customer needs, you’ve removed your ability to respond. Neither can you respond if you observe customer needs changing during the freeze period. 

    Freezing production during business-critical times prevents you from repairing sources of customer dissatisfaction when you most need to be able to do so. Due to the timing, the intensity of that dissatisfaction will be at its greatest. Not only will you not be able to soothe their frustration quickly; by definition you won’t be able to help them until AFTER their most critical need subsides.

    Typically, organizations address this conundrum with an intentionally draconian exception-review process. Only the most critical changes are allowed to thaw the freeze. The changes that do pass muster undergo an extraordinarily rigorous review and planning process. Unfortunately, this well-meaning (and again, intuitively correct) solution suffers from a fundamental misconception.

    Why do you over-review changes during freeze periods? Because you lack confidence in your normal process. Why do you restrict changes to those that are the most critical? Because you lack confidence in your normal process. In other words, you have instituted an approach that has you making the most important changes, at the most important times, with the least possible confidence that they will work.

    Imagine instead a scenario where you do away with production freezes. You use exactly the same process for making changes during business-critical times as you do during the off-season. This scenario has two implications: first, your ordinary change process must be robust enough to work at the most important times, with the most important changes. Second, your critical-time change process has been tested over and over again, throughout the year. As a result, you have maximum confidence in it.

    Fear leads us to do the riskiest things as seldom as possible. If, however, the riskiest things are also the most important ones, we need to engineer the risk, and thus the fear, out of them. The best way to do that is to reverse our logic and do them as often as possible. If we can make election-day IT change risk-free, we can shift our attention from unintentionally minimizing customer satisfaction of out fear, to confidently and continuously maximizing it.

  13. Servers? Yes. CPU-Hours? No.

    © 2012 Jeff Sussna, Ingineering.IT

    I had a spirited debate today with @jamesurquhart and @reillyusa about whether or not http://t.co/GupYbFKJ was a well-written blog post. @krishnan, as is his wont, made #popcorn and watched. We argued over whether the use of the word “serverless” was misleading. We didn’t, however, argue about the main thesis of the post: that development focus is shifting away from infrastructure altogether. 

    Currently, cloud infrastructure customers consume OS/CPU/memory/disk hours. In order to consume IaaS, developers must overcome a significant conceptual impedance mismatch between the server and the programming domain. PaaS dispenses with pseudo-physical abstractions such as “server” and “load balancer”, but still leaves the developer to drop their application into a monolithic tabula rasa (“the container”). 

    In the future, developers will consume things like rows, messages, queues, notifications, alerts, threads, tasks, buckets, and bytes. These objects will map more directly to the conceptual level at which developers design and code. The consumption model will map more directly to the actual usage of (and thus payment for) those objects. Note that consumption implies active creation and manipulation, not just passive reception.

    I have long argued that AWS is in a space of its own relative to other IaaS providers. 
    With the possible exception of GCE, I still hold this claim. With services like SWF, SES, SNS, SQS, Cloudwatch, and S3, AWS is already training developers to consume programming objects and not just OS/CPU/memory/disk hours. 

  14. Cloud and the Customer Journey

    © 2012 Jeff Sussna, Ingineering.IT

    If cloud computing has any one unifying characteristic, it’s the delivery of IT capabilities in service form. Thus the ‘S’ in IaaS, PaaS, and SaaS. Cloud vendors need to recognize they are in the service business. To provide customer satisfaction, they need to understand the nature of service.

    Services are experienced over time through multiple touchpoints. Customers judge services by the entirety of their experience. Cloud vendors must account for the entire customer journey and all its touchpoints.
     
    The cloud customer journey involves more than just daily utilitization of a given commodity, whether it be virtual servers (IaaS), application containers (PaaS), or end-user functionality (SaaS). The complete cloud customer journey includes:

    • Discovery and Understanding (what is this service? how do I use/pay for it?)
    • Adoption and Migration
    • Administration
    • Usage (the so-called daily commodity utilization)
    • Help and Feedback
    • Assimilation (understanding, adopting, and integrating changes)
    • Integration (with the customers’ processes as well as their toolsets)
    • Departure
    • Communications (changes, outages, breaches)

    That journey takes place across multiple touchpoints, including:

    • Web
    • API’s
    • Mobile
    • Email
    • Phone
    • RSS feeds
    • SMS
    • Social Media
    • Virtual and actual face-to-face contact (Webex, trade shows, etc.)

    As a service provider, the cloud vendor needs to think in terms of the customer journey, not just the endpoint or midpoint of that journey. They need to think this way during design, development, testing, and technical and business operations. Even though the vendor may largely deliver customer experience through software, they are in the service business, not the software business. The difference is subtle but important. The new user experience, the change assimilation experience, and the communications outage experience, for example, all need to be first-class components of the design process and the test plan. To make things more interesting, as the utility commodity functionality evolves, so must the rest of the touchpoints. 

    Good service providers view every aspect of the customer journey as an opportunity to reinforce their brand. Zappo’s response to a major security breach was a prime example. Instead of eroding their brand, it ended up enhancing it. Even the point of departure is a chance to create customer stickiness. An easy “delete account” process makes me feel better about a service, and increases the likelihood I’ll come back.

    Ultimately, the service provider is part of a larger customer journey that defines the customer’s overall “Job to Be Done”. The customer needs to be able to interact with the service provider in the context of their larger goals. For an IT department, evaluating and adopting new services is a non-trivial part of their ongoing work. As is maintaining resiliency in the face of service provider outages. Successful cloud vendors will view themselves through the lens of their customers’ journeys and desired outcomes. They will take this view continuously in all aspects of development and operations.

  15. Ops Ignorance Isn’t Dev Bliss

    © 2012 Jeff Sussna, Ingineering.IT

    Yesterday I vented on Twitter about developers changing software configurations willy-nilly. JP Morgenthal took it as proof that developers shouldn’t be allowed to manage production environments. Adrian Cockcroft saw it the other way ‘round. I’ll leave that particular controversy for another post. What concerned me yesterday was the notion that so-called “lower” environments should be managed differently from production.

    Undisciplined change causes, at least uncertainty, and at most, real problems. My code doesn’t work anymore because you upgraded to Java 7. Or, perhaps actually even worse, I don’t know whether my code works anymore because you upgraded to Java 7.

    Undisciplined change also slows things down. When problems do crop up, time gets wasted debugging, until someone remembers, “oh yeah, maybe I forgot to upgrade my machine to Java 7”. Even without problems, one-off changes set in motion chains of “you need to upgrade your machine to Java 7” emails. Each developer, tester, and ops person gets a disruption task tossed onto their cognitive queue.

    Back in the day, developers rebelled against source code control as impinging on their creative freedom. Fortunately, those days are well behind us. Why then do we still want to manually install the latest SDK version on our local machines, one at a time? Why do we want to waste time dealing with version incompatibilities between my machine and yours?

    I watch developers create multi-day scrum stories for themselves to install system software on their local machines, and I want to cry. With configuration automation tools like Puppet, Chef, CFEngine, and Vagrant available, there really isn’t any reason any more not to automate everywhere, and share configurations everywhere. If the prod version of your app is running against Apache version 1.2.3, then dev and test (and every developer’s machine) should be running Apache 1.2.3. If you want to migrate to 1.2.4, you should do it consistently and automatically wherever you want/need to. All dev environments can be at 1.2.5, while all test environments are 1.2.4. Need a developer to regress something against 1.2.4? Should just be a flip of a switch. Need to spin up another test environment running 1.2.x, or change the 1.2.x config and globally propagate the change? Another flip of the switch.

    At least for now I won’t render my opinion about whether developers should be allowed to touch production environments. I will, however, state my belief that good ops practices should be applied to all environments. Any unnecessary differences between environments, or between how they are managed, should be exorcised.

    I will also state my belief that developers and testers need to understand, appreciate, and engage in those same good ops practices. They should view undisciplined, manual change with the same distaste as do forward-thinking ops engineers. After all, it makes all their lives harder, and impedes their ability to deliver maximum customer value with minimum friction.