1. Why Vagrant is the Best DevOps Tool Ever

    © 2014 Jeff Sussna, Ingineering.IT

    I am a strong proponent of the viewpoint that DevOps is first and foremost about culture. When clients ask me which big, expensive enterprise tool they should use to implement DevOps, I tell them they shouldn’t buy anything until they fully understand why they want it. I’ve previously posted my belief that empathy is the true essence of DevOps.

    I do, however, believe that tools can sometimes help develop culture by influencing behavior. To that end, there is one tool I tell every client they should adopt from the start. That tool is Vagrant, created and maintained by Mitchell Hashimoto.

    Vagrant makes it possible to create desktop clouds by scripting the configuration and control of multi-VM systems. Imagine a multi-tier web application consisting of a web server, a database, and an email server. With Vagrant you can specify and package the entire description of that application: its tiers, their operating systems, and all the system and application configuration actions needed to provision the entire software stack. You can then share that package with your whole team in a controlled manner. Any configuration changes can be managed and disseminated consistently via a version control system.

    Vagrant makes it easy for everyone involved in delivering a software service to think about it in the same way. I tell my clients to create Vagrant boxes for their applications, and put them on everyone’s desktop. By everyone I mean developers, testers, admins, and even product owners. There is no reason product owners should depend on centralized test servers any more than anyone else. They should be able to do acceptance testing right on their own desktops, at their own convenience. Vagrant’s automation capabilities let them do it this way, with confidence they’re testing the same configuration that will run in production.

    Vagrant dissolves differences between perspectives across the Dev/Ops continuum. It makes it possible to treat every environment similarly, from the developer’s desktop all the way to production. It treats layers within the software stack similarly, from operating system patches to application configuration files. It presents everyone with the same view of a system, not just “infrastructure” or “application” or “database” or “app server” or “user behavior”. Most importantly, it treats every member of the software service team similarly, giving testers and admins alike the same environments and tools.

    Empathy involves the ability to see things from others’ perspectives. Vagrant puts complete systems on everyone’s machines. It makes those systems part of their daily lives. Team members can run the entire software stack, architecturally identical to production, on their laptop in a coffee shop. No longer is the full system architecture something that only lives in the cold, humming data center, on the other side of the man trap. In this way, Vagrant helps cross-functional software service teams start down the path towards mutual empathy, and thus towards DevOps culture.

    The title of this post is intended to be tongue-in-cheek. It’s not my intention to set up some kind of competition with other DevOps tools. As people say when asked to choose between Chef, Puppet, CFEngine, and other configuration automation tools, “using any of them is better than using none.” The point is that, when we struggle to understand how to foster something as intangible as “DevOps culture”, Vagrant can be an excellent starting point.

  2. Why We Should Design Software Systems Like We Design Buildings

    © 2014 Jeff Sussna, Ingineering.IT

    Yesterday I stumbled upon an online debate about whether we should build software like we build buildings. I would like to pose a slightly different question: should we design software systems the way we design buildings? To answer my own question: of course we should!

    We consider Apple under the leadership of Steve Jobs to be legendary and unique in the history of the computer industry. Jobs led the creation of systems that unified functionality, engineering, and esthetics. The results satisfied their users on multiple levels. With Jobs’ passing, it appears at least for the moment that Apple has lost its design mojo. No one else seems fully able to take up the mantle.

    By comparison, architects have been designing satisfying buildings for hundreds of years. Just within Spain, for example, you can see magnificent buildings whose design spans several centuries. Architects have always treated functionality, engineering, and esthetics inseparably. The idea that one would need to invent DevOps would seem strange to them. 

    We use the word ‘architecture’ on multiple levels in the context of software systems. None of our uses of the term are as rich as its meaning in the context of buildings. Building designers strive to satisfy human needs by shaping physical space. I believe that software architects need to redefine our jobs in a similar fashion. Our mission should be nothing less than striving to satisfy human needs by shaping digital space. We must pursue this mission inseparably across functional, operational, and esthetic dimensions. Hopefully, by doing so we can usher in an age that is replete with satisfying software systems, and where Steve Jobs is just one among many instead of a lonely beacon in the dark.

  3. Beginner’s Mind and Design

    From The Eames Studio’s Inspiring History and Unknown Dark Side:

    Charles and Ray’s biggest contribution was conceptual: They showed that “design” could be an art of manipulating ideas, not just materials. They were master communicators, not fabricators. “We don’t make art; we solve problems” was a favorite maxim of Charles, which still sounds perfectly contemporary in the 21st century, 50 years after he said it. “Design thinking” and research strategies, de rigueur now thanks to firms like IDEO, owe a debt to the Eameses philosophy of what one interviewee in the film calls “selling ignorance.” IBM and Westinghouse didn’t hire the Eames Office for its expertise, which would necessarily be limited; quite the opposite. They hired the Eameses for their process of discovery, of admitting that they knew little, and taking that “beginner’s mind” approach to finding design solutions.

  4. If You Want More Innovation, You Need More Art

    © 2014 Jeff Sussna, Ingineering.IT

    Last night I went to see a Trisha Brown Dance Company retrospective at the Walker art museum in Minneapolis. Trisha Brown is a preeminent postmodern dance composer. The works in the show incorporated contributions by three of my artistic heroes: Laurie Anderson, John Cage, and Robert Rauschenberg.

    The show was one of the most exhilarating, creative, original performances I’ve ever seen. Brown and her collaborators have stretched our understanding of how human beings can move, both alone and in relation to each other; what they can wear; how space can be arranged and decorated; and how it all can relate to sound. Afterwards my brain felt like it had been stretched and spun and thrown up in the air like pizza dough,  then doused in a concentrated caffeine bath.  

    We live in the age of competition through innovation. We’ve begun spending more and more time and energy trying to figure out how to make our organizations more innovative. We tend to do it by applying external mechanisms: everything from Agile to Lean Startup to Design Thinking to innovation consultants to Chief Innovation Officers. These mechanisms can all be helpful. By themselves, however, they are likely doomed to fail. Mechanisms can only guide people in expressing their internal abilities.

    We often dismiss art as not being “about anything” or having any “practical” purpose. But you could also say that art is creative problem solving stripped to its barest essence. After last night’s dance performance we couldn’t stop talking about which choreography choices worked, what they meant, what could have been done differently. These are exactly the kinds of questions we want people asking about product strategy or corporate structures and procedures. We need people to be able to stretch and spin and throw their brains up in the air in the workplace. We need them to feel dosed with the caffeine of desire to understand and experiment and learn and solve.

    In this age of innovation, the idea of reducing exposure to or support for the arts seems highly counterproductive. We want people to go to more dance performances and read more novels. We want them to talk about what they read or saw or heard around the water cooler on Monday morning. Just like an athlete improving their strength and stamina through training, people need to train their minds in order to increase their capacity to stretch and reach for new ideas and solutions. Without that training, all our well-intentioned innovation methodologies will just be expensive, frustrating exercises in futility.

  5. Puzzlement-as-a-Service

    © 2014 Jeff Sussna, Ingineering.IT

    I’ve been observing the latest PaaS debate with some interest and more frustration. The relationship between PaaS and IaaS is being questioned: is PaaS becoming just an attribute of IaaS? Which one is more central? Does it make sense for PaaS vendors to continue to exist as independent companies?

    With all due respect to the participants, all of whom I hold in great esteem, I fear that the debate may be missing the point. I’ve long been a fan of PaaS on general principle. At this point, though, it’s hard to tell what it really is. I haven’t seen enough in the way of concrete, detailed, grounded description or analysis. Claims for its benefits are highly unicornish: “PaaS will liberate developers from the thrall of IT”. These claims often dismiss, and risk alienating, the ops side of things. To me this dismissal and alienation is very ironic. Having worked with enterprise IT teams that supported multiple applications on a single set of infrastructure, I believe PaaS has as much potential benefit for ops as it does for development. Unfortunately, I feel a bit like I’m talking to myself. The current discourse doesn’t help me understand whether I should advise my clients to run as fast as they can towards PaaS, or away from it.

    According to its vendors, the PaaS market is maturing. The information about PaaS needs to mature as well. Imagine, if you will, a hard-nosed, skeptical IT ops architect conducting a PaaS evaluation. Now imagine that this architect issues a concluding report along the lines of “here’s why I tried to convince myself we should avoid PaaS, and here’s how I convinced myself we should adopt it instead.” The report would include specific details about which features facilitated which beneficial outcomes, and how they did so. 

    Such a report would be incredibly useful to everyone analyzing, selling, supporting, or considering adopting PaaS. I want to challenge one or more of the PaaS vendors to write such a report, or at least to use it as a conceptual model for their marketing material. I think it would be a great step towards helping PaaS cross the chasm. If its benefits really are as great as those being touted, then we really do want everyone to use it.

  6. Empathy: The Essence of DevOps

    © 2014 Jeff Sussna, Ingineering.IT

    I first encountered empathy as an explicit design principle in the context of design thinking. You can’t design anything truly useful unless you understand the people for whom you’re designing. Customer satisfaction is more than just an intellectual evaluation. Understanding users requires understanding not just their thoughts, but also their emotional and physical needs.

    I was surprised to encounter empathy again in the context of cybernetics. This rediscovery happened thanks to a Twitter exchange with @seungchan​. Cybernetics tells us that, in order for any one or any thing to function, it must have a relationship with other people and/or things. That relationship takes place through the exchange of information, in the form of a conversation. The thermostat converses with the air in the room. The brand converses with the customer. The designer converses with the developer. The developer converses with the operations engineer. Information exchange requires (and can contribute to) mutual understanding; e.g., empathy.

    I had another Twitter exchange, this one with @krishnan, on the question of whether Platform-as-a-Service needs DevOps. I think the question actually misses the point. Software-as-service offers customers inseparable functionality and operability. Development delivers functionality and experience; operations ensures the operational integrity of that experience. At some point, the service will inevitably break. Uncertainty and failure are part of the nature of software-as-service. They are, to use @seungchan’s term, part of its “materiality”, just as flexibility or brittleness are part of the materiality of the wood or metal or plexiglass used to make a piece of furniture.

    When a service does break, someone has to figure out where and why it broke, and how to fix it. Did the application code cause the failure? The PaaS? An interaction between them? Or something at a layer below them both? Regardless of how many abstraction layers exist, it’s still necessary both to make things and to run them. It doesn’t matter whether or not different people, or teams, or even companies take responsibility for the quality of the making and the operating. In order for a software service to succeed, both have to happen, in a unified and coherent way.

    The confluence of these two Twitter exchanges led me to reflect on the true essence of DevOps. It occurred to me that it’s not about making developers and sysadmins report to the same VP. It’s not about automating all your configuration procedures. It’s not about tipping up a Jenkins server, or running your applications in the cloud, or releasing your code on Github. It’s not even about letting your developers deploy their code to a PaaS. The true essence of DevOps is empathy.

    We say that, at its core, DevOps is about culture. We advise IT organizations to colocate Dev and Ops teams, to have them participate in the same standups, go out to lunch together, and work cheek by jowl. Why? Because it creates an environment that encourages empathy. Empathy allows ops engineers to appreciate the importance of being able push code quickly and frequently, without a fuss. It allows developers to appreciate the problems caused by writing code that’s fat, or slow, or insecure. Empathy allows software makers and operators to help each other deliver the best possible functionality+operability on behalf of their customers.

    Dev and Ops need to empathize with each other (and with Design and Marketing) because they’re cooperating agents within a larger software-as-service system. More importantly, they all need to empathize, not just with each other, but also with users. Service is defined by co-creation of value. Only when a customer successfully uses a service to satisfy their own goals does its value become fully manifest. Service therefore requires an ongoing conversation between customer and provider. To succeed, that conversation requires empathy.

  7. Designing Holographically (Learning to See)

    © 2013 Jeff Sussna, Ingineering.IT

    In response to my post on Lean Architecture and Holographic Design, the esteemed James Coplien asked me to “share concrete practices that work”. I will try my best to respond to his request, though I’m not sure he’ll find my answer satisfying. I don’t have any formal processes, algorithms, or best practices to offer. In my experience, designing holographically is more about seeing than about doing. 

    Holographic design doesn’t work by creating a purely high-level architecture first, then filling in details later. Instead, it goes all the way down from the beginning. Confidence in the details degrades at lower architectural levels, as does the need for precision. Big Up-Front Design doesn’t work because it doesn’t allow for adaptation. At the same time, though, we can’t just hope that good architecture will magically emerge. Holographic design allows us to create a coherent overall picture while leaving room for specific details within that picture to flex and change in response to feedback. 

    How do you know that any given version of an architecture is good enough to let you move forward? How can you be sure that iterative refinements at lower levels won’t invalidate higher-level decisions? To a large degree, it’s a matter of practice. Having done it enough times, you learn to see weaknesses in the overall design, and in the relationships between components and levels. You learn how to push at the design, almost like you would a spider web, to see what happens if individual strands break at any given level. 

    Architecture is about shape. In order to evaluate a design, you need to be able to see, and contemplate, the entire shape. There’s a reason we use the word ‘spaghetti’ to refer to poorly thought-out designs. They are, quite literally, ‘a mess’. You can’t understand shapes and their implications through linear analysis. It’s not a matter of applying graph theory. Instead, you need a holistic ability to see. How do you learn how to see? How do you practice it in order to improve your seeing ability? The best way I know is to study and practice the arts. Look at great buildings throughout history. Take a film-making or photography class. Read Hemingway and Fitzgerald and Louise Erdrich. Write poems and short stories. Participate in critiques. 

    Picasso’s ‘Guernica’ is my favorite painting in the world. Many years ago I had the great fortune to see a show devoted entirely to that painting. It included a large number of studies Picasso painted in preparation for the final piece. Being able to see his progression from the initial concept to the final masterpiece was remarkable. His process seemed very holographic. The basic premise remained from the beginning to the end. Along the way, his execution grew in richness, depth, detail, and completeness. He added more details, while simultaneously working out the relationships between them. 

    Systems thinking requires subjective choices about where to draw boundaries and how to connect components. Thinking about the meaning of things in the arts requires similar choice-making. What is Hemingway trying to say in “The Old Man and the Sea”? It’s up to you to find patterns within the story, and to decide how those patterns fit together. There is no such thing as “the right” analysis of a piece of literature. Similarly, there is no such thing as “the right” architecture. There is no way to be certain an architecture “works” until after the fact. Holographic design is a continual process of hypothesizing. The best you can do is to learn how to make good hypotheses. Making, studying, and critiquing works of art is about making, studying, and critiquing the process of hypothesis-making. As such, it can help us become better and more lean architects.

    The artistic approach isn’t just about lonely genius. It can also help us design holographically within a collaborative group process. Art has always incorporated group critique. Lean architecture can use a similar technique. Architectural critique can become a regular part of the iterative cadence. For it to work, however, everyone involved needs the ability to understand and push at architectural shapes. The entire team, therefore, not just a lonely ivory-tower architect, needs to learn to see.

  8. Lean Architecture and Holographic Design

    © 2013 Jeff Sussna, Ingineering.IT

    Agile has always struggled with the question of how to approach architecture. Should it be emergent, and just arise as part of iterative development? Or should it take place during a special, distinct Iteration Zero? We tend to think of architecture as something different and non-iterative. We worry about getting it “right” up front. That thought process goes against the very notion of Agile, which questions the possibility of getting anything “right”, forever, in one shot.

    I think the tension between architecture and Agile arises from the fact that architecture necessarily involves coherency. From the beginning, an architecture must represent a context, along with the relationships between the components within that context. We struggle to understand how to iteratively create coherency. We worry that, if we “bite off small pieces as a time”, we’ll end up with a Rube Goldberg device instead of anything coherent, elegant, or manageable.

    In reflecting on my own architectural process, I’ve found that I approach design “holographically”. The name comes by way of analogy with a hologram that, when broken, still shows the whole picture, but with less sharpness of detail. I design by starting with a fuzzy version of the entire architecture, then gradually increase the precision of successively lower levels of detail.

    Holographic design naturally balances coherency and iteration. I think it has the potential to help resolve the Agile/architecture tension, in the form of a Lean Architecture practice. Similar to Lean UX, Lean Architecture would integrate architecture into the Agile development process without sacrificing the integrity of either one.

    Rather than applying iteration to the gluing together of architectural components, holographic design applies it to increasing levels of precision within the overall architecture. My first cut at an architecture generally results in a picture that has more or less the right overall shape. The details of the components that make up that shape, however, are somewhat fuzzy. The more deeply I dig into the structure, the more fuzzy the details get. If someone asks me “how are you going to solve this sub-problem”, the answer is often “I’m not sure”. Sometimes I do have an answer, but one that doesn’t hold up to scrutiny. Peer review often helps me understand what I have figure out next.

    It isn’t generally the case that detailed review shows my picture to be fundamentally flawed. If my design is just plain wrong, I usually find out fairly quickly. More often, I need to resolve something at a component level. That resolution may uncover structural problems at lower levels of the architecture. It’s seldom the case, though, that structural breakage cascades back up to a higher level. By resolving lower-level details, I increase the “sharpness” of the overall design.

    Holographic design makes it possible to do rapid architecture early in a project. Early stage architecture review only needs to provide enough confidence to move forward with a design. It doesn’t need to freeze lower-level details. The architecture can, and will, flex during the course of iterative development. It can do what it’s supposed to do, which is to adapt to interaction with reality. Project teams can delay decisions about specific, component-level details and solutions until they really need to resolve them. In the meantime, iterative development generates feedback about still-fuzzy components. The lean architecture practice can use this feedback to increase sharpness at the same cadence, and within the same iterative structure, as the rest of the project.

  9. Promising Quality Service Experiences

    © 2013 Jeff Sussna, Ingineering.IT

    Imagine you’ve just arrived at your hotel after a harrowing flight. What is your immediate job to be done? It’s to get to your room so you can relax. The first person you encounter in the hotel is the check-in agent. What is their job to be done? Is it to check you in? Or is it to fulfill the hotel’s promise to help you make the transition from stress to relaxation?

    If the agent defines their job as checking you in, and the reservation computer is offline, then they frustrate you on two levels. First of all, you can’t get checked in, which means you can’t get to your room, toss your luggage in the corner, take off your shoes, flop on the bed, turn on the TV, and start to decompress from your flight. Second, they frustrate you by acting like a cog in a machine instead of a person. By allowing the computer system to thwart them, they lower themselves to its level. When a person acts like a machine part, one wonders whether it would be less frustrating to replace them with an actual machine part.

    If, on the other hand, the agent defines their job as helping you shift from feeling stressed to feeling relaxed, they feel empowered (even obliged) to fulfill that promise in the face of component failures. With the example of the offline reservation system, perhaps the agent finds you a specially set-aside room where you can take a shower while you wait. Maybe they just offer you free drinks and snacks in the bar until the system comes back online. Whatever the solution, the agent uses their human intelligence and creative problem-solving abilities to overcome system-level failures. 

    We all understand that nothing works perfectly all the time. We appreciate efforts to overcome failures. Being offered useful help in the presence of problems can not only prevent dissatisfaction, but also increase satisfaction. People intuitively understand the difference between resilience and brittleness. 

    I’ve previously written about the importance of designing services that can adapt to unexpected situations. These situations may arise from internal problems, such as a software failure, or from the surrounding context, such as a delayed flight. Promise theory provides a framework for designing resilient human-technical services. It can help us design and operate adaptive services that can respond gracefully to both internal and external surprises. 

    Promise theory treats systems as collections of agents collaborating through promises. A promise is a strongly stated intention to create value. A promise is not a guarantee. It may or may not come to pass. Just because I promise to buy groceries on my way home doesn’t guarantee I’ll remember to do it. Recipients of promises must decide how much to trust the promises made to them.

    Promise theory ironically increases system-level certainty by surfacing component-level uncertainty. Because I know I’ve been promised something with less than perfect certainty, I have the responsibility to make alternate provisions in case of failure. The reservation software promises to display a guest’s reservation upon request. It may fail. The IT department promises to fix software failures quickly and transparently. They may fail. The check-in agent promises to help the arriving guest shed the stress of their trip, regardless of the availability of the reservation system, or whether the guest’s flight brought them to the hotel at the expected time. 

    Promise theory speaks the language of jobs to be done. It directs attention to the value being created, rather than the thing being done to create that value. It also encourages holistic, customer-centered processes and structures. Value-centered promises cross disciplines. The promise to provide a working reservation system cuts across development and operations. The promise to help a newly-arrived guest relax applies to the check-in agent and the barkeep. 

    Service relationships find success through value co-creation. In order to have a productive business trip, I need a decent place to stay. If the hotel fails to fulfill its promise to provide a clean room with a soft bed and hot water, my ability to gain value from my trip is compromised. It will be harder for me to impress my clients if I”m tired and cranky.

    Service-dominant logic speaks of value propositions. One might think of them as promises. Brand quality depends on a company’s ability to keep its value promise. By focusing on benefit, relationships, and trust, promise theory offers a language and a perspective for maximizing value co-creation and customer satisfaction in the face of uncertain systems and events.

  10. Promise Theory, DevOps, and Design

    © 2013 Jeff Sussna, Ingineering.IT

    I’ve been using Promise Theory to think about problems related to agile requirements and project scaling. The devopser in me recognizes the incompleteness of any analysis which only considers development. After further contemplation, I believe Promise Theory can be helpful in thinking about connecting Dev and Ops. In fact, it can help us go further to integrate Design as well.

    Consider today’s favorite whipping boy, Healthcare.gov. It’s currently engendering tremendous frustration. The root cause of that frustration is the breaking of a social promise. Congress passed a law (the Affordable Care Act). Among other things, that law promises that individual citizens can purchase health insurance through exchanges. I don’t think any of us realized how completely that promise would rely on a website. The website is (at least perceived to be) breaking promises on multiple levels:

    -We promise you will be able to find, compare, and enroll in health insurance plans via healthcare.gov
    -We promise you will be able to find your way around on the site without confusion
    -We promise you will be able to accomplish the specific functions you need in order to enroll in a plan
    -We promise the site will be available when you want to use it
    -We promise the site will not lose your data
    -We promise the site will maintain the privacy of your data

    If ever there were a moment proving the inseparable importance to ordinary people of usability, functionality, and operability, this is it. Promises are being broken across design, development, and operations concerns.

    Promise Theory provides a mechanism for managing larger problems in terms of smaller ones. It helps us understand relationships between components. Stating requirements in terms of promises cuts across disciplines. The promise to maintain data privacy, for example, has implications for both infrastructure and application code. It thus cuts across Dev and Ops. Promises regarding navigation and functionality cut across Design and Development.

    Another way to see how promises cross disciplines is to ask “who is we?”; in other words, who is making the promise. At the highest level (we promise you will be able to enroll in plans using the website), “we” is everyone involved in designing, implementing, running, and supporting the site. Promise chains let us map the relationships between specialties and understand how they connect to make a greater whole.

    Approaches such as DevOps and Service Design are applications of systems thinking. They strive to improve quality, efficiency, and value through holistic thinking, organization, and action. Promise Theory can help these approaches frame the problems they’re trying to solve in compatible terms. It can even help us understand what concepts like DevOps and Service Design really mean, and why they’re valuable.

  11. Scaling Agile Projects Using Promises

    © 2013 Jeff Sussna, Ingineering.IT

    Yesterday I posted my thoughts about moving from requirements to promises. I’ve also been contemplating whether Promise Theory could address some of the challenges associated with scaling agile projects. All of the methodologies I’ve encountered seem to sacrifice the spirit of agility in order to make it work in the large. Self-organization and amenability to change seem to inevitably give way to some form of command and coordination.

    I’ve been mentally comparing large-scale agile projects to housing construction. How do general contractors make sure buildings get built properly and on time? They seem to do so much better than us at managing complex projects. It suddenly occurred to me that general contracting is a perfect application of Promise Theory.

    A contractor promises to build a house, with a given set of features, in a given amount of time, for a given price. The contractor makes this promise based on their relationships with other promisers, known as subcontractors: painters, carpenters, lumberyards, etc. Each subcontractor is an independent agent. They in turn rely on other promisers. The painter relies on the paint supplier. The lumber yard relies on the exotic wood importer. And so on.

    Each component in the chain makes promises, none of which are guaranteed to be kept. The painter might fall off a ladder and break his leg. Teak shipments from Asia might be delayed because of a dock strike, or the price might go up due to a sudden demand spike. Any of these events forces the contractor to make tradeoffs, some of which impact their ability to keep their own promises:

    -Find another painter
    -Switch the order of operations; for example, do the plumbing before the finish carpentry to give the teak time to arrive
    -Ask the client to decide whether to pay more to cover the new price of teak, or else switch to mahogany because it’s cheaper
    -Delay the completion date

    Those of us who have experienced construction projects know that all of these things happen all the time. Construction operates based on chains of promises with less-than-perfect trustworthiness. Promisers and promisees throughout the chain make decisions and contingency plans to minimize the effects of unkept promises.

    It seems to me that large-scale agile development projects have similar characteristics. Perhaps, rather than trying to bind together agile (aka adaptive) components into a reliable, coordinated, and thus no longer adaptive whole, we need to treat the sub-projects as autonomous agents cooperating via promises. Large-scale agile project management would become an exercise in managing promises, trust, and uncertainty. This approach seems to me to more accurately reflect reality.

    Part of the power of Promise Theory comes from the fact that promises are always local. One promise may trigger another as part of a promise chain. The scope of any given promise, however, never extends beyond the promisee to whom it’s made. A large-scale web application project, for example, may rely on a complex hierarchy of promises to reach completion. The team responsible for a backend REST service, however, only promises features and dates to its immediate consumers. It’s up to those consumers to make their own upstream promises based on their trust of the backend service team.

    Locality of trust has powerful implications for large-scale projects. It avoids the intractable problem of trying to globally coordinate a loosely coupled complex system. There is no need to incur the overhead and brittleness associated with having to move information consistently across levels within the project.

    Locality of trust gives each agent within the system, at every level, freedom to adapt to variance in trust. Imagine, for example, that I’ve promised certain functionality based on your promise to build the REST service. If I lose trust that you’ll keep your promise, I have the freedom, and the obligation, to make contingency plans. Possible alternatives might include writing my own backend service, contracting with another team to do it, or changing technical requirements to obviate the need for it.

    By its very nature, Promise Theory builds freedom and obligation into the process. Trust based on uncertainty characterizes all relationships between all teams at all levels. No one suffers from the illusion of certainty. Ironically, that lack of illusion raises the likelihood of a satisfactory outcome. Agile teams’ capacity to respond to change increases their chances of success. Similarly, promise-based agile projects’ chances for success are increased by their capacity to respond to changing certainty.

  12. Turning Requirements into Promises

    © 2013 Jeff Sussna, Ingineering.IT

    The software industry has been struggling with the topic of requirements for as long as it’s existed. The very mention of the word conjures up painful associations with other words like “incomplete” and “unmet”. Requirements have historically served as a fulcrum for conflict between those with needs and those charged with fulfilling them. Agile practices such as Behavior-Driven Development try to ameliorate this conflict by treating the requirements process as a conversation. BDD conversations still tend to focus on “requirements discovery”; in other words, collaboratively uncovering the development team’s obligations to the customer.

    Upon reading the phrase “turn requirements into autonomous promises” in Mark Burgess’ “An Introduction to Promise Theory”, it occurred to me that Promise Theory might offer a way out of the requirements tarpit. Whereas requirements (aka obligations) generate external pressure to conform, promises generate internal pressure to satisfy. Ironically, this difference can make promises more trustworthy than requirements. Requirements discovery asks “what do you need from us?” Promise discovery asks “what can we do for you”? The difference is subtle but I think potentially transformative.

    Promise-based software development transforms developers from order-takers to proactive, creative designers. Promises imply a strong motivation to succeed. You generally don’t promise to do something if you don’t care whether or not it happens. It likewise doesn’t make sense to promise something no one else cares about. Promise-driven developers thus will naturally gravitate towards promising things they believe have meaningful customer value. No one wants to burn a lot of time building something useless (especially if doing so makes you unable to keep your promise to build something else that’s more useful).This dynamic will encourage them to validate their beliefs through user-centered design techniques, so they really know what is in fact useful. 

    Professional designers try to go beyond understanding what customers want in order to create solutions that satisfy their underlying needs. Software development should operate on a similar level. It should do so at a deeper level than just the user interface level. A development team could, for example, promise to be able to add new functionality to an application quickly and easily. In order to make that promise, they also need to make another promise: to keep technical debt under control. 

    This example illustrates the power of promises in reducing distrust between technical and non-technical stakeholders. When the product owner asks why the development team wants to prioritize technical debt stories, the developers can answer in terms of a chain of promises. Because they created that chain, and linked a higher-order, customer visible promise (ease of modification) to lower-order, technical promise (controlling debt), they can easily explain the importance of one in terms of the other. Developers often struggle to explain technical logic in user-understandable terms. Promise theory can help there. Even better, shared visibility into the promise chain can obviate the need to explain it at all. 

    Conversely, promises can give customers greater trust in the development team. On the one hand, by the very act of having been promised something, they can sense developers’ desire to provide satisfaction. On the other hand, the tentative nature of promises surfaces the reality that customers’ needs are not always perfectly met. When imperfection does manifest, it’s not a surprise. The difference between “we will implement this feature by next Wednesday” and “we promise to implement this feature by next Wednesday” is subtle, but again potentially transforms the customer-developer relationship.

    The nature of promises as statements with varying degrees of trustworthiness can help prepare customers for imperfection, and understand it when it comes. “We told you we weren’t confident we could finish this feature in time” is a common developer response to customer frustration. From the perspective of requirements, this exchange denotes failure. From the perspective of promises, it denotes a known possible outcome. Again, this turn of events accurately represents the real world. No software, even if it completely satisfies every requirement, perfectly meets every customer need. There is always something more to be added. Customers always have to work around some limitation, even if it’s a missing feature that no one could have been expected to foresee. Promises simply make this reality more manifest. Doing so removes some of the intensity that drives the conflict fulcrum.

    Promises’ inherent consideration of imperfect outcomes also puts some responsibility for realism back onto the customer. Faced with less than perfectly trustworthy promises, they are obligated (pun intended :-) to make alternative provisions for satisfying critical needs. Doing so makes unkept promises less painful. This relationship applies throughout the customer-developer chain. If a developer estimates a week to build a feature, and the project manager tacks on an extra three days, and the project manager on the receiving end tacks on another three days, both project managers are making promises to their respective customers based on their confidence in the promises being made to them. This situation is perfectly natural. 

    Software projects are not monolithic, frozen moments in time. They evolve both in completeness and correctness. Promises address projects’ true nature by defining atomic expectations. Promise chains allow for higher-order aggregate promises. Are promise chains really any different from sprints or epics? Are atomic promises really any different from user stories? I believe they are, in a couple of important ways.

    Promises are responses to user stories. User stories were originally intended to represent a unit of desired customer value, and to serve as a basis for conversation. Somewhere along the way we forgot this definition and turned them into micro-requirements. A unit of customer value needs a unit of development work to realize it. A promise is…well…a promise to do that work, in a particular way. In fact, “needs” might be a better name for “user stories”.

    Promise chains provide more unified and thus flexible ways to represent aggregates such as epics, features, and nonfunctional requirements. A space of higher and lower-order promises can be arranged however one likes to represent different project views in time and space. 

    The relationship between time and space is perhaps the most important difference between promises and previous requirements management techniques. A promise divorced from time makes no sense. “I promise to implement Feature X” has no meaning. We all know the humorous response to “we’ll get this done by September”, which is “September of which year?”. You can only say “I promise to implement Feature X by next Tuesday”. Whether Tuesday comes and you’re not finished, or you are finished but it doesn’t work as intended, you’ve still failed to keep your promise. 

    One could object that it’s somehow possible to map the concept of promises to existing agile development techniques. Technically speaking, that might be true. The use of the word “promise”, however, introduces two critical characteristics into the discussion: emotion and uncertainty. Regardless of methodology, software is built by humans, for humans. Humans are emotional creatures. When customers don’t get the results they want, they feel frustration. When developers feel pressured to perform against their own wisdom and experience, they feel humiliation. A truly effective project methodology must account for the emotional dimension of work.

    At least at the level of atomic actions, every project is characterized by uncertainty. We wouldn’t measure velocity, for example, if we could be certain we would always attain the planned velocity. We wouldn’t discuss blockers during scrum stand ups if there were no possibility of problems that could derail planned progress. Promises build an understanding of the uncertain nature of development work into the very language of its management. Ironically, acknowledging uncertainty at the atomic level can increase certainty at the higher level. If I understand the ‘algebra of confidence’ in my promise chain, and make arrangements for alternatives where that algebra doesn’t match the criticality of my needs, I can trust that the overall result will provide greater satisfaction and less pain. Since we are in the business of providing value, isn’t that the ultimate measure of success?


    Epilogue

    There is (at least) one place where Promise Theory and Agile/Lean/Continuous Delivery are in perfect agreement. The larger the granularity of my promises, the lower the confidence with which I can make them. “I promise to feed the cat tonight” is much easier to keep than “I promise to care for the cat”. Promise theory naturally encourages us to reduce batch sizes. We can use promise chains to express larger/longer promises while capturing variances in confidence at different levels of the chain. At the same time, we can keep our attention on the atomic promises of greatest immediate concern. That attention raises the  those promises’ trustworthiness. Other promises in the chain can float off in the “definite maybe” background. As additional promises come under the microscope of the present, our new attention to them raises their trustworthiness. Promise chains thus naturally express the concepts behind a backlog. 

    Remember that promises bind space and time; trust in a promise thus has no meaning except as that promise synchronizes itself with the present moment. A key characteristic of Promise Theory is the ability of trust (and its importance) to vary over time. “I promise to feed the cat on September 1st” is much less trustworthy, and less important, on March 1st than it is on August 31st. 


    Epilogue Two

    The example of the chain of project managers adding time to estimates made me realize that Scrum has made a fundamental mistake. Scrum uses the metaphor of pigs and chickens. There really is no such thing as a chicken. I may only be a stakeholder in your project. The only reason I’m there, however, is because I’m a pig in some other project. When you treat me  as a chicken, you subtly devalue my participation. If anything, it should really be the other way around. With respect to the project in question, pigs have more control. Developers can write code; product owners can only hope that developers keep their promises to write code. Thinking in promises can help balance this relationship. The product owner cares whether the developer keeps their promises because the product owner has made their own promises. The manager  who flies in over the project once a month does so because they’ve made their own promises based on yours. If all the players can understand each others’ promises, and the ways those promises’ trustworthiness rely on each other, they can recognize that there are no chickens, only pigs. To express it in the language of Promise Theory, we are all both promisers and promisees, bound together by trust. 

  13. Why STEM Needs STEAM

    © 2013 Jeff Sussna, Ingineering.IT

    Yesterday I witnessed a Twitter debate about STEM vs. STEAM. One of the debaters questioned the relevance of Art because, unlike Science, Technology, Engineering, and Math, it isn’t “useful”. I would like to humbly disagree with this viewpoint, especially in the context of current post-industrial business trends.

    In the 21st century, companies need to continually innovate in order to survive. They must be able to adapt more quickly and frequently than their competitors. They exist within environments increasingly characterized by complexity. These pressures all present challenges which can’t completely be addressed through science, engineering, or analysis.

    Reductionism is giving way to systems thinking. Systems thinking deals in wholes, and in relationships between parts. The decision about what constitutes a given system, however, is necessarily provisional. Every system is part of and intersects with other systems. Selecting which components to include within a given scope is more about choosing a perspective than about discovering the correct answer.

    John Boyd defined the concept of an OODA Loop as a model for competition in battle. In order to orient yourself, and decide which action to take, you have to let go of previous solutions in order to find new ones. Boyd described a cycle of destructive and constructive thinking. In addition, as part of orienting and deciding you may need to choose between equally valid possible solutions.

    Complexity generates situations that can’t be perfectly modeled or managed. Linear planning is forced to give way to safe-to-fail experiments. The desire for control gives way to acceptance of uncertainty, since failure can’t be perfectly predicted or avoided. History doesn’t help, since similar starting points can lead to highly divergent destinations.

    Given this new world of change, uncertainty, complexity, and provisional knowledge, how can we act? How do we choose from among seemingly equivalent choices? How can get ourselves to let go of what we know in order to generate new solutions that surprise our enemies (and ourselves in the process)? What do we do when we can’t rely on experience to guide us? How do we even know what defines the system we’re trying to manipulate when we can’t identify “the right” viewpoint?

    Do we just make random choices and hope for the best? We need strategies for decision making and action that go beyond pure analysis. Such strategies are the province of the arts. Art isn’t just about technical ability, or compositional beauty. It’s also about intuitive understanding and decision-making, creation through destruction, and discovery through transformation.

    Writing a play, novel, or screenplay is an exercise in systems thinking. The writer must choose which characters, scenes, and conversations are critical to creating a coherent fictional world and communicating its desired meaning. Editing a film requires making choices about essential and non-essential moments and viewpoints. The editor has to decide what can be dropped onto the cutting-room floor without compromising the integrity of the film as a whole.

    Improvisational art and music are exercises in construction through non-conceptual decision-making. Participants strive to “never say no” to whatever comes next. The artistic piece unfolds towards its destination without having been designed in advance.

    Painting is an exercise in safe-to-fail experiments and discovery through transformation. X-rays of paintings by masters like Titian reveal repeated erasures and compositional changes. Joan Miro created abstract paintings with seemingly representational titles like “The Melancholic Singer”. These titles are puzzling until you see the sequence of studies that led to the final piece. Miro would start by painting a realistic depiction of a singer, then create progressively more abstract versions until he arrived at a study in pure colors and shapes unlike anything you’ve ever seen before. Picasso arrived at Cubism by responding to African art without knowing where he would end up, or what it would be called when he got there.

    If we are to succeed in post-industrial life in general, and business in particular, we need the ability to think and act trans-analytically. On the one hand, technology is ever more deeply embedded in ordinary life. We need our citizens and employees to have some understanding of particle physics and computer science. On the other hand, contemporary life requires an ever greater ability to “choose beyond certainty.” Studying and practicing the arts can train us in the discipline of finding our way to creative solutions.

    Post-industrialism is dissolving many of the intellectual and practical silos that have arisen since the Industrial Revolution. Our educational system needs to catch up with this transformation. STEM and A need to be joined into STEAM. Art classes are anything but a luxury, and should be anything but electives.

  14. Designing Adaptive Services

    © 2013 Jeff Sussna, Ingineering.IT

    Service design concerns itself primarily with experiences rather than things. This approach struggles with the reality that experiences are inherently personal. As someone rightly pointed out, “you can’t actually design experiences. You can only design the conditions for them.” No matter how well you’ve designed your airline service, someone may arrive at the airport worn out from the flu, or from not having slept because of a colicky child. They may have a bad experience despite your best efforts to design for delight.

    Service designers try their best to create the conditions for achieving service quality, but may throw up their arms at their own limitations. I think, though, that there is an opportunity to take a more adaptive approach to designing services. In this area service design can learn from other disciplines, in particular Information Technology.

    Testing is an integral part of software development. In addition to ensuring that a piece of software behaves properly under normal conditions, testing also validates how gracefully the software handles unexpected inputs. Techniques such as stress and load testing address input volumes as well as their content. In the context of service design, an airport design project might create a stress test to simulate the scenario where several delayed flights all land at once, disgorging large numbers of grumpy passengers. This test would stress the conditions for a good experience both in terms of unexpected inputs (grumpiness) and input volume (lots of simultaneous grumpiness).

    By uncovering inadequacies in a design, this kind of testing can lead to more robust solutions that account for more variance in individual experiences. Good testers are especially skilled at identifying scenarios that other team members miss. The simple process of probing boundaries for unexpected inputs can lead to improved services by reminding the design team that the surrounding context is more unruly than they may realize.

    Services also need to maintain the conditions for good experiences in the face of internal stresses. What happens to service quality in a hospital service, for example, when a CT scanner breaks, or a staff member calls in sick? Here again service design can learn from IT. Modern IT environments are very complex. Complex systems are characterized by the inevitability of failure. Rather than trying to prevent failure, resilient IT systems focus on maximizing their ability to respond to it.

    Central to resilient IT is the strategy of “design for fail”. Some kinds of failures can be masked. A load-balanced website continues working even if one of the web servers go offline. Other kinds of failures require graceful fallback. If your website has a page with a Google map on it, and the Google Maps service goes offline, does the page just display a big blank square where the map belongs? Or does it display a nicely designed message apologizing for the inability to show the map? Or does the site avoid confronting users with failure altogether by automatically removing the link to the mapping page?

    Not all failures can be masked. Outages are unavoidable. Design-for-fail requires the ability to repair unmaskable outages quickly and reliably. Modern IT operations teams run Game Days to test outage response procedures. Multiple rounds of game days create iterative learn/fix/validate loops.

    Some organizations go even further, employing so-called Chaos Monkeys. Chaos Monkeys intentionally inject failures into live production environments. The best way to test design-for-fail is to fail for real. The Chaos Monkey’s ultimate purpose is not to show management how well-tested their systems are. Instead, it celebrates exposing chinks in the resilient design armor. Finding problems in the place you least want to encounter them gives you the greatest opportunities for adaptation through learning.

    How might service design approach design-for-fail? It would start by incorporating the understanding that delivery systems can and will fail during the the customer journey. That understanding would generate exercises to understand how those failures impact the customer’s experience, how to mask them when possible, and how to respond to them when they can’t be masked.

    Take the example of the broken CT scanner. The first step in an adaptive service design would be to analyze its impact on the staff and the patient, and to generate ideas for designing around it. The second step would be to run one or more game days, and “break” a scanner to test how the staff and delivery systems respond to it, and to see how their responses impact patients. The final step, reserved for the very brave, would be to intentionally take a scanner offline in the real hospital, during real patient hours.

    Design-for-fail reflects the recognition that complex systems can’t be perfectly understood, predicted, or controlled. Services are perfect examples of complex socio-technical systems. Customer delight is a dynamic process that ebbs and flows. Attempts at its attainment necessarily take place in an environment of incomplete control. Both external and internal disturbances perturb it. The best service design will therefore incorporate dynamic feedback and self-repair, both into itself and into the operational services it creates.

  15. Are We Sure Netflix is Just an Edge Case?

    © 2013 Jeff Sussna, Ingineering.IT

    In an otherwise excellent post, Ben Kepes calls Netflix an outlier whose approach to cloud doesn’t map well to most enterprises’ needs. This view seems to be prevalent among cloud observers. I feel the need to take mild exception to it, as I believe it overstates the current state of affairs. It assumes that current enterprise legacy application architectures are static, and that IT organizations prefer to forklift them onto so-called “enterprise clouds”. In my experience, by contrast, many enterprises are questioning the wisdom and viability of the forklift approach.

    IT organizations are facing accelerating pressure to support companies’ growing need for business agility, innovation, customer responsiveness, and adaptability. This pressure doesn’t stop with so-called systems of engagement. It goes all the way back to systems of record. In fact, the distinction between the two is starting to erode. Enterprises are responding to this pressure by upgrading application architectures within and around the system-of-record tier. They are starting to view the “stability” of their legacy applications as a liability rather than as an asset.

    The need for business agility translates to technical needs for improved scalability, resilience, and composability. At least some enterprises are recognizing that forklifting applications to the cloud does little to remove architectural limitations that impede these technical requirements. The cost of forklifting, however, is significant, in terms of time, resources, and risk. Enterprises are therefore considering infrastructure transformations to accompany application transformations. These organizations view Netflix and its ilk, not as something alien or distasteful, but as the poster children for successful 21st-century IT.

    It’s certainly true that, a hundred years from now, some application will be running on a mainframe in the basement of a corporate office somewhere. I think, though, that it’s dangerous to underestimate the accelerating rate of change, or enterprise IT’s growing openness to new ways of doing things, at all levels of the technology stack. It’s also dangerous to assume that the duality between legacy and cutting-edge IT, or between systems of engagement and systems of record duality is standing still.