Wednesday 17 January 2018

The UI that broke Hawaii

Does anyone need reminding that design is more than pretty colours? Apparently they do. Here’s the web-app screen that sent an SMS to some 1 to 1.5 million Hawaiians, that a ballistic missile was headed their way.

Emergency SMS control screen

At least bad data design didn’t kill anyone this time *. I hope.

* This is awfully reminiscent of the powerpoint slide at NASA that should have, but didn’t, warn of the likelihood of the Space Shuttle Columbia disaster.

What’s wrong with that screen?

Let’s count the problems.

  1. It’s heavy with acronyms and jargon that make it hard to understand the links
  2. The items aren’t in any meaningful order
  3. High-safety critical items (Tsunami Warning) are mixed with convenience items (road closure notification) and tests
  4. Heavy use of capitals means the emphasis on DRILL does not stand out
  5. Inconsistent language – there are three test options, all indicated with different phrases:
    • “DRILL” (at the start)
    • “DEMO TEST” (at the end and)
    • “1. TEST Message” (the whole line)

This adds up to a screen with heavy cognitive load to perform a basically simple but safety-critical task. It is inviting an error, and it is a serious failure of the team that commissioned, accepted and manages the software, and the team that built it.

I hope lessons are learnt in the right place, and it’s not the operator who suffers.

How would I change it?

Since I’m carping, I should be clear what I would do differently here. I want to remedy a couple of those faults listed above:

  1. Ditch the acronyms and the jargon. “High Surf Warning North Shores” is perfect. PACOM should say “Incoming Missile Warning”.
  2. Order the items, in a way that makes sense to the operators. Alphabetical would be a good start.
  3. Make a crystal-clear design distinction between high-criticality links, low-criticality links and test actions.

Why haven’t I touched the issues of CAPITALS or of inconsistent language? I want to get the design fix right first (point 3):

  1. Place options for Test, Info and Emergency on different screens, or clearly marked sections on the same screen
  2. Make Test the easiest option to pick (least deliberate) and Emergency the hardest (most deliberate)

Get this right – create utter clarity between Incoming Missile Warning and Incoming Missile Warning Drill – and those other points shouldn’t matter nearly so much.

Excuses, excuses. This means YOU!

So you don’t work on safety-critical systems? Me neither. This still applies to both of us.

At one time in my career I’d say “But a user wouldn’t do that.” Or “A user shouldn’t do that.” Why would they? It’s stupid. It doesn’t make sense. Obviously it will break the system.

So here’s the heads-up. Sooner or later your users will,/b> do that. Why? Because they’re in a hurry. Because they’re overworked. Because their partner yelled at them this morning. Or just because they’re trying to do their job, the best they can, with a limited view of a complex system.

We the Dev team, are the ones with the full context. We’re the ones tasked with thinking through the workflows – the exceptions as well as the happy path. We’re the ones who need to make the right thing easy and the wrong thing damn near impossible.

And it’s everyone’s responsibility – Devs, Testers, Product Owners and Scrum Masters – whether or not we have a Designer on the team.

A case study

My last product was a lead generation tool for fund managers, including the custom CMS, managing a complex relational content model. We provided content editors with a delete button on content items. What about content items with dependencies?

3 options:

  1. Leave it – the content team is responsible for content integrity
  2. Remove the delete button if there are content dependencies
  3. Make the delete button do...something else

1. is the attitude I used to take. A content editor would daft to delete an Investor with a Mandate hanging off it. But you know it’s going to happen, the very first time they’re in a hurry to clean out an old record.

This is the attitude behind the Hawaii screen.

2. is more helpful. But it leaves users wondering why that delete button is missing. This way, bug reports lie!

We went for 3. The delete button is still there, but instead of deleting the item it opens a dialog with an explanation and a list of links to the dependencies that need to be fixed. It makes the wrong thing impossible, and the right thing as easy as possible.

Coda. A fix for Hawaii

In the wake of the incident, the relevant agency has issued a software update:

Emergency SMS control screen, showing False Alarm option

There it is at the top of the list, a BMD False Alarm option! Granted we’ve seen that this is necessary, but it only adds to the shortcomings listed above:

  1. More acronyms
  2. Still not in a meaningful order
  3. A whole SMS new category mixed up with the ones already there
  4. More capital letters

And a whole new problem. There’s no way to tell from this screen which SMS warnings the False Alarm applies to. Just the missile alert? Whatever was the last message sent? What does this link do if the last message was a Test? Or was sent three months ago?

Without fixing the underlying design failures, they’ve actually made this screen worse not better.

In anticipation of the next inevitable accident,
Guy

Thursday 4 January 2018

So your Product Owner doesn't like paying off Tech Debt?

No Product Owner likes paying off tech debt. It looks suspiciously like the Devs messing around with perfection when the product is already working. The team could be building me new features dammit!

Tech debt is a pretty abstract concept to people without a coding background. We want to communicate it in a way that explains the value to the PO, in terms that are meaningful to them. Here are two approaches – one that I've used before and that worked, and another that I mean to try next time.

Tried and tested – the car service

If you drive a car, you get it serviced every year. It's painful because (a) it's expensive and (b) your car's still running. Yes you could drive it to Birmingham next week without getting it serviced. And the week after. And the week after that. But it will keep getting a bit slower and a bit more expensive to run, until one day it stops. And it won't stop gently on a day that doesn't matter – it will stop hard on the motorway when you have to get to Birmingham in a hurry. Because that's when you're stressing it hardest.

Your codebase is just the same. Sure you can put off paying off tech debt, because it's still running. But dev work that should be easy will get slower and more expensive, until one day you can't go any further.

If your PO wants to keep driving, they've got to service the car. Otherwise expect it to come to a screeching halt just when it matters the most.

Next time – revenue protection

Product Management types understand two broad categories of project:

  • Revenue generation
  • Revenue protection

They prefer revenue generation projects. Everyone does – they're sexy and pay all our bills. But they understand the need for revenue protection as well.

Paying off tech debt is revenue protection for the workstream. Or maybe velocity protection. Without it, once again work will slow down until it can't go any further.

Can we avoid this in the first place?

Of course it's better if you can avoid having to commit time to paying off tech debt. In a steady-state business-as-usual workstream with frequent releases, ideally the team refactors the code as you go to avoid getting into this situation at all.

However sometimes you have to accrue tech debt – eg there's a cost-of-delay driving an MVP release. Or you'll discover it some time later. When that happens, you'll want to convince your PO to give it appropriate priority.

Guy