Thursday, June 24, 2010

Design Fetish and Function Failure

The most devastating comment Don Norman makes about over-design in The Psychology of Everyday Things is that a design is so good "it must have won an award."  The meaning, in context, is that the product looks great but is unusable — it's broken as designed.

Apple and Steve Jobs make a fetish of design.  Their industrial design tradition stresses beauty and  function, especially for casual users and folks who just "want it to work" like my 80-year-old dad.  Kudos.

Apple's new iPhone 4 is a triumph of design over function.  Its slender glass and stainless-steel case has wowed reviewers and fans from the beginning. It's sleek and chic and oh-so-unique.

Apparently it doesn't work very well, too.  The new iPhone dropped communications during Apple's World-Wide Developer's Conference, frustrating even demo-god Mr. Jobs. The problem was blamed on too many wireless signals in the room.

Today (24 June) the phone is being shown to lose signal when merely held "incorrectly":



The problem appears to mystify Apple fans and computer geeks alike.  How did Apple let this one get by?


The Real Problem

My son showed me the above video.  Inspecting the lower-left hand corner of the iPhone 4's case reveals the problem.  When held left-handed, the ball of the thumb covers up a small gap in the stainless-steel band that girds the unit.  This band is Apple's touted integrated antenna.

I'm a software-and-systems guy, not an electrical engineer, but I messed around with radios as a kid.  To anyone who's played with radios, the problem is immediately obvious: touching the gap bridges the antenna.

Antennae work by having a hunk of wire stick one or more ends out.  The free end lets the signal jump from the wire into space, which is why radio antennae are depicted as at right.  Even the goofy tech in Star Wars gets it: recall the antennae that festoon Cloud City's underside in "The Empire Strikes Back".  Their fundamental design saves Luke from plummeting to his death.

Antennae have worked like this since they were first invented by Heinrich Herz in 1888.  On the new iPhone, the user's hand placement appears to couple what look like two antennae on the case, turning them into a closed circuit.  No antenna, no radio signal, and the stylish, sexy iPhone 4 becomes a smaller, less-capable iSlab.


Fixing the Design

If bought already, put a piece of Scotch Tape over the antenna gaps on the case. Wait for someone to make a silicon rubber band that covers the iPhone's antenna/edge. It'll probably retail for $17.95.

Test with people who hold the product in alternate hands.

Put an electrical engineer who has radio experience on the product design team. Steve should be sure his engineer can say "no", then he should listen to him.


References

Addendum

30 June 2010 - Apple reportedly is hiring antenna engineers.

2 July 2010 - Apple Admits iPhone 4 Signal Issue, Blames it on Incorrect Signal Display. But Will Software Fix It?

14 July 2010 - Wall Street Journal reports Apple's engineers knew of the problem a year ago, but Steve Jobs forced the design.

14 January 2011 - Computerworld says Apple has redesigned the antenna for Verizon's upcoming roll-out of iPhone service.

9 February 2011 - Geek.com reports the newly redesigned antenna on the iPhone 4 suffers from "death grip" too.


Sunday, June 20, 2010

"Abort, Retry, Ignore?" -- and Die

All engineering failures -- and disasters -- have a critical human element.  This observation applies across the spectrum: the Challenger shuttle explosion, the BP Macondo well blowout, the aircrash that wiped out Poland's government.

That element: someone in authority overrode the safety checks and ignored advice of those who knew the risks.  Launch directors pushed the launch schedule, BP execs told the drilling engineers to proceed, an air force general told the pilot to land, all despite warnings from rocket engineers, the rig lead, and the pilot and ground control.

Why does this happen?   Because we train people that it's okay to go ahead anyway or offer an option to continue despite a system interlock or warning.


On a microscale, consider system security measures for privacy and identity theft prevention.  How often have users ignored warnings like the one at right:

This warning appears when Outlook 2007 connects to an email server over an encrypted channel.  The purpose of the secure connection is to prevent a bad guy from stealing email identity (login/password) and ensure mail privacy.  But there's no clue about that in this warning, and the temptation is to blithely click through in order to read mail.

Here's another example.  In this case, the browser fails to validate Register.com's security certificate.  Multiple reasons could lead to this warning: the browser doesn't know about Register.com's certificate authority or the certificate is self-signed.

In both cases, it doesn't matter.  The user is given the option to "go ahead anyhow" and since most people deem reading email as urgent-but-low-risk, they click through to do the task that's uppermost in mind.

Similar messages appear for expired certificates.  Certificates must be renewed periodically, and many business, particularly smaller businesses, forget this administrative chore.  After all, the customers can still get in okay, right?

It makes no difference if the URL location bar turns "green" or the little security lock icon appears locked when security is validated.  The user has already clicked through regardless.  The security interlock has been ignored.

Once habituated to clicking through for email, ignoring warnings for banking, e-commerce, healthcare, government, and other security-required applications becomes second nature.  The "go-ahead, make my day" feature has trained users it's okay and nothing bad will happen.

Until something does.

The Real Problem

Several things are at work.  The total system (browser, application, website, certificate authority) offers a way to go ahead instead of enforcing the lockout and requiring the user to go to lengths to verify the security of the situation. Second, the reasons for the warning are obscure as it's assumed the user understands how certificates work. The onus is on the user to evaluate the risk without complete information or understanding the underlying causes.  Last, the user is likely under time pressure or other constraint to make a decision quickly and "just get on with it".

In short, the overall design permits a dangerous action by someone who is uninformed about the root problem and who lacks the training and patience needed to understand the matter and judge risks.

The built-in assumption is that value and convenience override all. The lesson drawn and reinforced: it's okay to ignore warnings because likely nothing will happen. So thus, a security blow-out.

Fixing the Design

This one is simple: make the security interlock positive and hard.  Never make it simple for someone to obviate.  Will it inconvenience people?  Yes.  Will they take the time to fix it by calling and complaining?  Maybe.  Will they leave the application, website, or abandon their task?  Likely, but that puts the onus on certificate owners and application and website managers to keep their stuff up to date and working.

Make applications and sites work securely from any URL in the domain.   Certificates are inexpensive and serve to advertise and secure the brand.  There's no excuse for Amazon.com (for example) to cheap-out on a cert for the domain amazon.com.   Even if this domain isn't the final target (www.amazon.com), buy a cert to avoid seeing this screen, then bounce the customer from the former to the latter.

Applications must offer an alternate path in the event of a lock-out.  The alternate path must inform the user what to do and whom to contact to correct the situation.  The information must be meaningful, helpful, and lead to a positive resolution of the problem, e.g., call this toll-free number to talk to our security division of customer service.  The goal is to correct the defect, not override it.

Last, inform the user about what's going on, what the risks are, and the "what to do next" alternate path. The generic security warning screen in Firefox 3.6 is much better than Internet Explorer 8 (above).  It cannot solve the problem with Amazon's website, but at least it explains why the problem occurred and offers a reasonable workaround, in this case, the URL of the correct site.

References
The first two references discuss the impact of and cite the same Carnegie Mellon University study, "Crying Wolf: An Empirical Study of SSL Warning Effectiveness".


Addendum

28 June 2010 - Sean Kerner at eSecurity Planet reports a Qualys study suggesting that of 92 million active domains, 23 million were running SSL. Of those, 22 million had invalid certificates. See "SSL Certificates in Use Today Aren't All Valid".