Juking The Stats
If you’re a fan of the HBO show The Wire[1], “juking the stats†would be a familiar concept. In the show, Baltimore city cops – under pressure from management to improve crime numbers – resort to short term tactics that get better numbers but don’t necessarily reduce crime. Reclassifying crimes to lower categories, increasing the arrest rate by arresting for minor offenses, under reporting crimes are all part of the play book. And as Pryzbylewski – a former cop who becomes a teacher – later finds out, the same story repeats itself in the city schools. Under pressure from the state, to improve standardized test scores teachers focus on teaching for the tests rather than actually educating their students.
Juking the stats is however, not just a great sound bite on a TV show. It is an all too real issue that plagues organizations – public and private sector alike[2][3]. Performance measurements introduce perverse incentives and it is human nature, when measured, to optimize for the metric against which they are being judged[4].
The world of software engineering is no stranger to this problem. Software engineering and its management is a complex beast and relative to other engineering disciplines is still in its infancy. We are still figuring out effective ways to track and measure performance. Most methods are far from perfect and suffer from unintended consequences.
In some agile organizations – especially those that are new to agile – measuring team performance by their sprint velocity has become common practice. Far too often, this leads to teams – under pressure to deliver the committed story points in that sprint – unintentionally cutting corners on critical aspects like quality and testing only to pay the price later[5].
Large engineering programs require teams to report status on a weekly basis, typically as red, yellow or green or some variation thereof. The stigma attached with reporting one’s status as red can lead to teams suppressing problems. Being honest about these issues ahead of time could have fixed those issues, but the pressure[6] to not report red, means these issues remain buried until it’s too late.
In less mature organizations, QA teams are sometimes incentivized by the number and priority of bugs that they open. This invariably leads to bug priority inflation and battles with the development teams. Low team morale is an inevitable side effect.
Then, there is the possibly apocryphal tale of IBM incentivizing programmers by lines of code only to result in programmers intentionally writing verbose code.
In all of these cases, you see teams when pressured by poorly designed incentives and metrics, lose sight of the long term goals and focus on the short term statistics – sometimes overtly, but usually inadvertently. Qualitative attributes like software quality, good design and resilience end up taking a back seat. Measuring and tracking performance is a good thing and is essential for continuous improvement. However, it’s just as important to be aware of the possibility that more often than not, unintended consequences may rear its ugly head. When it does, it is imperative that leaders react and be prepared to either fix the metric or dump them entirely.
“Don’t matter how many times you get burnt, you just keep doin’ the same.†– Bodie[7]
- If you’re not, you should be. Apart from having a great storyline and an excellent cast of characters, it is rich with lessons in economics, management and human behavior. ↩
- Crime Report Manipulation Is Common Among New York Police, Study Finds – NYT ↩
- Criticism for standardized testing as a measure for driving school funding as collected by Wikipedia ↩
- There are a number of examples of the unintended consequences of perverse incentives at play. One of the more interesting examples from Bill Bryson’s A Short History of Everything is the story of paleontologists paying the locals for each fossil fragment they turn in. The paleontologists later find that the locals were smashing larger bone fragments into smaller pieces to maximize gain and in the process rendering the fossils worthless. The authors of Freakonomics collect a few more examples in this NYT article. ↩
- Joshua Kerievsky makes an interesting case for doing away with story points. ↩
- albeit imagined ↩
- Epigraph from “Time after Time”, season 3, episode 1 of The Wire ↩