The Case of the Recurring Network Timeout
Over at the Ticketmaster tech blog, Audyn Espinoza and I co-authored a post talking about how our investment in monitoring tools paid off and helped us fix an irksome performance issue with our web services:
At Ticketmaster weâ€™re passionate about monitoring our production systems. As a result, we occasionally come across interesting issues affecting our services that would otherwise go unnoticed. Unfortunately, monitoring only indicates the symptoms of whatâ€™s wrong and not necessarily the cause. Digging in deeper and getting to the root cause is a whole different ball game. This is one such example.
This story starts out with the observation that one of our web service calls had an unusually high number of timeouts. It was particularly unusual because the web service in question typically responds in about 50 ms, and the client times out at 1s. To add to that, the metrics at the web service level was still reporting a 99th percentile response time in the 50ms range. The issue had to be in the network between the client and the service.
We took a closer look at the metrics on the client side and a pattern emerged that we had missed earlier:
Time chart of web service response times as observed at the client
For a given cluster, the timeouts were occurring every minute at the same second mark. For example, on cluster A, timeouts would occur at 5:02:27, 5:03:27, 5:04:27, and at 5:02:55, 5:03:55, 5:04:55 on another cluster. While perplexing and a great data point, we were still nowhere close to the root cause. It was time for tcpdump.
Podcasts And Happiness
According to the calculations of Frey and Stutzer, a person with a one-hour commute has to earn 40 percent more money to be as satisfied with life as someone who walks to the office. Another study, led by Daniel Kahneman and the economist Alan Krueger, surveyed nine hundred working women in Texas and found that commuting was, by far, the least pleasurable part of their day.
That should have made me downright miserable, considering that my new job has increased my commute by about 4000%. I went from walking across my street to driving about 40 minutes each way, in gnarly LA traffic no less.
Luckily, Iâ€™ve discovered podcasts – albeit a decade too late – to keep me sane. Iâ€™ve mostly stuck to Martin Fowlerâ€™s recommendations with the addition of Freakonomics radio. By far the most consistently interesting podcast Iâ€™ve come across is RadioLab. The episode Detective Stories with three stories around digging up the past is particularly captivating.
While I believe that podcasts have ensured that my happiness has not dropped or may even have increased, they’ve brought problems of their own. There are times when I realize that I’ll be reaching my destination before the end of an episode, resulting in me quixotically willing the traffic lights to turn red and hoping that my commute was a wee bit longer. File that one under first world problems.
Iâ€™ve tentatively settled on Googleâ€™s Listen app as my podcast delivery vehicle of choice on Android. Itâ€™s buggy and not supported any more by Google, but it syncs with my Google Reader account. I would happily pay for the BeyondPod app if I could figure out how to sync a specific folder within Google Reader.
Sidebar: While searching for the study linking happiness to commute lengths, the first result was Jonah Lehrerâ€™s post (linked above). Turns out that Jonah is also the contributor to RadioLab and contributed to my other favorite episode which randomly enough happens to about the pervasiveness of randomness in our lives.
 Stress That Doesn’t Pay: The Commuting Paradox by Stutzer and Frey
 Developments in the Measurement of Subjective Well-Being by Kahneman and Krueger