False HTTP error
Today, performance monitoring involves monitoring the performance, reliability, and availability of every object on the website, including third party vendors, as any of these objects can cause performance problems to the page. Almost every web performance monitoring tool out there will alert on failures of such objects, including HTTP errors like 404 and 503s.
Recently we have noticed some 3rd party vendors that deliver HTTP failures – in the HTTP 500 for reasons other than their application is unavailable, without the consideration that their clients are monitoring their website – and such failures will trigger alarms in their monitoring systems. The problem with such “false” alerts is that that people get the alert, research the issue, discover that is nothing to worry about, and then snooze the issue for a very long time – and they will keep snoozing it. The 3rd party is becoming a “Boy Who Cried Wolf” and eventually when there are real problems on the site – they will be ignored by the website owners.
Take Foursquare.com for example, they have implemented Chartbeat (an analytics company) on their pages – and quite frequently Chartbeat delivers a 503 error when we monitor Foursquare’s site performance.
Definition of HTTP Error 503 is “Service unavailable”. It is meant to signal that the web application (or a key service of it) is not running. In the case of Chartbeat on Foursquare, the http header reads “Site over allowed capacity.”
I reached out to Chartbeat, and was told that the error is ok – their applications are working just fine, and the error was triggered because their client’s account is over the limit purchased and hence they issue a 503 for tracking requests for that client.
Third party content providers, widgets or analytics or ad companies, should be more careful on how they behave on their clients’ webpages – and not become the “Boy Who Cried Wolf”. Such false alarms cost companies time and money, and can easily be avoided.
Do not use HTTP error messages or TCP connection failures for business use cases, use them for what they are intended – Application Failures.
Mehdi – Catchpoint