I wanted to figure out how to make my websites more tailored to the browser they were being viewed in. I figured browser detection might work. So I set up bit of PHP to log user agent strings sent to my server for the http://adaburrows.com/test site. I then tweeted about it and watched the user agent strings roll in.
/* Code that opens a log file called agent.log */
$log = fopen('agent.log', 'a+');
/* Write agent string, IP address, current PHP file w/ full path, and full date */
fputcsv($log, Array($_SERVER['HTTP_USER_AGENT'], $_SERVER['REMOTE_ADDR'], __FILE__, date('r')));
/* close the file */
Here are some sample agent strings: (try and guess what browsers they are from!)
“Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:18.104.22.168) Gecko/20091102 Firefox/3.5.5 GTB6”
“Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:22.214.171.124) Gecko/20091102 Firefox/3.5.5 (.NET CLR 3.5.30729)”
“Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:126.96.36.199) Gecko/20091109 Ubuntu/9.10 (karmic) Firefox/3.5.5”
“AppEngine-Google; (+http://code.google.com/appengine; appid: mapthislink)”
“Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; WOW64; Trident/4.0; SLCC1; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30618)”
“Mozilla/5.0 (compatible; MSIE 6.0b; Windows NT 5.0) Gecko/2009011913 Firefox/3.0.6 TweetmemeBot”
“Mozilla/5.0 (X11; U; Linux i686; en-US) AppleWebKit/532.2 (KHTML, like Gecko) Chrome/188.8.131.52 Safari/532.2”
I learned a lot about different bots that crawl the internet, I also learned about how unreliable these strings are. I decided against using browser detection in any of my sites. Some friends on twitter were having a discussion about this and since I was looking into it anyway, I decided to write this and included info from their talk. They had some interesting resources that they brought up. One of which was a history overview of browser identifications. Another resource on the history is here. This gives some insight into why browser detection using the agent string is the most absurd way of grasping for what capabilities are available.
In order to actually use the user agent string, one need to have a database of all browser agent strings and all the capabilities that browser has. There are projects out there, and PHP actually supports doing this with if one has a browsercap.ini file, but this adds latency to one’s web site. According to some studies, any extra latency (even half a second) will reduce returning web traffic. It is also nowhere near being bullet proof. If the capabilities for a certain browser are absent from that file, then those browsers are discriminated against (a hall of shame is available here).
In order to use this method, all website must adhere to certain principles. These are all principles that websites should be following:
- Use semantic HTML markup
- Separate presentation from markup
- The site should degrade gracefully and still be usable (even in lynx, I use that text only browser sometimes!).