What to Look for in PHP 5.4.0

PHP 5.4.0 will arrive soon. The PHP team is working to bring to some very nice presents to PHP developers. In the previous release of 5.3.0 they had added a few language changes. This version is no different. Some of the other changes include the ability to use DTrace for watching PHP apps in BSD variants (there is a loadable kernel module for Linux folks). This release features many speed and security improvements along with some phasing out of older language features (Y2K compliance is no longer optional). The mysql extensions for ext/mysql, mysqli, and pdo now use the MySql native driver (mysqlnd). This release also improves support for multibyte strings.

Built-in Development Server

In the past, newcomers to PHP needed to set up a server. There was no built in server like a few other languages/web frameworks already had. If developing on *nix, a server needed to be set up with the right modules and the the files to tested needed to be copied over to the document root. Now, you can just run PHP with some options to get a server:

$ php -S localhost:1337

It runs in the current directory, using index.php or index.html as the default file to serve. A different document root can be specified as either an absolute or relative path:

$ php -S localhost:1337 -t /path/to/docroot

The server will log requests directly to the console. Interestingly, this server will not serve your static files unless your script return false. Existing frameworks will need to be modified to add in functionality that is commonly in rewrite rules. This is really all that is needed:

// If we're using the built-in server, route resources
if (php_sapi_name() == 'cli-server') {
  /*
   * If the request is for one of these image types, return false.
   * It will serve up the requested file.
   */
  if (preg_match('/\.(?:png|jpg|jpeg|gif)$/', $_SERVER["REQUEST_URI"]))
    return false;
}
// Process the rest of your script

One of the inconveniences of the server is lack of support for SSL. Granted, it is meant for development purposes only. However, some projects I’ve worked on required testing with SSL. Perhaps there will be demand for this once it’s out there.

An Overview of Traits

Traits are bits of code that other objects can use. Traits allow composing objects and they promote code reuse. The Self programming language, one of the precursors to the JavaScript language, introduced them. JavaScript, strangely, does not directly implement traits; instead it allows one to extend objects directly with other objects.

In PHP (and other languages), traits cannot be instantiated, only used to compose other objects. Traits do not imply inheritance, they just add methods to classes. They can be used with inheritance and interfaces. Traits could be used as standard implementations of interfaces, then one could easily compose classes that comply with certain interfaces.

Here is an example, demonstrating a simple use of traits:

<?php
/**
 * Define a trait called runner
 */
trait runner {
  // Does all the work
  public function run() {
    echo "Run, ".$this->name().", run!\n";
  }
  // Gets the name of the runner
  // Must be implemented by the class using the trait.
  abstract public function name();
}
/**
 * Define a class that uses the runner trait
 */
class runningPerson {
  // Use the runner trait
  use runner;
  // Used to store the person's name
  protected $name; // Constructor, assigns a name to the person
  public function __construct($name) {
    $this->name = $name;
  }
  // Retrieves the name of the person, required by runner trait
  public function name() {
    return $this->name;
  }
}
$gump = new runningPerson("Forrest");
$gump->run();

When a class implements a method that is also defined in a trait it uses, the method in the trait takes precedence and overrides the class method. If two traits implement the same method, the conflict needs to be resolved using the insteadof keyword, or given a new signature (only visibility and name can be changed) using the as keyword.

<?php
/**
 * Define traits with conflicting function names
 */
trait A {
  public function do_something(){
    echo "In A::do_something():\n";
    for($i = 0; $i < 10; $i++) {
      echo $i." ";
    }
    echo "\n";
  }
}
trait B {
  public function do_something(){
    echo "In B::do_something():\n";
    echo "Something else.\n";
  }
}
/**
 * Define a class using both traits
 * Prefers one over the other
 */
class ExampleA {
  use A, B {
    B::do_something insteadof A;
  }
}
/**
 * Define a class using both traits
 * Renames a function to use both
 */
class ExampleB {
  use A, B {
    A::do_something insteadof B;
    B::do_something as something_else;
  }
}

//Run examples
$testA = new ExampleA();
$testB = new ExampleB();
$testA->do_something();
$testB->do_something();
$testB->something_else();

More examples reside at the current PHP documentation for traits. If that documentation is lacking in substance, Wikipedia’s article on traits links plenty of background info.

Changes to Anonymous Functions

In PHP 5.3.x, working with anonymous functions needed a work around when stored in an array. The function needed to be stored in a temporary variable before it could be called. For instance:

$functions = array();
// assign an anonymous function to an array element
$functions['anonymous'] = function () {
  echo "Hello, the parser needs to make up a name for me...\n";
};
// to call it you had to do this:
$temp = $functions['anonymous'];
$temp();

Now, anonymous functions stored in an array can be called directly without first storing them in a temporary variable:

// assume $functions[] is still around.
$functions['anonymous']();

Closures in Classes

Closures defined inside of a class are automatically early bound to the $this variable. If a class method returns a closure, it retains access to the original class that defined it (along with all the public member properties and methods) no matter where it is passed. If assigned to a member property and called as a method, PHP issues a warning if it was called directly. If called as a local variable or by calling the Closure::__invoke() method (which it inherits), PHP issues no warning.

<?php
/**
 * Class that generates closures that reference a public method.
 */
class ClosureTest {
  private $value;

  public function setValue($value) {
    $this->value = $value;
  }

  public function getValue() {
    return $this->value;
  }

  public function getCallback() {
    return function() {
      return $this->getValue();
    };
  }
}

/**
 * Create a class that calls a closure.
 */
class ClosureCaller {
  private $callback;

  public function setCallback($callback) {
    $this->callback = $callback;
  }

  public function doSomething() {
    // Since this is a member variable, call Closure::__invoke().
    echo $this->callback->__invoke() . "\n";
  }
}

// Set up a class to generate closures that reference itself
$test = new ClosureTest();
$test->setValue(42);
$closure = $test->getCallback();
echo $closure() . "\n";

// Test calling a closure from another class
$testCaller = new ClosureCaller();
$testCaller->setCallback($test->getCallback());
$testCaller->doSomething();

Closures allow changing what object scope $this is bound to by calling the bindTo() method and passing in the new object to use as $this.

Currently, no consensus exists around letting closures bound to an object access the private and protected methods of that class. Additionally, PHP still needs to iron out the details around binding closures to static classes. One can find more details about closures, $this, and Closure::bindTo() at https://wiki.php.net/rfc/closures/object-extension

Outlook

There are a lot of established projects that may not immediately start taking advantage of these features, unless the community sees an obvious benefit to drastically changing their projects. When PHP 5.3.0 was released with namespace support and anonymous functions, new frameworks sprung up like Laravel (anonymous functions), FLOW3 (namespaces), Lithium (namespaces), and Symfony2 (namespaces). After PHP 5.4.0 is released, I’m sure new frameworks (or new versions, like Symfony2 vs. Symfony) will spring up around using traits to compose functionality and using the new $this functionality in closures defined in classes. The built in server definitely has some potential for making it easier for developers to debug their apps. It’s just a matter of time before frameworks start taking advantage of it.

Reference

PHP 5.4.0 Release Candidate 2 News

Installing Darwin 8.0.1 Using QEMU

When Darwin was first ported over to the x86 architecture, I thought I could get it running inside of a VM. I was stopped because QEMU was not quite ready for it. I tried it twice over a period of at least 4 years. I just tried it again and it worked. During the time since then, a patch was made to QEMU to make it work properly — which also got absorbed into releases after v0.10. Another big problem was figuring out the exact install process and that one could not really just boot it off the CD image and expect it to work. There is a specific process. I’m pretty altruistic, so I’ve documented it here.

First things first, grab the Darwin ISO image from Apple’s site. While you’re at it you can peruse the release notes. When it’s done, unzip it into a directory.

If you haven’t installed QEMU, please do so. To install Darwin you need to create a disk file to use with QEMU. For this, type: qemu-img create -f vmdk darwin-disk.vmdk 3G. The release notes recommend 3GB as a minimum hard disk size, but feel free to make it larger if you desire.

When you are ready to start installing Darwin, type: qemu -hda darwin-disk.vmdk -cdrom darwinx86-801.iso -boot d -m 1024. This will start QEMU and boot from the CD image. It will churn its cogs and then ask you which disk you want to install it on. If you’ve followed my directions, there will only be one disk available to choose from.

Next, it will ask you to partition the drive. Since this is a new disk, make it auto-partition. When it asks you for a volume name, it will fail. Start the machine over again. Choose to use the existing partitions, then continue with installation. When it asks for which partition to use, it should show you two options. Type in the second option. Now, wait for a really long time while it unpacks all of the files.

Once it’s done, it will ask you for a root password. Enter it and confirm it. Then it will ask if you want to create a user, start a shell or restart. Create a user, then restart. It should power the machine down.

When you want to run the disk image type: qemu -hda darwinx86-801.vmdk -cdrom darwinx86-801.iso -m 1024. This will start the machine from the disk image we just installed Darwin and the installation CD will be available to mount, too. The Network should be up and running. If that doesn’t work right, and it just sits there spinning, try re-installing it. I had to do that. The second install to the image proved flawless.

Interesting associated reading:

  1. Installing Darwin in a VM <http://althenia.net/wiki/darwin> — I found this after I published this. The instructions given for upgrading the compiler are interesting, also has more detailed steps to follow.
  2. Help with mounting disk images, especially .dmg files <http://www.puredarwin.org/developers/diskimages>

The State of Functional Programming in PHP 5.3.x

A few weeks ago, I started looking into the possibility of functional programming in PHP. I stumbled across the capability of defining anonymous functions — starting with PHP 5.3.0. These can be used to define closures with bound variables and also for partial application or currying. Higher order functions can also be defined.

All anonymous functions are implemented through the internal Closure class — which is strange since anonymous functions can be used to implement closures, but anonymous functions are not closures. It takes advantage of the __invoke() magic in PHP 5.3.x. This means that alternate implementations or libraries taking further advantage of anonymous functions could be created rather easily.

Anonymous Functions

It is very easy to define an anonymous function:


$variable = function ($arg) {
  return $arg+1;
};

This is very close to what JavaScript programmers typically call a closure, for reasons we’ll see shortly.

Note: In PHP 5.4.x (currently in alpha — development preview), an anonymous function defined inside of a class can use the $this variable to access class properties and methods.

Closures and Bound Variables

A closure is an anonymous function that encloses part of the external scope surrounding it. Here’s an example:


// Assign value to variable to be bound
$variable = 3;
// Create a closure using a variable outside of the scope of the anonymous function.
$closure = function ($arg) use($variable) {
  return $arg + $variable;
};

One must explicitly bind variables in the outside scope by using the keyword use. Binding variables in PHP is, by default, through early binding. This means that the values seen by the anonymous function are the values that were bound at the time the function was defined. One can implement late binding by passing in the variables to be bound by reference in the use parameters. For example:


// Define the greeting
$greeting = "Hello!\n";
// Define a function and bind a variable by reference (late binding)
$f = function() use(&$greeting) {
  echo $greeting;
};
// Run the closure
$f();
// Change the greeting
$greeting = "Hi!\n";
// Run the closure
$f();

This would output:


Hello!
Hi!

JavaScript uses late binding, where the value seen is whatever the bound variable’s value is at the time of the anonymous function’s execution. Variable scope overlaps one level of nesting — closures can access variables in the scope directly containing it. In PHP, I have nonesuch luck. This is why anonymous functions are commonly called closures in JavaScript, it is so common to use its implicit binding that most programmers don’t notice that they might not even be creating a real closure. (Since the variables in the containing scope are always available to the anonymous function (unless there are optimizations in the interpreter or compiler), it might technically still be a closure.)

Partial Application of Functions

If there is a general function that accepts many variables and one wants to create a function that fixes most of the variables, one can partially apply a function. This takes a function with a certain number of parameters and returns a function that accepts fewer parameters. For instance, I can take a saddle function (a 3-d function that looks like a saddle) and turn it into a 2-d parabola (which is actually the curve at y = 5):


$general_function = function($x, $y) {
  return $x*$x - $y*$y;
};
$partial_function = function($x) use($general_function) {
  return $general_function($x, 5);
};

Higher Order Functions

Higher order functions are functions that accept functions as their parameters. A common higher order function is the compose function. It accepts two functions and returns a function. All of the inputted functions must accept the same number of parameters and resulting function will accept the same number of parameters.


function compose(&$f, &$g) {
  // Return the composed function
  return function() use($f,$g) {
    // Get the arguments passed into the new function
    $x = func_get_args();
    // Call the function to be composed with the arguments
    // and pass the result into the first function.
    return $f(call_user_func_array($g, $x));
  };
}

Other common functions are map, filter, & fold (reduce in some cases).

The function map takes an array and a function, applies the function to each element in the array, returning the resulting array:


// Convenience wrapper for mapping
function map(&$data, &$f) {
  return array_map($f, $data);
}

The filter function takes an array and a function (boolean). It applies the function to each element in the array and appends it to the array if the function returns true.


// Convenience wrapper for filtering arrays
function filter(&$data, &$f) {
  return array_filter($data, $f);
}

The fold function aids in taking an array and turning it into a scalar (single) value:


// Convenience wrapper for reducing arrays
function fold(&$data, &$f) {
  return array_reduce($data, $f);
}

// sum over an array using the fold function
$sum = function ($values) {
  return fold($values, function($u, $v) {
    return $u += $v;
  });
};

Functions as Objects

Unfortunately, the class which is implemented internally for anonymous functions does not support instantiation. This has the side effect of not allowing the Closure class to be extended. This is not a huge deal, since the __invoke() magic could be used to create a class that can be used the same way as the internal class. For instance:


class Lambda {
private $anonymous;
  public function __construct($f) {
    $this->anonymous = $f;
  }
  public function __invoke() {
    $x = func_get_args();
    return call_user_func_array($this->anonymous, $x);
  }
}

$a = new Lambda(function($x) {
  return $x*$x;
});

$y = $a(5);
echo "$y\n";

If one implemented the ArrayAccess interface, one could create objects that as extensible as JavaScript objects.

Example Code

There are more examples for one to peruse over at my github account. Here are the links to the gists:

  1. Composing Functions in PHP
  2. Sketch of Making a Functional Style Controller (Please, don’t emulate this! This is terrible, I have a better way of doing it in the works.)
  3. Functional Programming in PHP

Documentation for gRaphael: g.line.js

Update: I found another library that is superior to gRaphael. It’s called Ico by Alex Young. You can find it here: http://alexyoung.github.com/ico/. It doesn’t support all of the graph types, but the options more thorough.

gRaphael provides an easy way of creating charts through JavaScript. Unfortunately, many people have complained about how poorly it is documented. In the interest of seeing more good looking charts on websites, I decided to go through the code and post documentation here. This first post is on the line chart library, but I will be posting more on the rest of the library in the future.

To start using the line chart, we need to include the Raphael and the line chart library:


<script src="/script/raphael.js" type="text/javascript" charset="utf-8"></script>
<script src="/script/g.raphael.js" type="text/javascript" charset="utf-8"></script>
<script src="/script/g.line.js" type="text/javascript" charset="utf-8"></script>

Then initialize the chart by passing in the options to the constructor:


var r = Raphael("element");
r.g.linechart(x_offset, y_offset, width, height, x_values_array, y_values_array, options);

x_values_array is either an array of x values, or an array of multiple arrays of x values.
y_values_array is the same as above, but for the y-axis.
If only one array of x values is specified, multiple y value arrays will share these values. If both are specified, each plot will be completely independent of each other.

options is an object with the following possible elements:

var options = {
  gutter: 10,
  symbol: "",
  colors: Raphael.fn.g.colors,
  nostroke: false,
  smooth: false;
  shade: false,
  dash: "",
  axis: "",
  axisxstep: 1,
  axisystep: 1
};

options.gutter is the default spacing between the edge of the chart area and the graph itself.

options.symbol is either a single symbol or an array of multiple symbols for each line. Just leaving it as a blank (“”) means that it will be a line graph without any decoration at the points. The list of allowable options is:

"" - no symbol
"o" - "disc"
"f" - "flower"
"d" - "diamond"
"s" - "square"
"t" - "triangle"
"*" - "star"
"x" - "cross"
"+" - "plus"
"->"; - "arrow"

options.colors must be an array, even if there is only one line.

options.nostroke controls the line drawing. Set to false (default), it will draw the line. If this is set to true and a symbol is chosen, a scatter type plot can be made.

options.smooth turns on smoothing. No jagged angles will be seen, just smooth curves.

options.shade controls shading the area below the line. No shading happens by default. Set this to true to create an area chart.

options.dash sets the ‘stroke-dasharray’ property on the line element. According to the SVG spec:

‘stroke-dasharray’ controls the pattern of dashes and gaps used to stroke paths. Contains a list of comma and/or white space separated integers that specify the lengths of alternating dashes and gaps. If an odd number of values is provided, then the list of values is repeated to yield an even number of values. Thus, stroke-dasharray: 5,3,2 is equivalent to stroke-dasharray: 5,3,2,5,3,2.

options.axis is a comma or space separated list of axes to show. Seems like it follows the order of css lists: top, right, bottom, left.

option.axisxstep determines how far apart the ticks and axis numbers are placed.

option.axisystep is the same as above, but for the y-axis.

Unfortunately, this is as far as the axis customization goes. I’ve been looking at the parent object and it looks like it wouldn’t be hard to customize the axes further. I’ll post more on this as I figure it out. If anyone wants to look at the g.axis() function and figure out how to add labels and customize the labels on the axis, please let me know what you find.

r.g.linechart() returns a chart object:

chart = {lines, shades, symbols, axis, columns, dots};

That chart object has the following default functions. All of these are overridable.

chart.hoverColumn = function (fin, fout) {
  !columns && createColumns();
  columns.mouseover(fin).mouseout(fout);
  return this;
};

chart.clickColumn = function (f) {
  !columns && createColumns();
  columns.click(f);
  return this;
};

chart.hrefColumn = function (cols) {
  var hrefs = that.raphael.is(arguments[0], "array") ? arguments[0] : arguments;
  if (!(arguments.length - 1) && typeof cols == "object") {
    for (var x in cols) {
      for (var i = 0, ii = columns.length; i < ii; i++) if (columns[i].axis == x) {
        columns[i].attr("href", cols[x]);
      }
    }
  }
  !columns && createColumns();
  for (i = 0, ii = hrefs.length; i < ii; i++) {
    columns[i] && columns[i].attr("href", hrefs[i]);
  }
  return this;
};

chart.hover = function (fin, fout) {
  !dots && createDots();
  dots.mouseover(fin).mouseout(fout);
  return this;
};

chart.click = function (f) {
  !dots && createDots();
  dots.click(f);
  return this;
};

chart.each = function (f) {
  createDots(f);
  return this;
};

chart.eachColumn = function (f) {
  createColumns(f);
  return this;
};

Duplicating VirtualBox Machines the Hard Way

VirtualBox is a great software package — and it provides an easier way to clone an virtual machine, but I didn’t know that. Already, I had manually copied a basic image of a Windows XP machine twice. I had three working copies, but I had to rename the files every time I wanted to start one up. I decided I was tired of this. I spent about two hours figuring this stuff out (with pacing time included). I advise following the link at the bottom of this post to see the proper way to do this.

I looked at the folowing files in my ~/.VirtualBox directory:
VirtualBox.xml
HardDisks/*.vdi
Machines/*.xml

I had the hard disks already in their directory. I copied the original machine’s .xml file twice so I had one for each hard drive. I looked inside each of those Machines/*.xml files and saw the place to change the name of the machine and the HardDisks/*.vdi file it used as the disk. Then I changed the names and the disks they used. Then I edited the VirtualBox.xml file to reflect the changes I had made.

I saved everything and started up VirtualBox. “Cannot create VirtualBox COM object with UUID: {00000000-0000-0000-0000-000000000000}; Object already exists,” it said. Ok, I just need to change the UUIDs in the files, I thought.

I opened up all the files again, grabbed a UUID generator, and changed the UUIDs. Saved everything. Started VirtualBox back up. There were no error messages, until I tried to start a copied machine that I had changed the UUID on. What was the reason? The UUID that I changed in the .xml files was embedded in the *.vdi files.

I powered my hex editor (bless, if you’re wondering. If you know of a better editor, let me know!). I looked for the UUID. I couldn’t find it. I looked for just the end, and I found that. I looked right next to it and found that the bytes for the first three byte groups were stored in reverse order (little-endian: least significant byte to most significant byte).

I found that the offset for the UUID was at 0x188. I wrote down my UUIDs I created and swapped the format around and entered it into the .vdi files at the above offset. Here’s an example:
UUDI: {23DF45A4-4378-2019-FAB7-ED4312DCC001}
Byte Group 1: 23 DF 45 A4
Byte Group 2: 43 78
Byte Group 3: 20 19
Byte Group 4: FA B7
Byte Group 5: ED 43 12 DC C0 01

The first three byte groups need to be re-ordered from big-endian to little endian:
Byte Group 1: 23 DF 45 A4 --> A4 45 DF 23
Byte Group 2: 43 78 --> 78 43
Byte Group 3: 20 19 --> 19 20

Then putting it together:
A4 45 DF 23 78 43 19 20 FA B7 ED 43 12 DC C0 01

Grab your hex editor a replace the 16 bytes at offset 0x188 with the bytes you computed above. Viola! It should work if you’ve done everything correctly.

As I’ve said before, all of this is completely unnecessary. I just didn’t realize there was an easy was to do this inside the program itself or via the command prompt.

A great blog post on how to use the commands is: http://srackham.wordpress.com/cloning-and-copying-virtualbox-virtual-machines/

Thoughts on Computing

I was just thinking about the browser-less concept that was thrown around earlier. People already are moving in that direction with iPhone apps and desktop twitter clients. Most of those drop the HTML aspect and either communicate with JSON, XML, YAML, or some other format. All tend to use RESTful APIs when dealing. Naturally a browser uses a RESTful API because the person who introduced the term also helped write the HTTP protocol.

Currently OAuth requires a browser of some sort. It’s easier to authorize web applications, because those requests are not out of band and all authorization can be completed in the browser. If the application has both a web and desktop counterpart that authorization can happen once and the web application and desktop app could technically use the same authorization. That has unfortunate side effect of opening up a security hole in OAuth.

Any future applications or operating systems will need to provide standard functionality for rendering different types of XML for those instances where one needs to just read data like XHTML, MathML, or MusicML.

Then there’s what Google is doing which takes any problems with OAuths out of band authorizations and tosses it out the window — everything is a web app.

I’ve downloaded the source for both Chromium and Chromium OS. Appearently, it’s based on a heavily tweaked Ubuntu system and it seems like it would compile on Ubuntu Jaunty Jackalope. I’ve had problems compiling both the browser and the OS in the new version of Ubuntu I’m running. They’ve also switched to a different build system from when I first built it (they’ve opted to use a traditional Unix makefile approach, as opposed to Google’s “hammer” tools). I’ll see if they’ve submitted a patch, other have had the same problem. I’ll make a USB disk image for you all to try if you want.

Chrome has support for O3D, in browser OpenGL 3D rendering support scripted with Javascript. You can click on 3D objects and interact with them. They have a version of 3D chess here. It’s the very last demo.

Personally, I’d like to have an option of developing browser side code in something other than Javascript. I’d like to see browser technology extended to be able to run Python, Ruby, Java, Scala, and perhaps MSIL/CIL (for all you .NET folks).

I think that moving to a standardized RESTful paradigm where all UI/view is marked up declaratively, controlled via a set of standard languages, and all models/data storage along with super heavy data processing is offloaded to the cloud is the future. I also wouldn’t mind a version of Ubiquitous Computing happening soon where everything that we have with computing power is networked ultra-efficiently into a universal computing mesh.

Just throwing thoughts out at you all. I’ll probably write a blog entry on it sometime soon.

Browser Detection: Why Feature Detection is Better

I wanted to figure out how to make my websites more tailored to the browser they were being viewed in. I figured browser detection might work. So I set up bit of PHP to log user agent strings sent to my server for the http://adaburrows.com/test site. I then tweeted about it and watched the user agent strings roll in.


<?php
/* Code that opens a log file called agent.log */
$log = fopen('agent.log', 'a+');
/* Write agent string, IP address, current PHP file w/ full path, and full date */
fputcsv($log, Array($_SERVER['HTTP_USER_AGENT'], $_SERVER['REMOTE_ADDR'], __FILE__, date('r')));
/* close the file */
fclose($log);
?>

Here are some sample agent strings: (try and guess what browsers they are from!)

“Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5 GTB6”
“Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5 (.NET CLR 3.5.30729)”
“Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.5) Gecko/20091109 Ubuntu/9.10 (karmic) Firefox/3.5.5”
“AppEngine-Google; (+http://code.google.com/appengine; appid: mapthislink)”
“Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; WOW64; Trident/4.0; SLCC1; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30618)”
PycURL/7.18.2
“Mozilla/5.0 (compatible; MSIE 6.0b; Windows NT 5.0) Gecko/2009011913 Firefox/3.0.6 TweetmemeBot”
bitlybot
“Mozilla/5.0 (X11; U; Linux i686; en-US) AppleWebKit/532.2 (KHTML, like Gecko) Chrome/4.0.222.5 Safari/532.2”

I learned a lot about different bots that crawl the internet, I also learned about how unreliable these strings are. I decided against using browser detection in any of my sites. Some friends on twitter were having a discussion about this and since I was looking into it anyway, I decided to write this and included info from their talk. They had some interesting resources that they brought up. One of which was a history overview of browser identifications. Another resource on the history is here. This gives some insight into why browser detection using the agent string is the most absurd way of grasping for what capabilities are available.

In order to actually use the user agent string, one need to have a database of all browser agent strings and all the capabilities that browser has. There are projects out there, and PHP actually supports doing this with if one has a browsercap.ini file, but this adds latency to one’s web site. According to some studies, any extra latency (even half a second) will reduce returning web traffic. It is also nowhere near being bullet proof. If the capabilities for a certain browser are absent from that file, then those browsers are discriminated against (a hall of shame is available here).

Feature Detection

The best method to actually use is called feature detection. It is client side only and uses Javascript, but it reinforces use of web standards. It lets one test if features are available in a browser and conditionally add extra features to your site based on the actual capabilities of the browser. It completely sidesteps the pitfalls of using the agent string and it is a part of best practices.

In order to use this method, all website must adhere to certain principles. These are all principles that websites should be following:

  • Use semantic HTML markup
  • Separate presentation from markup
  • Provide basic functionality for non-Javascript browsers as a fall back.
  • The site should degrade gracefully and still be usable (even in lynx, I use that text only browser sometimes!).

I’ll come back to writing more resources on this soon, but here’s the best article on that I’ve seen. This page at Mozilla has some serious code for feature detection.

A great project that showcases great use of feature detection is Modenizr (Paul Irish is their code lieutenant). Check it out and help make a typography renaissance on the web!