First, the Policy
Originally, Google, Yahoo and Microsoft wanted to keep personally identifiable data for long periods of time because it was thought that it would somehow be useful for refining the efficacy of queries. In more recent times, however, the theory is that after six months the data becomes useless in this regard.
Accordingly, in April 2008 a body of the European Union Directive called the Working Party asked search engines doing business in their region to anonymize the personally identifiable data logs they’ve collected on their users after a six-month period.
Microsoft’s Response: Sure!
Recently, Microsoft agreed to the request and will reportedly be altering the policy of its search engine Bing. Though it may take as long as 18 months to implement (we assume because that’s the amount of time Bing typically stores data), the company says it will anonymize its data logs after only six months of retention.
Google’s Response: Um, No Thanks
Google, which typically holds onto its logs for a whopping 18 to 24 months, told the Working Party that nine months was the lowest they’d go.
Incidentally, researchers also discovered a lost-in-translation type of situation, as Google’s version of "anonymize" did not mean "delete." That is, for now Google reportedly plans to remove just the rightmost eight bits of the IPv4 address while Microsoft plans to delete all 24.
The significance in the difference between Google and Microsoft's decisions lies in an observation made by Christopher Soghoian, a student fellow at Harvard's Berkman Center for the Internet and Society: Google (or anyone else) could potentially match the IP addresses from currently retained cookie data, with the cookies applying to records with the partly-deleted IP addresses, to simply re-create the deleted 8 bits.
An Ongoing Battle - is it Worth it?
In a world that is becoming less and less private, we wonder if this particular war is one worth fighting. After stooping to a nine-month retention period, Google's argument behind keeping data was explained by Peter Fleisher, Google's global privacy counselor, like this:
While we're glad that this will bring some additional improvement in privacy, we're also concerned about the potential loss of security, quality, and innovation that may result from having less data. As the period prior to anonymization gets shorter, the added privacy benefits are less significant and the utility lost from the data grows.
In fact, Google's been struggling against data loss for some time now. Here is a public argument they had with the Working Party back in 2007 over the context of "personally identifiable data." Poor G, it looks like they're just trying to find a balance.
What do you think? Does the thought of having your IP address recreated make you nervy, or are we slaves to a public world? Moreover, does Microsoft's new compliance policy mean they'll have less opportunities to innovate like Fleisher suspects?
Talk to us.