January 30, 2009

Effective Query Log Anonymization

Check out this very good Google tech talk about using query logs:

"User search query logs have proven to be very useful, but have vast potential for misuse. Several incidents have shown that simple removal of identifiers is insufficient to protect the identity of users. Publishing such inadequately anonymized data can cause severe breach of privacy. While significant effort has been expended on coming up with anonymity models and techniques for microdata/relational data, there is little corresponding work for query log data -- which is different in several important aspects. In this work, we take a first cut at tackling this problem. Our main contribution is to define effective anonymization models for query log data, along with techniques to achieve such anonymization. "

