Over at Infoworld, Jon Udell has continued his string of posts on database translucency. This time he takes on a (self-admittedly) stretched example of hiding texts while still being able to search for matches indicating plagiarism (or sloppy quoting), in a database of term papers and essays. He suggests that a cryptographic hash function over sentences in texts might do the job.
So it could. But here I'm going to deliver a one line sermon that those who've worked for me in R&D mode have probably heard all too often: Do Your Prior Art! Seems the notion of using hashes to represent texts for searching was invented over 35 years ago. It's a specific application of an algorithm called Bloom filters. And, as the wikipedia article notes: