CNET has an article on name search. They mention Arabic, Thai, and transliterated Arabic name search — all problems we worked on at IIT (myself, MSL, and DH and me, respectively). Just makes me that much more impatient for my paper to get printed…
Interestingly, the LAS guy gets his only Soundex example wrong:
The limitations are many. "'Kanellos' is the same as 'Kiematteg,'" Hermansen said. "And if 'Kanellos' is ever spelled with a 'C,' you will never find it...It is a very crude device."
First of all, both ‘k’ and ‘c’ are classified as sibilants (encoded as ‘3′), so that’s not true. But beyond that, Soundex has been recognized to be a piece of crap for more than 20 years now — there’s been a ton of research into better solutions, leading to algorithms like edit distances and the derived Editex technique.
Still, they’re working with a ‘850 million name database’…respect :)
(Although…isn’t 850 million a pretty massive number? Almost like they were generating and pruning random combinations of characters…hmmm.)



0 Responses to “Name search in the news…”