Hardin MD Notes, July 25, 2000

Eric Rumsey

There are several complicating factors in using search engines to search for AIDS. A primary one is the fact that the word is an acronym which is also a word (as in hearing aids, study aids, etc). A search engine that's able to distinguish upper-case will generally do a better job at finding upper-case acronym words like AIDS.

Another complicating factor is that the word AIDS happens to end in "s" -- Search engines that lump singular and plural forms of words together don't distinguish "AIDS" from such things as "first aid."

To compare how search engines do in finding AIDS, I did searches in several of them, and looked at the first 100 hits to see how many of them were on AIDS the disease, as opposed to other meanings of the word. I made several interesting discoveries about the techniques used by the search engines to distinguish ambiguous meanings, which are useful not only in searching for AIDS, but for other ambiguous subject terms as well.

Search engines that distinguish upper-case -- Search term is "AIDS"

Alta Vista - 1.7 million hits
InfoSeek - 218,000 hits
Of the two search engines that distinguish upper-case, Alta Vista stands out because of its much larger number of hits. For the record, although it's immaterial since these search engines are able to distinguish upper-case -- Alta Vista does exact word searching and InfoSeek searches singular/plural forms together.

Other search engines -- Search term is "aids"

The other search engines, although they don't distinguish upper-case terms, use other methods, with mixed results, to distinguish links that are on AIDS the disease.

A powerful technique for distinguishing hits on AIDS the disease, used by the two search engines below, is to give the user a choice of different contexts for the word being searched. For the serious searcher, this is a excellent feature, that makes it quite easy to find the desired meaning of an ambiguous term.

Northern Light - Searching for "aids" gives 5.8 million hits; since it lumps singular and plural words together, this list includes a relatively low proportion of hits on AIDS the disease. But the "custom search folders" feature gives the user the option to specify particular contexts, with the top folders being Financial aid, HIV/AIDS (Acquired Immune Deficiency Syndrome), Humanitarian aid, and First aid. The HIV/AIDS folder has 158,000 hits, with all of the top 100 being on AIDS the disease.

Direct Hit - As with Northern Light, a search for "aids" gives an undiffentiated list, with a low proportion of hits on AIDS the disease. But its "related search" on "aids disease" produces a list of relevant hits.

Several of the search engines are probably using link analysis techniques to help them to find links on AIDS the disease. Link analysis, as typified by Google, works by analyzing the number of links to a site, and the importance of the pages making links. Since link analysis was introduced by Google, other search engines have adopted similar methods, though without the fanfare of Google. As far as I know, the only major search engines that have announced that they are using link analysis to rank hits are Excite and Direct Hit. Preliminary findings of a survey I'm doing, however, indicate that other search engines are also experimenting with it.

Some of the search engines below use an exact word search, so that when the search term is "aids" they search only for that exact word. Other search engines put singular and plural forms together, so that a search for "aids" retrieves both the words "aids" and "aid."

Google - 1.0 million hits; all of the first 100 hits on AIDS the disease; exact word search
Even though Google doesn't distinguish upper-case, it does an excellent job of finding AIDS, the disease. What it lacks in sophisticated features, like upper-case recognition, it makes up for with its much-heralded link-analysis technique -- Even though it's searching for occurrences of the word "aids" in upper and lower case, it's still able to figure out that it's AIDS the disease that most people want, because it sees that the sites on AIDS are the ones that people make links to.

Webcrawler - 25,000 hits; 96 of the first 100 hits on AIDS the disease; exact word search
Apparently uses link-analysis to return a high proportion of hits on AIDS, but limited because of its small size.

Lycos - 1.9 million hits; 92 of the first 100 hits on AIDS the disease; exact word search
Does a relatively good job of returning a high proportion of hits on AIDS; the first 40 hits are on AIDS the disease.

Excite - Number of hits is not given, 92 of the first 100 hits on AIDS the disease
It's a bit tricky to determine if Excite is searching for the exact word or singular/plural forms because it doesn't give the number of hits for searches, but a search for "aid" (as opposed to "aids") indicates that it's searching singular/plural forms together -- The first 20-30 hits are on AIDS the disease, but after that hits are mostly on "aid," such as first aid, financial aid, etc. So although Excite doesn't distinguish singular/plural forms of a term, it uses link-analysis to rank the most popular links, the ones on AIDS the disease, at the top of the list.

Hotbot - 2.5 million hits; 90 of the first 100 hits on AIDS the disease; exact word search

FAST - 1.9 million hits; 85 of the first 100 hits on AIDS the disease ; exact word search
These both find non-disease occurrences of the term in the first 10 hits, an indication that they are not using a very sophisticated link analysis to rank hits.

AOL - 27,000 hits, 32 of the first 100 hits on AIDS the disease; singular/plural word search
"Low man on the totem pole" -- Has no redeeming features in searching for AIDS

A qualifier to this article: It should be obvious that searching for AIDS in general, without combining it with some other topic, is best done in a good directory, such as those contained on the Hardin MD AIDS page. Using search engines is the best way to search only when AIDS is being combined with some other, more specific topic.

