Google and Web Community
Hardin MD Notes, March 28, 2003
Google's uncanny ability to cut right to the heart of a subject is well known, and much discussed in Web commentaries. PageRank technology, Google's patented system that acts like an "automated peer-review" system that ranks sites according to how many people make links to them, does remarkably well at putting the best links at the top of the search list. The reason this method works so well, I think, is that Google is in a real sense identifying Web communities.
In looking for sites in a subject that's new to you, what you really want to do is to find peoples' web pages who are interested in the same subject that you are who share a "community of interest" with you. For established, well-defined subjects, directory sites do a good job of gathering together such subject communities, putting groups of related links in an organized list. The larger and more well-established a subject community is, the more likely it will be covered in directory sites. But what about smaller subject communities? For more narrow searches, I'd like to suggest that the best way to identify subject communities is reputation-based search engines (RBSE), like Google.
Because Google is by far the largest and best-known of the RBSE's, much of the discussion in this article will center on Google. But the same things that apply to Google also apply to the other RBSE's, which will also be discussed in the article.
Google finds communities
Most any search engine works well for needle-in-a-haystack subject searches, with only a few links (almost silly to call them communities). But where Google really shines is in finding the middle-sized, emerging subject communities, ones that have more than just a handful of links, but not enough to be listed in standard directories.
Inasmuch as the idea of community is closely related to the idea of connections, Google is by its very nature a community-finding agent. Its PageRank search algorithm that's based on the pattern of how peoples' pages connect to each other enables it to find newly-emerging communities that have not even become conscious of themselves as communities yet.
One of Google's great strengths that makes it such a good identifier of communities is that it recognizes the close connection between net community and directory sites. People who make directories are constantly working to identify and bring together related sites, much like "shepherds of the web." This community-building activity is recognized by people around the web, who make many links to directories, and, accordingly Google ranks directory sites prominently, taking advantage of the community-building activity of directory makers.
The existence of a directory site for a subject acts as a good marker for its emergence as a new web community. The first "directory" for a new subject often starts out informally, as someone's list of bookmarks that gets put on the web. As people become aware of its usefulness, it may grow into a more established, regularly-maintained directory site. The final step in the emergence of a new subject community is its appearance in standard directory sites (Yahoo, Open Directory).
The emergence of web communities after 9/11 is a good example of how this process works. When the threat of bioterrorism, anthrax, and smallpox first arose in October 2001, we looked for directory sites to put on Hardin MD. Only a few of the standard directories had small pages, and there were a few informal independent directory sites. Within a matter of days, however, the bioterrorism subjects grew rapidly in size and prevalence on directories, showing how new web communities can arise quickly in response to crisis.
Other reputation-based search engines
Google doesn't use the word "community" in discussing their approach, but people at other search engines do. It was an article about the early RBSE Clever that first helped me see the connection between net community, reputation-based search engines, and directory sites. In this interesting article, Trawling the web for emerging cyber-communities, the developers of Clever discuss the identification of new communities with a SE, and the importance of directory sites in defining web communities. This article especially struck me because Ive been doing a major expansion of Hardin MD in the last 1-2 years, adding many new pages on relatively small subjects, and in doing this, I realized as I read the article, Ive been using Google much as the authors use Clever Searching for directories and identifying the movers and shakers of the new communities for the subjects being added, so I can send them announcements about our new pages.
The concept of community as being central to the workings of RBSE's is given its fullest statement by the developers of Teoma, a relatively new search engine that has become the featured search engine for Ask Jeeves. Teoma's developers use the term "identifying communities" as a technical term to describe how Teoma differs from other RBSE's. Search engine commentators, however, generally say that Teoma works much like Google. My own experience, admittedly limited since it's so new, also indicates that Teoma has the same sort of ability to "gather the community together" that Google does; in particular, it does an excellent job of finding directory sites, just like Google. So the important thing, I think, is not the technical definition of the word community, but the fact that the word is being used in describing what a RBSE does.
It's ironic, then, that the people who work on other RBSE's an old one, Clever, and a new one, Teoma talk in terms of community, but Google does not. In my view, much of the success of Google is based on the fact that it works so well with the concept of community, even though they don't call it that.
Hardin MD, web community, and Google
With Hardin MD, we've been working with the "net communities" of medicine and health for several years. From the early days of doing Hardin MD the unique features that characterize different medical specialty communities has been striking. This first became apparent simply from observing the look-and-feel of the sites in different disciplines From the flashing red and fire-engine graphics (sometimes with audibly blaring sirens!) of emergency medicine directories to the finely organized, neural-net-like directories of neurology. Over time, we've observed deeper differences, too, indicating varying patterns of information usage in different disciplines.
In the early pre-Google days of doing Hardin MD, we spent much time finding and identifying sites that are part of these communities. One of the techniques that we used in this was to look at many pages in each discipline, to see which other pages within the discipline were frequently present. If a site was frequently linked to from other sites, it must be good, we figured. And when Google came out in 1998 it worked the same way! Not surprisingly, maybe, with such a similar approach to finding the heart of subject communities, Google ranked many of our pages highly from the time it started.
Another "community-based" method we've used with Hardin MD that's certainly contributed to its getting high rankings in Google is that we've communicated with people in subject disciplines to build up links to our site. In the early days, we sought out online forums for each discipline, and sent postings to make them aware of discipline-specific pages on our site, in hopes that people in the discipline would make links to our pages. This worked well, and by the time Google came out in 1998, we had built a strong network of links to Hardin MD in several discipline communities, which certainly helped us to gain good rankings in Google.
A quote from Craig Silverstein, Google's Chief Technology Officer ties things together In discussing how to get good placement for a site in Google searches, Silverstein's advice to anyone doing page development is to work through "the community around which you're trying to build your page." This is certainly the approach we've taken in building Hardin MD, and it's worked well to help us get good rankings in Google. But beyond Hardin MD, the quote also shows that, even though the people at Google don't generally use the word "community" in talking about how Google, they are aware of the importance of the concept.
Hardin Library for the Health Sciences, University of Iowa
Please send comments to firstname.lastname@example.org
The URL for this page is http://hardinmd.lib.uiowa.edu