The term "search engine" is often used generically to describe both crawler-based
search engines and human-powered directories. These two types of search engines
gather their listings in radically different ways.
Crawler-Based Search Engines
Crawler-based search engines, such
as Google, create their listings automatically. They "crawl" or "spider" the
web, then people search through what they have found. If
you change your web pages, crawler-based search engines eventually
find these changes, and that can affect how you are listed.
Page titles, body copy and other elements all play a role.
Human-Powered Directories
A human-powered directory, such as the Open Directory,
depends on humans for its listings. You submit a short description to the
directory for your entire site, or editors write one for sites they review.
A search looks for matches only in the descriptions submitted.Changing
your web pages has no effect on your listing. Things that are useful for
improving a listing with a search engine have nothing to do with improving
a listing in a directory. The only exception is that a good site, with good
content, might be more likely to get reviewed for free than a poor site.
Hybrid Search Engines Or Mixed Results
In the web's early days, it
used to be that a search engine either presented crawler-based
results or human-powered listings. Today, it extremely common
for both types of results to be presented. Usually, a hybrid
search engine will favor one type of listings over another.
For example, MSN Search is more likely
to present human-powered listings from LookSmart. However, it does also present
crawler-based results (as provided by Inktomi), especially for more obscure queries.
The Parts Of A Crawler-Based
Search Engine
Crawler-based search engines have three major elements. First is the spider,
also called the crawler. The spider visits a web page, reads it, and then
follows links to other pages within the site. This is what it means when
someone refers to a site being "spidered" or "crawled." The spider returns
to the site on a regular basis, such as every month or two, to look for changes.Everything
the spider finds goes into the second part of the search engine, the index.
The index, sometimes called the catalog, is like a giant book containing
a copy of every web page that the spider finds. If a web page changes, then
this book is updated with new information.Sometimes
it can take a while for new pages or changes that the spider finds to be
added to the index. Thus, a web page may have been "spidered" but not yet "indexed." Until
it is indexed -- added to the index -- it is not available to those searching
with the search engine.Search engine software is the
third part of a search engine. This is the program that sifts through the
millions of pages recorded in the index to find matches to a search and rank
them in order of what it believes is most relevant. You can learn more about
how search engine software ranks web pages on the aptly-named How
Search Engines Rank Web Pages page.
Major Search Engines: The Same, But Different
All crawler-based search engines have the basic parts described above, but
there are differences in how these parts are tuned. That is why the same
search on different search engines often produces different results. Some
of the significant differences between the major crawler-based search engines
are summarized on the Search
Engine Features Page. Information on this page has been drawn from the
help pages of each search engine, along with knowledge gained from articles,
reviews, books, independent research, tips from others and additional information
received directly from the various search engines.Now
let's look more about how crawler-based search engine rank the listings that
they gather.
|