Please note: This is only intended to be an overview. Though more information and details are available they are not considered relevant to readers of this post.
What is a Search Engine?
These are software systems designed to search both the World Wide Web (www) and their own databases for information. How they actual work is dealt with further down.
The www as it is known today commenced in December 1990. However, there were prior search engines dating from 1982 and 1989.
Initially (before September 1993) it was indexed by hand but as more and more servers were added manual indexing became unmanageable. At first the system primarily searched for users but was further developed to include file names. It still did not search the content of files and sites but at the time it was relatively easy for these to be searched manually once accessed.
In late 1993 a software system was developed that could collect slightly more detailed information.
In 1994 the software was further developed to allow users to search for any word. This became the standard for most search engines.
How Search Engines Work
There are three stages: First: the ‘Web Crawler’ (sometimes known as a ‘spider’). Second: Indexing; Third: Searching.
The Web Crawler (spider) browses the www for the primary purpose of indexing. It is able to copy all pages visited and save them for later processing. It then indexes the pages. One aspect that can be an issue for some; the crawler can visit and index sites without specific permission. Some public sites that do not wish to be ‘crawled’ may add coding to limit what may be indexed or to prevent indexing all together. Presumably individuals may also add similar coding but in most cases it would probably be counterproductive to their aims.
Basically, the crawler/spider visits URLs (Uniform Resource Locators), identifies hyperlinks in pages and adds them both to the search engine’s list of URLs. The search engine then uses ‘keywords’ to search the index to find and display information relevant to the keyword(s) input by the user.
In addition to the keyword(s) entered search engines also have their own systems for refining search results. These filter through the available information to see if there are any other sites or pages that may be relevant to the user’s inquiry.
With the multitude of websites now in existence it is probable the keyword(s) searched will be included in millions of pages. Search engines have developed their own methods, usually an algorithm, for determining the most relevant results. Naturally, the keyword(s) entered are first taken into account and then it is most likely a combination of how popular and/or authoritative a site or page is considered to be.
Algorithms may be influenced by the legislation, politics, economics and social understandings and acceptances of a territory. For example: In some countries it is illegal to display sites containing specific types of information. Or it may be commercial: A company that advertises with the search engine may show up as more popular than other results from within the index.
In principle search engines search their own database of information.
Web crawlers/spiders ‘crawl’ webpages and follow site links to other pages. These are then added to their index of information. Once ‘crawled’ these spiders frequently revisit sites/pages to index any updates or changes.
It should be noted that spiders can only follow links from one page to another, from one site to another. Consequently, ‘inbound’ links to a site are very important as are links from other sites as these provide additional information for the search engine to index and subsequently display.
Search engines are an essential part of modern day, technologically governed, society. Anyone with a website, blog or any on-line presence needs to be aware of their existence and usefulness. They also need to be aware of how less private these make life. Wisdom needs to be applied to anything shared on-line. Society is more vulnerable but also benefits from, and is the richer for, the Web and the ease with which search engines enable people to access information.