Wednesday, October 1, 2008

Learn about search engines

A search engine is an information retrieval system designed to help find information stored on a computer system. Search engines help to minimize the time required to find information and the amount of information which must be consulted, akin to other techniques for managing information overload.
While researchers and developers take a broader view of IR systems, consumers think of them more in terms of what they want the systems to do — namely search the Web, or an intranet, or a database.







Search engines match queries against an index that they create. The index consists of the words in each document, plus pointers to their locations within the documents. This is called an inverted file. A search engine or IR system comprises four essential modules:

* A document processor
* A query processor
* A search and matching function
* A ranking capability

While users focus on “search,” the search and matching function is only one of the four modules. Each of these four modules may cause the expected or unexpected results that consumers get when they use a search engine.

Read detailed abt this on - www.infotoday.com/searcher/may01/liddy.htm

The term “search engine” is often used generically to describe both crawler-based search engines and human-powered directories. These two types of search engines gather their listings in radically different ways.

Crawler-based search engines, such as Google, create their listings automatically. They “crawl” or “spider” the web, then people search through what they have found.

If you change your web pages, crawler-based search engines eventually find these changes, and that can affect how you are listed. Page titles, body copy and other elements all play a role.

A man-powered directory, such as the Open Directory, depends on humans for its listings.

You submit a short description to the directory for your entire site, or editors write one for sites they review. A search looks for matches only in the descriptions submitted.

Changing your web pages has no effect on your listing. Things that are useful for improving a listing with a search engine have nothing to do with improving a listing in a directory.

The only exception is that a good site, with good content, might be more likely to get reviewed for free than a poor site.

“Hybrid Search Engines” Or Mixed Results

In the web’s early days, it used to be that a search engine either presented crawler-based results or human-powered listings. Today, it extremely common for both types of results to be presented. Usually, a hybrid search engine will favor one type of listings over another.

For example, MSN Search is more likely to present human-powered listings from LookSmart.

However, it does also present crawler-based results (as provided by Inktomi), especially for more obscure queries.

The Parts Of A Crawler-Based Search Engine
Crawler-based search engines have three major elements. First is the spider, also called the crawler. The spider visits a web page, reads it, and then follows links to other pages within the site. This is what it means when someone refers to a site being “spidered” or “crawled.” The spider returns to the site on a regular basis, such as every month or two, to look for changes.

Everything the spider finds goes into the second part of the search engine, the index. The index, sometimes called the catalog, is like a giant book containing a copy of every web page that the spider finds. If a web page changes, then this book is updated with new information.

Sometimes it can take a while for new pages or changes that the spider finds to be added to the index. Thus, a web page may have been “spidered” but not yet “indexed.” Until it is indexed — added to the index — it is not available to those searching with the search engine.

Search engine software is the third part of a search engine. This is the program that sifts through the millions of pages recorded in the index to find matches to a search and rank them in order of what it believes is most relevant. You can learn more about how search engine software ranks web pages on the aptly-named How Search Engines Rank Web Pages page.

Major Search Engines: The Same, But Different
All crawler-based search engines have the basic parts described above, but there are differences in how these parts are tuned. That is why the same search on different search engines often produces different results. Some of the significant differences between the major crawler-based search engines are summarized on the Search Engine Features Page.

How Search Engines Rank Web Pages
its too big to post so check the link..
searchenginewatch.com/showPage.html?page=2167961

ok now i think its enough on intro part.. now posting links to refer..

Nice animated explaining How SEARCH ENGINE works..
http://www.learnthenet.com/ENGLISH/animate/search.html

Read what wikipidea says about SEARCH ENGINE and WEB SEARCH ENGINE..
http://en.wikipedia.org/wiki/Search_engine_%28computing%29

http://en.wikipedia.org/wiki/Web_search_engine

A very well posted difference between Google, Yahoo and Ask.. also explaining major technical related stuff associated with SEARCH ENGINE..
also look for Recommended things in the site while making or using a SEARCH ENGINE..
http://www.lib.berkeley.edu/TeachingLib/Guides/Internet/SearchEngines.html

A very detailed tutorial.. and specifically abt SPIDERS
http://www.webreference.com/content/search/how.html

http://www.monash.com/spidap4.html

A step by step understanding
1. Introduction to How Internet Search Engines Work
2. Looking at the Web
3. Building the Index
4. Building a Search
5. Future Search
6. Lots More Information
7. See all Internet Basics articles
http://computer.howstuffworks.com/search-engine.htm

That’s all.. I hope all of u can understand SEARCH option in better way now.. use it

The Search Engines ability to find corrections is with the help of a function called “levenshtein”.

The function can take two strings as parameter and calculates just the number of insert, replace and delete operations needed to transform one string to another.
soundex and metaphone are two other similar functions.

http://in.php.net/manual/en/function.levenshtein.php

1 comments:

Anonymous said...

Processors are complex...huh

Hi do check out my new forum TECH VYOM FORUM it is signature supported,so you can promote your blog while posting.