Friday 9 March 2012

What is search engine


A Web search engine is a tool designed to search for information on the World Wide Web. Information may consist of web pages, images, information and other types of files.



What is spider


A program that automatically fetches Web pages. Spiders are used to feed pages to search engines. It's called a spider because it crawls over the Web. Another term for these programs is webcrawler. Because most Web pages
contain links to other pages, a spider can start almost anywhere. As soon as it sees a link to another page, it goes off and fetches it. Large search engines , like Alta Vista, have many spiders working in parallel.


How Web Search Engines Work


Crawler-based search engines are those that use automated software agents (called crawlers) that visit a Web site, read the information on the actual site, read the site's meta tags and also follow the links that the site connects to performing indexing on all linked Web sites as well. The crawler returns all that information back to a central depository, where the data is indexed. The crawler will periodically return to the sites to check for any information that has changed. The frequency with which this happens is determined by the administrators of the search engine.
 
Human-powered search engines rely on humans to submit information that is subsequently indexed and catalogued. Only information that is submitted is put into the index. In both cases, when you query a search engine to locate information, you're actually searching through the index that the search engine has created —you are not actually searching the Web. These indices are giant databases of information that is collected and stored and subsequently searched. This explains why sometimes a search on a commercial search engine, such as Yahoo! or Google, will return results that are, in fact, dead links. Since the search results are based on the index, if the index hasn't been updated since a Web page became invalid the search engine treats the page as still an active link even though it no longer is. It will remain that way until the index is updated.

Major search engines

Google
Yahoo
MSN/Bimg

Robots

Google:Googlebot
MSN / Bing: MSNBOT/0.1
Yahoo:  Yahoo! Slurp

 

Robot.txt file

Robot.txt is a file that gives instructions to all search engine spiders to index or follow certain page or pages of a website. This file is normally use to disallow the spiders of a search engines from indexing unfinished page of a website during it's development phase. Many webmasters also use this file to avoid spamming. The creation and uses of Robot.txt file are listed below:

Robot.txt Creation:

To all robots out
User-agent: *
Disallow: /

To prevent pages from all crawlers
User-agent: *
Disallow: /page name/

To prevent pages from specific crawler
User-agent: GoogleBot
Disallow: /page name/

To prevent images from specific crawler
User-agent: Googlebot-Image
Disallow: /

To allows all robots
User-agent: *
Disallow:

Finally, some crawlers now support an additional field called "Allow:", most notably, Google.

To disallow all crawlers from your site EXCEPT Google:
User-agent: *
Disallow: /
User-agent: Googlebot
Allow: /


"Robots" Meta Tag

If you want a page indexed but do not want any of the links on the page to be followed, you can use the following instead:
< meta name="robots" content="index,nofollow"/>

If you don't want a page indexed but want all links on the page to be followed, you can use the following instead:
< meta name="robots" content="noindex,follow"/>

If you want a page indexed and all the links on the page to be followed, you can use the following instead:
< meta name="robots" content="index,follow"/>

If you don't want a page indexed and followed, you can use the following instead:
< meta name="robots" content="noindex,nofollow"/>

Invite robots to follow all pages
< meta name="robots" content="all"/>

Stop robots to follow all pages
< meta name="robots" content="none"/>

Robots.txt Vs Robots Meta Tag


Robots.txt
While Google won't crawl or index the content of pages blocked by robots.txt, we may still index the URLs if we find them on other pages on the web. As a result, the URL of the page and, potentially, other publicly available information such as anchor text in links to the site, or the title from the Open Directory Project (www.dmoz.org), can appear in Google search results.

In order to use a robots.txt file, you'll need to have access to the root of your domain (if you're not sure, check with your web hoster). If you don't have access to the root of a domain, you can restrict access using the robots meta tag.

Robots Meta Tag
To entirely prevent a page's contents from being listed in the Google web index even if other sites link to it, use a noindex meta tag. As long as Googlebot fetches the page, it will see the noindex meta tag and prevent that page from showing up in the web index.

When we see the noindex meta tag on a page, Google will completely drop the page from our search results, even if other pages link to it. Other search engines, however, may interpret this directive differently. As a result, a link to the page can still appear in their search results.

Note that because we have to crawl your page in order to see the noindex meta tag, there's a small chance that Googlebot won't see and respect the noindex meta tag. If your page is still appearing in results, it's probably because we haven't crawled your site since you added the tag. (Also, if you've used your robots.txt file to block this page, we won't be able to see the tag either.)

If the content is currently in our index, we will remove it after the next time we crawl it. To expedite removal, use the URL removal request tool in Google Webmaster Tools.

Validate Your Code


There are several ways to validate the accuracy of your website's source code. The four most important, in my opinion, are validating your search engine optimization, HTML, CSS and insuring that you have no broken links or images.

Start by analyzing broken links. One of the W3C's top SEO tips would be for you to use their tool to validate links. If you have a lot of links on your website, this could take awhile.
Next, revisit the W3C to analyze HTML and CSS. Here is a link to the W3C's HTML Validation Tool and to their CSS Validation Tool.

The final step in the last of my Top  SEO Tips is to validate your search engine optimization. Without having to purchase software, the best online tool I've used is ScrubTheWeb's Analyze Your HTML tool. STW has built an extremely extensive online application that you'll wonder how you've lived with out.
One of my favorite features of STW's SEO Tool is their attempt to mimic a search engine. In other words, the results of the analysis will show you (theoretically) how search engine spiders may see the website.

Install a sitemap.xml for Google


Though you may feel like it is impossible to get listed high in Google's search engine result page, believe it or not that isn't Google's intention. They simply want to insure that their viewers get the most relevant results possible. In fact, they've even created a program just for webmasters to help insure that your pages get cached in their index as quickly as possible. They call the program Google Sitemaps. In this tool, you'll also find a great new linking tool to help discover who is linking to your website.

For Google, these two pieces in the top SEO tips would be to read the tutorial entitled How Do I Create a Sitemap File and to create your own. To view the one on this page, website simply right-click this SEO Tips Sitemap.xml file and save it to your desktop. Open the file with a text editor such as Notepad.
Effective 11/06, Google, Yahoo!, and MSN will be using one standard for sitemaps. Below is a snippet of the standard code as listed at Sitemaps.org. Optional fields are lastmod, changefreq, and priority.


<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
   <url>
      <loc>http://www.example.com/</loc>
      <lastmod>2005-01-01</lastmod>
      <changefreq>monthly</changefreq>
      <priority>0.8</priority>
   </url>
</urlset> 

The equivilant to the sitemap.xml file is the urllist.txt for Yahoo!. Technically you can call the file whatever you want, but all it really contains is a list of every page on your website. Here's a screenshot of my urllist.txt:

Include a robots.txt File


By far the easiest top SEO tips you will ever do as it relates to search engine optimization is include a robots.txt file at the root of your website. Open up a text editor, such as Notepad and type "User-agent: *". Then save the file as robots.txt and upload it to your root directory on your domain. This one command will tell any spider that hits your website to "please feel free to crawl every page of my website".

Here's one of my best top SEO tips: Because the search engine analyzes everything it indexes to determine what your website is all about, it might be a good idea to block folders and files that have nothing to do with the content we want to be analyzed. You can disallow unrelated files to be read by adding "Disallow: /folder_name/" or "Disallow: /filename.html". Here is an example of the robots.txt file on this site:

Nomenclatures


Whenever possible, you should save your images, media, and web pages with the keywords in the file names. For example, if your keyword phrase is "golf putters" you'll want to save the images used on that page as golf-putters-01.jpg or golf_putters_01.jpg (either will work). It's not confirmed, but many SEO's have experienced improvement in ranking by renaming images and media.


More important is your web page's filename, since many search engines now allow users to query using "inurl:" searches. Your filename for the golf putters page could be golf-putters.html or golf_putters.html. Anytime there is an opportunity to display or present content, do your best to insure the content has the keywords in the filename (as well as a Title or ALT attribute).

Use Title and ALT Attributes


More often then not, web addresses (URL's) do not contain the topic of the page. For example, the URL www.myspace.com says nothing about being a place to make friends. Where a site like www.placetomakefriends.com would tell Google right away that the site being pointed to is about making friends. So to be more specific about where we are pointing to in our links we add a title attribute and include our keywords.
Using the Title Attribute is an direct method of telling the search engines about the relevance of the link. It's also a W3C standard for making your page accessible to disabled people. In other words, blind folks can navigate through your website using a special browser that reads Title and ALT attributes. The syntax is:
.
<a href="http://www.seotips.com/seo_software.htm" title="SEO Software">SEO Software</a>

The ALT Attribute is used for the same reasons as the Title Attribute, but is specifically for describing an image to the search engine and to the visually disabled. Here's how you would use ALT in an IMG tag:

<img src="http://top10seotips.com/img/image01.jpg" alt=" SEO Tips">

Use Headings


In college and some high schools, essays are written using a standard guideline created by the Modern Language Association (MLA). These guidelines included how to write you cover page, title, paragraphs, how to cite references, etc. On the Web, we follow the W3C's guidelines as well as commonly accepted "best practices" for organizing a web page.
Headings play an important role in organizing information, so be sure to include at least H1-H3 when assembling your page. Using cascading style Sheets (CSS), I was able to make my h1 at the top of this page more appealing. Here's a piece of code you can pop into your heading:

<style type="text/css">

      h1 font-size: 18px;       h2 font-size: 16px;       h3 font-size: 14px;

</style>

Since a page full of headings would look just plain silly, my SEO tip would be to fill in the blank space with paragraphs, ordered and unordered lists, images, and other content. Try to get at least 400+ words on each page.

Optimize Your META Tags

The meta description tag or element does not appear anywhere on a web page, so why bother making it part of your on-page optimization strategy?  Because optimizing it can drastically influence the amount of organic traffic you get from search engines once your URL begins to rank well.

 
META tags are hidden code read only by search engine webcrawlers (also called spiders). They live within the HEAD section of a web page. There are actually 2 very important META tags you need to worry about: description and keywords. Meta tags summarize what the site is about, and despite some SEO controversy, they still play an instrumental role in meta-based search engines. The META tags you need to be the most concerned about are:
  1. Description
  2. Keywords
Sequencing of these tags may be extremely important. I say "may" because SEO is mostly hypothesis due to the changing algorithms of the search engines. Even though the W3C states that tag attributes do not have to be in any particular sequence, I've noticed a significant difference when I have the tags and attributes in the order described here. The only deviation from the list above is that the Title tag should come before the META description.

The description META tag is the text that will be displayed under your title on the results page. See the OC Internet Advertising example above. There's also a lot of controversy about the number of characters you should have in this tag. I've seen sites with a paragraph in their description listed in the top results, so I don't think the number of characters here plays any kind of role with the search engines.
However, if you want the listing to look clear and to the point, META tag would be to keep it under 150 characters and to not repeat your keywords more than 3 times. It may be a coincidence, but I've also noticed ranking improvements when I put my keywords at the beginning of the description. Here's the syntax:

<meta name="description" content="your_keywords_here followed by a statement about your product service or organization." />


The last important META tag is the keywords META tag, which some time ago lost a lot of points in Google's search engine algorithm. this tag is still important to many other search engines and should not be ignored. Based on my experience with this tag, you can have approximately 800 characters in this tag (including spaces).
 if you repeat your keywords more than 3 times it can be a pretty good indication to the search engine that you are trying to spam their search results. Also, don't waste your time including keywords that aren't used in the BODY section of your website, that could be seen as another spam technique. Here's the syntax used on this 

<meta name="keywords" content="top 10 seo tips, what is seo, resources, seo software, seo ebook, search engine optimization" />

<meta name=”description” content=”This is a sample meta description value.”>
or
<meta name=”description” content=”This is a sample meta description value.” />
depending on whether the document is HTML or XHTML, respectively.

Keyword density placement & Research

Keyword density :- No SEO consultants will tell you the correct keyword density for a keyword. what is the ideal keyword density for your targeted keywords. The reason is because keyword density is tricky, it does not mean that having a high keyword density will guarantee you a top ranking but it will guarantee you a low position in the SERP if you have an extremely low keyword density. The following techniques show how can you achieve an ideal keyword density for your targeted keyword(s).


Places for keywords are important so that it can give the Search Engine an idea of what content are you stressing in your page.We cannot deny that your domain name is the most obvious place for any keyword(s) you are targeting. Choosing an appropriate name for your domain is the first step you should do when deciding on a website. The following explains the other useful areas in addition to the domain names that you should consider when developing your website.
  

Keyword Research :-You have come up with a great idea for your next web project (or your first web project) and are getting ready to build your site.
Most people go out and build a website, then when after a few months of not getting any site visitors they either give up or think about trying to Optimize their site for the Search Engines.  If you do not plan for SEO to be part of the site-building process, you will probably spend a lot of time going back and almost re-building your entire site later on.
Part of your planning should be to do some research on keywords for your site and pages.  Your selected keywords form the basis for the SEO of your website.
It is getting harder and harder to get a new website to the top of the Search Engines, so your choice of keywords and knowing a bit about them is important to your success.  Search Engines index words (and phrases).  You want the words you choose to be words that are searched for and not too competitive.

Optimize Your Title & Optimizing Your Website Title


The Title and META tags should be different on every page of your website if you wish for most search engines to store and list them in the search results. Us SEO Expert's have experimented with these two pieces of code to help us reach an accepted conclusion about how best to use them. Don't click off this site until you've read the top 10 SEO tips below to see what I've discovered works best for search engine optimization.

 
There are different theories about how long your Title should be. Since Google only displays the first 66 or so characters (with spaces), my Top 10 SEO Tips for the title would be to keep it under 66 characters and relevant to the content on the page. However, some may argue that the value of the homepage title may warrant additional search term inclusion.
Bar none the most important of the top 10 SEO tips involves your keywords. If you wish to be on the first page of the search results, you must include your keywords in your Title tag. Preferably before all other words in the Title. No need to repeat your keywords in the Title, that's interpreted as spam by the search engines. Here is an example of good Title:




This one excercise could make or break your SEO campaign. Click-Through Rate (CTR) plays an instrumental role in how relevant Google thinks your website is. By compelling users to click with clear call-to-actions (buy, order, download, beat, fix, etc) and by using value propositions (guaranteed, on sale now, etc), one can improve their CTR and search engine ranking. Oh, don't forget to squeeze your keywords in there as well.

Discover Your Competitors


It's a fact and that search engines analyze incoming links to your website as part of their ranking criteria. Knowing how many incoming links your competitors have will give you a fantastic edge. Of course, you still have to discover your competitors before you can analyze them.
My tool of choice is SEO Elite, which digs through the major search engines by keyword to not only tell you who your competitors are, but also provides you with an in-depth analysis of each competitor. The analysis includes these extremely important linking criteria (super SEO tips), such as:
  • Competitor rank in the Search Engines
  • Number of incoming links
  • What keywords are in the title of linking page
  • % of links containing keywords in the link text
  • The PageRank of linking pages
  • The Alexa traffic ranking information
Here is a screenshot of their SEO software that shows the search results and the module that has the email functionality.


 
Stats, such as the above, play a critical part in determining what tools your website will need to compete in the Internet marketing competition. SEO Elite also offers you the ability to see who the website owner is and even send emails to all websites discovered to have quality link potential.

Find the Best Keywords


It would be a waste of your time to optimize your website for keywords that are not even being searched for. Therefore you should invest some energy into finding the best keywords. There are several SEO tools available on the Internet to help you find the best keywords. Don't be deceived by organizations that require you to register first. The two most popular resources are WordTracker and KeywordDiscovery.com. 
   

Below is a screenshot from WT that shows the results you'll get when doing a query for "putter". Notice that "golf putters" has the highest search volume with 100 searches in the last 24 hours, yet there are over 100,000 websites to compete against. Using the tool's Keyword Effectiveness Index (KEI), you'll be able to see that "custom putter" would have a better chance at higher ranking, since there are only 2,640 competing. 



When using any SEO tool for doing keyword research, start by keeping your searches ambiguous like we did in the example above for "putters". The results will always return suggestions, sometimes surprising ones that you may not have thought of.
You can get less comprehensive results by using DigitalPoint's Keyword Suggestion Tool. This SEO tool will give you a summary of information without the KEI. Personally, I like to know how many people are competing before I design a webpage.