This page describes which features cause difficulties ..
Some search engines catalogue only what they find in the user-visible
portions of text of your web pages.
They ignore anything in META tags, comments,
and anything in Java and Perl scripts or CGI directories.
The more frequently words are found in web pages catalogued by most
search engines, the more difficult it is to find any particular page
containing those frequently - used words.
For example, "HTML forms and CGI" will appear in millions of web pages.
Your page could appear as number 51,939 on the search result list of
93,000 web pages that all four of these words appear in.
Don't prevent search engines from indexing your site
by doing things they CAN'T or WON'T deal with.
There are a number of things that are 'so cool' that search engines
either can not deal with at all or refuse to deal with
for practical reasons.
Search-engine users will typically skip over entries where the displayed
summary consists of several lines of JavaScript or tells the user what
browsers the site is best viewed with. Such things simply do not belong
at the beginning of a page (some would argue that they don't belong
anywhere else on the page), nor do link-exchange blurbs, awards, etc.
DON'T use fancy
JavaScript
'drop menu' navigation or
Java
only navigation without providing bypass navigation methods
(links).
Avoid FRAMEs. Many search engines will refuse to index FRAMEed sites.
If you use page redirection,
If your home page has lots of graphics or Java scripts,
or if most of your site is contained in a database, Perl and CGI,
considers creating a text - only web page that describes your company/organization sites.
Spiders are not often capable of indexing automatically redirected pages.
Avoid Content-free Pages
Search engines also give more-or-less useless entries when indexing
splash screens that say little more than "click here to enter,"
pages in which much of the text is actually in the form of images with no ALT text, and
pages that consist mostly of imagemaps.
Framesets with no NOFRAMES content typically don't show up at all;
if a frameset's NOFRAMES content
starts off with "your browser does not support frames...," that's what
the search-engine user will see as the first entry for your site. In
fact, frames create a lot of difficulty for search-engine users.
If you use frames, consider using <noframes> to include the
information for searchers and for people whose browsers
do not support frames.
Provide Substantial Content, Not Hype
Overall, the best thing you can do is make sure that your pages have a
high content-to-fluff ratio; not only will this make them easier to
index, it will also keep the user reading the site once he gets to it.
Remember that someone searching is looking for information, not hype;
he doesn't care how great you think your company is; all he cares about
is whether your company's products and services can meet his needs.
Removing or Revising pages
If you do not want your site to be catalogued at all,
follow
instructions on robots.txt files.
Robots can be made to ignore part or all of
your site when this file is used.
Be sure to delete old pages from all search engines that you no
longer want catalogued.
If you revise your pages, resubmit them so they can be re-indexed.