This book is licensed under a Creative Commons by-nc-sa 3.0 license. See the license for more details, but that basically means you can share this book as long as you credit the author (but see below), don't make money from it, and do make it available to everyone else under the same terms.
This content was accessible as of December 29, 2012, and it was downloaded then by Andy Schmitz in an effort to preserve the availability of this book.
Normally, the author and publisher would be credited here. However, the publisher has asked for the customary Creative Commons attribution to the original publisher, authors, title, and book URI to be removed. Additionally, per the publisher's request, their name has been removed in some passages. More information is available on this project's attribution page.
For more information on the source of this book, or why it is available for free, please see the project's home page. You can browse or download additional books there. To download a .zip file containing this book to use offline, simply click here.
Search engine traffic is vital to a Web site; without it, chances are the site will never fulfill its marketing functions. It is essential that the search engines can see the entire publicly visible Web site, index it fully, and consider it relevant for its chosen keywords.
Here are the key considerations for search engine optimization when it comes to Web development and design.
URLs (uniform resource locators), alt tagsTextual information that is displayed if an image cannot be displayed; used by search engines to determine what an image is., title tags, and metadataInformation that can be entered about a Web page and the elements on it that provides context and relevancy information to search engines; these used to be an important ranking factor. all describe a Web site and its pages to both search engine spiders and people. (And don’t worry: these words are all described for you in what follows.) Chances are, clear descriptive use of these elements will appeal to both.
URLs should be as brief and descriptive as possible. This may mean that URLs require server-sideOperations that take place on the server. rewriting so as to cope with dynamic parametersThe elements of a URL that are dynamically generated. You can identify them by question marks and ampersands in a URL. in URLs. Does that sound a little heavy? The examples below should make this clearer.
The following is a comparison of the URLs for Diamante Dog Collars, an imaginary product for sale on two imaginary Web sites:
The first example has dynamic parameters—these are shown by the question mark and the ampersand—and uses categories that make sense to the database (e.g., pid=1201), but they make little sense to the user. Dynamic parameters are so called as they change depending on the information being requested. Next time you use a search engine to search for information, take a look at the URL and see if you can spot the dynamic parameters.
The second example is far more user friendly and clearly indicates where in the site the user is. You even start getting a good idea of the architecture of the Web site from just one URL.
More than two dynamic parameters in a URL increase the risk that the URL may not be spidered. The search engine would not even index the content on that page.
Lastly, well-written URLs can make great anchor text. If another site is linking to yours and they use just the URL, the search engine will do a better job of knowing what the page is about if you have a descriptive URL.
Have you ever waited for a page to load and seen little boxes of writing where the images should be? Sometimes they say things like “topimg.jpg” and sometimes they are much clearer and you have “Cocktails at sunset at Camps Bay.”
Since search engines read text, not images, descriptive tags are the only way to tell them what the images are, but these are still essentially for users. Text readers for browsers will also read out these tags to tell the user what is there. Meaningful descriptions certainly sound a lot better than “image1,” “image2,” and “image3.”
Just as you can have the alt tag on an image hypertext markup language (HTML)The code that is used to write most Web sites. element, you can have a title attribute on almost any HTML element—most commonly on a link. This is the text that is seen when a user hovers over the element with the mouse pointer. It is used to describe the element or what the link is about. As this is text, it will also be read by search engine spiders.
Title tags, or what appears on the top bar of your browser, are used by search engines to determine the content of that page. They are also often used by search engines as the link text on the search engines results’ page, so targeted title tags help drive click-through rates. Title tags should be clear and concise (it’s a general rule of thumb that all tags be clear and concise, you’ll find). Title tags are also used when bookmarking a Web page.
The title tag appears in the browser and on the search engine results page (SERP), and the meta description can appear on the SERP as well.
Meta tags are where the developer can fill in information about a Web page. These tags are not normally seen by users. If you right click on a page in a browser and select “view source,” you should see a list of entries for “<meta name=.”
These are the metadata. In the past, the meta tags were used extensively by search engine spiders, but since so many people used this to try to manipulate search results, they are now less important. Metadata now act to provide context and relevancy rather than higher rankings. However, the meta tag called “description” often appears on the search engine results page (SERP) as the snippet of text to describe the Web page being linked to. This is illustrated in Figure 13.4. If the description is accurate, well written, and relevant to the searcher’s query, these descriptions are more likely to be used by the search engine. And if it meets all those criteria, it also means the link is more likely to be clicked on by the searcher.
When it comes to Web development, the copy that is shown on the Web page needs to be kept separate from the code that tells the browser how to display the Web page. This means that the search engine spider can discern easily between what is content to be read (and hence scanned by the spider) and what are instructions to the browser. A cascading style sheet (CSS)An approach to Web design that aims for lightweight code and standards-compliant Web sites. can take care of that and is covered further in this chapter.
The following text styles cannot be indexed by search engines:
If the search engine cannot see the text on the page, it means that it cannot spider and index that page.
If an XML (extensible markup language) file is used for the content in a Macromedia Flash file, then the content can be easily read by search engine spiders.
Well-organized information is as vital for search engines as it is for users. An effective link structure will provide benefits to search rankings and helps to ensure that a search engine indexes every page of your site.
Make use of a site map, linked to and from every other page in the site. The search engine spiders follow the links on a page, and this way, they will be able to index the whole site. A well-planned site map will also ensure that every page on the site is within a few clicks of the home page.
There are two site maps that can be used: an HTML site map that a visitor to the Web site can see, use, and make sense of and an XML (extensible markup language)A standard used for creating structured documents. site map that contains additional information for the search engine spiders. An XML site map can be submitted to search engines to promote full and regular indexing. Again, a dynamically generated site map will update automatically when content is added.
Using a category structure that flows from broad to narrow also indicates to search engines that your site is highly relevant and covers a topic in depth.
Have you noticed that sometimes several URLs can all give you the same Web page? For example, refer to the following:
All the above can be used for the same home page of a Web site. However, search engines see these as three separate pages with duplicate content. Search engines look for unique documents and content, and when duplicates are encountered, a search engine will select one as canonical, and display that page in the SERPs (search engine results pages). However, it will also dish out a lower rank to that page and all its copies. Any value is diluted by having multiple versions.
Lazy Webmasters sometimes forget to put any kind of redirect in place, meaning that http:// websitename.com doesn’t exist, while http://www.websitename.com does. This is termed “Lame-Ass Syndrome” (LAS) by Quirk, a fitting moniker.
Having multiple pages with the same content, however that came about, hurts the Web site’s search engine rankings. There is a solution: 301 redirects can be used to point all versions to a single, canonical version.
A robots.txt file restricts a search engine spider from crawling and indexing certain pages of a Web site by giving instructions to the search engine spider, or bot. This is called the Robots Exclusion ProtocolA protocol used to indicate to search engine robots which pages should not be indexed.. So, if there are pages or directories on a Web site that should not appear in the SERPs, the robots.txt file should be used to indicate this to search engines.
If a search engine robot wants to crawl a Web site URL—for example, http://www.websitename.com/welcome.html—it will first check for http://www.web sitename.com/robots.txt.
Visiting the second URL will show a text file with the following:
Here, “User-agent: *” means that the instruction is for all bots. If the instruction is to specific bots, it should be identified here. The “Disallow: /” is an instruction that no pages of the Web site should be indexed. If there are only certain pages or directories that should not be indexed, they should be included here.
For example, if there is both an HTML and a PDF (portable document format) version of the same content, the wise Webmaster will instruct search engine bots to index only one of the two to avoid being penalized for duplicate content.
The robots.txt file is publicly accessible, so although it does not show restricted content, it can give an idea of the content that a Web site owner wants to keep private. A robots.txt file needs to be created for each subdomain.
Here is a robots.txt file with additional information.
Instructions to search engine robots can also be given in the meta tags. This means that instructions can still be given if you only have access to the meta tags and not to the robots.txt file.
Make sure that both visitors to your Web site and search engines can see it all by following these guidelines: