Teacher Librarian: The Journal for School Library Professionals
TL Magazine

Searching the Web

Volume 31, Number 4, April 2004

Finding Those Missing Links

Holly Gunn

Don’t give up on a site when a URL returns an error message. Many web sites can be found by using strategies such as URL trimming, searching cached sites, site searching and searching the WayBack Machine.

URL trimming

If a URL gives you an error message, try cutting the filename from the URL until only the server name remains in the location bar. This is called URL trimming. In order to trim a URL, it is essential to understand how a URL is constructed. The uniform resource locator, or URL, is the Web address for an Internet site. A URL is comprised of the following parts: the protocol, the server name and the path and filename.

Take a look at this URL: http://eawc.evansville.edu/nepage.htm

The http: is the protocol. Protocols are separated from the rest of the URL by a colon and two forward slashes ://. The next part, eawc.evansville.edu, is the name of the server. The server name is followed a forward slash. The path, or directory if there is one, and the filename follow the name of the server. The last part of the filename is the file type: html, ppt, pdf etc.

The URLs in these examples are incorrect and bring error messages, but URL trimming can find the correct web sites:

Example 1: http://www.americanradioworks.org/feature/vietnam/index.html is the URL for a recommended site about Vietnam.

Example 2: http://www.loc.gov/poetry/180/p180list.html is the URL for list of read-aloud poems.

To trim the URLs, cut everything back to the server name, and find the web site from the main page. Here are the correct URLs in case you still weren’t able to find them:
Example1: http://www.americanradioworks.org/features/vietnam/index.html
Example 2: http://www.loc.gov/poetry/180/p180-list.html
Exercises such as these can help students see how URLs are constructed, and teach them how to locate correct Web address when URLs don’t work.

Cached sites

Google Cache is another way to find websites for URL’s that give error messages. Google provides snapshots of each page it examined as it crawled the Web and caches these as a backup in case the original page is unavailable. Google Cache is a useful way of locating web sites when filenames or server names have changed. For example, a Google search for “searching by file type” teacherlibrarian.com brings two results. The first hit results in a “File Not Found” error message, but a click on the link “Cached” in the search result brings up a view of the web site when it was indexed by Google.

Site search

A site search is another way to locate the correct web page as long as the file is still on the same server. The process of searching within a site was described in more detail in Teacher Librarian Vol. 30 (5), June 2003. Here is an example of a site search using the search engine Google: “searching the Web” toolkit site: www.teacherlibrarian.com.

Wayback Machine

If you are really serious about finding a dead link, try the Wayback Machine. This Internet archive contains screen captures of sites as they once appeared on the Web. A search for http://www.teacherlibrarian.com in the Wayback Machine brings a listing of all pages from that Internet address that have been captured in Wayback Machine’s database. See http://web.archive.org/web/*/http://www.teacherlibrarian.com.

References:
Google. (2003). Google web search features: Cached links. Retrieved November 30, 2003.
Notess, G. (2003). Strategy 1: URL guessing and cutting. Search Engine Showdown. Retrieved November 30, 2003.
Sherman, C. (2001). SearchDay: The WayBack Machine: A Web archives search engine. SearchEngineWatch.com. Retrieved November 30, 2003.
URL trimming. (2001). Prosser Schools. Retrieved November 30, 2003.



Holly Gunn

Holly Gunn is the teacher-librarian at Sackville High School, Nova Scotia. She can be reached at hgunn@accesscable.net.

Feature articles support the TL's role in collaboration, leadership, advocacy and technology integration as well as thought-provoking pieces on management and programming issues.

Email Us Return to Home Page About Us TL Magazine Subscribe Now TL Toolkit Contact Us Webmaster Disclaimer Privacy Statement Subscribe Today