Searching the Web
Volume 31, Number 4, April 2004
Finding Those Missing Links
Holly Gunn
Don’t give up on a site when a URL returns an
error message. Many web sites can be found by using strategies
such as URL trimming, searching cached sites, site searching
and searching the WayBack Machine.
URL trimming
If a URL gives you an
error message, try cutting the filename from the URL
until only the server name remains
in the location bar. This is called URL trimming. In
order to trim a URL, it is essential to understand
how a URL is constructed. The uniform resource locator,
or
URL, is the Web address for an Internet site. A URL
is comprised of the following parts: the protocol,
the server
name and the path and filename.
Take a look at this
URL: http://eawc.evansville.edu/nepage.htm
The http: is the
protocol. Protocols are separated from the rest of
the URL by a colon and two forward slashes
://. The next part, eawc.evansville.edu, is the name
of the server. The server name is followed a forward
slash. The path, or directory if there is one, and
the filename follow the name of the server. The last
part
of the filename is the file type: html, ppt, pdf etc.
The
URLs in these examples are incorrect and bring error
messages, but URL trimming can find the correct web
sites:
Example
1: http://www.americanradioworks.org/feature/vietnam/index.html is
the URL for a recommended site about Vietnam.
Example
2: http://www.loc.gov/poetry/180/p180list.html is
the URL for list of read-aloud poems.
To trim the URLs,
cut everything back to the server name, and find the
web site from the main page. Here
are the
correct URLs in case you still weren’t able to
find them:
Example1: http://www.americanradioworks.org/features/vietnam/index.html
Example 2: http://www.loc.gov/poetry/180/p180-list.html
Exercises such as these can help students see how URLs
are constructed, and teach them how to locate correct
Web address when URLs don’t work.
Cached sites
Google Cache is another way to find websites
for URL’s
that give error messages. Google provides snapshots of each page it examined as it
crawled the
Web and caches these as a backup in case the original
page is unavailable. Google Cache is a useful way
of locating web sites when filenames or server names
have
changed. For example, a Google search for “searching
by file type” teacherlibrarian.com brings two
results. The first hit results in a “File Not
Found” error
message, but a click on the link “Cached” in
the search result brings up a view of the web site
when it was indexed by Google.
Site search
A site search is another way to locate the
correct web page as long as the file is still on the
same server.
The process of searching within a site was described
in more detail in Teacher Librarian Vol. 30 (5), June
2003. Here is an example of a site search using the
search engine Google: “searching the Web” toolkit
site: www.teacherlibrarian.com.
Wayback Machine
If
you are really serious about finding a dead link, try
the Wayback
Machine. This Internet archive contains
screen captures of sites as they once appeared on the
Web. A search for http://www.teacherlibrarian.com in
the Wayback Machine brings a listing of all pages from
that Internet address that have been captured in Wayback
Machine’s database. See http://web.archive.org/web/*/http://www.teacherlibrarian.com. References:
Google. (2003). Google web search features: Cached
links. Retrieved
November 30, 2003.
Notess, G. (2003). Strategy 1: URL guessing and cutting.
Search Engine Showdown. Retrieved
November 30, 2003.
Sherman, C. (2001). SearchDay: The WayBack Machine:
A Web archives search engine. SearchEngineWatch.com.
Retrieved
November 30, 2003.
URL trimming. (2001). Prosser Schools. Retrieved
November 30, 2003.

Holly Gunn is the teacher-librarian at Sackville
High School, Nova Scotia. She can be reached at hgunn@accesscable.net.
|