Teacher Librarian Feature Article
The TL web site provides a sample of the excellent material available
in each back issue. To access a specific article, bookmark, or column subscribe
today, subscribers can contact
us with the volume, number and article they would like.
V.30.5 | V.30.4 | V.30.3 | V.30.2 | V.30.1
Volume 30, Number 4, April 2003
Trust or Trussed? Has Turnitin.com Got It All Wrapped Up?
John Royce
Almost every week there is a report on the prevalence of plagiarism
from Internet sources, or news of a university, school or school district
which has tired of student plagiarism and has signed up for the services
of Turnitin.com. Headlines announce: "75
percent of students admit to fraud, studies show" (Oakland Tribune,
December 9, 2002), or, "Using the Internet to catch cheaters" (Newsday.com,
December 17, 2002).
Turnitin is reported to investigate more than 10,000 papers a day, and
about 3,000 of these turn out to be plagiarized to a significant extent
(Paper chase, 2002; Slaton, 2002). Turnitin itself claims that the company
gains three new users a minute (Turnitin.com intro, 2002). It is a remarkable
success story. As a result, the levels of detected plagiarism are reported
to be falling in those institutions which subscribe. It seems enough
for a school to announce that it has subscribed to Turnitin for its levels
of suspected plagiarism to fall. Turnitins reputation deters would-be-plagiarists.
Two kinds of plagiarism
There are two kinds of plagiarism, and Turnitin aims to root out both.
The first kind of plagiarism is from published materials. The original
is openly published in some form. The original source might be a book,
a newspaper, a television program, CD-Rom, a discussion list posting
or a web page. Somewhere, a record exists and it isat least it
was openly available. The plagiarist found it, and so too might
anybody else looking for the same information. In this regard, however,
note that Turnitin seeks matches only on the Internet. It does not claim
to seek among printed, broadcast or other materials; it searches only
the Internet. As more and more students turn to the Internet for their
information, this may not, at first glance, be too much of a drawback
to the subscribers (Canadian students
, 2001; Lenhart, 2002).

Plagiarism of this first kind is traceable. It might take a long time,
but because the material was published it should be possible to find
the original. However, even Internet sources can be difficult to track
down. Search engines search only a fraction of the Internet. Some search
only the World Wide Web. Different search engines find different hits,
and no single search engine finds everything (Notess, 2002).
A lot of valuable material is found on what has become known as the
Invisible Web. Sherman and Price (2001, p. 57) define the Invisible Web
as: Text pages, files, or other often high-quality authoritative
information available via the World Wide Web that general-purpose search
engines cannot, due to technical limitations, or will not, due to deliberate
choice, add to their indices of Web pages. This includes material
that is openly available on databases that are free to the end-user.
Invisible web pages can often be found easily but not by the general
search engines. Unless one can work out which resources were used or
replicate the actual search, re-finding the originals may be nigh impossible.
Moreover, many web pages are unstable, here today and gone tomorrow,
especially pages published by the mass media. Online journals make their
latest pages available, but past issues may be available only to subscribers,
or may disappear altogether. Some online journals constantly update their
pages. The URL remains but the article vanishes, replaced by another.
Plagiarism of the second kind is from unpublished materials. These could
be personal diaries and letters, a friends homework, even ones
own work originally written for a different teacher last year. Also included
is unpublished material passed along a network or fraternity, from one
friend to another, from one year to another. This is the form of plagiarism
discovered in a well-publicized case at the University of Virginia in
2001, when more than 120 students were suspected of plagiarizing from
the same material over five years. Because unpublished work is not openly
available, it may be impossible to track down the original material which
has been copied and plagiarized.

There is also much concern over the ever-increasing number of cheat
sites and paper mills which flourish on the Internet. Some sites do not
charge for their services, and may publish the actual papers on the Internet,
leading to plagiarism of the first kind, open publication. Papers found
on paper mills are expensive and usually at university level. They may
even be custom written for the customer. These papers are often sent
to the buyer by e-mail or fax, and no search engine will find these materials.
They are not there to be found. These lead to plagiarism of the second
kind. Curiously, Google Answers may be providing a cheap school-level
alternative to the paper mills (Goot, 2002). Even more curiously, Google's search
engine does not index Google Answers.
All these factors add to the frustrations of the chase.
Enter Turnitin.com
Turnitin aims to track down both kinds of plagiarism.
That said, it is important to realize that Turnitin does not find plagiarism.
What it does is find sequences of words in submitted documents which
match sequences of words in documents in its database, or sequences of
words in documents on the Internet.
Every paper submitted to Turnitin, apart from those submitted as part
of a free trial, is added to the Turnitin database. This happens even
if no matches to other documents are found. In this way, Turnitin hopes
to nail plagiarism of the second kind, plagiarism from unpublished sources.
A match will not be found the first time an essay is submitted from a
paper mill or an informal network, but if it is ever submitted again,
in whole or in part, it will be found. The more schools subscribe to
Turnitin, the more papers are submitted and the larger the database becomes.
Turnitin also searches the Internet in an attempt to find source material
which has simply been cut and pasted from a published source into a students
essay. Because they are automated, the searches will be more persistent
than a human searcher can achieve, and the Turnitin search engine will
handle greater numbers of searches than a human searcher.
Turnitin thus has two strengths: its ability to search the Internet
faster and longer than any individual, and the ever-growing database
of submitted essays.
And from the press reports published in the open press and the commendations
posted on the Turnitin web site, it is doing a good job. Turnitin has
seen several rivals come and go. Integriguard and its simplified and
free version at HowOriginal.com have disappeared. So too has Findsame,
although the original company, Digital Integrity, is still in business.
Turnitin has been aggressively marketing itself in the last few years,
and its commercial success is evidenced in the headlines cited earlier.
The company claims more than 1.5 million new users a year (Turnitin.com
intro, 2002). John Barrie, the founder of Turnitin, has declared: In
very short order, we'll have it all wrapped up. We'll become the next
generation's spell checker.... There will be no room for anybody else,
not even a Microsoft, to provide a similar type of service because we
will have the database (Masur, 2001).
Some nagging doubts
Some of the reported stories would seem to show that many students are
incredibly naïve about plagiarism and their chances of getting caught and
that many instructors are equally naïve. Surprise is often expressed
at how easy it is to find essays, to download and to cut and paste. Surprise
is often expressed at how easy it is to find the originals when plagiarism
is suspected. If only! Many instructors claim they can recognize when
a students voice or writing style changes, especially if a mediocre
student suddenly shines. The change may be less obvious when a mediocre
student produces mediocre work, and a lot of material on the Internet,
and especially on the cheat sites, is mediocre (Royce, 2001).
When plagiarism is suspected, the burden of proof lies with the instructor.
The only way definitively to prove plagiarism is to find a word-for-word
copy of the original passage or passages plagiarized, and then to show
that the students attribution is missing or wrong. If the match
is just a short sentence or two, the student might be given the benefit
of the doubt, especially if the rest of the paper is in order. But when
large pieces of the paper match other documents word for word with false
attribution or none at all, then plagiarism has surely taken place.

It is more difficult to prove plagiarism when there has been much transformation
of sentence structure and use of different words. It can be very difficult
to prove that ideas have been plagiarized, even when the work is not
the students usual voice. In these cases, a student
who protests innocence might well defend him- or herself successfully.
However strong the circumstantial evidence, however unable the student
is to explain the research process or to explain words used or provide
other proof of original work, the case cannot be proven unless
the student confesses. The only damning proof is a copy of the original.
And it is quite possible that those whose plagiarism is proven beyond
doubt might still fight and beat a reluctant faculty or a weak school
board, as happened in Piper School District, following which both the
teacher and her principal resigned (Carroll, 2002; Principal..., 2002).
Of course, if the instructions are to provide copies of sources used,
or to provide full citations and bibliographical referencing, then the
student might be downgraded for failing to fulfil all aspects of the
assignment. If citations and references are provided then it might be
easier to see just howand ifthe original sources have been
used. The students essay might point to a need for more help with
the mechanics of research reporting rather than providing proof of plagiarism.
As noted, it is often difficult to find an original source. Sometimes
a simple Internet search for the title, a few keywords or a string of
consecutive words really does suffice. Many times detection takes longer.
The concern is that those who think detection is a simple matter of typing
in a few keywords will also think that if no matches are found then the
suspicions are not valid.
Similarly, the reports and the sales figures suggest there is great
trust in Turnitin. Hildebrand notes (2002, para. 7-8), The system
isnt foolproof, but its very presence in schools seems to serve
as a deterrent. 'You don't have to use it,' observes [one teacher
]
'Students just have to know you have it.' It is another concern
that Turnitin's results may be accepted without further questioning;
if a match is found then the student is automatically believed to be
guilty, while if no match is found, the student is automatically considered
innocent. The bulk of comments on Turnitins testimonials page provide
evidence of great and sometimes unquestioning trust in Turnitin's results and
that makes these concerns very real (Turnitin.com Testimonials, 2002).
Turnitin itself does not claim to be infallible. Which is just as well.
It isnt.
How well does Turnitin perform?
Several reports draw attention to areas which Turnitin does not cover,
and they point to shortcomings in the areas which Turnitin does claim
to cover. The fact that Turnitin finds matches in one-third of papers
submitted means little, unless one knows that only one-third of all papers
written contain plagiarism. A true test of a detection services
detection rate might require a control group, such as a set of papers
where plagiarism is known because the investigator has created the
submission.
This has been done in at least four investigations.
Robin Hill carried out a small test using three completely plagiarized
essays. Turnitin failed to find one essay, and found only one of the
two sources used to create the second essay. Turnitin failed to find
the third essay, but did find a number of false hits, matching strings
of words in other papers. Since this last was a paper on recombinant
DNA, Hill suggests that the false hits were found because of the limited
language available in this field of study (Hill, 2002).
The Joint Information System Committee (JISC) carried out a wide-ranging
investigation in several British post-secondary institutions. One of
the issues investigated was the success of Turnitin and other detection
agents in detecting plagiarism. The investigating team used a number
of genuine essays, and also created 11 essays from a variety
of sources, including paper mills or essay banks. Students were warned
before the project stage that their work would be used to test the services.
Turnitin performed best of the software and agencies investigated. The
JISC report gives Turnitin a five-star rating (excellent) for tracking
down cases of collusion, and four stars (good) for discovering cut-and-paste
and paper mill origins (Chester, 2000; Large, 2001).
However, respondents to the JISC survey on plagiarism believe that 74
percent of plagiarism originates from textbooks and theses, and only
24 percent from the Internet. At a conference held to discuss the various
project reports, there was strong voice for textbooks and also for print
and online journals to be included in the Turnitin database (Large, 2001).
The JISC Reports also noted areas not covered by the project:
- Detection of text converted to a foreign language and then converted
back to English
- Detection of essays converted from a foreign language
- Plagiarism of diagrams, pictures or graphs
and certain other forms of cheating (Large, 2001, p. 2).
My own research forms the third investigation, another attempt to investigate
the various plagiarism detection services. My interest grew in part from
the realization that many of the papers available on the cheat sites
themselves contained huge amounts of plagiarized material, not always
detectable using ordinary search engines. I too compiled a number of
essays from various sources. It is very easy to plagiarize; could I get
away with it? With my plagiarized essays and a number of genuine student
essays, I tested various free services, including services which normally
charge but which do offer free trial periods. All services showed wanting.
My favorite was Findsame, but this company has since collapsed and its
services are no longer available.
I found that Turnitin found no matches for material lifted from usenet
discussion groups and discussion lists; found no matches for material
lifted from online encyclopedias; and did not track down material lifted
from journals located in subscription databases. There is irony here,
for when much on the Internet is of dubious worth, many librarians encourage
students to use the periodicals databases as worthy published sources
(Valenza, 2001). Yet these worthy sources are less likely to be detected
and a guilty student can escape detection. Nor did Turnitin work well
with transformations and paraphrases. In the Piper High School case,
one parent reportedly said her daughter is not sure now how much
she needs to rewrite research material before she can use it (Carroll,
2002, para.16). Rewriting is not the issue; a piece of work authored
by someone else but rewritten in ones own words still needs a citation
to the author whose thoughts, if not words, are being used.
On the other hand, it did find matches for small contentless strings
of words in completely irrelevant documents. It several times made false
accusations of plagiarism, but missed by far the greater part of the
material I really had plagiarized, missing 15 of 18 plagiarized passages
in one of the essays (Royce, 2001).
Turnitin seeks only one match, and this is a particular concern. As
soon as Turnitin finds a match, it stops seeking for other matches for
that particular string or section of paper. It cannot distinguish between
material used and cited correctly, material for which false citations
are made, and material lifted without any citation at all; a match is
all, and any match will do.
Also of concern is that a student can quote and cite accurately, but
still be accused of plagiarism. If a student does give citations, then
the instructor must check them out. This is the first rule of plagiarism
detection. The fact that Turnitin finds a passage on a different web
site to that cited does not necessarily prove plagiarism. On the Internet,
there is frequent double and multiple posting of identical pages on several
sites. Of course, if a match is found with no attribution at all, this
could be a more definite indication, but it still needs to be checked
out.
The fourth investigation, by Satterwhite and Gerein, is the most thorough
survey of those discussed here. This team bought a subscription to a
detection service, and their budget allowed them to use papers purchased
from Internet paper mills as well as papers downloaded for free. They
too compiled their own plagiarized essays, and also submitted genuine
student essays.
Their best results were from Turnitin, but they remain cautious about
recommending it. They were
not very impressed with the results
provided by both the paid and free plagiarism detection services and
software (Satterwhite & Gerein, 2002, Summary of our observations,
para. 1). They go on to report: Based on our findings this far,
we are fairly confident in our ability to relate to our faculty that
available detection software and services as they currently exist are
not effective tools with which to identify online plagiarism. They are
not reliable, nor sophisticated enough to warrant the investment of college
funds. Not only are they ineffective, but some of the products/services
promote a real lack of trust and resentment between professor and student
that, especially given their lack of success, makes such a purchase undesirable (Satterwhite & Gerein,
2002, Preliminary conclusions, para. 1).
Satterwhite and Gerein do report that despite their shortcomings, the
detection services performed better than search engines. I would disagree,
and suggest that it very much depends on which search engines are used,
the skill of the searcher and the source of the material. I always recommend
that those seeking to prove plagiarism make sure they include the sources
available to students, sources which often include online periodicals
databases and CD-ROMs in any library which the students use; it is also
worth looking at print resources in the students library, journals
and books. I believe an automated search for plagiarism makes the
whole thing mechanical. It lacks the nose and instinct, the logic, intuition
and determination of a skilled human bloodhound. A skilled librarian
may be better able to discover plagiarism
I think it probable
that a skilled librarian would have tracked down most plagiarized sequences
[from the test submissions] especially those lifted from electronic sources
and from the Internet (Royce, 2001, p. 182).
The comparative studies agree, Turnitin is probably the best plagiarism
detection service available but there are still major concerns.
Does it matter?
Does it matter that Turnitin is less than perfect, when you do not have
to prove that the whole of an essay has been plagiarized? Isnt
it enough to find that a significant amount has been plagiarized? Indeed
it is, and will be, until would-be cheats realize the shortcomings of
Turnitin and similar services. They might then use periodicals databases
and usenet groups, they might seek out material elsewhere in the Invisible
Web, they might well get ever better at rewriting in their own words.
They might set out to discover the holes in a detection services
coverage or read a paper like this. The thinking cheat might well
take advantage of Turnitins offer to check five pieces of work
free of charge. If the work passes muster, well and good, and if certain
parts of it need more careful handling then this too can be done and
checked before submitting the final product to the teacher.
The bottom line is that innocent students may be falsely accused of
plagiarism, and that many plagiarists may go undetected. With these caveats
in mind, Turnitin can be used to weed out the most obvious cases, to
perform a first sweep. Using it in this way will save much time and worry.
But the instructor still needs to check each report, especially when
the student has given citations, even if they do not match the sources
pinpointed by Turnitin. And then the instructor must still check, or
ask the librarian to check, whenever plagiarism is still suspected in
other papers which have been given a clean bill of health.
Conclusion
We are not going to beat the cheats. It is too easy to beat the system.
In particular, working in a bilingual school and with an international
school background, I am all too aware of how many students have two and
more languages. If it is difficult to prove plagiarism when a student
has heavily rewritten a piece, using own words in place of the original,
then it is nigh impossible to catch someone who has translated a source
written in one language into a paper written in another.
We can attempt to set plagiarism-proof assignments; we can make it so
that students do not want or need to copy; we can devise alternative
presentation methods which minimize the opportunity for plagiarism; we
can stress process as well as content; we can ask students to provide
originals or copies of the sources used; we can make it so hard for the
plagiarist to plagiarize that it is easier to do the real work; we can
try to promote honorable and ethical attitudes towards work. We can indulge
in any number of techniques and strategies which will reduce plagiarism.
We can do all this, but we must be aware, we are not going to beat out
of existence those who are determined to cheat.
In the meantime, we must also be aware of the shortcomings of plagiarism
detection services. They are a tool, a weapon, a deterrent. But they
have to be used wisely, with an awareness of their shortcomings. The
human element remains vital, and without further investigation of their
findings, both positive and negative, innocent students stand to be accused
of plagiarism and guilty students could still get away with it. Plagiarism
services are a tool, but caveat emptor, buyer beware!
References
Canadian students choose Internet as top homework source, but spend
more than half their time searching for relevant information. (2001,
October 4). Rogers iMedia Education Group Press Releases. Retrieved
August 6, 2002, from http://www.rogerseducation.com/press_releases/100401.html.
Carroll, D. (2002, January 29). Teacher quits in dispute with school board
over student plagiarism. Kansas City Star. Retrieved February 8, 2002, from http://www.kansascity.com/mld/kansascity/2561083.htm.
Chester, G. (2000). Pilot of free-text plagiarism detection software: a report
prepared for the Joint Information System Committee. Retrieved April 25, 2002,
from http://www.jisc.ac.uk/pub01/pilot.pdf .
Goot, D. (2002, September 10). Thin line splits cheating, smarts. Retrieved
January 2, 2003, from http://www.wired.com/news/school/0,1383,54963.00html.
Hildebrand, J. (2002, December 17). Using the Internet To Catch Cheaters. Newsday.com.
Retrieved January 2, 2002, from http://www.newsday.com/mynews/ny-lied173049335dec17.story
Hill, R. (2002). Brown bag lunch: Turnitin plagiarism detection software. Retrieved
April 1, 2002, from http://uwadmnweb.uwyo.edu/ctl/event_calendar/TIINotes.txt.

Internet plagiarism worries educators. (2001, April 30). Milwaukee Journal
Sentinel Online. Retrieved March 11, 2002, from http://www.jsonline.com/bym/Tech/news/apr01/cheat01043001.asp.
Large, S. (2001). Notes from the JISC workshop on electronic detection of plagiarism
held on the 16 July 2001. Retrieved April 25, 2002, from www.jisc.ac.uk/events/01/plag_det/sacha.rtf.
Lenhart, A., Maya S. & Graziano, M. (2001). The Internet and education:
Findings of the Pew Internet and American Life Project. Retrieved June 10,
2002, from http://www.pewinternet.org/reports/pdfs/PIP_Schools_Report.pdf .
Masur, K. (2001, May). Papers, profits, and pedagogy: Plagiarism in the age
of the Internet. Perspectives Online. Retrieved June 10, 2002, from http://www.theaha.org/perspectives/issues/2001/0105/0105new3.cfm.
Notess, G.R. (2002, March 6). Search engines statistics: Database overlap.
Retrieved June 13, 2002, from http://www.searchengineshowdown.com/stats/overlap.shtml.
OConnell, J.C. (2002, May 1). Cliché amendment: Plagiarizers never
prosper. The Lantern. Retrieved May 15, 2002, from http://thelantern.com/main.cfm/include/detail/storyid/248205.html.
Free registration required.
Paper chase. (2002, April 16). The Santa Rosa Press Democrat. Retrieved April
30, 2002, from http://www.pressdemocrat.com/search.
Principal in plagiarism dispute announces resignation. (2002, March 17). Amarillo
Globe News. Retrieved April 30, 2002, from http://www.amarillonet.com/stories/031702/usn_principal.shtml.

Royce, J. (2001, Winter). Quis custodiet
: Investigating the investigators.
School Librarian 49 (4), 181-183.
Satterwhite, R., & Gerein, M. (2002). Downloading detectives: Searching
for on-line plagiarism. Retrieved June 11, 2002, from http://www.coloradocollege.edu/Library/Course/downloading_detectives_paper.htm.
Sherman, C. & Price, G. (2001). The invisible web: Uncovering information
sources search engines can't see. Medford, NJ: Information Today, Inc.
Slaton, J. (2002, April 29). Plagiarizers beware: Turnitin.com is here to stop
your cheating ways. SF Gate. Retrieved May 2, 2002, from http://sfgate.com/cgi-bin/article.cgi?file=/gate/archive/2002/04/29/plagiar.DTL
Tucker, J. (2002, December 9). 75 percent of students admit to fraud, studies
show. The Oakland Tribune. Retrieved January 2, 2003, from http://www.oaklandtribune.com/Stories/0,1413,82%257E1865%257E1041793,00.html.
Turnitin.com intro. (2002.) Retrieved June 1, 2002, from http://www.turnitin.com/.
Turnitin.com Testimonials (2002). Retrieved January 4, 2003, from http://www.turnitin.com/static/testimonials.html.
Valenza, J. (2001, September). What's not on the web. Learning and Leading
with Technology, 29 (1), 6-9, 48.
John Royce is Library Director of Robert College, in Istanbul, Turkey,
and has worked in Germany, Malawi, England and Zambia. Currently the
IASL Regional
Director for North Africa and the Middle East, he has twice served as chairman
of the ECIS Libraries Committee. He can be contacted at jroyce@robcol.k12.tr. |