Teacher Librarian: The Journal for School Library Professionals
TL Magazine

Teacher Librarian Feature Article

The TL web site provides a sample of the excellent material available in each back issue. To access a specific article, bookmark, or column subscribe today, subscribers can contact us with the volume, number and article they would like.
V.30.5 | V.30.4 | V.30.3 | V.30.2 | V.30.1

Volume 30, Number 4, April 2003

Trust or Trussed? Has Turnitin.com Got It All Wrapped Up?

John Royce

Almost every week there is a report on the prevalence of plagiarism from Internet sources, or news of a university, school or school district which has tired of student plagiarism and has signed up for the services of Turnitin.com. Headlines announce: "75 percent of students admit to fraud, studies show" (Oakland Tribune, December 9, 2002), or, "Using the Internet to catch cheaters" (Newsday.com, December 17, 2002).

Turnitin is reported to investigate more than 10,000 papers a day, and about 3,000 of these turn out to be plagiarized to a significant extent (Paper chase, 2002; Slaton, 2002). Turnitin itself claims that the company gains three new users a minute (Turnitin.com intro, 2002). It is a remarkable success story. As a result, the levels of detected plagiarism are reported to be falling in those institutions which subscribe. It seems enough for a school to announce that it has subscribed to Turnitin for its levels of suspected plagiarism to fall. Turnitin’s reputation deters would-be-plagiarists.

Two kinds of plagiarism

There are two kinds of plagiarism, and Turnitin aims to root out both.

The first kind of plagiarism is from published materials. The original is openly published in some form. The original source might be a book, a newspaper, a television program, CD-Rom, a discussion list posting or a web page. Somewhere, a record exists and it is—at least it was— openly available. The plagiarist found it, and so too might anybody else looking for the same information. In this regard, however, note that Turnitin seeks matches only on the Internet. It does not claim to seek among printed, broadcast or other materials; it searches only the Internet. As more and more students turn to the Internet for their information, this may not, at first glance, be too much of a drawback to the subscribers (Canadian students…, 2001; Lenhart, 2002).

To top

Plagiarism of this first kind is traceable. It might take a long time, but because the material was published it should be possible to find the original. However, even Internet sources can be difficult to track down. Search engines search only a fraction of the Internet. Some search only the World Wide Web. Different search engines find different hits, and no single search engine finds everything (Notess, 2002).

A lot of valuable material is found on what has become known as the Invisible Web. Sherman and Price (2001, p. 57) define the Invisible Web as: “Text pages, files, or other often high-quality authoritative information available via the World Wide Web that general-purpose search engines cannot, due to technical limitations, or will not, due to deliberate choice, add to their indices of Web pages.” This includes material that is openly available on databases that are free to the end-user. Invisible web pages can often be found easily – but not by the general search engines. Unless one can work out which resources were used or replicate the actual search, re-finding the originals may be nigh impossible.

Moreover, many web pages are unstable, here today and gone tomorrow, especially pages published by the mass media. Online journals make their latest pages available, but past issues may be available only to subscribers, or may disappear altogether. Some online journals constantly update their pages. The URL remains but the article vanishes, replaced by another.

Plagiarism of the second kind is from unpublished materials. These could be personal diaries and letters, a friend’s homework, even one’s own work originally written for a different teacher last year. Also included is unpublished material passed along a network or fraternity, from one friend to another, from one year to another. This is the form of plagiarism discovered in a well-publicized case at the University of Virginia in 2001, when more than 120 students were suspected of plagiarizing from the same material over five years. Because unpublished work is not openly available, it may be impossible to track down the original material which has been copied and plagiarized.

To top

There is also much concern over the ever-increasing number of cheat sites and paper mills which flourish on the Internet. Some sites do not charge for their services, and may publish the actual papers on the Internet, leading to plagiarism of the first kind, open publication. Papers found on paper mills are expensive and usually at university level. They may even be custom written for the customer. These papers are often sent to the buyer by e-mail or fax, and no search engine will find these materials. They are not there to be found. These lead to plagiarism of the second kind. Curiously, Google Answers may be providing a cheap school-level alternative to the paper mills (Goot, 2002). Even more curiously, Google's search engine does not index Google Answers.

All these factors add to the frustrations of the chase.

Enter Turnitin.com

Turnitin aims to track down both kinds of plagiarism.

That said, it is important to realize that Turnitin does not find plagiarism. What it does is find sequences of words in submitted documents which match sequences of words in documents in its database, or sequences of words in documents on the Internet.

Every paper submitted to Turnitin, apart from those submitted as part of a free trial, is added to the Turnitin database. This happens even if no matches to other documents are found. In this way, Turnitin hopes to nail plagiarism of the second kind, plagiarism from unpublished sources. A match will not be found the first time an essay is submitted from a paper mill or an informal network, but if it is ever submitted again, in whole or in part, it will be found. The more schools subscribe to Turnitin, the more papers are submitted and the larger the database becomes.

Turnitin also searches the Internet in an attempt to find source material which has simply been cut and pasted from a published source into a student’s essay. Because they are automated, the searches will be more persistent than a human searcher can achieve, and the Turnitin search engine will handle greater numbers of searches than a human searcher.

To top

Turnitin thus has two strengths: its ability to search the Internet faster and longer than any individual, and the ever-growing database of submitted essays.

And from the press reports published in the open press and the commendations posted on the Turnitin web site, it is doing a good job. Turnitin has seen several rivals come and go. Integriguard and its simplified and free version at HowOriginal.com have disappeared. So too has Findsame, although the original company, Digital Integrity, is still in business. Turnitin has been aggressively marketing itself in the last few years, and its commercial success is evidenced in the headlines cited earlier. The company claims more than 1.5 million new users a year (Turnitin.com intro, 2002). John Barrie, the founder of Turnitin, has declared: “In very short order, we'll have it all wrapped up. We'll become the next generation's spell checker.... There will be no room for anybody else, not even a Microsoft, to provide a similar type of service because we will have the database” (Masur, 2001).

Some nagging doubts

Some of the reported stories would seem to show that many students are incredibly naïve about plagiarism and their chances of getting caught – and that many instructors are equally naïve. Surprise is often expressed at how easy it is to find essays, to download and to cut and paste. Surprise is often expressed at how easy it is to find the originals when plagiarism is suspected. If only! Many instructors claim they can recognize when a student’s voice or writing style changes, especially if a mediocre student suddenly shines. The change may be less obvious when a mediocre student produces mediocre work, and a lot of material on the Internet, and especially on the cheat sites, is mediocre (Royce, 2001).

When plagiarism is suspected, the burden of proof lies with the instructor. The only way definitively to prove plagiarism is to find a word-for-word copy of the original passage or passages plagiarized, and then to show that the student’s attribution is missing or wrong. If the match is just a short sentence or two, the student might be given the benefit of the doubt, especially if the rest of the paper is in order. But when large pieces of the paper match other documents word for word with false attribution or none at all, then plagiarism has surely taken place.

To top

It is more difficult to prove plagiarism when there has been much transformation of sentence structure and use of different words. It can be very difficult to prove that ideas have been plagiarized, even when the work is not the student’s usual “voice”. In these cases, a student who protests innocence might well defend him- or herself successfully. However strong the circumstantial evidence, however unable the student is to explain the research process or to explain words used or provide other ‘proof’ of original work, the case cannot be proven unless the student confesses. The only damning proof is a copy of the original.

And it is quite possible that those whose plagiarism is proven beyond doubt might still fight and beat a reluctant faculty or a weak school board, as happened in Piper School District, following which both the teacher and her principal resigned (Carroll, 2002; Principal..., 2002).

Of course, if the instructions are to provide copies of sources used, or to provide full citations and bibliographical referencing, then the student might be downgraded for failing to fulfil all aspects of the assignment. If citations and references are provided then it might be easier to see just how—and if—the original sources have been used. The student’s essay might point to a need for more help with the mechanics of research reporting rather than providing proof of plagiarism.

As noted, it is often difficult to find an original source. Sometimes a simple Internet search for the title, a few keywords or a string of consecutive words really does suffice. Many times detection takes longer. The concern is that those who think detection is a simple matter of typing in a few keywords will also think that if no matches are found then the suspicions are not valid.

Similarly, the reports and the sales figures suggest there is great trust in Turnitin. Hildebrand notes (2002, para. 7-8), “The system isn’t foolproof, but its very presence in schools seems to serve as a deterrent. 'You don't have to use it,' observes [one teacher … ] 'Students just have to know you have it.'” It is another concern that Turnitin's results may be accepted without further questioning; if a match is found then the student is automatically believed to be guilty, while if no match is found, the student is automatically considered innocent. The bulk of comments on Turnitin’s testimonials page provide evidence of great and sometimes unquestioning trust in Turnitin's results – and that makes these concerns very real (Turnitin.com Testimonials, 2002).

Turnitin itself does not claim to be infallible. Which is just as well. It isn’t.

To top

How well does Turnitin perform?

Several reports draw attention to areas which Turnitin does not cover, and they point to shortcomings in the areas which Turnitin does claim to cover. The fact that Turnitin finds matches in one-third of papers submitted means little, unless one knows that only one-third of all papers written contain plagiarism. A true test of a detection service’s detection rate might require a control group, such as a set of papers where plagiarism is known because the investigator has “created” the submission.

This has been done in at least four investigations.

Robin Hill carried out a small test using three completely plagiarized essays. Turnitin failed to find one essay, and found only one of the two sources used to create the second essay. Turnitin failed to find the third essay, but did find a number of false hits, matching strings of words in other papers. Since this last was a paper on recombinant DNA, Hill suggests that the false hits were found because of the limited language available in this field of study (Hill, 2002).

The Joint Information System Committee (JISC) carried out a wide-ranging investigation in several British post-secondary institutions. One of the issues investigated was the success of Turnitin and other detection agents in detecting plagiarism. The investigating team used a number of genuine essays, and also “created” 11 essays from a variety of sources, including paper mills or essay banks. Students were warned before the project stage that their work would be used to test the services. Turnitin performed best of the software and agencies investigated. The JISC report gives Turnitin a five-star rating (excellent) for tracking down cases of collusion, and four stars (good) for discovering cut-and-paste and paper mill origins (Chester, 2000; Large, 2001).

However, respondents to the JISC survey on plagiarism believe that 74 percent of plagiarism originates from textbooks and theses, and only 24 percent from the Internet. At a conference held to discuss the various project reports, there was strong voice for textbooks and also for print and online journals to be included in the Turnitin database (Large, 2001).

The JISC Reports also noted “areas not covered by the project:

  • Detection of text converted to a foreign language and then converted back to English
  • Detection of essays converted from a foreign language
  • Plagiarism of diagrams, pictures or graphs”

and certain other forms of cheating (Large, 2001, p. 2).

To top

My own research forms the third investigation, another attempt to investigate the various plagiarism detection services. My interest grew in part from the realization that many of the papers available on the cheat sites themselves contained huge amounts of plagiarized material, not always detectable using ordinary search engines. I too compiled a number of essays from various sources. It is very easy to plagiarize; could I get away with it? With my plagiarized essays and a number of genuine student essays, I tested various free services, including services which normally charge but which do offer free trial periods. All services showed wanting. My favorite was Findsame, but this company has since collapsed and its services are no longer available.

I found that Turnitin found no matches for material lifted from usenet discussion groups and discussion lists; found no matches for material lifted from online encyclopedias; and did not track down material lifted from journals located in subscription databases. There is irony here, for when much on the Internet is of dubious worth, many librarians encourage students to use the periodicals databases as worthy published sources (Valenza, 2001). Yet these worthy sources are less likely to be detected and a guilty student can escape detection. Nor did Turnitin work well with transformations and paraphrases. In the Piper High School case, one parent reportedly said her daughter “is not sure now how much she needs to rewrite research material before she can use it” (Carroll, 2002, para.16). Rewriting is not the issue; a piece of work authored by someone else but rewritten in one’s own words still needs a citation to the author whose thoughts, if not words, are being used.

On the other hand, it did find matches for small contentless strings of words in completely irrelevant documents. It several times made false accusations of plagiarism, but missed by far the greater part of the material I really had plagiarized, missing 15 of 18 plagiarized passages in one of the essays (Royce, 2001).

Turnitin seeks only one match, and this is a particular concern. As soon as Turnitin finds a match, it stops seeking for other matches for that particular string or section of paper. It cannot distinguish between material used and cited correctly, material for which false citations are made, and material lifted without any citation at all; a match is all, and any match will do.

To top

Also of concern is that a student can quote and cite accurately, but still be accused of plagiarism. If a student does give citations, then the instructor must check them out. This is the first rule of plagiarism detection. The fact that Turnitin finds a passage on a different web site to that cited does not necessarily prove plagiarism. On the Internet, there is frequent double and multiple posting of identical pages on several sites. Of course, if a match is found with no attribution at all, this could be a more definite indication, but it still needs to be checked out.

The fourth investigation, by Satterwhite and Gerein, is the most thorough survey of those discussed here. This team bought a subscription to a detection service, and their budget allowed them to use papers purchased from Internet paper mills as well as papers downloaded for free. They too compiled their own plagiarized essays, and also submitted genuine student essays.

Their best results were from Turnitin, but they remain cautious about recommending it. They were “… not very impressed with the results provided by both the paid and free plagiarism detection services and software” (Satterwhite & Gerein, 2002, Summary of our observations, para. 1). They go on to report: “Based on our findings this far, we are fairly confident in our ability to relate to our faculty that available detection software and services as they currently exist are not effective tools with which to identify online plagiarism. They are not reliable, nor sophisticated enough to warrant the investment of college funds. Not only are they ineffective, but some of the products/services promote a real lack of trust and resentment between professor and student that, especially given their lack of success, makes such a purchase undesirable” (Satterwhite & Gerein, 2002, Preliminary conclusions, para. 1).

Satterwhite and Gerein do report that despite their shortcomings, the detection services performed better than search engines. I would disagree, and suggest that it very much depends on which search engines are used, the skill of the searcher and the source of the material. I always recommend that those seeking to prove plagiarism make sure they include the sources available to students, sources which often include online periodicals databases and CD-ROMs in any library which the students use; it is also worth looking at print resources in the student’s library, journals and books. I believe “an automated search for plagiarism makes the whole thing mechanical. It lacks the nose and instinct, the logic, intuition and determination of a skilled human bloodhound. A skilled librarian may be better able to discover plagiarism … I think it probable that a skilled librarian would have tracked down most plagiarized sequences [from the test submissions] especially those lifted from electronic sources and from the Internet” (Royce, 2001, p. 182).

The comparative studies agree, Turnitin is probably the best plagiarism detection service available but there are still major concerns.

To top

Does it matter?

Does it matter that Turnitin is less than perfect, when you do not have to prove that the whole of an essay has been plagiarized? Isn’t it enough to find that a significant amount has been plagiarized? Indeed it is, and will be, until would-be cheats realize the shortcomings of Turnitin and similar services. They might then use periodicals databases and usenet groups, they might seek out material elsewhere in the Invisible Web, they might well get ever better at rewriting in their own words. They might set out to discover the holes in a detection service’s coverage – or read a paper like this. The thinking cheat might well take advantage of Turnitin’s offer to check five pieces of work free of charge. If the work passes muster, well and good, and if certain parts of it need more careful handling then this too can be done – and checked before submitting the final product to the teacher.

The bottom line is that innocent students may be falsely accused of plagiarism, and that many plagiarists may go undetected. With these caveats in mind, Turnitin can be used to weed out the most obvious cases, to perform a first sweep. Using it in this way will save much time and worry. But the instructor still needs to check each report, especially when the student has given citations, even if they do not match the sources pinpointed by Turnitin. And then the instructor must still check, or ask the librarian to check, whenever plagiarism is still suspected in other papers which have been given a clean bill of health.

To top

Conclusion

We are not going to beat the cheats. It is too easy to beat the system. In particular, working in a bilingual school and with an international school background, I am all too aware of how many students have two and more languages. If it is difficult to prove plagiarism when a student has heavily rewritten a piece, using own words in place of the original, then it is nigh impossible to catch someone who has translated a source written in one language into a paper written in another.

We can attempt to set plagiarism-proof assignments; we can make it so that students do not want or need to copy; we can devise alternative presentation methods which minimize the opportunity for plagiarism; we can stress process as well as content; we can ask students to provide originals or copies of the sources used; we can make it so hard for the plagiarist to plagiarize that it is easier to do the real work; we can try to promote honorable and ethical attitudes towards work. We can indulge in any number of techniques and strategies which will reduce plagiarism. We can do all this, but we must be aware, we are not going to beat out of existence those who are determined to cheat.

In the meantime, we must also be aware of the shortcomings of plagiarism detection services. They are a tool, a weapon, a deterrent. But they have to be used wisely, with an awareness of their shortcomings. The human element remains vital, and without further investigation of their findings, both positive and negative, innocent students stand to be accused of plagiarism and guilty students could still get away with it. Plagiarism services are a tool, but caveat emptor, buyer beware!

To top

References

Canadian students choose Internet as top homework source, but spend more than half their time searching for relevant information. (2001, October 4). Rogers iMedia Education Group – Press Releases. Retrieved August 6, 2002, from http://www.rogerseducation.com/press_releases/100401.html.

Carroll, D. (2002, January 29). Teacher quits in dispute with school board over student plagiarism. Kansas City Star. Retrieved February 8, 2002, from http://www.kansascity.com/mld/kansascity/2561083.htm.

Chester, G. (2000). Pilot of free-text plagiarism detection software: a report prepared for the Joint Information System Committee. Retrieved April 25, 2002, from http://www.jisc.ac.uk/pub01/pilot.pdf .

Goot, D. (2002, September 10). Thin line splits cheating, smarts. Retrieved January 2, 2003, from http://www.wired.com/news/school/0,1383,54963.00html.

Hildebrand, J. (2002, December 17). Using the Internet To Catch Cheaters. Newsday.com.
Retrieved January 2, 2002, from http://www.newsday.com/mynews/ny-lied173049335dec17.story

Hill, R. (2002). Brown bag lunch: Turnitin plagiarism detection software. Retrieved April 1, 2002, from http://uwadmnweb.uwyo.edu/ctl/event_calendar/TIINotes.txt.

To top

Internet plagiarism worries educators. (2001, April 30). Milwaukee Journal Sentinel Online. Retrieved March 11, 2002, from http://www.jsonline.com/bym/Tech/news/apr01/cheat01043001.asp.

Large, S. (2001). Notes from the JISC workshop on electronic detection of plagiarism held on the 16 July 2001. Retrieved April 25, 2002, from www.jisc.ac.uk/events/01/plag_det/sacha.rtf.

Lenhart, A., Maya S. & Graziano, M. (2001). The Internet and education: Findings of the Pew Internet and American Life Project. Retrieved June 10, 2002, from http://www.pewinternet.org/reports/pdfs/PIP_Schools_Report.pdf .

Masur, K. (2001, May). Papers, profits, and pedagogy: Plagiarism in the age of the Internet. Perspectives Online. Retrieved June 10, 2002, from http://www.theaha.org/perspectives/issues/2001/0105/0105new3.cfm.

Notess, G.R. (2002, March 6). Search engines statistics: Database overlap. Retrieved June 13, 2002, from http://www.searchengineshowdown.com/stats/overlap.shtml.

O’Connell, J.C. (2002, May 1). Cliché amendment: Plagiarizers never prosper. The Lantern. Retrieved May 15, 2002, from http://thelantern.com/main.cfm/include/detail/storyid/248205.html. Free registration required.

Paper chase. (2002, April 16). The Santa Rosa Press Democrat. Retrieved April 30, 2002, from http://www.pressdemocrat.com/search.

Principal in plagiarism dispute announces resignation. (2002, March 17). Amarillo Globe News. Retrieved April 30, 2002, from http://www.amarillonet.com/stories/031702/usn_principal.shtml.

To top

Royce, J. (2001, Winter). Quis custodiet…: Investigating the investigators. School Librarian 49 (4), 181-183.

Satterwhite, R., & Gerein, M. (2002). Downloading detectives: Searching for on-line plagiarism. Retrieved June 11, 2002, from http://www.coloradocollege.edu/Library/Course/downloading_detectives_paper.htm.

Sherman, C. & Price, G. (2001). The invisible web: Uncovering information sources search engines can't see. Medford, NJ: Information Today, Inc.

Slaton, J. (2002, April 29). Plagiarizers beware: Turnitin.com is here to stop your cheating ways. SF Gate. Retrieved May 2, 2002, from http://sfgate.com/cgi-bin/article.cgi?file=/gate/archive/2002/04/29/plagiar.DTL

Tucker, J. (2002, December 9). 75 percent of students admit to fraud, studies show. The Oakland Tribune. Retrieved January 2, 2003, from http://www.oaklandtribune.com/Stories/0,1413,82%257E1865%257E1041793,00.html.

Turnitin.com intro. (2002.) Retrieved June 1, 2002, from http://www.turnitin.com/.

Turnitin.com Testimonials (2002). Retrieved January 4, 2003, from http://www.turnitin.com/static/testimonials.html.

Valenza, J. (2001, September). What's not on the web. Learning and Leading with Technology, 29 (1), 6-9, 48.


John Royce is Library Director of Robert College, in Istanbul, Turkey, and has worked in Germany, Malawi, England and Zambia. Currently the IASL Regional Director for North Africa and the Middle East, he has twice served as chairman of the ECIS Libraries Committee. He can be contacted at jroyce@robcol.k12.tr.

 

Teacher Librarian, or TL as we're often called, is designed specifically for you, the library professional working with children and young adults.

Email Us Return to Home Page About Us TL Magazine Subscribe Now TL Toolkit Contact Us Webmaster Disclaimer Privacy Statement Subscribe Today