If you copy somebody else’s blog entry verbatim, credit the original author and link back to the original post.
Sometimes I’ll google my own topics to learn more about what other people have to say about it. I stumbled across some blatant plagiarism. While that was annoying, the cool thing was it hit me that you could write a tool to search for blog plagiarism:
1.) Have some some tool which reads through a blog feed. For each entry in the feed:
2.) use a search engine to search for a large part of the entry’s text. Perhaps search a paragraph at a time since there’s a higher chance of copying a single paragraph instead of the whole document. Since a whole paragraph is a pretty specific search, you’d expect only a few matches.
3.) scan each search result (skipping the ones for the original post, of course!) for a hyperlink back to the original blog or for the author’s name. If there is no such reference, the search result is likely plagiarizing the blog entry.
It seems like it should be pretty straightforward. It’s mostly glue around an RSS reader and an search engine API . (Actually, it sounds so simple, I bet such a tool is already out there. I expect this is a common problem with schools and student papers)
As a sanity check, I tried this method out be hand with an example search using MSN Search on my post about 0xFeeFee sequence points. At the time of writing (8/20/05), there are only 3 different matches: my original post, this, and this. (For each match, there’s actually a blog entry and an archived blog entry, so there were 6 total matches). When I pull up the source HTML for each of the results, I can see the 2nd one does not include any reference (either my name or blog URL) back to me; whereas the 3rd one does. So the tool could automatically flag the 2nd one as plagiarizing.
Offhand, I don’t know how to automate the search APIs. If I do end up writing such a tool, I’ll be sure to post back. (Update: I wrote the tool and it’s available here)