New Spamming Tactics | mtekk's Crib

Something caught one’s eye today, there was a new comment the seemed far too familiar. The chosen name for the commenter may have been a complete give away. However, one has seen people with legitimate comments use their website name as their alias. It did not take much effort to find where the comment’s body came from, they were one’s own words from a comment placed earlier on post–over a month ago. Differentiating between simple, and misguided plagiarism and spam required looking at, or in this case only the URI of, the site linked to as the commenter’s “website” (some World of Warcraft gold selling site).

This seems to be the “holy grail” of comment spam, producing “relevant” comments while linking to what ever site they are promoting. Spam Karma 2 even thought it was valid–SK2 is losing it’s effectiveness. While in this case the site was not relevant, the body of the comment was relevant to the discussion. It took plagiarism to accomplish it, but for people already breaking laws what’s another broken law (plagiarism is a form or copyright violation/theft).

To protect against this new breed of spam a few things could be done to resolve the issue. The first is, in the case of SK2, the comment author website URI needs to be checked against a distributed blacklist as all other URIs in the comment body are (SK2 probably already does this, but the site was not on the list yet). Secondly, comments should be checked for an “originality” percentage. Basically, this would compare it against other comments for the post, and then under the potential matches, find how close it is to them. This would prevent direct sentence, paragraph and comment plagiarism/lifting. Ultimately, making code behave as a human is the goal. If all else fails, improving the ability to find the person behind the spam so that justice may be brought to him (or her) would suffice.

-John Havlik

[end of transmission, stay tuned]