Blog Spammers SUCK ! - Or, Blocking Spam in .TEXT

Recently I’ve been getting hammered with blog spam on www.JoeOn.net. My blog mirror blogs.msdn.com/joestagner runs on Community Server which does a better job of spam blocking but I don’t want to switch www.JoeOn.net to Community Server because it is a Commercial product (Sorry Rob) and I want to ability to code additions to meet my needs and make my code available to anyone that wants to use it. (Don’t get me wrong, community server ROCKS and I will be using it in a new On-Line community.)

I also really like DasBlog but stumbled when trying to migrate my blog content from .TEXT to DasBlog.

So I needed to find a way to solve the spam problem with my existing .TEXT setup, at least for the mean time.

A little research revealed this post by John Sample -THANKS John.

Original Post https://www.johnsample.com/articles/DotTextSpamBlock.aspx

Basically, it ads a table to the .TEXT SQL Server instance that is used to contain a list of “spam words”. If the post contains the spam word it is disallowed.

Where is how I’m using mine.

I’m primarily adding portions of the domain names that the spammers, who are mostly on line gambeling sites, are using. They are mostly free sites so this won’t be a problem with any valid perminant domains being referenced in a post.

Here is the SQL that creates the table.

IF NOT EXISTS (select * from dbo.sysobjects where id = object_id(N'[dbo].[FILTER_WORD]') and OBJECTPROPERTY(id, N'IsUserTable') = 1)
 BEGIN
CREATE TABLE [FILTER_WORD] (
 [fw_ID] [int] IDENTITY (1, 1) NOT NULL ,
 [fw_WORD] [varchar] (100) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
 CONSTRAINT [PK_FILTER_WORD] PRIMARY KEY  CLUSTERED
   (
   [fw_ID]
   )  ON [PRIMARY]
) ON [PRIMARY]
END
GO

Once the table is installed I just used Microsoft Query Analyzer to add the words to the FILTER_WORD table.

Next we I needed to alter the stored procedure – [blog_InsertEntry] as follows. [Download Here]

ALTER    Proc blog_InsertEntry
(
 @Title nvarchar(255),
 @TitleUrl nvarchar(255), 
 @Text ntext,
 @SourceUrl nvarchar(200),
 @PostType int,
 @Author nvarchar(50),
 @Email nvarchar(50),
 @SourceName nvarchar(200),
 @Description nvarchar(500),
 @BlogID int,
 @DateAdded datetime,
 @ParentID int,
 @PostConfig int,
 @EntryName nvarchar(150),
 @ID int output)
as

If(@EntryName is not null)
Begin
     if exists(Select EntryName From blog_Content where BlogID = @BlogID and EntryName = @EntryName)
 Begin
  RAISERROR('The EntryName you entry is already in use with in this Blog. Please pick a unique EntryName.',11,1)
  RETURN 1
 End
End
if(Ltrim(Rtrim(@Description)) = '')
Begin
   set @Description = null
End

If (@PostType = 3 or @PostType = 4)
   Begin
        declare @temp_text varchar(8000)

 set @temp_text = UPPER(isnull(CAST(@Text as varchar(7500)),'')) + UPPER(isnull(@SourceName,'')) + Upper(isnull(@TitleUrl,'')) + Upper(isnull(@Author,''))
 declare @word_buffer varchar(100)
 declare filterCursor CURSOR LOCAL FAST_FORWARD for
   select UPPER(fw_WORD) from FILTER_WORD
 OPEN filterCursor
 FETCH NEXT FROM filterCursor INTO @word_buffer
       WHILE @@FETCH_STATUS = 0
  BEGIN
  IF (CHARINDEX(@word_buffer,@temp_text) > 0)
      Begin
           CLOSE filterCursor
                    DEALLOCATE filterCursor
           RAISERROR('Disallowed words or phrases detected. I you feel this message is in error, please contact blog owner.',11,1)
           RETURN 1
                    End
  FETCH NEXT FROM filterCursor INTO @word_buffer
  END
  CLOSE filterCursor
  DEALLOCATE filterCursor
 End

INSERT INTO blog_Content
(Title, TitleUrl, [Text], SourceUrl, PostType, Author, Email, DateAdded,DateUpdated, SourceName, [Description], PostConfig, ParentID, BlogID, EntryName )
VALUES
(@Title, @TitleUrl, @Text, @SourceUrl, @PostType, @Author, @Email, @DateAdded, @DateAdded, @SourceName, @Description, @PostConfig, @ParentID, @BlogID, @EntryName)
Select @ID = @@Identity

if(@PostType = 1 or @PostType = 2)
Begin
 exec blog_UpdateConfigUpdateTime @blogID, @DateAdded
End
Else if(@PostType = 3)
Begin
 Update blog_Content
 Set FeedBackCount = FeedBackCount + 1 where [ID] = @ParentID
End

I also added the TrackBack filter. In addition to blocking based on words this update lets you add IP address to the filter file. To do this we need to modify the stored procedure [blog_InsertPingTrackEntry] as below. [Download Here]

ALTER      Proc blog_InsertPingTrackEntry
(
 @Title nvarchar(255),
 @TitleUrl nvarchar(255), 
 @Text ntext,
 @SourceUrl nvarchar(200),
 @PostType int,
 @Author nvarchar(50),
 @Email nvarchar(50),
 @SourceName nvarchar(200),
 @Description nvarchar(500),
 @BlogID int,
 @DateAdded datetime,
 @ParentID int,
 @PostConfig int,
 @EntryName nvarchar(150),
 @ID int output)
as

Set @ID = -1

if not exists (Select [ID] From blog_Content where TitleUrl = @TitleUrl and ParentID = @ParentID)
Begin
     if(Ltrim(Rtrim(@Description)) = '')
     set @Description = null

     declare @temp_text varchar(8000)

     set @temp_text = UPPER(isnull(CAST(@Text as varchar(7500)),'')) + UPPER(isnull(@SourceName,'')) + Upper(isnull(@TitleUrl,'')) + Upper(isnull(@Author,'')) + Upper(isnull(@Title,''))
     
     declare @word_buffer varchar(100)
     declare filterCursor CURSOR LOCAL FAST_FORWARD for
         select UPPER(fw_WORD) from FILTER_WORD
     OPEN filterCursor
     FETCH NEXT FROM filterCursor INTO @word_buffer
     WHILE @@FETCH_STATUS = 0
     BEGIN
          IF (CHARINDEX(@word_buffer,@temp_text) > 0)
               Begin
                  CLOSE filterCursor
                  DEALLOCATE filterCursor
                  --RAISERROR('Disallowed words or phrases detected. I you feel this message is in error, please contact blog owner.',11,1)
                  RETURN 1

               End
          FETCH NEXT FROM filterCursor INTO @word_buffer      
END
CLOSE filterCursor
DEALLOCATE filterCursor

INSERT INTO blog_Content
( PostConfig, Title, TitleUrl, [Text], SourceUrl, PostType, Author, Email, DateAdded,DateUpdated, SourceName, [Description], ParentID, BlogID)
VALUES
(@PostConfig, @Title, @TitleUrl, @Text, @SourceUrl, @PostType, @Author, @Email, @DateAdded, @DateAdded, @SourceName, @Description, @ParentID, @BlogID)

Select @ID = @@Identity

Update blog_Content
Set FeedBackCount = FeedBackCount + 1
where [ID] = @ParentID

End

— SO far it looks to have reduced my blog spam by 95%