bson4jackson 2.2 has just been released!

Version 2.2 of bson4jackson has just been released. bson4jackson adds support for BSON, a binary representation of JSON, to the Jackson JSON processor.

The latest release of bson4jackson now supports Jackson 2.2. Apart from that, Ben McCann and John Stoneham fixed the Maven dependencies and updated some 3rd party libraries, so project builds depending on bson4jackson should now be more stable. Thanks a lot for that, guys!

Projects using bson4jackson

bson4jackson is used in several other Open Source projects including the following one:

  • Jongo is a rather cool library that allows MongoDB to be queried in Java just like you would query it in the MongoDB shell. Jongo uses bson4jackson to serialize objects before they are sent to the database, and of course to deserialize queried documents.

    http://jongo.org/

  • MongoJack is a POJO mapper that uses Jackson and bson4jackson to serialize and deserialize objects before they are sent to the database. MongoJack is extremely fast and very easy to handle.

    http://mongojack.org/

I know that there are a lot of other projects out there that use bson4jackson. If you want your project to be added to this list please leave a comment below or send me a message.

More information

For a complete description of bson4jackson (including how to download it) have a look at my tutorial.


Spamihilator website now Open Source

I’ve just released the source code of the Spamihilator website as Open Source. You can download it from the following GitHub repository:

https://github.com/michel-kraemer/spamihilator.com

Everyone is invited to make contributions! I’m open to all kind of changes. You may submit new content (e.g. FAQ), change the design or style, etc.

If you want to contribute please fork the GitHub repository and send me pull requests. I will check and upload them to the Spamihilator web server as soon as possible. Further instructions can be found below or in the README file.

Building

The Spamihilator website has been created using Jekyll. If you want to build it please follow these steps:

  1. Download and install Ruby 1.8.7 (if you haven’t done so already).

    Under Windows I recommend to use RubyInstaller. Under Linux and Mac OS I highly recommend to use rvm as the repository already contains proper .ruby-version and .ruby-gemset files.

  2. Install the bundler gem (if you haven’t done so already).

    gem install bundler

  3. Clone or download this repository.

  4. Open a command line shell in the cloned directory and enter the following command:

    bundle install

  5. After that you are ready to build the website using the following commands:

    compass compile
    jekyll

  6. Repeat these commands whenever you make a change. The files will be compiled to the subdirectory _site.

Run locally

You may also run and test the website locally before uploading your changes. In order to do this, follow the instructions above and then run the following command:

jekyll --server

Launch a web browser and open http://localhost:4000 to view the site. For more information see Jekyll’s website.

License

Creative Commons License
If not noted otherwise the files in the Spamihilator website repository are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.


5 anti-spam measures for phpBB 3.0

This is an English translation of my German blog post. I updated it so the measures described here are compatible to the latest phpBB 3.x version.

phpBB is an open-source forum software that is very popular and widely used. This makes it an ideal target for spammers. The phpBB developers therefore implemented an improved Captcha in version 3.0. But spammers have already adapted to this and have implemented improved bots that are able to break the new Captcha and to automatically create junk posts. In the following I will describe five anti-spam measures that effectively reduce spam in every phpBB 3.x instance. The main goal of these measures is to block as many spam posts as possible without affecting normal forum users.

Typically spammers try to advertise certain websites. About 95% of all spam posts contain links or URLs. The most effective way to block those posts is to completely forbid links. However, this would also affect normal forum users.

Spammers usually sign in to a forum and then immediately start posting junk. We can make use of this and forbid links only to guests and users with less than a certain number of posts. Once a normal user has reached this number of posts links will be enabled. Typical spam bots will never reach this number since all their posts will be blocked.

To forbid posting links you have to add the following to the function submit_post() in the file includes\functions_posting.php.

//Define the minimum number of posts for "good" users
//Users below this threshold are considered potential spammers
$user_posts_threshold = 3;
 
//strip whitespace characters in the post body
$msgwows = $data['message'];
$msgwows = str_replace(" ", "", $msgwows);
$msgwows = str_replace("\n", "", $msgwows);
$msgwows = str_replace("\r", "", $msgwows);
$msgwows = str_replace("\t", "", $msgwows);
 
if (!$user->data['is_registered'] ||
    $user->data['user_posts'] < $user_posts_threshold) {
  if (strpos($msgwows, 'http://') !== FALSE ||
      strpos($msgwows, 'ftp://') !== FALSE ||
      strpos($msgwows, 'www.') !== FALSE ||
      strpos($msgwows, '[url') !== FALSE) {
    trigger_error("You are not allowed to post URLs!");
  }
}

This code should be put at the beginning of the submit_post() function to check all posts before they are saved to the database.

Measure #2: images

Spammers often try to trick spam filters by posting images instead of text. They put their junk messages and links into image files and then attach them to forum posts. You can use the same technique as the one described in measure #1 to block images for guests and users with less than a certain number of posts. Put the following code in the function submit_post() in the file functions/functions_posting.php.

if (!$user->data['is_registered'] ||
    $user->data['user_posts'] < $user_posts_threshold) {
  if (strpos($msgwows, '[img') !== FALSE) {
    trigger_error("You are not allowed to post images!");
  }
}

Measure #3: Russian and Chinese posts?

A lot of spam posts are written in Russian or Chinese or simply contain a lot of special characters and garbage. If your forum is targeted to English users you can check if a post is written in English before it is submitted. Posts that mostly contain special characters or foreign characters can then be treated as spam.

Cory Mawhorter has developed a PHP funktion (is_english()) that is able to recognise special characters. You can use this function to differentiate English posts from any other.

if (!$user->data['is_registered'] ||
    $user->data['user_posts'] < $user_posts_threshold) {
  if (!is_english($msgwows, 0.75)) {
    trigger_error("Only English posts are allowed here!");
  }
}

Measure #4: http:BL

Project Honey Pot offers an effective system to keep spammers and mail address harvesters away from websites. http:BL matches the website visitor’s IP address against a database. If the IP address is known to be used by a spammer the visitor will be blocked before the website is even rendered. The system uses DNS which makes queries very fast.

In order to use http:BL you first have to sign up for Project Honey Pot. You will receive a special key that is used to authenticate against the system. They already offer a MOD for phpBB but it is only compatible to version 2.0. You may be able to make it compatible to phpBB 3, but alternatively you can simply put the following code at the end of the file common.php.

//configure your http:BL Access Key here
$httpblkey = "xxxxxxxxxxx";
$httpblmaxdays = 21;
$httpblmaxthreat = 25;
 
//if you already configured a honey pot on your website use this line:
//$httpblhoneypot = "http://xxxxxxxxxxx";
 
function httpbl_check() {
  global $httpblkey, $httpblmaxdays, $httpblmaxthreat, $httpblhoneypot;
 
  $ip = $_SERVER["REMOTE_ADDR"];
 
  $result = explode(".", gethostbyname($httpblkey."."
    .implode(".", array_reverse(explode(".", $ip)))
    .".dnsbl.httpbl.org"));
 
  if ($result[0] != 127) {
    //something went wrong or the IP is not in the database.
    //ignore this one.
    return;
  }
 
  $days = $result[1];
  $threat = $result[2];
 
  if ($days < $httpblmaxdays && $threat > $httpblmaxthreat) {
    if ($httpblhoneypot) {
      header("HTTP/1.1 301 Moved Permanently");
      header("Location: ".$httpblhoneypot);
    }
    die();
  }
}
httpbl_check();

Please make sure to put your http:BL access key in the variable $httpblkey.

Measure #5: Akismet

Another technique to block Internet spam is Akismet. This system is usually used in WordPress blogs to block comment spammers. Just like for Project Honey Pot you need to sign up to receive an API key.

You can use Akismet to block posts in phpBB 3 forums as well. The system may produce false positives (normal posts accidentally identified as spam). I therefore recommend to only check the first posts of a new user until he or she has reached a certain number of posts. The following code uses the file Akismet.class.php that can be downloaded from Alex Potsides’ blog or from his GitHub repository. Put the code in the function submit_post() in the file includes/functions_posting.php.

//configure your Akismet API key here
$akismet_key = 'xxxxxxxxxxx';
 
//the URL you entered when you registered for a Wordpress account
$akismet_url = 'xxxxxxxxxxx';
 
include('Akismet.class.php');
 
$akismet = new Akismet($akismet_url, $akismet_key);
if (!$user->data['is_registered'])
  $akismet->setCommentAuthor($username);
else
  $akismet->setCommentAuthor($user->data['username']);
$akismet->setCommentContent($data['message']);
$akismet->setUserIP($user->ip);
if ($user->data['is_registered'])
{
  $akismet->setCommentAuthorEmail(strtolower($user->data['user_email']));
  $akismet->setCommentAuthorURL(strtolower($user->data['user_website']));
}
 
if ((!$user->data['is_registered'] ||
    $user->data['user_posts'] < $user_posts_threshold) &&
    $akismet->isCommentSpam()) {
    trigger_error("Akismet says your post is spam");
}

Put your Akismet API key into the variable $akismet_key. The URL you entered during sign-up has to be put in the variable $akismet_url.

Akismet can also be used reasonably to block spammers who try to sign up to your forum. Put the following code into the function user_add() in the file includes/functions_user.php.

//configure your Akismet API key here
$akismet_key = 'xxxxxxxxxxx';
 
//the URL you entered when you registered for a Wordpress account
$akismet_url = 'xxxxxxxxxxx';
 
include('Akismet.class.php');
 
$akismet = new Akismet($akismet_url, $akismet_key);
$akismet->setCommentAuthor($username_clean);
$akismet->setUserIP($user->ip);
$akismet->SetCommentAuthorEmail(strtolower($user_row['user_email']));
 
if($akismet->isCommentSpam()) {
  trigger_error("Akismet says you are a spammer");
}

Conclusion

The measures presented here help drastically reduce spam in phpBB 3.0-based forums. Since I implemented them in the Spamihilator forum a couple of years ago only a very small number of spammers were actually able to post. However, none of their messages contained links, URLs or images. They mostly consisted of a number of meaningless and motley words.

Forbidding links and images is in my experience the most effective way to block spammers. Searching for special characters and foreign languages blocks all other spam posts that do not contain links or images. Normal users are typically not affected by these measures. As soon as a normal user reaches a certain number of ‘good’ posts the anti-spam measures are disabled. Up to now, in the Spamihilator forum no spammer was able to reach this limit. 3 or 5 posts is in my experience a good threshold. If ever needed, this limit can easily be raised.

Spammers often try to put links and images into signatures. I highly recommend to disable this in phpBB’s administration area. You may also try to apply the link and image filters from measure #1 and #2 respectively to signatures.

Many phpBB forums plagued by spammers disable guest posts. Users have to be registered to post. For support forums like Spamihilator’s this can be tedious for users who would like to easily post support requests without have to go through the complete sign-up procedure. The measures presented here allow forum administrators to leave guest posts enabled.


bson4jackson 2.0 has just been released!

Version 2.0 of bson4jackson has just been released. bson4jackson adds support for BSON, a binary representation of JSON, to the Jackson JSON processor. The latest release of bson4jackson now supports Jackson 2.0.

Enda O’Donohoe fixed two bugs regarding the UTF-8 decoder. Thanks for that!

Support for Jackson 2.0 was greatly supported by James Roper. Thanks again for your contributions, James!

Support for older Jackson versions will be dropped with bson4jackson 2.0. If you’re looking for a version supporting the Jackson 1.x branch, then please download bson4jackson 1.3.0.

More information

For a complete description of bson4jackson (including how to download it) have a look at my tutorial.


Improved MongoDB compatibility for bson4jackson

Version 1.2.0 of bson4jackson has just been released. bson4jackson adds support for BSON, a binary representation of JSON, to the Jackson JSON processor. Thanks to contributions from the community, the latest release of bson4jackson now includes better support for MongoDB.

Gergő Ertli has fixed the support for the ObjectId type. Object IDs are used as the primary key for MongoDB documents.

Support for the UUID type has been added by Ed Anuff. He added a new module which can be registered to Jackson’s ObjectMapper:

ObjectMapper om = new ObjectMapper(new BsonFactory());
om.registerModule(new BsonUuidModule());

Thanks to the contribution by James Roper the BsonParser class now supports the new HONOR_DOCUMENT_LENGTH feature which makes the parser honor the first 4 bytes of a document which usually contain the document’s size. Of course, this only works if BsonGenerator.Feature.ENABLE_STREAMING has not been enabled during document generation.

This feature can be useful for reading consecutive documents from an input stream produced by MongoDB. You can enable it as follows:

BsonFactory fac = new BsonFactory();
fac.enable(BsonParser.Feature.HONOR_DOCUMENT_LENGTH);
BsonParser parser = (BsonParser)fac.createJsonParser(...);

Apart from that, a lot of other minor bugs have been fixed. The library has been tested with Jackson 1.7 up to 1.9.

More information

For a complete description of bson4jackson (including how to download it) have a look at my tutorial.