himazu on Japan, Tech, and Social Issues: Japanese Spams Gmail Cannot Filter

One month ago or so, I started seeing Japanese spams which Gmail fails to filter much more frequently than before. Until then, Gmail spam filter had been doing a decent job with Japanese spams, but that's not the case now. I'm receiving dozens of Japanese spams which Gmail fails to filter everyday now, which is quite annoying.

The distinguishing characteristics of those spams is that their subjects are claimed to be in the ISO-2022-JP charset but actually in the SHIFT_JIS charset. And they are encoded in base 64. e.g.

Subject: =?ISO-2022-JP?B?kWaQbJBsjciSsouzk6+NRInvgqmC54LMgqiSbYLngrmBQg==?=

The problem is that, according to my observation, because of the false claim, Gmail understands the subject as a random string hence its spam filter doesn't work as it should. Here's how Gmail looks to understand the subject:

The false claim is a result of sloppy understanding of how to compose a Japanese email. It's ironic that the sloppiness works in favor of the spammers against Gmail's spam filter.

I'd really like Gmail to cope with it soon. Let me point out that this spamming technique is not Japanese specific; it can be employed for other languages as well.

Added on 2006-03-20:
At the time I published this posting, I notified Google about it. I don't know how it contributed, but now, Gmail's SPAM filter seems to be able to cope with SPAMs of this kind to some extent.

himazu on Japan, Tech, and Social Issues

2006-03-06

Japanese Spams Gmail Cannot Filter

No comments:

Blog Archive

About Me