2006-09-29

The Nagoya Omiai Protocol

There is the institution of "omiai" in Japan. It's a date arranged by a go-between, typically a middle aged woman. And it's a family matter. Not only a young man and a woman, but also their parents are involved.

An omiai date itself is not so unusual. Except the go-between attends to introduce the man to the woman and vice versa. The go-between leaves after the introduction. Then starts a date of a man and a woman who don't know well each other.

After the date, the man is supposed to see the woman home. Here a social protocol kicks in. In one protocol, the woman's parents are supposed to bring him in and give him tea and cake. If he likes the woman and wants to go forward, he drinks the tea. If he really likes her, he has the cake in addition. If he doesn't like her, he declines the tea, let along the cake. This way of showing the man's willingness is not official. Official communication takes place through the go-between the following day or so.

A friend of mine born and raised and living in Nagoya city for his entire life told me this protocol. Let's call it the Nagoya omiai protocol. Nagoya is a big city having Toyota as its economic backbone and known by its peculiar culture in Japan. I don't know other omiai protocols. But most likely, this is not widely observed in Japan.

When he explained the protocol, he confessed that he came to know the protocol after getting married. He went through several omiai's. He's a kind of person very few women don't like. Especially with women searching for their partner through omiai. In all the cases but last, he declined women he met at omiai's.

Without knowing the local protocol, he always had the tea and the cake he was offered when he saw a woman home. He regretted deeply how insulting he had been to the women and their parents. After showing his fondness to a woman on the day of omiai by having the tea and the cake, he communicated turndown through the go-between.

2006-09-28

How Indirect Kyoto People Are

Kyoto people are known to be quite indirect among Japanese people. There is a story showing it.
Assume you are visiting a home in Kyoto personally around noon. The host would offer a light meal by saying "how about having bubuzuke?"

Bubuzuke is the Kyoto dialect of ochazuke. Ochazuke is steamed rice topped with something (e.g. grilled salmon flakes, salmon roe) and soaked in green tea. It's a light meal. It cannot be a real meal.

The rule is, you have to decline the offer at first. And then the host would offer it again. You have to decline again. Only after the host offers it three times, you can accept it. If the host actually wants the guest to leave, the host doesn't offer three times. But to be polite, the host offers at least once.

Accepting the offer at the first or second time is not expected -- not following the Kyoto home visit protocol.

This is known to many Japanese people if not majority. Shortly after I got married to a Kyoto woman, I asked my wife if it's true.

"Not quite," she said. According to her:
Offering bubuzuke around noon really means "please leave now". How many times the host offers is irrelevant.

If the host really intends to offer lunch, it's prepared by then and they say "please have lunch". The expectation is that at the moment the guest hears the word bubuzuke, they know the time to leave.

After explaining it, she asked "this is a well accepted protocol all over Japan, right?" Not! She was born and raised and living there until she was 30.

2006-03-31

When you deal with katakana in your program

Katakana is one of several sets of characters used to write Japanese . Katakana is a phonetic alphabet whereas kanji is an ideogram -- each kanji character has a meaning and a set of pronunciations whereas each katakana character has only a pronunciation.

It's good to know that there is another kind of phonetic alphabet in Japanese -- hiragana. Hiragana and katakana are like lower and upper cases. Each hiragana character has a corresponding katakana character. Hiragana being off-topic, I don't go farther with hiragana here.

You can see a table of katakana characters here. For historical reasons, there is another set of katakana characters called half width katakana. You can see a half width katakana table here.

When you talk about non half width katakana specifically, you should call it full width katakana. Half width and full width came from how those sets of characters are displayed and printed. Typically, half width katakana characters occupies half the width of kanji characters whereas full width katakana occupies the same width as kanji.

When computers were much less capable, half width katakana was the only way to represent Japanese on computers. Consisted of 63 characters and taking the same resolution to display and print, katakana was easy enough to handle even in old days. Handling thousands of kanji characters requiring much higher display and print resolution had not been practical until the mid 80's. Representing Japanese only with katakana is somewhat like representing English only with upper case letters, which was common at the dawn of computing.

There are two pronunciation modifier symbols used in katakana -- voice sound mark and half voiced sound mark. For example, using Unicode code points and character names, U+30AC (KATAKANA LETTER GA) is U+30AB (KATAKANA LETTER KA) with a voiced sound mark attached. When half width katakana was designed and implemented, they decided not to have precomposed character of a katakana character with a pronunciation modifier.

Rather, it's represented, displayed, and printed as two consecutive characters. This is because it takes more dots and/or screen resolution to represent a katakana character with a pronunciation modifier as a precomposed single character.

Time went by and computers had become powerful enough to represent around 6000 Japanese characters requiring higher display and print resolution than Latin letters. Full width katakana was introduced in addition to half width katakana so that katakana characters are displayed and printed more properly.

As you can imagine, there is no need to use half width katakana now. However, for backward compatibility, half width katakana is still available and many people end up using it simply because it's available. And unfortunately, a typical user doesn't care half with and full width. Here arises needs for conversion between half width and full width katakana -- you have to normalize Japanese input data to half width or full width. Otherwise, search won't yield expected results.

There is another occasion where half width to full width conversion is necessary -- for email. The vast majority of Japanese email is in ISO-2022-JP charset, which lack half width katakana. There are cases where half width katakana is contained in a ISO-2022-JP text data, but ISO-2022-JP defined by RFC 1468 doesn't have half width katakana.

As described so far, the basic difference between full width and half width katakana is that the former has precomposed character whereas the latter doesn't. When you convert a half width katakana string into full width, you have to recognize a sequence of a katakana character followed by a pronunciation modifier and convert them into a single precomposed character.

2006-03-06

Japanese Spams Gmail Cannot Filter

One month ago or so, I started seeing Japanese spams which Gmail fails to filter much more frequently than before. Until then, Gmail spam filter had been doing a decent job with Japanese spams, but that's not the case now. I'm receiving dozens of Japanese spams which Gmail fails to filter everyday now, which is quite annoying.

The distinguishing characteristics of those spams is that their subjects are claimed to be in the ISO-2022-JP charset but actually in the SHIFT_JIS charset. And they are encoded in base 64. e.g.
Subject: =?ISO-2022-JP?B?kWaQbJBsjciSsouzk6+NRInvgqmC54LMgqiSbYLngrmBQg==?=
The problem is that, according to my observation, because of the false claim, Gmail understands the subject as a random string hence its spam filter doesn't work as it should. Here's how Gmail looks to understand the subject:


The false claim is a result of sloppy understanding of how to compose a Japanese email. It's ironic that the sloppiness works in favor of the spammers against Gmail's spam filter.

I'd really like Gmail to cope with it soon. Let me point out that this spamming technique is not Japanese specific; it can be employed for other languages as well.

Added on 2006-03-20:
At the time I published this posting, I notified Google about it. I don't know how it contributed, but now, Gmail's SPAM filter seems to be able to cope with SPAMs of this kind to some extent.

2006-03-03

Why Firefox's Share Is Small in Japan

This posting is inspired by a Slashdot Japan story of the same title. Some of the points made in the story are elaborated here.

According to a presentation in Mozilla Japan Seminar, Firefox's share is substantially smaller in Japan than in other regions. At this point, Firefox has 12% share world-wide, 20% in Europe, 15% in North America, 10% across Asia, and as small as 4% in Japan. Let me think why.

Japanese-Unfriedly
I think the prime reason is that Firefox is so Japanese-unfriendly. Firefox's web site design, download instruction, documentation, default font and character parameters have or used to have rooms for improvement for Japanese speakers. Average computer users in Japan are not comfortable with them. Even though there are a fair number of Japanese speakers involved with Firefox, Japanese localization is not enough.

For example, Japanese characters having many strokes on average, readability of serif fonts is noticeably less than san serif on computer screens. Screen resolution is still not enough to display serifs of a complex character at 10 point or so. Think about the dawn of personal computer around 1980. Characters being displayed at 5 dot by 7 dot, serif font design was impractical.

Firefox had long been employing a serif font (Mincho) as the default Japanese font. Whereas Internet Explorer's default Japanese font is a san serif font (Gothic). From version 1.5, Firefox's default Japanese font is san serif though.

I suspect that in Japan, English proficiency rate among computer users is quite low -- probably the world lowest. There are regions where general English proficiency rate is lower than Japan. But in those regions, computer users are in the privileged class and they tend to speak English.

Japanese Friendly Alternatives
There is another noteworthy factor. Before Firefox became popular, several free tab browsers using IE component emerged in Japan. Firefox's biggest appeal (arguably) being the tab feature, computer users in Japan got good IE based alternatives before Firefox. Those free tab browsers are written by Japanese programmer, those browsers are comfortable for Japanese users in terms of documentation, default parameters, and the feature set.

Postscript on 2006-09-28
Originally, I mentioned that one reason of low English proficiency in Japan was education. Not the quality of English education, but the fact that education is done in Japanese all the way. Whereas many non English speaking countries, college education is done in English.

A reader pointed out that in most countries, education is done in local language up to graduate school. I asked a Peruvian and an Italian friends about it. The reader is proved to be right.