Most SMT systems are trained using the procedings of the European Court - as it's a huge corpus of multilingual documents which all have the same meanings.
This is a large factor as to why non-european languages typically don't fare that well in Statistical Machine Translation systems. The corpuses (corpii?) aren't as large for other languages.
Subject-broadness is another problem. Early information retrieval systems were trained (and many still are) using the Wall Street Journal as the corpus. Which means they work great for searching on the topic of big business, but not so great on getting apple pie recipes, as WSJ doesn't talk much about that ;)
There are some research projects (in rather early stages) that develop SMT techniques for translation to languages without big parallel corpora (essentially by bootstrapping such corpora, assisted by active learning). This could be of particular importance to keep smaller languages from disappearing, otherwise less and less works in that language will be available (yes, I'm aware that there are many people who consider language death a good thing).
This is a large factor as to why non-european languages typically don't fare that well in Statistical Machine Translation systems. The corpuses (corpii?) aren't as large for other languages.
Subject-broadness is another problem. Early information retrieval systems were trained (and many still are) using the Wall Street Journal as the corpus. Which means they work great for searching on the topic of big business, but not so great on getting apple pie recipes, as WSJ doesn't talk much about that ;)