It wouldn’t be unreasonable to say that most people on the web would like their achievements to be available in all languages. This has the benefit of attracting and informing the greatest number of visitors or clients. The rapidly growing popularity of the WordPress platform around the globe has underlined the importance of delivering translated content, as well as the fact that translating a website is a demanding task. I would like to propose a partial solution to this problem.
This article is the first in a series of two and covers the definition of the problem and our proposal as a solution.
WordPress is a Content Management System (or CMS), open source software which allows you to build websites quickly and efficiently. The pages of a website using WordPress can be added, modified, updated without having to learn how to be a programmer. The platform is very popular and highly customizable: about 30% of all websites use it. As a result, developers around the globe are working in the WordPress environment.
WordPress in multiple languages
However, the popularity and prevalence of WordPress presents a problem: if the language of the site is different from the language of a plugin or the theme in use, it will be necessary to translate the content when adding it to the site. In addition, the “core” of WordPress must also be translated. While the WordPress team makes the project available in several languages, they can’t possibly cover all existing languages around the world on their own. Fortunately, WordPress is very flexible and allows for contributors to translate available plugins or themes on wordpress.org, as well as the platform itself.
How does the translation system work?
Access to translations is made possible through PO files. PO files are text files that have a specific structure and contain source strings (original language) and translation strings (which are empty at the time of file creation). The PO format makes it possible to translate the content of the site without changing the software’s code. WordPress documentation explains to developers the process of creating PO files and translators format operation.
Anyone can contribute. However, translation suggestions must be approved by someone with an editor. Submitting and getting translations approved allow communities to enjoy their WordPress sites in their native language.
Unfortunately, each language (or “locale”) has a varying number of contributors and translation suggestions are not translated at the same pace. A plugin could have thousands of translation strings and for a small community of contributors, the task may seem insurmountable.
On the other hand, translation of the same plugin using an interchangeable “locale” could be completed and approved quickly. For example, the case of English US and English UK. This content is publicly available and can be reused.
Naturally, the cultural differences between two locales (“colour” and “color”, for example) must be taken into consideration, but in general their contents are almost interchangeable. Unfortunately, simply copying content from one language to another is not a viable approach: already approved translations could exist in our language file.
But what if we automated translating?
The solution is to use existing translations from an interchangeable locale for the language that is missing translations. However, the manual approach is not realistic either: the workload remains enormous and humans are prone to make mistakes.
The real solution is to create a tool that would process PO files and automatically merge the content. Nevertheless, such a tool presents several questions and conceptual difficulties to solve:
- Does the tool have to support two specific languages or all possible languages in the context of WordPress (this would surely be appreciated by various communities)?
- If the tool must support all languages, it must also consider the plural forms of each language. For example, the Arabic language has six plural forms. Thus, the complexity of the tool increases significantly.
- Is a user interface required or not?
- What programming language should I use (in fact, WordPress uses PHP, it would probably be a good idea to keep the same language)?
- How to proceed with cases like “colour” and “color”? Simply replace the word in question, or mark it for revision? A set of contextual issues could surface by simply replacing them.
In short, there are many difficulties, but the concept is attractive. In fact, such a tool already exists, but I find that before diving into the technical aspects, it is necessary to explain why the tool must exist, what context led to its creation and what impact it could have on the potential users.
Incidentally, this emphasizes the power of programming: a problem that repeats itself and which, each time, takes hours and hours of tedious work, or (if we dramatize in little) “a painful infinity” could be solved in a fraction of time!
Definition of terms
- CMS: Content Management System, Content Management System, which lets you easily and dynamically create the content of a website.
- PO file: A text file that contains translation strings.
- Localization (“locale”): In context, an identifier (or code) of the language (fr_CA, fr, en_US, etc.)
- Open Source: Software with freely accessible source code, but the use case could be defined by guidelines.