Open Source Translation Tools
From My Wiki
At this point, we invite tool developers to add their tool(s) to the list below, which is just a scratch pad. As we get to a larger list of tools, we'll group, tag, etc and enter "real" tools (as opposed to concepts, alpha code, etc) into http://www.socialsourcecommons.org
Apertium
Apertium is an open-source machine translation toolkit, you can find out more about it here:
We mostly have support for the Romance languages of Spain, but are actively developing in Romanian, Afrikaans and Welsh. We are also seeking input and collaboration from third parties interested in open-source machine translation.
Current stable versions of our translators are available for:
- Spanish ⇆ Catalan (es-ca)
- Spanish ← Romanian (es-ro)
- French ⇆ Catalan (fr-ca)
- Occitan ⇆ Catalan (oc-ca) -- (oc and oc@aran)
- Spanish ⇆ Portuguese (es-pt) -- (pt and pt_BR)
- English ⇆ Catalan (en-ca)
- English ⇆ Spanish (en-es)
- Spanish ⇆ Galician (es-gl)
- French ⇆ Spanish (fr-es)
- Esperanto ← Spanish (eo-es)
- Esperanto ← Catalan (eo-ca)
- Portuguese ⇆ Catalan (pt-ca)
- Portuguese ⇆ Galician (pt-gl)
You can find more pairs on our Wiki:
Apertium is as far as we know is one of only two open-source (GPL) machine translation platforms (the other being Moses (LGPL) -- a statistical MT platform), but it is the only one that is useful for under-resourced and lesser-used languages as it doesn't require massive amounts of parallel text to get working.
We are very passionate about open source/free software machine translation, and for lesser-used/under-resourced languages in general, and would be interested in taking part some how. We would also be interested to know how we can make our software more useful for users. We are planning to add TMX support soon, but always welcome comments and suggestions.
All the tools are written in C++, with dictionary and rule specification formats in XML. Specification formats are compiled into efficient binary representations. On an average machine the system can process around 4,000 words per second.
For document file formats, we support: plain text, HTML, ODT, SXW, DOCX, RTF, generic XML, and a few others...
Passiflora
Passiflora will be a Free and Open Source, web-based cooperative document writing system. It is designed to prevent data version conflicts (for example: user 1 loads file, user 2 loads file, user 2 saves, user 1 saves, work of user 2 is lost or incorrectly mixed with work of user 1) without any locking (multiple users can simultaneously work on the same document).
Passiflora will have special support for open content licensing: It will feature a license chooser, and it will warn users when they combine content with incompatible licenses (in which case it will offer to request permission for use of a different license - if the author of the original work did not opt out of being asked for permission).
There will be a simple user interface for translating documents. We are interested in integrating machine translation software into this.
A first usable version of Passiflora, without the licensing features, should be finished a short time before OTT2007.
A project homepage is under construction: http://passiflora.nuxified.com/
Open Translation Engine (OTE)
The Open Translation Engine (OTE) is an open source project developing language translation and dictionary tools for the internet community. The prototype system currently supports Dutch to English translations. The OTE is written in PHP and uses a MySQL database.
The next release, which should be available before OTT2007, will allow users to interactively edit the translation dictionaries.
Main OTE site: http://ote.2meta.com/
OTE Project site: http://sourceforge.net/projects/ote/
World Wide Lexicon (WWL)
The World Wide Lexicon (WWL) is an open source project that enables web publishers to translate their content into any language, via volunteer and paid translators. WWL is designed to be embedded in a wide range of publishing and webservice environments. For example, see our blog for a demo of our Word Press plug in. One of the key goals of WWL is to enable collaborative translation for any website that wants to use it. We will be releasing an API at OTT07, which will be available for live use at the conference or soon after. We are also working on an in situ localization library (SLS : Simple Localization System) which we will release sometime in December.
Main WWL site: http://www.worldwidelexicon.org/
WWL Blog: http://blog.worldwidelexicon.org/
Translation Community for OTT07: http://www.worldwidelexicon.org/groups/index/OTT07
Moses
Moses is a statistical machine translation system that allows you to automatically train translation models from very large bilingual aligned parallel corpora for any language pair.
Main site: http://www.statmt.org/moses/
XLIFF
The XML Localisation Interchange File Format. Not a tool, but rather an initiative to create an open container format for localisation resources, similar to e.g. how PO is used in the open source software world as a container format for many file formats.
The current version (1.2) is now going through the OASIS standardization process, and the focus is now shifting towards the next major version (2.0). Here, the open source community has a great opportunity to input in how this format is going to evolve, ensuring that it is usable in common open source and open content localisation workflows.
XLIFF 1.2 is not yet *the* solution to all localisation format problems, but a step forward, and has a great potential. Some features include:
- Support for adding translation suggestions (eg. from translation memory, machine translation, previous versions)
- Some support for recording the translation history of a translation unit
- Support for adding tool-specific meta data
- Support for merging and sub-segmenting translation units
- Some support for translation work-flows
Note that these features are available in XLIFF, but there are not much open source tool support for this *yet*.
Main site: http://www.xliff.org/
Tools with some support for XLIFF:
- Wordforge Off-line Localization Editor (WordForge Foundation), priorly known as Pootling. Specialised editor for XLIFF. http://sourceforge.net/projects/wordforge2
- Pootle, Translate Toolkit - http://translate.sf.net/
- I'm (asgeirf) working on a Java api for manipulating XLIFF 1.2 and I will also be active in the development of XLIFF 2.0 through participation in the OASIS XLIFF Technical Committee
- OmegaT - http://www.omegat.org/
- QT Linguist
- Okapi - http://okapi.sf.net/
.po Translation Tool
FLOSS Manuals are working on a TWiki plugin for translation of .po files, online editing of CSS, and basic management of image translation. The plugin is in beta form, you can see it here :
http://en.flossmanuals.net/bin/localize
Transifex
Transifex is a novel system designed to ease the process of contributing translations to projects hosted on remote and disparate various version control systems (VCS). It acts as a proxy between the translator and the project maintainer, making the work of both more efficient. By abstracting the VCS to a common, easy to use interface, it makes the submission process to remote projects easier and straightforward. At the same time, by acting as a translation gateway to remotely hosted resources, developers are enabled to reach out to already established translation communities.
In contrast to similar systems, Transifex doesn't require the source code to be relocated or the translation files to be copied to a downstream VCS. Therefore, translation merging is not needed, and all downstream projects are equally benefited. Oh, and it's free software!
It is already in use by the Fedora Localization Project, counting more than 2000 translators.
Development site: https://fedorahosted.org/transifex/
Translation Workflow Tool
FLOSS Manuals is also working on a plugin for TWiki that will manage copying material from other TWiki installations, basic translation workflow, and inline editing comments. This dev has just begun, with luck there will be something to show during the event.
WordForge Off-line Localization Editor
WordForge, now working in its version 0.5, is an Off-line Localization Editor developed specifically to allow translators to get the most out of the XLIFF file format.
Besides the standard features of off-line localization editors (translation memory management, catalog management), WordForge can use Glossary and Translation Memory information that is stored in the XLIFF file itself, as well as external sources. Version 0.5 will support spell-checking, the use of a third language as reference, conversion of different file formats to and from XLIFF, and the possibility of checking the quality of translation immediately after finish the translation of a string. It will also support different option for SVN merging.
Version 1.0 of WordForge will support localization workflow management, assigning different roles to the users, and keeping information (inside the XLIFF file) about the steps followed by each translation. At each point in time, it will be possible to know which strings need to be translated, reviewed or approved. This is specially important for reducing the amount of work in the review and approval stages.
The WordForge Off-line Localization Editor is supported by the WordForge Foundation.
Translate Toolkit
The Translate Toolkit is a toolkit designed to make it easier for localisers to work with various formats, to help increase quality, etc.
It works with XLIFF and PO as its primary formats and can convert many other formats to these. e.g. Java .properties, Qt, HTML, Mozilla DTD, various wiki, etc.
The toolkit has a number of QA related tools e.g. pofilter that can perform >40 checks on the translations.
It also has functionality around Translation Memory management and glossary management
The toolkit is continually being developed and forms the basis of a number of other tools including Pootle (an online translation tool) and Pootling (an offline translation tool). The number of formats supported is slowly expanding over time.
Pootle
Pootle is a web-based translation tool that allows you to manage PO or XLIFF translations through a web interface. This allows for easier community participation.
The server helps to:
- Assign people various rights
- Create goals
- Check translations
- Allow ad hoc contribution
- Commit translations back to a version control system
- User terminology and translation memory matching
Pootle is being adopted by a number of localisation projects including OpenOffice.org, Creative Commons and others
PO Editing Tools
Just a quick list with no detail of existing PO offline editing tools
- poEdit
- KBabel
- gtranslator
- WordForge Editor
XLIFF Editing Tools
- WordForge Editor
- Transolution
- Open Translation Tools
CollaboDict
CollaboDict is a web application for collaborative creation of dictionaries. This means that an open or closed group of users can create a dictionary project and work together on it in a democratic way. An important feature of CollaboDict is that it is free software (in the sense of the GPL license) and that it also promotes open content, in the sense that the projects can be created only under a Creative Commons license. This guarantees that the content created by way of CollaboDict will always contribute to the common good, while protecting some rights of the authors.
CollaboDict is not only interesting for dictionary developers, but more so for other people, as it presents a online repository of dictionaries (ex. of specialized terminology) which people can consult.
Site: http://www.terminologija.org.mk/
Betawiki
Betawiki is a wiki dedicated to translating MediaWiki messages, its extensions, FreeCol and other projects. It supports export to .po for offline translation of most used messages and core messages. Betawiki editors contribute to more than 70 languages every month and this numbr increases with five to 10 languages every month.
Site: Betawiki
OmegaWiki
OmegaWiki is a wiki that allows people to work on both lexical, terminological and ontological data in multiple languages. OmegaWiki is based on MediaWiki and has its functionality extended with relational functionality. As the data is relational, it is possible to localise the complete user interface and it can be selected in the user preferences.
As OmegaWiki intends to provide facts, and as facts cannot be copyrighted in the first place, OmegaWiki only asks for CC-by licensing. The point is that people will know where to go when OmegaWiki has it wrong or when data is missing. The operational definition for "success" is "When people find an application for our data we did not think off, that is success.
Concepts have on average 8,72 translations, there are 53 languages with over 500 translations (December 2007). Existing relations will be shown in the UI language when a translation exists. We want to make our data available to other Open Source applications and we hope / wish for reciprocal collabaration.
