Translation memory
From My Wiki
Q: What are TM?
- All translations you've done and how can you use it usefully.
Q: How do you apply the TM? Are the suggestions useful?
- Besides just searching in them, there are various algorithms that help in doing rough translations
- Basic usage: Compares 2 sentences (to be translated, already translated) and sees how many characters are different between them (additions+deletions).
- The match is done on the original (eg. english) term, and the translation is completed automatically
- The similarity can be tweaked (trust 10% similarity or 90%). There are people who work on only with 100% because it's more efficient for them.
- Doesn't work that well for small sentences.
- Isn't considered "machine translation"
- Useful for "rough" translations
- Efficient for technical terms, less so for documentation and freely written text
Q: how useful are TMs?
- For free text, sometimes TMs are personal - each translator or project might have a different idea on how something should be translated.
- One-word matches cause a problem sometimes: Is the word "click" in this sentence a noun or verb?
- Pre-populate a TM for a big text. If a new version of it comes in, the TM will translate most of the text (3rd use)
Q: What is Omega-T?
- A translation tool that has a TM built-in, not a TM itself
Q: Can you use an existing TM?
- There are companies that allow to upload a TM (wordfast)
- Not many professionals share their TM
- Open source doesn't care about sharing their TM
- If you have existing files, you align the sources (same strings at the same positions/paragraphs) and then do a 1-1 match to create the memory.
Q: How are links handled?
- If the links change between translations, then you could have the TM ignore links
- You can mark such elements with special tags
- Similar to quotes -- some languages do them italic or bold
Q: How much matching do you get with your TM?
- Could get to 20-25% for new projects
- Matching paragraphs give much less percentages
- 10% is very good for paragraphs etc.
Q: Other algorithms?
- If you have 5 words moved in sentence, traditionally you get bad performance
- People have extended the algorithms to "catch" such situations
Q: Any good open source implementations?
- Levenstein: C implementation fast, Python one slow
- Almost all open source translation interfaces have TM in them
- Pretty simple though
- Commercial ones are probably not much better, but more integrated
- Some libraries have fulltext search (eg. with SQLite)
