Translation memory

From My Wiki

Jump to: navigation, search

Q: What are TM?

  • All translations you've done and how can you use it usefully.

Q: How do you apply the TM? Are the suggestions useful?

  • Besides just searching in them, there are various algorithms that help in doing rough translations
  • Basic usage: Compares 2 sentences (to be translated, already translated) and sees how many characters are different between them (additions+deletions).
  • The match is done on the original (eg. english) term, and the translation is completed automatically
  • The similarity can be tweaked (trust 10% similarity or 90%). There are people who work on only with 100% because it's more efficient for them.
  • Doesn't work that well for small sentences.
  • Isn't considered "machine translation"
  • Useful for "rough" translations
  • Efficient for technical terms, less so for documentation and freely written text

Q: how useful are TMs?

  • For free text, sometimes TMs are personal - each translator or project might have a different idea on how something should be translated.
  • One-word matches cause a problem sometimes: Is the word "click" in this sentence a noun or verb?
  • Pre-populate a TM for a big text. If a new version of it comes in, the TM will translate most of the text (3rd use)

Q: What is Omega-T?

  • A translation tool that has a TM built-in, not a TM itself

Q: Can you use an existing TM?

  • There are companies that allow to upload a TM (wordfast)
  • Not many professionals share their TM
  • Open source doesn't care about sharing their TM
  • If you have existing files, you align the sources (same strings at the same positions/paragraphs) and then do a 1-1 match to create the memory.

Q: How are links handled?

  • If the links change between translations, then you could have the TM ignore links
  • You can mark such elements with special tags
  • Similar to quotes -- some languages do them italic or bold

Q: How much matching do you get with your TM?

  • Could get to 20-25% for new projects
  • Matching paragraphs give much less percentages
  • 10% is very good for paragraphs etc.

Q: Other algorithms?

String metrics
  • If you have 5 words moved in sentence, traditionally you get bad performance
  • People have extended the algorithms to "catch" such situations

Q: Any good open source implementations?

  • Levenstein: C implementation fast, Python one slow
  • Almost all open source translation interfaces have TM in them
  • Pretty simple though
  • Commercial ones are probably not much better, but more integrated
  • Some libraries have fulltext search (eg. with SQLite)
Personal tools