TMX Files and Format

This is an overview of what a TMX file is and it's basic format

Christos Balafoutis avatar
Written by Christos Balafoutis
Updated over a week ago

TMX, or Translation Memory eXchange, is a standard file format used in the translation industry to store and exchange translation memories. A translation memory is a database of previously translated phrases and sentences, along with their corresponding translations, that can be used to assist in translating new documents.

TMX files are XML-based, which means they are structured using tags and attributes. The overall structure of a TMX file consists of a header section, followed by one or more body sections, each containing translation units.

The header section of a TMX file includes information about the translation memory, such as its name, source language, and target language. It may also include information about the tools and processes used to create the translation memory and metadata about the file itself, such as its creation date and any revisions made.

The body of a TMX file comprises one or more translation units, each representing a single translated phrase or sentence, pluralized or not. Each translation unit includes the source text and its corresponding translation, as well as additional information, such as the context in which the translation was used and any notes or comments made by the translator.

In addition to the basic structure described above, TMX files can also include various other elements and attributes, such as segments, notes, and context information, that can provide additional context or detail about the translations in the file.

Here is an example of the basic structure of a TMX file:

<?xml version="1.0" encoding="UTF-8"?>
<tmx version="1.4">
<header
creationtool="MyTranslationTool"
creationtoolversion="1.0"
segtype="sentence"
o-tmf="tmx"
adminlang="en-US"
srclang="en-US"
datatype="plaintext"
/>
<body>
<tu>
<tuv xml:lang="en-US" creationdate="20240212T191726" lastusagedate="20240212T191726>
<seg>Hello, how are you?</seg>
</tuv>
<tuv xml:lang="fr-FR" creationdate="20240212T191726" lastusagedate="20240212T191726>
<seg>Bonjour, comment vas-tu?</seg>
</tuv>
<tuv xml:lang="es-ES" creationdate="20240212T191726" lastusagedate="20240212T191726>
<seg>Hola, ¿cómo estás?</seg>
</tuv>
</tu>
<tu>
<tuv xml:lang="en-US" creationdate="20240212T191726" lastusagedate="20240212T191726>
<seg>I am fine, thank you.</seg>
</tuv>
<tuv xml:lang="fr-FR" creationdate="20240212T191726" lastusagedate="20240212T191726>
<seg>Je vais bien, merci.</seg>
</tuv>
<tuv xml:lang="es-ES" creationdate="20240212T191726" lastusagedate="20240212T191726>
<seg>Estoy bien, gracias.</seg>
</tuv>
</tu>
<tu>
<prop type="x-Plural-Source-Group">src-plural-id</prop>
<prop type="x-Plural-Rule">one</prop>
<tuv xml:lang="en-US" creationdate="20240212T191726" lastusagedate="20240212T191726>
<seg>{count} file</seg>
</tuv>
<tuv xml:lang="fr-FR" creationdate="20240212T191726" lastusagedate="20240212T191726>
<prop type="x-Plural-Translation-Group">tr-plural-id-1</prop>
<seg>{count] dossier</seg>
</tuv>
<tuv xml:lang="es-ES" creationdate="20240212T191726" lastusagedate="20240212T191726>
<prop type="x-Plural-Translation-Group">tr-plural-id-2</prop>
<seg>{count} archivo</seg>
</tuv>
</tu>
<tu>
<prop type="x-Plural-Source-Group">src-plural-id</prop>
<prop type="x-Plural-Rule">other</prop>
<tuv xml:lang="en-US" creationdate="20240212T191726" lastusagedate="20240212T191726>
<seg>{count} files</seg>
</tuv>
<tuv xml:lang="fr-FR" creationdate="20240212T191726" lastusagedate="20240212T191726>
<prop type="x-Plural-Translation-Group">tr-plural-id-1</prop>
<seg>{count] dossiers</seg>
</tuv>
<tuv xml:lang="es-ES" creationdate="20240212T191726" lastusagedate="20240212T191726>
<prop type="x-Plural-Translation-Group">tr-plural-id-2</prop>
<seg>{count} archivos</seg>
</tuv>
</tu>
</body>
</tmx>

In this example, the TMX file contains four translation units (TUs), each containing three translation unit variants (TUVs) in English, French, and Spanish. The last two translation units are pluralized, containing extra attributes, like the source group identifier and the plural rule.

Each TUV contains a single segment, which is a translation of the corresponding segment in the other languages. If the segment is a pluralized translation, then the TUV also contains the translation group identifier. These identifiers are used to group all segments of a single plural form in a specific language (source or translation language).

The srclang attribute in the header element specifies the source language of the translations (in this case, English), and the xml:lang attribute in the tuv element specifies the language of the translation.

This is a very basic type of TMX file. We have added elements inside the <tu> elements containing information like the creation date of the entry or the last used entry for a term.

Example:

<?xml version="1.0" encoding="UTF-8"?>
<tmx version="1.4">
<header
adminlang="en-US"
srclang="en-US"
creationtool="MyTranslationTool"
creationtoolversion="1.0"
creationdate="20240212T141145Z"
datatype="plaintext"
segtype="sentence"
o-tmf="tmx"
/>
<body>
<tu>
<tuv xml:lang="en-US" creationdate="20240209T233730" lastusagedate="20240209T233730>
<seg>Hello, how are you?</seg>
</tuv>
<tuv xml:lang="fr-FR" creationdate="20240209T233730" lastusagedate="20240209T233730>
<seg>Bonjour, comment vas-tu?</seg>
</tuv>
<tuv xml:lang="es-ES"creationdate="20240209T233730" lastusagedate="20240209T233730>
<seg>Hola, ¿cómo estás?</seg>
</tuv>
</tu>
<tu>
<tuv xml:lang="en-US" creationdate="20231218T185622" lastusagedate="20231218T185622">
<seg>I am fine, thank you.</seg>
</tuv>
<tuv xml:lang="fr-FR" creationdate="20231218T185622" lastusagedate="20231218T185622">
<seg>Je vais bien, merci.</seg>
</tuv>
<tuv xml:lang="es-ES" creationdate="20231218T185622" lastusagedate="20231218T185622">
<seg>Estoy bien, gracias.</seg>
</tuv>
</tu>
</body>
</tmx>

⚠️Warning: Make sure that you keep the format of a TMX file while splitting a TMX file into smaller size parts or transferring a TMX file from another cat tool to Transifex.

📝 Note

  • The maximum .tmx file size you can export from Transifex is 500MB.

  • The maximum .tmx file size you can import from Transifex is 200MB.

Overall, TMX files provide a standard, structured format for storing and exchanging translation memories, enabling translators and translation tools to easily access and use previously translated content to assist in translating new documents.


💡Tip

Looking for more help? Get support from our Transifex Community Forum!

Find answers or post to get help from Transifex Support and our Community.

Did this answer your question?