HTML

Information about how Transifex handles the HTML (.html, .xhtml) file format.

Ryan avatar
Written by Ryan
Updated over a week ago

File Extension(s)

.html, .xhtml

i18n type(s)

HTML, HTML_FRAGMENT, XHTML


HTML

Any HTML document or part of one can be uploaded to Transifex.

Since HTML is not a proper i18n file format, translating offline often causes alignment issues, and as such, HTML documents should be translated with the web editor Transifex provides. You can still translate offline by downloading the file for translation, in which case Transifex inserts extra information that will help it map the content of the translated document back to the original segments.

If a segment has not been translated, the corresponding source string will be used instead so that the resulting document will be complete, even though partially translated.

Context in HTML

To provide context to your translators and make this information available in Transifex Web Editor, you can use the attribute tx-context :

```
<div tx-context="homepage">
<div tx-context="button">Register</div>
</div>
```

This information will be displayed under the context tab in the editor as follows:

context_html.png#asset:8409


HTML_FRAGMENT

The new HTML_FRAGMENT file format detects and extracts translatable content from any HTML file. Content is characterized as translatable and non-translatable according to type or position in the document. The file format is not parsing the HTML file, looking for correct formatting and structure of elements within, thus allowing for missing or misformatted elements. The fragment HTML format will extract content from any given file as long as the file used features correct HTML punctuation. Examples where the file format will fail include missing '<' or '>' or even nesting things like "<h1 <h2>>".

๐Ÿ“ Note: The HTML Fragment parser behaves the same way as the HTML parser with these exceptions:

  1. Ability to handle HTML documents of misformatted HTML "syntax".

  2. Maintain all whitespaces or any related "layout" characters the original input file affords.

Inline tags are considered to be text, meaning they are parsed to create HTML translatable entities. For example, the following HTML code will be considered a single translatable string:

This is an interestingly <b> bold </b> text.

The complete inline element list is as follows:

'b', 'big', 'i', 'small', 'tt', 'abbr', 'acronym', 'cite', 'code', 'dfn', 'em', 'kbd', 'strong', 'samp', 'var', 'a', 'bdo', 'map','object', 'q', 'span', 'sub', 'sup', 'button', 'label', 'select', 'textarea', 'snippet', 'u','img', 'input', 'br'.

The fragment HTML format does recognize content for translation in HTML attributes. The attribute content that is recognized for translation includes only the values for the listed attributes:

  • alt

  • label

  • placeholder

  • title

Regarding the end-user, the parsing and translating use-case flow remains the same except for the "Download for translation" option. A custom HTML file is created during that step to enable instant, inline HTML translations. Despite being the same as the original, it features a layout with a few extra details to allow more complex future translation workflows.

Example:

Macaque in the trees

During export, Transifex adds some internally created hashes on each item to allow the correct mapping of translated content once it is uploaded back into Transifex. In the above example, the download to translate file is going to be structured as follows:

Macaque in the trees

The types of tokens that any translator should translate are the Text and Attribute Value tokens. A special tx tag surrounds the translatable Text tokens:

<tx se_hash= "source_hash" > .... </tx>

Tags while the attribute values are prefixed by custom:

se_hash="source_hash" attribute, you will see the Transifex custom prefix only in the attribute title and src attributes.

Translators should update the values of only the prefixed attributes or the surrounded text objects. Any other update may introduce a broken translation to the related resource. Here's an example of how the HTML content inside the for-translation file looks:

Macaque in the trees

Here is how it should look once translations have been filled in:

Macaque in the trees

XHTML

The XHTML support Transifex provides differs in two ways from the support for HTML:

  • You can only upload parts of an XHTML document.

  • The file must be a valid XHTML document.

For these reasons, we recommend always using the HTML file format.

๐Ÿ’กTip: Big paragraphs in HTML files will result in long source strings for translators, which is hard to work with. We recommend breaking them up into smaller ones by adding line breaks whenever possible.


Detected content

When loading a .html or .xhtml file, you'll notice that anything shown to the user will be detected and made available for translation. This includes items like:

  • Content of block-level elements

  • Content appearing inside table cells

  • Attributes such as alt, label, placeholder, title

  • Contents inside <a> tags, their href attribute

  • The src attribute of images

Some of this content might be formatted as HTML, which might come as a surprise. However, in some cases, this is necessary. Some of the images you're showing your users might need localization (e.g., a screenshot), so their src must be translatable. Some links might need localization since you might want to point your users to the appropriate URL of a localized page. In most cases, Transifex will present the element to the user so they can translate only what they should be translating, avoiding the risk of breaking the HTML.

translatable_attributed.png#asset:7913

If you would like such strings to be excluded from the translation process, you can instruct Transifex to "lock" them and block translators from working on them (or accidentally breaking them) by using Smart tags.


Managing HTML translation files

Uploading an HTML translation file, the respecting parser will extract and match the content found in the file with the respecting content of the source HTML file, following the specific content order found on the source HTML file. Any differences in the structure of the HTML elements between the two files will not allow uploading of the translation file.

When uploading an HTML translation file, the parser will go through each HTML tag element identified and match translations to the source strings following the same order in the respecting source file. Please ensure the structure and content are relayed in the same order between source and translation files. Any difference will map the translations to different source strings.

In the following example, the translation file contains one element less than the source file in the middle of the HTML. This will result in the wrong mapping of translations to the respecting source strings.

```html
<!-- Source file structure & content -->
...
<p>Some block of text</p>
<p>&nbsp;</p>
<p>Another block of text</p>
<p>Third block of text</p>
...
<!-- End of Source file -->

<!-- Uploaded translation file structure & content -->
...
<p>Translation for "Some block of text"</p>
<!-- missing empty paragraph element -->
<p>Translation for "Another block of text"</p>
<p>Translation for ""Third block of text""</p>
...
<!-- End of translation file -->
```

In this example, the mapping of translations to strings will be:

  • The translation for the string Some block of text is Translation for "Some block of text"

  • The translation for the string is Translation for "Another block of text"

  • The translation for the string Another block of text is Translation for "Third block of text"

  • The translation for the string Third block of text will be the next element in the HTML translation file.

To avoid this, please double-check the HTML translation files before uploading them to Transifex.

Handling duplicate content

HTML format will ignore duplicate content entries when uploading an HTML file resource. This means you will only see a single entry in your resource content that will provide translation for all other duplicate instances of that string. Duplicate content is defined as similar text contained in two different HTML elements. In the example below, the resource will have one entry for "Dealing with duplicates" and another one for "foo":

```html
<a href="#duplicates" target="_self">Dealing with duplicates</a>
<ul>
<li>foo</li>
<li>bar</li>
</ul>

<!-- here the anchor link from above and the text of the h3 element are the same. -->
<h3><a name="duplicates">Dealing with duplicates</a></h3>

<!-- in the following list `foo` is duplicate -->
<ul>
<li>foo</li>
<li>fooBar</li>
</ul>
```

If you want Transifex to handle duplicate strings as different source entries without ignoring them during source file upload (you might want to translate these identical strings differently), you can do that using our API. Specifically, you can create your HTML resource through the API endpoint described here, setting the option "allow_duplicate_strings" to true:

HTML_API_Duplicates.png#asset:8539

Uploading a translation HTML file that contains duplicates needs additional handling on the HTML code to define the duplicate entries. Not defining the duplicate entries will raise an error when parsing the file.

To address that, you can use the data-tx-separate attribute in the elements that contain the duplicated text. Find all but the first occurrences of duplicated text, and in each element tag, add the data-tx-separate attribute. In the example shared above, the final code should look like this:

```html
<!-- At the element of each second instance of each duplicate text, add the data-tx-separate attribute: -->
<a href="#duplicates" target="_self">Dealing with duplicates</a>
<ul>
<li>foo</li>
<li>bar</li>
</ul>

<!-- here the anchor link from above and the text of the h3 element are the same. Since the text belongs to the h3 element we add the data-tx-separate in it -->
<h3 data-tx-separate="false"><a name="duplicates">Dealing with duplicates</a></h3>

<!-- in the following list `foo` is duplicate so we add in the li element the data-tx-separate attribute -->
<ul>
<li data-tx-separate="false">foo</li>
<li>fooBar</li>
</ul>
```

This needs to be done only in the translated HTML and not in the source language HTML file.


Parser behavior

The following table outlines what occurs to strings when using the API, CLI, or UI to manipulate translation files depending on download mode.


๐Ÿ’กTip

Looking for more help? Get support from our Transifex Community Forum!

Find answers or post to get help from Transifex Support and our Community.

Did this answer your question?