If you're translating your website to other languages, you'll likely want those pages to rank on global search engines, too. This guide will walk you through a few things you'll need to consider regarding URL structures and publishing.
URL structure
Implementing a multi-language SEO strategy takes a significant amount of time and effort. Changing your URL structure after the initial implementation requires more effort and may negatively affect your current rankings. So, it's essential to pick a URL structure that can be kept in place for quite some time.
A quick overview of codes
Be sure to understand the language and country locale codes you will use on your site. Many people use these identifiers interchangeably, but to ensure strong performance, care should be taken.
Locale language codes refer to the common identifiers used to designate languages. A few specifications for this can be found in the footnotes. But generally, they are a two-letter code, sometimes with a regional designation. For example, 'ja' is the code for the Japanese language, while 'jp' refers to Japan as a country.
Country codes are more standardized from a web perspective since Internet registrars work country-by-country. The same codes used in top-level domain suffixes are often used as a subdomain prefix for localized instances of a website.
Common approaches to URL structure
There are three common ways you can structure URLs for your multilingual website.
Top-Level Domains
Top-level domains are the least flexible and the most costly from a domain registration perspective. However, they are often considered the easiest to implement. Because you are registering a whole new domain, the locale code you use will need to be the same set by Internet registrars.
Since this will be country-specific, you must decide what language should be set as the default on the site for countries with several language choices.
Subdomains
Subdomains give greater flexibility because you won't need to register a new domain. However, you'll still need to modify your domain's DNS records to make the subdomain available.
Additionally, you can use the locale designation for language instead of a country identifier. This allows for multiple languages with different region codes, not just the country code.
Subdirectories
Subdirectories offer even more flexibility over subdomains. A subdirectory simply allows you to add the locale code within the URL path of your site. Similar to subdomains, these can be country-specific, language-specific, or a mixture of both.
Also, with subdirectories, you don't have to modify your DNS records, so it's an excellent solution for some hosting solutions where your control over the site is limited.
Measurement and tracking
Another concern is how you will measure the traffic to your localized sites and pages.
Using country information is one common way to measure traffic based on different locales. This information can be determined in several ways. First, in most cases, the browser is aware of the country. Often, this is set when it is first set up/installed. The downside to this metric is that it doesn't tell us where the visitor is currently. Therefore, many businesses choose to augment this data with GeoIP lookups. However, these lookups can still have some amount of inaccurate data.
Measuring traffic based on the country is easy and will often be a part of your analytics package. Additionally, designing a URL structure that compliments this approach is simple. However, your accuracy and confidence in the results might suffer due to the nature of using countries.
Another way to measure your traffic is to use language. Language tends to be a more accurate approach since users only use languages they understand. Since this setting directly affects how the browser works, it's a safer assumption that most visitors have their language settings 'correct'. (Remember that 'correctness' is relative to the user). Most analytics packages also include statistics based on language as a default.
A more complex approach to international traffic measurement is a hybrid approach of using language AND country. In this approach, measurement is done for traffic with a correlation between the data points (e.g., if a visitor's browser is set to US and English...and GeoIP shows they are likely in the US). While this approach raises confidence in our measurements, we might also be ignoring important data points.
User experience
Another consideration when designing your internationalized structure is the experience your users see. If a user's browser is set to a specific language, they should see your site in that language. While this seems straightforward, it depends on your URL structure.
The most straightforward solution is to detect the language browser setting and redirect the visitor to a predefined URL. However, we still need to allow them to choose a different language. So, we must also implement a UI language picker. When we use asynchronous JavaScript to translate, we can change the language immediately without needing a reload action from the browser. If, instead, we are using the visitors' country to control the user experience, our user experience is much trickier and raises quite a few questions.
Does it make sense to redirect a user immediately to another page?
How do we match the country to the language?
Is this type of functionality desirable? (This might be true for e-commerce sites where visitors cannot place international orders).
Answers to these questions can help to guide our decisions around URL structure.
Publish your languages
Besides making it clear to users how localized content is organized, we must also tell search engines.
Using sitemaps
The easiest way for indexers to identify where your localized pages reside is to use a sitemap. Sitemaps list the link hierarchy of your website so they can be used to indicate your localized pages. Sitemaps are easy to generate since many CMS systems will build them automatically for you. If you have a static site, there are also desktop tools that aid in building and maintaining them.
Using HREFLANGs
Another approach is to use HREFLANG tags in the actual pages to tell the indexers where our localized pages can be found. All localized pages must be listed because an indexer could reach our site through an inbound link to a localized page. Adding these tags is very easy for static sites because the languages and URLs do not change. However, if your site is hosted on a content management system, this can be more challenging, especially if it does not support internationalization.
If you are in a case where you are using a content management system with no internationalization support, there are a couple of ways to address this issue. The first approach is to store the hreflang tags as data within the CMS on each page. This type of approach allows a great deal of flexibility since URLs can be altered individually. The downside with this approach is that this set of tags would need to be managed.
A better, more automated solution can exist if your content management solution also handles the rules regarding how localized URLs are built. If you can access this set of rules, you only need to maintain a list of the language codes. Also, if you wish to avoid adding the source language in these tags, you will need to have a way to identify when the user is on a 'source' page.
Using HTTP headers
The last option is to use HTTP headers to provide information on the Hreflang. This approach is the trickiest and the least reliable. Downstream network filters often manipulate server response headers. If you are taking this approach, be sure to test your setup with search engine bots.
Testing your implementation
Testing out the solution you implemented is always a good idea. Most search engines provide an interface that lets you ensure your localized pages are correctly detected. If you submit a sitemap, the search engine should warn you of any errors.
Advanced integrations
The last aspect of international SEO involves advanced integrations that enable search engines and other online content indexers to detect the translated language.
Content management without internationalization
Frequently, individuals utilize a content management solution that lacks support for internationalization. Without this being built into the content management system, they often resort to 'hack' solutions. These solutions often replicate the translated content in some way, often as multiple posts or even sites. This is one case where a JavaScript-based solution like Transifex Live is very helpful. Since translations are stored remotely, no 'hacking' is required.
JavaScript and SEO concerns
There is much evidence that Googlebot does an excellent job indexing JavaScript content. And even Google seems to indicate this is being done.
However, you want to ensure you have support for all other indexers and various other bots. In that case, only a single option currently ensures all of your content can be indexed across all bots and indexers. This solution uses the 'prerendered' approach to display all content as 'source' content. This is the most direct approach because it does not rely on any specific indexer technology. The goal here is to represent source content to the browser the same way you would want the search engine to see it. If your content comes from JavaScript, you will need a separate process that 'prerenders' the JavaScript and produces static HTML. In this case, the static HTML is served directly to the search engine, and it does not need to be aware of the JavaScript implementation.
Please note that a previous specification for indexing JavaScript, often referred to as 'escaped_fragment' (using the '!#' in your URL), was part of a Google specification for Ajax crawling. Google has officially deprecated it as of October 2015. So you should choose a different approach for new sites.
Additional Reading
💡Tip
Looking for more help? Get support from our Transifex Community Forum!
Find answers or post to get help from Transifex Support and our Community.