Thursday, 1 June 2023

Lost in translation: Upleveling Sprout Social’s localization system

Localizing a dynamic application like Sprout Social into multiple languages is a complex undertaking. Translating the text that appears in the application is only one half of the story. It also involves developing our application in a way that makes it easy to extract and swap out that text for the translations. At Sprout, we lean on third-party vendors for translations. But we still need tools to extract, bundle and submit translation requests to those vendors and then serve and render the translations to end users.

For years, the Sprout engineering team got by with a custom localization solution, since open source solutions were still maturing. It allowed us to accommodate our largest customers in our supported languages, but lacked some useful features. In this article, I will outline our new localization system, how it tackles the most complicated localization scenarios, and how we incrementally introduced those changes across the web engineering organization.

Our old system

To understand our new localization system, you first need to understand how our old system worked and the areas where we could improve it.

Message Syntax

Application localization works by abstracting the text that is visible to the end user into string units, called messages. These messages are extracted and submitted to translators. By abstracting these strings, we can easily swap them out depending on the end user’s preferred language.

These messages can be simple static strings like “Hello, world” or have placeholders like “Hello, {name}” or rich text formatting like “Hello, world”. Since these features need to be serialized into strings, you need a syntax that both the translators and the application code understands to properly translate and render the text.

Part of what made our old localization system difficult to use was that we made up our own syntax and maintained a homemade “parser” for said syntax. This code was time consuming to maintain and the syntax was pretty minimal. We wanted additional features to help render more complex messages.

Example: In the Sprout application, we need a way of rendering “You have X posts” where X is a dynamic numeric value.

Consider the plural case, “You have 5 posts”. Consider the singular case, “You have 1 post”. Consider the “0” case. Consider languages that might have a grammar for the “1” case like Chinese and Japanese. Consider languages that have a grammar for the case when X is a “large number” like Arabic, Polish and Russian.

Message management

We have messages that we can send to translators and swap out in our application. Our application needs a way of storing these messages and serving them to our end users.

Our old system stored all our messages in JSON files (we called “lang files”), which were managed manually. We referenced the messages in these files by using IDs in our source javascript code. When a user wanted the application in Spanish, we would serve our Spanish language files, and then the javascript would render the corresponding Spanish message using the ID.

For performance reasons, we tried to only serve the user messages that were on that page, so we had separate lang files for the different pages of the application. This was a valid system, but as our team and application scaled, it meant more manual developer time creating and managing these IDs and lang files.

Screenshot of JavaScript previously used to manually manage messages and translation in Sprout's codebase.

To add a new message to the application, developers had to manually add them to the correct lang file with a unique ID to reference that message. At times, we would run into issues of ID collisions and ID typos leading to missing lang in the application. Adding text to the web application felt tedious with numerous steps that weren’t intuitive.

Our new solution

Knowing these shortcomings, web engineers from across the Product organization created a localization working group to develop a solution. We met regularly to brainstorm. After an in-depth research process, we decided to migrate the Sprout application from our homemade localization system to use FormatJS’s react-intl library and build infrastructure around it for managing our messages. React-intl was the most feature-rich and popular open source localization library in the javascript ecosystem and integrated well into our codebase.

Message syntax

We wanted a more robust solution and didn’t want to create something from scratch. We adopted the ICU message syntax, a standardized syntax that is used in Java, PHP and C applications, and captures the complexities of dynamic application messages. The react-intl library also supports parsing and rendering ICU message syntax messages.

A side-by-side example of how ICU message syntax captures plural cases. On the left is the message in English, before being translated to Russian. On the right is the message translated to Russian. Notice how when the translators convert this message into other languages, they can add and remove cases as necessary to properly support the language. The Russian translation of this message adds “few” and “many” cases.

This is an example of how ICU message syntax captures plural cases. This is the message in English and Russian. Notice how when the translators convert this message into other languages, they can add and remove cases as necessary to properly support the language. The Russian translation of this message adds “few” and “many” cases.

The ICU message syntax has been battle-tested by many applications in countless languages. We could trust that it could support our sophisticated customer needs, and that there were many solutions and/or educational resources for any localization questions we ran into.

Message management

We developed a system using tooling provided by FormatJS that would automate the process of adding, removing and storing messages. This involved some philosophical changes in how we approached message storing and referencing.

A major change from our old system that FormatJS encourages was using our UI code as the source of truth for messages. In our previous system, the source of the messages and the usage of the messages were in two different places, which meant we had to keep them in sync. Our new system keeps the message sources with the rest of the UI code. We simply need to run a script that will extract all the messages from the UI files to generate our lang files, and the message content becomes the unique IDs with the help of a hash function.

Screenshot of JavaScript previously used to automatically manage messages and translation in Sprout's codebase.

This change colocates the messages with the UI code and had several benefits:

  • More readable: No more IDs that are designed for robots in our UI code. Now we can read the English messages in the UI code and understand what text the user will see.
  • No manual IDs: These IDs which were only used by machines are now generated by machines, and by definition, unique per message.
  • No manually managed lang files: Developers should not need to touch these lang files. Our scripts manage the adding and deleting of the messages.

How did we migrate?

But how did we migrate our entire web engineering team and codebase to this new system? We broke this out into four milestones: piloting the new system, educating our team, deprecating the old system and migrating to our new solution.

Piloting the new system

The working group piloted the new system in specific sections of the application to get a sense of its best practices and the full migration scope. This got the new system set up on the client-side (poly-fills, etc.) and the build side of the application. This allowed us to iterate on the developer experience and mitigate risk.

Education

We took what we learned from the pilot and used it to educate the entire web engineering team. We developed an FAQ and other educational documentation and presentations to aid developers using the new library. It’s easy to undervalue this step, but this part of a migration is extremely important. It doesn’t matter how good your new system is—people need to know how and why they should use it.

We also developed an ambassador program where each web feature team at Sprout had an appointed Localization Ambassador, who was responsible for helping educate their team on the new system and reporting issues or pain points to the working group.

This allowed us to delegate the education responsibilities and identify issues specific to individual teams.

Deprecating the old system

After we felt confident in the developer experience, shared knowledge and scale potential of the new system, we deprecated the old system. We created some custom eslint rules and used the linting tool, esplint, to block usage of the old system while allowing existing usages. From this point on, web engineers were expected to use the new system when writing new code.

Migrating to our new system

With confidence in our new system and a fixed number of old usages, we started migrating.

A lot of usages had one-to-one equivalents in the new system. Where these equivalents exist, we were able to automate the migration by writing a code-mod using jscodeshift. We were able to iteratively run the code-mod over sections of the codebase, learning and fixing issues as we went. There were few enough remaining edge cases that could not be easily code-moded that we felt comfortable fixing them manually.

Rollout

Why did we opt for such an iterative approach instead of trying to migrate everything at once? Using an iterative approach is part of Sprout’s Engineering culture, and we believe in constantly learning and improving.

By approaching the migration this way, we were able to learn as we go, adjusting and fixing issues in real time. We could also roll back the changes if the migration started to block application development. Our iterative approach allowed us to make progress while working on other initiatives, and empowered us to feature-flag major changes with a smaller group before rolling it out to everyone. The same principles of feature development for an application apply to the development of internal developer tools.

Learnings and takeaways

Reimagining our localization system was a massive undertaking across the entire web engineering organization. My advice to others facing similar projects or challenges would be to:

  • Use widely adopted standards: Why create a custom message syntax when engineers who have spent years thinking on this problem space already developed ICU message syntax?
  • Consider collocating related items: It will make adding, changing and deleting them much easier.
  • Embrace an iterative rollout: Design the rollout of your change in a way that allows you to learn as you go. You can’t anticipate everything, so build in space for recourse into your plan.
  • Share your learnings: Education is half of a rollout. It doesn’t matter how good your new system is if people don’t know how to use it or why it is better.

For more information about Sprout’s Engineering culture, check out our careers page today.

The post Lost in translation: Upleveling Sprout Social’s localization system appeared first on Sprout Social.



from Sprout Social https://ift.tt/MNKb37A
via IFTTT

No comments:

Post a Comment