USING REGEX TO CREATE A CUSTOM QA CHECK IN TRADOS

A regular expression or Regex, as they are more commonly known, is a sequence of characters in a certain pattern that can be used to find or find/replace strings.

Regex is very useful when you are conducting QA on your content. Regex can be used with the following CAT tools: memoQ, Wordfast, and Trados Studio.

The examples provided in this article are product of a work collaboration with my MIIS colleague Eunji Hamnnet who specializes in Korean translation and localization management. These Regex examples can be used in QA checker in Trados for Spanish from Mexico and Korean.

Regex Examples For Spanish (Mexico)

Identify Date Format

Regex can be used to check date formats. In the US the order is month, day, year. However in Mexico, we use day, month, year. In this case we could use a regex expression below to check if both source and target are the same and to trigger a warning when both formats are the same. This will allow us to identify such instances and correct them to desired format using the following regex: \d{1,2}\/\d{1,2}\/\d{4}

All our dates are in the same format in the screenshot below:

Once we select Verify, the QA tool will show us the errors/warnings from the REGEX that we applied.

Find and Replace Date Format Using Control +H

Now, let’s replace it! With the find and replace (CTRL+H) feature together with regular expressions, we can easily search and edit any desired content. In this example we continue working with the different date formats. Using the following regex we can search for the US format and replace it with the appropriate date order to match the Mexican date format:

Find what: (\d{1,2})\/(\d{1,2})\/(\d{4})
Replace with: $2/$1/$3

Click on Find Next to find the next instance, then click on Replace, to correct the formatting.

Screenshot below shows all the target dates in the correct format.

Regex Examples for Korean

Punctuation “;”

Punctuation in Korean follows the English style generally, but it is rare to see a semicolon ‘;’ in Korean.

Some translators keep it in the Korean translated version, which should not be included, and depending on the context, there are several ways to translate phrases with a semicolon into Korean – 1) use a comma ‘,” instead 2) use period ‘.’ and start a new sentence.
Below is the RegEx example:

RegEx Souse: ;\s.|;.
RegEx Target: ;\s.|;.

This Regex example catches two things by using ‘|’ (meaning = either or): ‘semicolon followed by space and characters’ or ‘just semicolon and characters.’ The condition for this Regex can be set as . This means that the QA Checker will issue a warning if a semicolon in the English file still exists in the Korean translation, then a translator can respond to it using different methods, replacing it with ‘,’ or finishing the previous sentence or phrase just before ‘;’ then start a new sentence depending on the context.

Find and Replace to Get Rid of Parenthesis Using Control +H

Parentheses () are used to add additional nonessential information or an aside to a sentence or word.

When there are English titles, the name of a program or campaign or company, or product that appeared for the first time in official documents, they should be translated into Korean and use parentheses right after the Korean noun and insert English titles inside. You can see this practice in press releases, product guidelines, pamphlets, etc.

However, when it comes to game localization, this rule should not be applied since the game conversation is very casual and informal. It looks funky if you see special marks like parentheses in conversational texts.

So, to find this potential error in translation or review and replace it correctly, we can use Regex’s Find & Replace function in QA Checks.

Below is the Regex example:

Find: (\W+)([[({|].+?[|]})])
Replace with: (blank)
￿should be left blank since we don’t want to see any characters with ()

I used examples of conversational texts, which you can find from League of Legends. Let’s say a translator used the formality of translation to translate the names of game characters by using parentheses. I want to keep translated characters’ names only and eliminate parentheses after them.

Below are the screenshots of Trados:

[Find issue – a total of 3 issues in this example]

[After one issue is replaced properly]

[After all 3 issues are replaced properly] Yay!

Leave a Comment