8 octobre 2024

Our article translation workflow with the OpenAI API

3 minutes reading

Our article translation workflow with the OpenAI API

🇫🇷 This post is also available in french

At Premier Octet, we enjoy sharing our technical articles with as many people as possible. To reach an international audience, we have set up a custom script that allows us to automatically translate our articles from French into English, hassle-free. Once translated, a banner appears at the top of the article to indicate that the article is available in English (as seen above this paragraph).

In this article, I will take you through our process.

Our structure

Our website, and therefore this blog, uses the Next.js framework. Our blog articles are stored as static MDX files, a format which combines markdown and React components.

Each French article is stored in the /src/pages/blog directory, and English articles are stored in /src/pages/blog/en:

src/
  pages/
    blog/
      nouveautes-react-19.mdx
      en/
        nouveautes-react-19.mdx

Next.js routing simply follows the structure of the folders.

Our translation workflow

In order to automatically translate our French articles into English, we have developed a script that performs this task in an automated fashion. All it takes is launching the following command to create a new English page based on the French content:

yarn post:translate src/pages/blog/my-article.mdx

The command is set up in our package.json file:

{
  "scripts": {
    "post:translate": "tsx scripts/translate.ts"
  }
}

Our script, translate.ts, exposes a "self-executing" function that waits for the path of the MDX file to be translated as an argument:

;(async () => {
  const mdxPath = process.argv?.[2]

  if (!mdxPath) {
    console.error('❌ no path')
    return
  }

  await translate(mdxPath)
})()

Let's see now how the translate function works.

Reading the MDX file

We start by reading the full content of the MDX file:

export const translate = async (mdxPath: string) => {
  const content = await fs.readFile(mdxPath, 'utf-8')
}

Calling GPT for translation

Once the file is read, the script initiates the OpenAI client using the API key stored in our environment variables. Then, it sends a request to translate the content:

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
})

const stream = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages: [
    { role: 'system', content: 'Translate the provided MDX text from French to English...' },
    { role: 'user', content },
  ],
  stream: true,
})

The stream: true parameter allows us to get a continuous translation, ideal for real-time feedback.

Progress feedback

For a fairly large article, the translation can take several seconds. Therefore, we have added a progress bar indicating the translation's progress.

The bar updates itself with each new block of translated text:

let translatedContent = ''
const bar = new ProgressBar('🇺🇸  Translating [:bar] :percent', { total: 100 })
let currentProgression = 0

for await (const part of stream) {
  translatedContent += part.choices[0]?.delta?.content || ''
  const percent = Math.floor((translatedContent.length * 100) / content.length)

  if (percent > currentProgression) {
    bar.tick()
    currentProgression = percent
  }
}

We're using the npm progress package to handle the progress bar rendering.

This allows us to track the progress of the translation in real time:

$ tsx scripts/translate.ts src/pages/blog/our-workflow-for-translating-our-articles-with-gpt.mdx
🇺🇸  Translating [░░░░░░----------------------------------------] 13%

Adjusting the translated content

Once the translation is finished, the script makes a few adjustments. It modifies the BlogPost component import path and adds the lang: en attribute to the frontmatter:

const updatedContent = translatedContent
  .replace('../../components/Layout/BlogPost', '../../../components/Layout/BlogPost')
  .replace(/^(---\s*\n)/, '$1lang: en\n')

These modifications guarantee that the English version is correctly configured and follows the directory structure.

Saving the translated file

Finally, the translated file is stored in a subdirectory en, a clean approach to separating the French and English versions:

const enPath = path.join(path.dirname(mdxPath), 'en', path.basename(mdxPath))
await fs.mkdir(path.dirname(enPath), { recursive: true })
await fs.writeFile(enPath, updatedContent)
console.log(`✅ Translation saved to: ${enPath}`)

Once saved, a confirmation message lets us know that everything went according to plan.

Thanks to this automation, we manage our multilingual articles easily. Translation becomes a quick and smooth process, letting us focus on creating content while reaching a wider audience.

And because we all love inception moments, check out this article translated by the very same script right here!

Finally, you can find the complete script on this gist.

👋

Our article translation workflow with the OpenAI API

Our structure

Our translation workflow

Reading the MDX file

Calling GPT for translation

Progress feedback

Adjusting the translated content

Saving the translated file

À découvrir également

Comment fonctionne un RAG ?

Comment simuler une réponse streamée avec le SDK Vercel AI

Comment construire des applications IA avec Vercel AI SDK — Part I

Premier Octet vous accompagne dans le développement de vos projets avec gpt