AccueilClientsExpertisesBlogOpen SourceContact

23 octobre 2024

How to Simulate a Streamed Response With the Vercel AI SDK

4 minutes de lecture

How to Simulate a Streamed Response With the Vercel AI SDK
🇫🇷 This post is also available in french

At Premier Octet, we develop the SaaS Postula, an application that allows you to import your RSS feeds and easily generate LinkedIn posts from articles.

One of the main components is a widget that generates LinkedIn posts from an RSS feed. We would like to showcase this widget directly on our homepage for demo purposes.

The problem: it makes API calls to OpenAI, which are billed to us.

We had the idea to simulate the response progressively, to give the illusion of a real call to an LLM model such as GPT without the associated cost.

The Existing System

In our component, we use the useCompletion hook from Vercel AI SDK to generate the content of LinkedIn posts:

component.tsx
import { useCompletion } from 'ai'

const { complete } = useCompletion({
  api: '/api/completion/post',
})

On the API side, we have an endpoint /api/completion/post that calls OpenAI:

api/completion/post/route.ts
import { openai } from '@ai-sdk/openai'
import { streamText } from 'ai'

export async function POST() {
  const result = await streamText({
    model: openai('gpt-4o'),
    prompt: `Write an Haiku about React.js`,
  })

  return result.toDataStreamResponse()
}

To see the format of the response, we will call this endpoint and then inspect the response via the "Network" tab of our browser:

Headers
HTTP/1.1 200 OK
vary: RSC, Next-Router-State-Tree, Next-Router-Prefetch
content-type: text/plain; charset=utf-8
x-vercel-ai-data-stream: v1
Date: Tue, 22 Oct 2024 15:41:15 GMT
Connection: keep-alive
Keep-Alive: timeout=5
Transfer-Encoding: chunked
Réponse
0:"Java"
0:"Script"
0:" dances"
0:","
0:"  \n"
0:"Components"
0:" weave"
0:" through"
0:" the"
0:" DOM"
0:","
0:"  \n"
0:"React"
0:" breath"
0:"es"
0:" new"
0:" life"
0:"."
e:{"finishReason":"stop","usage":{"promptTokens":14,"completionTokens":18},"isContinued":false}
d:{"finishReason":"stop","usage":{"promptTokens":14,"completionTokens":18}}

Pieces of text are sent gradually, line by line, with "0:" as a prefix and then at the end, information indicating that the response is finished.

We will therefore simulate this response.

Let's start by creating the function that will break the text down into chunks.

Creating Chunks

Now that we know the response's structure, we can create a function that will break down the text into chunks:

export const chunkText = (inputText: string): string[] => {
  // Split text into an array of words and spaces
  const chunks = inputText.match(/\S+|\s/g) || []

  return chunks.map((chunk) => {
    // Handle new line
    if (chunk === '\n') {
      return '0:"\\n"\n'
      // Handle spaces
    } else if (chunk.trim() === '') {
      return chunk
        .split('')
        .map((char) => (char === '\n' ? '0:"\\n"\n' : `0:"${char}"\n`))
        .join('')
      // Handle word
    } else {
      return `0:"${chunk}"\n`
    }
  })
}

This function breaks down the text into chunks (words, spaces) and prepares them to be sent incrementally. The 0: prefix is added to mimic the API response format we observed previously.

Creating the Endpoint to Stream the Response

The next step is to create an API route that will stream our text chunk by chunk.

api/completion/demo/post/route.ts
export async function POST() {
  const text = `Lorem ipsum dolor sit amet...` // The text to stream
  const predefinedChunks = chunkText(text) // Break down text into chunks

  const encoder = new TextEncoder()

  const stream = new ReadableStream({
    async start(controller) {
      // Pause for 1s to simulate waiting
      await new Promise((resolve) => setTimeout(resolve, 1000))

      let index = 0

      function push() {
        if (index < predefinedChunks.length) {
          const chunk = predefinedChunks[index]
          controller.enqueue(encoder.encode(chunk)) // Send the encoded chunk
          index++

          // Interval of 10ms between each send
          // to simulate a stream with some delay
          setTimeout(push, 10)
        } else {
          controller.close() // Close the stream at the end
        }
      }

      // Start the stream
      push()
    },
  })

  const headers = new Headers({
    'Content-Type': 'text/plain; charset=utf-8',
    'Transfer-Encoding': 'chunked',
  })

  return new Response(stream, { headers })
}

This route sends each chunk of text progressively with a delay of 10ms (with an initial delay of 1s), thus giving the impression of a live response.

Testing the Endpoint with Our Component

We can now use our demo endpoint to simulate a response from an LLM model:

useCompletion({
  api: isDemo ? '/api/completion/demo/post' : '/api/completion/post',
})

This approach makes it easy to switch between the demo and the real endpoint, by adding a prop that activates demo mode.

Of course, don't forget to secure your /api/completion/post endpoint by checking if the user is authenticated and has enough credits.

The response is streamed well, and the component works as expected!

Please note that since version 3.4 of Vercel AI SDK, there is a function streamText that allows you to mock a stream response:

import { streamText } from 'ai'
import { convertArrayToReadableStream, MockLanguageModelV1 } from 'ai/test'

const result = await streamText({
  model: new MockLanguageModelV1({
    doStream: async () => ({
      stream: convertArrayToReadableStream([
        { type: 'text-delta', textDelta: 'Hello' },
        { type: 'text-delta', textDelta: ', ' },
        { type: 'text-delta', textDelta: `world!` },
      ]),
    }),
  }),
})

But this one is intended for unit or E2E tests and therefore does not simulate any delay between each chunk, making the result less realistic.

👋

À découvrir également

Notre workflow de traduction d’articles avec l’API OpenAI

08 Oct 2024

Notre workflow de traduction d’articles avec l’API OpenAI

Chez Premier Octet, nous aimons partager nos articles avec le plus grand nombre, pour cela nous avons mis en place un workflow de traduction automatisée.

par

Baptiste

Comment construire des applications IA avec Vercel AI SDK — Part I

06 Sep 2024

Comment construire des applications IA avec Vercel AI SDK — Part I

Vercel a dévoilé une nouvelle version de Vercel AI SDK, conçu pour simplifier le processus de développement d'applications IA en fournissant des outils et des bibliothèques prêts à l'emploi pour les développeurs. Je vous propose un tour d'horizon des exemples clés abordés par Nico Albanese, ingénieur chez Vercel, dans sa vidéo didactique publiée sur YouTube.

par

Vincent

AI et UI #1 - Filtres intelligents avec le SDK Vercel AI et Next.js

12 Jun 2024

AI et UI #1 - Filtres intelligents avec le SDK Vercel AI et Next.js

Dans ce premier article d’une série consacrée à l’IA et l’UI, je vous propose de découvrir différentes manières d’intégrer ces modèles d’IA dans vos applications React pour améliorer l’expérience utilisateur.

par

Baptiste

Premier Octet vous accompagne dans le développement de vos projets avec nextjs

En savoir plusNous contacter
18 avenue Parmentier
75011 Paris
+33 1 43 57 39 11
hello@premieroctet.com

Suivez nos aventures

GitHub
X (Twitter)
Flux RSS

Naviguez à vue