23 octobre 2024

How to Simulate a Streamed Response With the Vercel AI SDK

4 minutes reading

How to Simulate a Streamed Response With the Vercel AI SDK

🇫🇷 This post is also available in french

At Premier Octet, we develop the SaaS Postula, an application that allows you to import your RSS feeds and easily generate LinkedIn posts from articles.

One of the main components is a widget that generates LinkedIn posts from an RSS feed. We would like to showcase this widget directly on our homepage for demo purposes.

The problem: it makes API calls to OpenAI, which are billed to us.

We had the idea to simulate the response progressively, to give the illusion of a real call to an LLM model such as GPT without the associated cost.

The Existing System

In our component, we use the useCompletion hook from Vercel AI SDK to generate the content of LinkedIn posts:

component.tsx

import { useCompletion } from 'ai'

const { complete } = useCompletion({
  api: '/api/completion/post',
})

On the API side, we have an endpoint /api/completion/post that calls OpenAI:

api/completion/post/route.ts

import { openai } from '@ai-sdk/openai'
import { streamText } from 'ai'

export async function POST() {
  const result = await streamText({
    model: openai('gpt-4o'),
    prompt: `Write an Haiku about React.js`,
  })

  return result.toDataStreamResponse()
}

To see the format of the response, we will call this endpoint and then inspect the response via the "Network" tab of our browser:

Headers

HTTP/1.1 200 OK
vary: RSC, Next-Router-State-Tree, Next-Router-Prefetch
content-type: text/plain; charset=utf-8
x-vercel-ai-data-stream: v1
Date: Tue, 22 Oct 2024 15:41:15 GMT
Connection: keep-alive
Keep-Alive: timeout=5
Transfer-Encoding: chunked

Réponse

0:"Java"
0:"Script"
0:" dances"
0:","
0:"  \n"
0:"Components"
0:" weave"
0:" through"
0:" the"
0:" DOM"
0:","
0:"  \n"
0:"React"
0:" breath"
0:"es"
0:" new"
0:" life"
0:"."
e:{"finishReason":"stop","usage":{"promptTokens":14,"completionTokens":18},"isContinued":false}
d:{"finishReason":"stop","usage":{"promptTokens":14,"completionTokens":18}}

Pieces of text are sent gradually, line by line, with "0:" as a prefix and then at the end, information indicating that the response is finished.

We will therefore simulate this response.

Let's start by creating the function that will break the text down into chunks.

Creating Chunks

Now that we know the response's structure, we can create a function that will break down the text into chunks:

export const chunkText = (inputText: string): string[] => {
  // Split text into an array of words and spaces
  const chunks = inputText.match(/\S+|\s/g) || []

  return chunks.map((chunk) => {
    // Handle new line
    if (chunk === '\n') {
      return '0:"\\n"\n'
      // Handle spaces
    } else if (chunk.trim() === '') {
      return chunk
        .split('')
        .map((char) => (char === '\n' ? '0:"\\n"\n' : `0:"${char}"\n`))
        .join('')
      // Handle word
    } else {
      return `0:"${chunk}"\n`
    }
  })
}

This function breaks down the text into chunks (words, spaces) and prepares them to be sent incrementally. The 0: prefix is added to mimic the API response format we observed previously.

Creating the Endpoint to Stream the Response

The next step is to create an API route that will stream our text chunk by chunk.

api/completion/demo/post/route.ts

export async function POST() {
  const text = `Lorem ipsum dolor sit amet...` // The text to stream
  const predefinedChunks = chunkText(text) // Break down text into chunks

  const encoder = new TextEncoder()

  const stream = new ReadableStream({
    async start(controller) {
      // Pause for 1s to simulate waiting
      await new Promise((resolve) => setTimeout(resolve, 1000))

      let index = 0

      function push() {
        if (index < predefinedChunks.length) {
          const chunk = predefinedChunks[index]
          controller.enqueue(encoder.encode(chunk)) // Send the encoded chunk
          index++

          // Interval of 10ms between each send
          // to simulate a stream with some delay
          setTimeout(push, 10)
        } else {
          controller.close() // Close the stream at the end
        }
      }

      // Start the stream
      push()
    },
  })

  const headers = new Headers({
    'Content-Type': 'text/plain; charset=utf-8',
    'Transfer-Encoding': 'chunked',
  })

  return new Response(stream, { headers })
}

This route sends each chunk of text progressively with a delay of 10ms (with an initial delay of 1s), thus giving the impression of a live response.

Testing the Endpoint with Our Component

We can now use our demo endpoint to simulate a response from an LLM model:

useCompletion({
  api: isDemo ? '/api/completion/demo/post' : '/api/completion/post',
})

This approach makes it easy to switch between the demo and the real endpoint, by adding a prop that activates demo mode.

Of course, don't forget to secure your /api/completion/post endpoint by checking if the user is authenticated and has enough credits.

The response is streamed well, and the component works as expected!

Please note that since version 3.4 of Vercel AI SDK, there is a function streamText that allows you to mock a stream response:

import { streamText } from 'ai'
import { convertArrayToReadableStream, MockLanguageModelV1 } from 'ai/test'

const result = await streamText({
  model: new MockLanguageModelV1({
    doStream: async () => ({
      stream: convertArrayToReadableStream([
        { type: 'text-delta', textDelta: 'Hello' },
        { type: 'text-delta', textDelta: ', ' },
        { type: 'text-delta', textDelta: `world!` },
      ]),
    }),
  }),
})

But this one is intended for unit or E2E tests and therefore does not simulate any delay between each chunk, making the result less realistic.

👋

How to Simulate a Streamed Response With the Vercel AI SDK

The Existing System

Creating Chunks

Creating the Endpoint to Stream the Response

Testing the Endpoint with Our Component

À découvrir également

Comment fonctionne un RAG ?

Comment construire des applications IA avec Vercel AI SDK — Part I

AI et UI #1 - Filtres intelligents avec le SDK Vercel AI et Next.js

Premier Octet vous accompagne dans le développement de vos projets avec nextjs