23 octobre 2024
How to Simulate a Streamed Response With the Vercel AI SDK
4 minutes de lecture
At Premier Octet, we develop the SaaS Postula, an application that allows you to import your RSS feeds and easily generate LinkedIn posts from articles.
One of the main components is a widget that generates LinkedIn posts from an RSS feed. We would like to showcase this widget directly on our homepage for demo purposes.
The problem: it makes API calls to OpenAI, which are billed to us.
We had the idea to simulate the response progressively, to give the illusion of a real call to an LLM model such as GPT without the associated cost.
The Existing System
In our component, we use the useCompletion
hook from Vercel AI SDK to generate the content of LinkedIn posts:
import { useCompletion } from 'ai'
const { complete } = useCompletion({
api: '/api/completion/post',
})
On the API side, we have an endpoint /api/completion/post
that calls OpenAI:
import { openai } from '@ai-sdk/openai'
import { streamText } from 'ai'
export async function POST() {
const result = await streamText({
model: openai('gpt-4o'),
prompt: `Write an Haiku about React.js`,
})
return result.toDataStreamResponse()
}
To see the format of the response, we will call this endpoint and then inspect the response via the "Network" tab of our browser:
HTTP/1.1 200 OK
vary: RSC, Next-Router-State-Tree, Next-Router-Prefetch
content-type: text/plain; charset=utf-8
x-vercel-ai-data-stream: v1
Date: Tue, 22 Oct 2024 15:41:15 GMT
Connection: keep-alive
Keep-Alive: timeout=5
Transfer-Encoding: chunked
0:"Java"
0:"Script"
0:" dances"
0:","
0:" \n"
0:"Components"
0:" weave"
0:" through"
0:" the"
0:" DOM"
0:","
0:" \n"
0:"React"
0:" breath"
0:"es"
0:" new"
0:" life"
0:"."
e:{"finishReason":"stop","usage":{"promptTokens":14,"completionTokens":18},"isContinued":false}
d:{"finishReason":"stop","usage":{"promptTokens":14,"completionTokens":18}}
Pieces of text are sent gradually, line by line, with "0:" as a prefix and then at the end, information indicating that the response is finished.
We will therefore simulate this response.
Let's start by creating the function that will break the text down into chunks.
Creating Chunks
Now that we know the response's structure, we can create a function that will break down the text into chunks:
export const chunkText = (inputText: string): string[] => {
// Split text into an array of words and spaces
const chunks = inputText.match(/\S+|\s/g) || []
return chunks.map((chunk) => {
// Handle new line
if (chunk === '\n') {
return '0:"\\n"\n'
// Handle spaces
} else if (chunk.trim() === '') {
return chunk
.split('')
.map((char) => (char === '\n' ? '0:"\\n"\n' : `0:"${char}"\n`))
.join('')
// Handle word
} else {
return `0:"${chunk}"\n`
}
})
}
This function breaks down the text into chunks (words, spaces) and prepares them to be sent incrementally. The 0:
prefix is added to mimic the API response format we observed previously.
Creating the Endpoint to Stream the Response
The next step is to create an API route that will stream our text chunk by chunk.
export async function POST() {
const text = `Lorem ipsum dolor sit amet...` // The text to stream
const predefinedChunks = chunkText(text) // Break down text into chunks
const encoder = new TextEncoder()
const stream = new ReadableStream({
async start(controller) {
// Pause for 1s to simulate waiting
await new Promise((resolve) => setTimeout(resolve, 1000))
let index = 0
function push() {
if (index < predefinedChunks.length) {
const chunk = predefinedChunks[index]
controller.enqueue(encoder.encode(chunk)) // Send the encoded chunk
index++
// Interval of 10ms between each send
// to simulate a stream with some delay
setTimeout(push, 10)
} else {
controller.close() // Close the stream at the end
}
}
// Start the stream
push()
},
})
const headers = new Headers({
'Content-Type': 'text/plain; charset=utf-8',
'Transfer-Encoding': 'chunked',
})
return new Response(stream, { headers })
}
This route sends each chunk of text progressively with a delay of 10ms (with an initial delay of 1s), thus giving the impression of a live response.
Testing the Endpoint with Our Component
We can now use our demo endpoint to simulate a response from an LLM model:
useCompletion({
api: isDemo ? '/api/completion/demo/post' : '/api/completion/post',
})
This approach makes it easy to switch between the demo and the real endpoint, by adding a prop that activates demo mode.
Of course, don't forget to secure your /api/completion/post
endpoint by checking if the user is authenticated and has enough credits.
The response is streamed well, and the component works as expected!
Please note that since version 3.4 of Vercel AI SDK, there is a function streamText
that allows you to mock a stream response:
import { streamText } from 'ai'
import { convertArrayToReadableStream, MockLanguageModelV1 } from 'ai/test'
const result = await streamText({
model: new MockLanguageModelV1({
doStream: async () => ({
stream: convertArrayToReadableStream([
{ type: 'text-delta', textDelta: 'Hello' },
{ type: 'text-delta', textDelta: ', ' },
{ type: 'text-delta', textDelta: `world!` },
]),
}),
}),
})
But this one is intended for unit or E2E tests and therefore does not simulate any delay between each chunk, making the result less realistic.
👋