Getting Started with the Vercel AI SDK
- 文章發表於
Introduction
Vercel's open-source AI SDK is arguably one of the most developer-friendly AI integration libraries available today. When building AI-powered products, we often need to leverage different large language models (LLMs) for various scenarios, which typically requires writing glue code to integrate multiple LLM APIs. The AI SDK directly addresses this pain point by providing a unified API interface that works seamlessly with any model.
Here's a practical example: imagine you're building a jsExpert
AI assistant and want to integrate both Google's Gemini and Anthropic's Claude. With the AI SDK, you can abstract the jsExpert
function to accept any model and prompt without writing platform-specific conditional logic, since Vercel AI SDK handles all the underlying complexity for you.
import { google } from "@ai-sdk/google";import { anthropic } from '@ai-sdk/anthropic';import { generateText } from "ai";const gemini = google("gemini-2.0-flash-001");const claude = anthropic("claude-sonnet-4-20250514");const SYSTEM_PROMPT = "You are a JavaScript expert. Please answer the user's question concisely."const jsExpert = async ({prompt,model,}) => {const { text } = await generateText({model,prompt,system: SYSTEM_PROMPT,});console.log(text);};await jsExpert({ prompt: "What's JavaScript?", model: gemini });await jsExpert({ prompt: "What's JavaScript?", model: claude });
In this article, we'll build a simple CLI chat interface to explore the APIs provided by the AI SDK.
Environment Setup
Installation
Start by navigating to your development directory and run the following setup commands:
> git init> npx gitignore node> pnpm init> pnpm add -D @types/dotenv-safe @types/node @typescript-eslint/eslint-plugin @typescript-eslint/parser eslint tsx typescript> pnpm add @ai-sdk/google ai dotenv dotenv-safe zod
Add the following scripts to your package.json
:
{..."scripts": {"build": "tsc","start": "node dist/main.js","dev": "tsx src/main.ts","lint": "eslint src --ext .ts","clean": "rm -rf dist"},...}
generateText
generateText
is straightforward to use—simply provide the model and prompt, and you'll receive the LLM's response.
// main.tsimport * as dotenvSafe from "dotenv-safe";import { google } from "@ai-sdk/google";import { generateText } from "ai";dotenvSafe.config();const model = google("gemini-2.0-flash-001");const { text } = await generateText({model,prompt: "What's the ECMAScript? Please answer the question concisely.",});console.log(text);
streamText
While generateText
returns complete responses in one go, the inherent computational and transmission latency of LLMs means users still experience waiting times between submitting a query and seeing the full response, which can degrade user experience. streamText
addresses this with real-time token-by-token streaming.
import * as dotenvSafe from "dotenv-safe";import { google } from "@ai-sdk/google";import { generateText, streamText } from "ai";dotenvSafe.config();const model = google("gemini-2.0-flash-001");const { textStream } = streamText({model,prompt: "What is ECMAScript?",// Sometimes the AI has to follow a specific behavior, no matter the prompt it receives. -- use system, it can be use both generateText and streamTextsystem:"You are a JavaScript expert. Please answer the user's question concisely.",});for await (let text of textStream) {process.stdout.write(text);}
streamText
returns results in chunks, allowing the frontend to render partial text immediately, significantly improving content delivery fluidity.
System Prompts
When you need the AI to follow specific behaviors, system prompts provide a way to predefine instructions. We've already seen how to include prompts in the system
field, but you can also place them in the messages
array.
const { textStream } = streamText({model,prompt: "What is ECMAScript?",messages: [{ role: "system", content: "You are a JavaScript expert. Please answer the user's question concisely." }]});
CLI Chat Interface
Let's outline the main commands for our CLI chat interface:
history
: Display your conversation history with the systemhelp
: Show available commandsexit
: Exit the chat
The start
method in the Chat class continuously reads user input. If the input starts with /
, it calls handleCommand
; otherwise, it calls streamResponse
. This implementation uses the AI SDK as described earlier, with the addition of maintaining user chat history in this.messages
.
// main.jsconst dotenvSafe = require("dotenv-safe");const { google } = require("@ai-sdk/google");const { streamText } = require("ai");const readline = require("readline");dotenvSafe.config();const gemini = google("gemini-2.0-flash-001");class Chat {rl;messages = [/** System prompt can also be put it as the first chat in the history */{role: "system",content:"You are a helpful assistant. Please answer the user's questions concisely and helpfully.",},];constructor() {this.rl = readline.createInterface({input: process.stdin,output: process.stdout,});}async streamResponse(prompt) {this.messages.push({ role: "user", content: prompt });try {const { textStream } = streamText({model: gemini,messages: this.messages,});let assistantResponse = "";process.stdout.write(`\nAssistant: `);for await (let text of textStream) {process.stdout.write(text);assistantResponse += text;}this.messages.push({ role: "assistant", content: assistantResponse });console.log("\n");} catch (err) {console.error(err);}}async handleCommand(input) {const command = input.slice(1).toLowerCase();switch (command) {case "help":console.log(`Commands:/help - Show this help/history - Show conversation history/exit - Exit chatJust type your message to chat!`);return false;case "history":console.log(JSON.stringify(this.messages, null, 2));return false;case "exit":console.log("See ya!");return true;default:console.log(`Unknown command: ${command}\nType /help for available commands\n`,);return false;}}async start() {while (true) {try {// const input: string becomes const inputconst input = await new Promise((res) =>this.rl.question("You: ", res),);if (!input.trim()) continue;if (input.startsWith("/")) {const shouldExit = await this.handleCommand(input);if (shouldExit) break;continue;}await this.streamResponse(input);} catch (error) {console.error(`Error: ${error}`);break;}}this.rl.close();}}async function main() {console.log("Type /help for commands\n");const chat = new Chat();await chat.start();}process.on("SIGINT", () => {console.log("\nGoodbye!");process.exit(0);});main().catch(console.error);
You can now interact with the CLI chat interface using pnpm dev
:
> pnpm dev> [email protected] dev /Users/jinghuangsu/workspace/ai/ai-sdk> tsx src/main.tsType /help for commandsYou: hiAssistant: Hi there! How can I help you today?You: /history[{"role": "user","content": "hi"},{"role": "assistant","content": "Hi there! How can I help you today?\n"}]
Advanced Features
Let's explore additional APIs. Feel free to extend your Chat CLI with these features as you read along.
generateObject
Using Zod with generateObject
ensures the AI returns data in a predefined object structure. For example, when asking an AI for a recipe, receiving a continuous block of text is much harder to parse and utilize compared to structured data.
This is where generateObject
shines—it transforms unstructured AI output into structured data.
import { google } from "@ai-sdk/google";import { generateObject } from 'ai';import { z } from 'zod';const model = google("gemini-2.0-flash-001");const { object } = await generateObject({model,schema: z.object({recipe: z.object({name: z.string(),ingredients: z.array(z.string()),steps: z.array(z.string()),}),}),prompt: 'Generate a Mapo Tofu recipe.',});console.log(JSON.stringify(object, null, 2));
You can also use this API to generate specific enum values. For instance, in an era of information overload, if you want an AI to classify an article's sentiment as positive or negative, you can predefine this in generateObject
.
import { google } from "@ai-sdk/google";import { generateObject } from 'ai';const model = google("gemini-2.0-flash-001");const article = "Today is a good day";const { result } = await generateObject({model,output: 'enum',enum: ['positive', 'negative', 'neatural'],system: `You are a professional sentiment analyst. Your task is to determine the overall emotional tone of the article content provided by the user. You must select ONLY the most fitting result from 'positive', 'negative', or 'neutral' as your output, and must NOT include any explanation or additional text.`,prompt: article});// "positive"console.log(result);
streamObject
streamObject
addresses the same problem as streamText
but for structured data. Here's how you would rewrite the generateObject
example using streamObject
:
import { google } from "@ai-sdk/google";import { streamObject } from 'ai';import { z } from 'zod';const model = google("gemini-2.0-flash-001");const result = await streamObject({model,schema: z.object({recipe: z.object({name: z.string(),ingredients: z.array(z.string()),steps: z.array(z.string()),}),}),prompt: 'Generate a Mapo Tofu recipe.',});for await (let chunk of result.partialObjectStream) {console.clear();console.dir(chunk, { depth: null });}const finalObject = await result.object;console.log(finalObject)
Non-Text Inputs
Modern LLMs are multimodal, and the AI SDK supports non-text inputs. For example, to ask an AI to describe an image, you can use any of the previously mentioned APIs with slight modifications to the input parameters.
const imgURL = "https://imgs.parseweb.dev/images/fe-profermance/throttle-and-debounce/og.png"const { text } = await generateText({model: gemini,system: `You will receive an image. Please describe it concisely, ensuring the output length is within 100 words.`,messages: [{role: "user",content: [{type: "image",image: new URL(imgURL),},],},],});console.log(text)// You: https://imgs.parseweb.dev/images/fe-profermance/throttle-and-debounce/og.png// Assistant: "The image illustrates the concepts of \"Debounce\" and \"Throttle\" using simple diagrams. \"Debounce\" is represented by a series of close arrows merging into a single arrow pointing towards a stick figure. \"Throttle\" shows three vertical arrows stemming from a horizontal line and then flowing into a single arrow directed at a stick figure. The stick figures are essentially the same, each pointing a finger toward the arrows. The background is a plain, light color, ensuring the focus remains on the diagrams and text.\n"
Conclusion
This article provided a basic introduction to the AI SDK's core APIs. I highly recommend checking out the AI SDK official website for more detailed and comprehensive documentation!