AI chat is quickly becoming a standard feature in modern SaaS applications. Whether you're building a customer support tool, a coding assistant, or a content generation platform, users expect real-time, conversational AI interfaces.
But integrating AI chat into a Next.js app involves more than just calling an API. You need streaming responses, authentication, rate limiting, conversation persistence, and proper error handling. Getting all of these pieces working together is where most tutorials fall short.
This guide walks through every layer of a production-ready AI chat integration in Next.js, from the API route to the frontend UI.
This guide uses Claude (Anthropic's API) for the examples, but the architecture applies to any LLM provider.
The core of your AI chat lives in a Next.js API route. This handles receiving the user's message, calling the LLM, and returning the response.
// src/app/api/ai/route.ts
import Anthropic from "@anthropic-ai/sdk";
import { NextRequest, NextResponse } from "next/server";
const anthropic = new Anthropic({
apiKey: process.env.ANTHROPIC_API_KEY!,
});
export async function POST(req: NextRequest) {
const { prompt } = await req.json();
const message = await anthropic.messages.create({
model: "claude-sonnet-4-5-20250929",
max_tokens: 1024,
messages: [{ role: "user", content: prompt }],
});
const text = message.content[0].type === "text"
? message.content[0].text
: "";
return NextResponse.json({ response: text });
}
This works for prototyping, but it has serious problems for production:
Here's what a production-ready version looks like. This is significantly more complex:
// src/app/api/ai/route.ts
import Anthropic from "@anthropic-ai/sdk";
import { NextRequest } from "next/server";
import { auth } from "@/lib/auth";
import { prisma } from "@/lib/prisma";
import { rateLimit } from "@/lib/rate-limit";
const anthropic = new Anthropic({
apiKey: process.env.ANTHROPIC_API_KEY!,
});
const MAX_PROMPT_LENGTH = 4000;
export async function POST(req: NextRequest) {
// 1. Authenticate the user
const session = await auth();
if (!session?.user?.id) {
return new Response("Unauthorized", { status: 401 });
}
// 2. Rate limit per user
const { success } = await rateLimit(session.user.id);
if (!success) {
return new Response("Rate limited", { status: 429 });
}
// 3. Validate input
const { prompt, systemPrompt } = await req.json();
if (!prompt || typeof prompt !== "string") {
return new Response("Invalid prompt", { status: 400 });
}
if (prompt.length > MAX_PROMPT_LENGTH) {
return new Response("Prompt too long", { status: 400 });
}
// 4. Check subscription tier for daily limits
const user = await prisma.user.findUnique({
where: { id: session.user.id },
select: { stripePriceId: true },
});
const dailyLimit = user?.stripePriceId ? 1000 : 50;
const todayCount = await getDailyMessageCount(session.user.id);
if (todayCount >= dailyLimit) {
return new Response("Daily limit reached", { status: 429 });
}
// 5. Stream the response
const stream = await anthropic.messages.stream({
model: "claude-sonnet-4-5-20250929",
max_tokens: 1024,
system: systemPrompt || "You are a helpful assistant.",
messages: [{ role: "user", content: prompt }],
});
// 6. Return as Server-Sent Events
const encoder = new TextEncoder();
const readable = new ReadableStream({
async start(controller) {
let fullText = "";
for await (const event of stream) {
if (
event.type === "content_block_delta" &&
event.delta.type === "text_delta"
) {
fullText += event.delta.text;
controller.enqueue(
encoder.encode(`data: ${JSON.stringify({
text: event.delta.text
})}\n\n`)
);
}
}
// 7. Save to database after completion
await prisma.message.create({
data: {
userId: session.user.id,
role: "user",
content: prompt,
},
});
await prisma.message.create({
data: {
userId: session.user.id,
role: "assistant",
content: fullText,
},
});
controller.enqueue(encoder.encode("data: [DONE]\n\n"));
controller.close();
},
});
return new Response(readable, {
headers: {
"Content-Type": "text/event-stream",
"Cache-Control": "no-cache",
Connection: "keep-alive",
},
});
}
That's about 80 lines of code for the API route alone. And we haven't touched the frontend yet.
Rate limiting prevents a single user from draining your API budget. You need two layers:
For development, an in-memory rate limiter works. For production, you need Redis or a database-backed counter.
// src/lib/rate-limit.ts
const rateLimitMap = new Map<string, { count: number; resetTime: number }>();
export async function rateLimit(userId: string) {
const now = Date.now();
const windowMs = 60_000; // 1 minute window
const maxRequests = 10;
const entry = rateLimitMap.get(userId);
if (!entry || now > entry.resetTime) {
rateLimitMap.set(userId, { count: 1, resetTime: now + windowMs });
return { success: true };
}
if (entry.count >= maxRequests) {
return { success: false };
}
entry.count++;
return { success: true };
}
This in-memory version resets on deployment. For production, use Upstash Redis or track request counts in your PostgreSQL database.
Building a good chat interface is harder than it looks. You need:
// src/components/Chat.tsx
"use client";
import { useState, useRef, useEffect } from "react";
export default function Chat() {
const [messages, setMessages] = useState<Message[]>([]);
const [input, setInput] = useState("");
const [isStreaming, setIsStreaming] = useState(false);
const scrollRef = useRef<HTMLDivElement>(null);
async function sendMessage() {
if (!input.trim() || isStreaming) return;
const userMsg = { role: "user", content: input };
setMessages((prev) => [...prev, userMsg]);
setInput("");
setIsStreaming(true);
try {
const res = await fetch("/api/ai", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ prompt: input }),
});
if (!res.ok) throw new Error("API error");
// Handle SSE streaming
const reader = res.body!.getReader();
const decoder = new TextDecoder();
let aiText = "";
setMessages((prev) => [
...prev,
{ role: "assistant", content: "" },
]);
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
const lines = chunk.split("\n");
for (const line of lines) {
if (line.startsWith("data: ")) {
const data = line.slice(6);
if (data === "[DONE]") break;
const parsed = JSON.parse(data);
aiText += parsed.text;
setMessages((prev) => {
const updated = [...prev];
updated[updated.length - 1] = {
role: "assistant",
content: aiText,
};
return updated;
});
}
}
}
} catch (err) {
setMessages((prev) => [
...prev,
{ role: "error", content: "Something went wrong. Try again." },
]);
} finally {
setIsStreaming(false);
}
}
// Auto-scroll to bottom on new messages
useEffect(() => {
scrollRef.current?.scrollIntoView({ behavior: "smooth" });
}, [messages]);
return (
<div className="flex flex-col h-full">
{/* Messages */}
<div className="flex-1 overflow-y-auto p-4 space-y-4">
{messages.map((msg, i) => (
<div
key={i}
className={`flex ${
msg.role === "user" ? "justify-end" : "justify-start"
}`}
>
<div
className={`max-w-[75%] rounded-2xl px-4 py-2 ${
msg.role === "user"
? "bg-black text-white"
: "bg-gray-100 text-gray-800"
}`}
>
{msg.content}
</div>
</div>
))}
<div ref={scrollRef} />
</div>
{/* Input */}
<form onSubmit={(e) => { e.preventDefault(); sendMessage(); }}>
<input
value={input}
onChange={(e) => setInput(e.target.value)}
placeholder="Ask anything..."
disabled={isStreaming}
/>
<button type="submit" disabled={isStreaming}>Send</button>
</form>
</div>
);
}
This is a simplified version. A production chat UI also needs markdown rendering, code syntax highlighting, copy buttons, conversation history loading, and responsive design adjustments. Expect to spend 2-3 days getting the UX right.
Users expect their conversations to be saved. This requires database models for conversations and messages:
// prisma/schema.prisma
model Conversation {
id String @id @default(cuid())
title String?
userId String
user User @relation(fields: [userId], references: [id])
messages Message[]
createdAt DateTime @default(now())
updatedAt DateTime @updatedAt
}
model Message {
id String @id @default(cuid())
role String // "user" | "assistant" | "system"
content String @db.Text
conversationId String
conversation Conversation @relation(fields: [conversationId], references: [id])
tokenCount Int?
createdAt DateTime @default(now())
}
You also need API routes to list conversations, load a specific conversation's messages, and create new conversations. Each of these needs authentication checks.
Many AI SaaS products offer different "modes" or personas — a code expert, a writing assistant, a business advisor. This is implemented through system prompts:
const personas = {
"code-expert": {
name: "Code Expert",
systemPrompt: "You are an expert software engineer...",
},
"writer": {
name: "Creative Writer",
systemPrompt: "You are a creative writing assistant...",
},
"advisor": {
name: "Business Advisor",
systemPrompt: "You are a startup business advisor...",
},
};
The tricky part is letting users switch personas mid-conversation while keeping conversation context intact, and sanitizing the system prompt to prevent prompt injection from user input.
If you're offering tiered access, you need to track how many tokens each user consumes. The Anthropic API returns token counts in the response:
const response = await anthropic.messages.create({...});
// response.usage contains:
// { input_tokens: 42, output_tokens: 156 }
await prisma.user.update({
where: { id: userId },
data: {
totalTokensUsed: { increment: response.usage.input_tokens + response.usage.output_tokens },
},
});
This is straightforward with non-streaming responses. With streaming, you need to count tokens from the final message_delta event, which requires careful event handling.
Adding production-ready AI chat to a Next.js app involves:
| Component | Estimated Time |
|---|---|
| API Route (auth, streaming, validation) | 1-2 days |
| Rate Limiting (per-minute + per-day) | 2-3 hours |
| Chat UI (messages, streaming, responsive) | 2-3 days |
| Conversation Persistence (DB + API routes) | 3-4 hours |
| Multi-Persona System | 2-3 hours |
| Token Tracking | 2-3 hours |
| Error Handling + Edge Cases | 1-2 days |
| Total | 5-10 days |
That's 5-10 days of focused development, assuming you already have authentication and a database set up. If you're starting from scratch, add another week for the foundational infrastructure.
If you'd rather skip 2-3 weeks of integration work and start building your actual product features, LaunchFast includes everything described in this guide — already built, tested, and deployed.
What you get out of the box:
One command to clone. Five minutes to configure. Deploy to Vercel and start building your product.
LaunchFast Standard ($59) — Everything you need to launch an AI-powered SaaS.
LaunchFast Pro ($89) — Standard plus priority support and advanced examples.
Live demo: launchfast-starter.vercel.app | Source: GitHub
30-day money-back guarantee. If LaunchFast doesn't save you time, get a full refund.