Firething Insights
HomeBlogAbout

Building a Modern RAG Application with Next.js, Vercel AI SDK and LangChain

by Edgar Kang

First published on January 3, 2026

Last updated on January 11, 2026

RAGNext.jsVercel AI SDKVercel AI GatewayLangChainPrismaPostgreSQLpgvectorGemini

Retrieval-Augmented Generation (RAG) has become the gold standard for building AI applications that can “talk” to your data. In this post, I’ll walk you through how we can build a simplified, high-performance chat application that handles document uploads, vector searches, and grounded AI responses.

🚀 Live Demo  | 💻 GitHub Repository 


Table of Contents

  1. The Tech Stack
  2. Architecture Overview
  3. Chat & File Management
  4. Database Design
  5. Deep Dive into Implementation
  6. Frontend: Responsive 3-Column UI
  7. Usage Limits via Vercel AI Gateway API
  8. Conclusion

Technical Stack

To build a production-grade RAG app, we chose a modern, type-safe stack:

  • App Framework: Next.js  (App Router)
  • AI Orchestration: LangChain.js  and Vercel AI SDK 
  • Database: Prisma  for PostgreSQL  with the pgvector  extension
  • LLM & Embeddings: Google Gemini  via Vercel AI Gateway 
  • Styling: Tailwind CSS  & Shadcn UI 

Architecture Overview

The application follows a classic RAG pipeline but with advanced agentic twists. It separates the “Write Path” (Ingestion) from the “Read Path” (Retrieval & Generation).

Overall System Architecture

The following diagram illustrates how the frontend, backend actions, AI Gateway, and storage layers interact:

RAG Principle & Procedure

The following sequence diagram illustrates the lifecycle of a document and a user query within the system, showing the interaction between the client, server, AI models, and vector database:


Chat & File Management

Managing the lifecycle of chats and documents requires a robust orchestration layer that handles both metadata management and physical storage.

Vercel AI Gateway as Universal AI Engine

One of the key technical simplifications in this project is the use of an Vercel AI Gateway. Instead of managing separate providers and API keys for chat models and embedding models, we use a single entry point that gives us:

  • Simplified Configuration: a single BASE_URL and API_KEY for all AI needs.
  • Unified Observability: track credits, usage, and performance for both chatting and embeddings in one dashboard.
  • Model Flexibility: easily swap underlying models (e.g., from Gemini to OpenAI) without changing the core application logic.

Global Documents & Chat References

The system treats every uploaded file as a global document. When a file is uploaded, it is assigned a unique ID, and its text is extracted and split into chunks.

  • Global Availability: Once a document is in the system, it exists independently of any singular chat.
  • Chat-Specific References: A join table (ChatDocument) allows specific chats to “reference” these global documents. This means the RAG agent only searches through documents that have been explicitly added to the current conversation’s context, ensuring high relevance and reduced “noise”.

Physical Files & Embedding Maintenance

The application maintains a dual-storage strategy for documents:

  • Physical Binary Storage: The actual files (PDFs, TXT, etc.) are stored in Vercel Blob. This ensures the files are globally accessible via URL and can be previewed or downloaded by the user.
  • Semantic Storage: The text content is chunked and turned into vector embeddings, which are stored in the embeddings table in PostgreSQL. These embeddings are what enable the “semantic” search capability, allowing the agent to find information based on meaning rather than just keyword matches.

Database Design

We use Prisma with the pgvector extension to handle both relational metadata and high-dimensional vector embeddings in a single database.

Why Prisma ORM?

We chose Prisma to manage our database for several reasons:

  • Type Safety: Prisma generates a TypeScript client that matches our schema, preventing runtime errors when querying chats or documents.
  • Schema Migrations: Managing the complex relationships between chats, messages, and documents is effortless with Prisma’s declarative schema.
  • Vectorization Support: Prisma makes it easy to work with PostgreSQL extensions such as pgvector for similarity searches, allowing us to combine relational queries with vector searches in the same codebase.

ER Diagram

Main Entities

EntityTable NameDescription
ChatchatsThe root of a conversation session.
MessagemessagesIndividual turns in a conversation (user vs. assistant).
DocumentdocumentsPhysical files uploaded by the user.
ChatDocumentchat_documentsA many-to-many join table allowing documents to be reused across different chat contexts.
EmbeddingembeddingsHigh-dimensional representations of document chunks, enabling semantic search.

Deep Dive into Implementation

Document Processing

Handling different file types (e.g., PDF, TXT, etc.) efficiently is crucial. In lib/document-processor.ts, we use a unified interface for processing:

export async function extractTextFromFile(file: File): Promise<string> {
  if (file.type === 'application/pdf' || file.name.endsWith('.pdf')) {
    const loader = new PDFLoader(file);
    const docs = await loader.load();
    return docs.map((doc) => doc.pageContent).join('\n');
  }
  return await file.text();
}

We then use RecursiveCharacterTextSplitter to ensure chunks don’t cut off in the middle of important sentences:

const textSplitter = new RecursiveCharacterTextSplitter({
  chunkSize: 1000,
  chunkOverlap: 200,
  separators: ['\n\n', '\n', '. ', ' ', ''],
});

Vector Storage with Prisma

Instead of a separate vector database, we used pgvector directly within PostgreSQL. This allows us to keep our relational data (Chats, Messages) and vector data in one place.

Using the PrismaVectorStore integration from LangChain, we can perform similarity searches with a familiar API just like:

const results = await vectorStore.similaritySearch(query, topK, { documentId: { in: docIds } });

The RAG Agent & Middleware

We used a middleware pattern which is the essential pattern in LangChain to build a modular RAG agent. This approach separates the concerns of history management, query transformation, and retrieval.

Middleware Chain Sequence Diagram

Key Middlewares

MiddlewareResponsibility
Summarization
(built-in middleware)
Automatically compresses long histories to stay within the model’s context window.
Query Transformation
(custom middleware)
Rewrites ambiguous or redundant user prompts into standalone search queries based on history.
Retrieval
(wrapped by dynamic-prompting middleware)
Injects the latest retrieved context directly into the system instructions before the model generates its response.

Frontend: Responsive 3-Column UI

The UI is built for productivity using a responsive 3-column layout.

  • Left Column: chat history list
  • Center Column: main chat interface
  • Right Column: global knowledge base (file list)

Inter-Component Reactivity

The components of 3 columns aren’t isolated; they respond to each other’s changes via a combination of React state and global events:

  • Header updates: When the agent generates a title in the center column, the currentTitle state updates the header, and a chat-updated event notifies the Left Sidebar to refresh.
  • File uploads: When a file is uploaded in the center chat UI, it triggers a knowledge-base-updated event, ensuring file items in the right sidebar (knowledge base) are immediately visible.

Vercel AI Elements Integration

The main chat interface uses a set of specialized layout components (Vercel AI Elements) found in components/ai-elements/.

  • Conversation: Manages the scrollable container and message list.
  • Message: Handles the layout for individual user/assistant turns.
  • PromptInput: A sophisticated input area with support for multi-line text and file upload triggers.

Vercel AI SDK Integration

The essential part of the interaction between the chat UI and the backend is the use of useChat hook from Vercel AI SDK, as it handles the heavy lifting of:

  • Streaming: Automatically parsing and updating the message list as the AI streams its response.
  • Input Persistence: Managing the current prompt state.
  • Lifecycle Callbacks: Invoking onFinish to persist AI responses to the database and trigger automatic titling.
const { messages, sendMessage, status } = useChat({
  id: chatId,
  onFinish: async ({ message }) => {
    // Handles the AI response, save message to DB, and generate title
  },
});

Usage Limits via Vercel AI Gateway API

While this application is a demo, deploying it to a public platform like Vercel requires careful attention to usage limits to manage costs and prevent abuse. We integrated these safeguards by leveraging the Vercel AI Gateway’s balance API.

Real-time Credit Monitoring

The backend interacts with the AI Gateway’s /credits endpoint to retrieve the current account balance. This ensures that the application is always aware of the available resources before initiating expensive operations.

// lib/actions/usage.actions.ts
const response = await fetch(`${gatewayBaseURL}/credits`, {
  headers: { Authorization: `Bearer ${gatewayApiKey}` },
});
const { balance } = await response.json();

Proactive UI Enforcement

We use a UsageProvider context to propagate the limit status across the entire application. When the credit balance falls below a safe threshold (e.g., 3 credits), the UI reactively disables input areas and upload buttons, providing immediate feedback to the user via tooltips.

// components/chat-box.tsx
const { isCreditLimitReached } = useUsage();
 
<PromptInputTextarea
  disabled={isCreditLimitReached}
  placeholder={isCreditLimitReached ? "Usage limit reached..." : "Type your message..."}
/>

Global Storage Safeguards

Beyond individual credits, we also monitor the total storage footprint in PostgreSQL. By tracking the size of all uploaded documents, we enforce a 200MB global limit, ensuring the database remains performant and cost-effective.

Moving Toward Production

While the current safeguards help manage resources, a true production application should also implement robust authentication and authorization. If you plan to expand this demo further, it is highly recommended to integrate a reliable auth framework such as Better Auth or Auth.js, which allows you to associate chats and documents with specific user accounts, enforce per-user quotas, and ensure that private data stays private.


Conclusion

Building a robust RAG application today is less about “writing the search” and more about orchestrating agents and workflows. The complexity has shifted from the algorithm of search to the orchestration of the conversation flow.

Throughout the development of this project, we leveraged modern AI-powered IDEs like Antigravity. By practicing vibe coding — describing high-level intent and letting the AI handle the boilerplate — we were able to move from a blank screen to a well-organized, pgvector-integrated application in record time. This shift in developer experience allows us to focus on the logic and user experience of the AI, rather than the minutiae of configuration.

Copyright © 2026 Firething.