Serverless RAG on AWS: Step-by-Step PDF Embedding with Lambda, Bedrock, FAISS & LangChain

A step-by-step walkthrough of building an automated PDF embedding application using Lambda, Bedrock, FAISS, and LangChain — with honest feedback and what I'd improve next time.

The Problem This Solves

If you've ever wanted to ask a document a question — not keyword search it, but genuinely query its meaning — you need a Retrieval-Augmented Generation (RAG) pipeline. This lab taught me how to build one from scratch on AWS using fully serverless, event-driven components.

Instead of fine-tuning a model, RAG lets you store your documents as vector embeddings and retrieve the semantically relevant chunks at query time. The AI never "learns" your document — it retrieves and reasons on the fly.

💡 "RAG is the practical alternative to fine-tuning — cheaper, faster to update, and far more flexible."

Architecture Overview

The application is fully event-driven and uses three Lambda functions, each with a single responsibility:

PDF Upload → S3 → ExtractMetadata Lambda → SQS Queue → GenerateEmbeddings Lambda → FAISS Index → S3
                                                                                          ↓
                                                                               GenerateResponse Lambda ← User Query

Function	Trigger	Responsibility	Config
`ExtractMetadata`	S3 ObjectCreated (`.pdf`)	Reads page count, stores metadata in DynamoDB, pushes job to SQS	Default
`GenerateEmbeddings`	SQS message (batch=1)	Downloads PDF, creates FAISS index via Bedrock, uploads index files back to S3	1024 MB · 180s
`GenerateResponse`	Manual / API invocation	Loads FAISS index, runs similarity search, returns the most relevant chunk	30s

AWS Services Used

Amazon S3 — stores PDFs and FAISS index files
Amazon SQS — decouples metadata extraction from the heavy embedding job
Amazon DynamoDB — tracks document metadata and processing status
Amazon Bedrock — provides the amazon.titan-embed-text-v1 embedding model
AWS Lambda — serverless compute for all three functions
Amazon ECR — hosts container images for Lambda (bypasses the 250 MB zip limit)
AWS SAM — infrastructure-as-code and deployment tooling

Step-by-Step Walkthrough

Step 1 — Configure the SAM CLI

The samconfig.toml file stores your deployment parameters so you don't repeat CLI flags on every deploy:

version=0.1
[default.global.parameters]
stack_name = "EmbeddingsStack"

[default.deploy.parameters]
region = "us-west-2"
s3_bucket = "deploy-assets-tkngbx"
s3_prefix = "ca-lab"
confirm_changeset = true
capabilities = "CAPABILITY_IAM"
tags = "project=\"ca-labs\" stage=\"development\""

Step 2 — Define Infrastructure in template.yaml

The SAM template wires everything together. The core infrastructure resources are:

AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31

Globals:
  Function:
    Architectures:
      - x86_64
    Environment:
      Variables:
        LOG_LEVEL: INFO

Resources:
  DocumentBucket:
    Type: "AWS::S3::Bucket"
    Properties:
      BucketName: !Sub "documents-${AWS::AccountId}"

  EmbeddingQueue:
    Type: AWS::SQS::Queue
    Properties:
      VisibilityTimeout: 180
      MessageRetentionPeriod: 3600

  DocumentTable:
    Type: AWS::DynamoDB::Table
    Properties:
      KeySchema:
        - AttributeName: document_id
          KeyType: HASH
        - AttributeName: created
          KeyType: RANGE
      BillingMode: PAY_PER_REQUEST

Why container images instead of zip packages? Lambda functions here use ECR container images. This sidesteps the 250 MB zip limit — critical when your dependencies include PyPDF2, LangChain, and FAISS which together are quite large.

The three Lambda functions are added to this same template:

  ExtractMetadata:
    Type: AWS::Serverless::Function
    Properties:
      FunctionName: ExtractMetadata
      ImageUri: !Sub "\({AWS::AccountId}.dkr.ecr.\){AWS::Region}.amazonaws.com/lambdas:extract-metadata-latest"
      Role: !Sub "arn:aws:iam::${AWS::AccountId}:role/LambdaExecutionRole"
      PackageType: Image
      Environment:
        Variables:
          DOCUMENT_TABLE: !Ref DocumentTable
          QUEUE: !GetAtt EmbeddingQueue.QueueName
          BUCKET: !Sub "documents-${AWS::AccountId}"
      Events:
        S3Event:
          Type: S3
          Properties:
            Bucket: !Ref DocumentBucket
            Events:
              - s3:ObjectCreated:*
            Filter:
              S3Key:
                Rules:
                  - Name: suffix
                    Value: .pdf

  GenerateEmbeddings:
    Type: AWS::Serverless::Function
    Properties:
      FunctionName: GenerateEmbeddings
      ImageUri: !Sub "\({AWS::AccountId}.dkr.ecr.\){AWS::Region}.amazonaws.com/lambdas:generate-embeddings-latest"
      Role: !Sub "arn:aws:iam::${AWS::AccountId}:role/LambdaExecutionRole"
      PackageType: Image
      Timeout: 180
      MemorySize: 1024
      Environment:
        Variables:
          DOCUMENT_TABLE: !Ref DocumentTable
          BUCKET: !Ref DocumentBucket
      Events:
        EmbeddingQueueEvent:
          Type: SQS
          Properties:
            Queue: !GetAtt EmbeddingQueue.Arn
            BatchSize: 1

  GenerateResponse:
    Type: AWS::Serverless::Function
    Properties:
      FunctionName: GenerateResponse
      Timeout: 30
      ImageUri: !Sub "\({AWS::AccountId}.dkr.ecr.\){AWS::Region}.amazonaws.com/lambdas:generate-response-latest"
      Role: !Sub "arn:aws:iam::${AWS::AccountId}:role/LambdaExecutionRole"
      PackageType: Image
      Environment:
        Variables:
          BUCKET: !Ref DocumentBucket

Step 3 — The ExtractMetadata Function

Triggered automatically when any .pdf lands in S3. It downloads the file, reads the page count, writes a DynamoDB record, and dispatches a job to SQS.

def lambda_handler(event, context):
    key = urllib.parse.unquote_plus(event["Records"][0]["s3"]["object"]["key"])
    file_name = key.split("/")[0]
    document_id = shortuuid.uuid()

    s3.download_file(BUCKET, key, f"/tmp/{file_name}")

    with open(f"/tmp/{file_name}", "rb") as f:
        reader = PyPDF2.PdfReader(f)
        pages = str(len(reader.pages))

    document = {
        "document_id": document_id,
        "filename": file_name,
        "created": timestamp_str,
        "pages": pages,
        "filesize": str(event["Records"][0]["s3"]["object"]["size"]),
        "docstatus": "UPLOADED"
    }

    document_table.put_item(Item=document)
    sqs.send_message(QueueUrl=QUEUE,      MessageBody=json.dumps(message))

What's happening here:

The S3 event delivers the object key — we parse out the file name
shortuuid generates a unique ID per document
PyPDF2 reads the file to count pages (no heavy ML yet)
DynamoDB records the document with status UPLOADED
SQS receives a lightweight message containing just the document ID and key

Step 4 — The GenerateEmbeddings Function (the heavy lifter)

This function is triggered by SQS messages. It does the actual ML work — loading the PDF with LangChain, calling Bedrock to embed text chunks, and saving a FAISS index.

from langchain_community.embeddings import BedrockEmbeddings
from langchain_community.document_loaders import PyPDFLoader
from langchain.indexes import VectorstoreIndexCreator
from langchain_community.vectorstores import FAISS

def lambda_handler(event, context):
    # Parse SQS message
    event_body = json.loads(event["Records"][0]["body"])
    document_id = event_body["documentid"]
    key = event_body["key"]

    set_doc_status(document_id, created, "PROCESSING")

    # Download and load PDF
    s3.download_file(BUCKET, key, f"/tmp/{file_name_full}")
    loader = PyPDFLoader(f"/tmp/{file_name_full}")

    # Set up Bedrock embeddings
    bedrock_runtime = boto3.client(service_name="bedrock-runtime", region_name="us-west-2")
    embeddings = BedrockEmbeddings(
        model_id="amazon.titan-embed-text-v1",
        client=bedrock_runtime,
    )

    # Create and save FAISS index
    index_creator = VectorstoreIndexCreator(vectorstore_cls=FAISS, embedding=embeddings)
    index_from_loader = index_creator.from_loaders([loader])
    index_from_loader.vectorstore.save_local("/tmp")

    # Upload index files to S3
    s3.upload_file("/tmp/index.faiss", BUCKET, f"{file_name_full}/index.faiss")
    s3.upload_file("/tmp/index.pkl",   BUCKET, f"{file_name_full}/index.pkl")

    set_doc_status(document_id, created, "READY")

LangChain components used:

Component	Role
`PyPDFLoader`	Parses the PDF and extracts text per page
`BedrockEmbeddings`	Calls Bedrock Titan to convert text chunks into vectors
`VectorstoreIndexCreator`	Orchestrates chunking + embedding + indexing
`FAISS`	Stores vectors locally for fast similarity search

Step 5 — The GenerateResponse Function

This one is invoked on demand. It loads the pre-built FAISS index from S3, runs a similarity search against your query, and returns the most relevant text chunk.

def lambda_handler(event, context):
    file_name    = event["file_name"]
    human_input  = event["prompt"]

    # Pull FAISS index from S3
    s3.download_file(BUCKET, f"{file_name}/index.faiss", "/tmp/index.faiss")
    s3.download_file(BUCKET, f"{file_name}/index.pkl",   "/tmp/index.pkl")

    # Load the index
    embeddings  = BedrockEmbeddings(model_id="amazon.titan-embed-text-v1", client=bedrock_runtime)
    faiss_index = FAISS.load_local("/tmp", embeddings, allow_dangerous_deserialization=True)

    # Retrieve top matching chunk
    retriever = faiss_index.as_retriever()
    docs      = retriever.invoke(human_input)

    return {
        "statusCode": 200,
        "body": json.dumps(docs[0].page_content),
    }

Step 6 — Deploy

sam deploy --resolve-image-repos

The --resolve-image-repos flag tells SAM to pull Lambda container images directly from ECR, skipping the local packaging step. Confirm the changeset when prompted. Deployment completes in ~3 minutes.

Step 7 — Test It

Upload waf-intro.pdf (an AWS Well-Architected Framework page) to the S3 bucket. After a few moments, a new folder appears in the bucket containing index.faiss and index.pkl — the pipeline ran automatically.

Then invoke GenerateResponse from the Lambda console Test tab with:

{
  "file_name": "waf-intro.pdf",
  "prompt": "what is the AWS WA Tool?"
}

Result: The function returned the most semantically similar paragraph from the document — the document was split into 3 chunks, and the correct section about the AWS Well-Architected Tool was retrieved instantly. ✅

Key Concepts I Internalized

Embeddings are coordinates. Each text chunk becomes a point in high-dimensional vector space. Similarity search finds the nearest neighbour to your query vector — no keyword matching involved.

FAISS is fast. Facebook AI's library does approximate nearest-neighbour search efficiently. For Lambda's ephemeral environment where compute time is money, this matters.

LangChain as glue. The abstractions (loaders, embeddings, retrievers) dramatically cut down the boilerplate needed to wire Bedrock and FAISS together. Without it you'd be writing a lot of repetitive plumbing code.

ECR for fat dependencies. Container images lift the 250 MB Lambda limit — essential for ML workloads that bundle native libraries like FAISS.

SQS for resilience. Decoupling the fast metadata step from the slow embedding step via a queue means uploads are always instant, and failed embedding jobs can be retried automatically without re-triggering the S3 event.

What I Would Improve

🔐 The S3 bucket is publicly accessible. The lab disables all public access blocks for simplicity. In production, use pre-signed URLs or IAM-scoped access — never expose a document bucket publicly.

🧩 GenerateResponse returns a raw chunk, not a generated answer. A complete RAG pipeline passes the retrieved chunks as context to a generative model (Claude, Llama, etc.) which synthesizes a human-readable answer. Right now this is just retrieval, not generation.

📊 No observability. There's no CloudWatch dashboard or alerting for failed embeddings. A dead-letter queue on SQS plus a Lambda error alarm would be the minimum production requirement.

🗂️ FAISS index as flat S3 files doesn't scale. For multi-user or multi-document scenarios, a managed vector store like Amazon OpenSearch Serverless (vector engine) or Pinecone would be far more robust.

🧹 No control over chunking strategy. VectorstoreIndexCreator uses sensible defaults, but for domain-specific documents (legal, medical, financial) you'd want to tune chunk size and overlap for better retrieval precision..

Final Thoughts

This lab gave me a solid mental model for RAG on AWS. The event-driven pattern — S3 → Lambda → SQS → Lambda — is something I'll reuse across many projects. The combination of LangChain's abstractions and Bedrock's managed models means you can go from PDF to queryable knowledge base in under 200 lines of Python.

The biggest lesson: RAG is not magic — it's structured retrieval. Answer quality depends far more on chunking strategy and embedding quality than on which model you pick. Get the retrieval right first; the generation almost takes care of itself.

If you found this useful, drop a reaction and share it with your team. Building in public is how we all get better. 🚀

Repo Link - https://github.com/suryansh639/bedrock-genai-aws.git

Tags: AWS ServerlessComputing GenerativeAI LangChain RAG AmazonBedrock MachineLearning Python CloudComputing AWSCommunity

Serverless RAG on AWS: Step-by-Step PDF Embedding with Lambda, Bedrock, FAISS & LangChain

The Problem This Solves

Architecture Overview

AWS Services Used

Step-by-Step Walkthrough

Step 1 — Configure the SAM CLI

Step 2 — Define Infrastructure in template.yaml

Step 3 — The ExtractMetadata Function

Step 4 — The GenerateEmbeddings Function (the heavy lifter)

Step 5 — The GenerateResponse Function

Step 6 — Deploy

Step 7 — Test It

Key Concepts I Internalized

What I Would Improve

Final Thoughts

Comments

More from this blog

I Deployed a CrewAI Vacation Planner on Amazon Bedrock AgentCore — Here's What Actually Happened

Shift Left or Get Left Behind: Embedding Security Testing into Your AWS CI/CD Pipeline with Gauntlt

From Docker to Deployment: Hosting a Node.js App on AWS ECS & AWS Fargate

Build a Complete CI/CD Pipeline Using Jenkins on AWS

Command Palette

The Problem This Solves

Architecture Overview

AWS Services Used

Step-by-Step Walkthrough

Step 1 — Configure the SAM CLI

Step 2 — Define Infrastructure in template.yaml

Step 3 — The ExtractMetadata Function

Step 4 — The GenerateEmbeddings Function (the heavy lifter)

Step 5 — The GenerateResponse Function

Step 6 — Deploy

Step 7 — Test It

Key Concepts I Internalized

What I Would Improve

Final Thoughts

Comments

More from this blog