Skip to main content

Command Palette

Search for a command to run...

Serverless RAG on AWS: Step-by-Step PDF Embedding with Lambda, Bedrock, FAISS & LangChain

Updated
9 min read
Serverless RAG on AWS: Step-by-Step PDF Embedding with Lambda, Bedrock, FAISS & LangChain
S

3× AWS Certified & GitHub Certified Professional | DevOps & Cloud Engineer passionate about building scalable, secure, and automated cloud solutions. I share insights on AWS, CI/CD, Kubernetes, and cloud-native development — helping developers and engineers level up their cloud journey

A step-by-step walkthrough of building an automated PDF embedding application using Lambda, Bedrock, FAISS, and LangChain — with honest feedback and what I'd improve next time.


The Problem This Solves

If you've ever wanted to ask a document a question — not keyword search it, but genuinely query its meaning — you need a Retrieval-Augmented Generation (RAG) pipeline. This lab taught me how to build one from scratch on AWS using fully serverless, event-driven components.

Instead of fine-tuning a model, RAG lets you store your documents as vector embeddings and retrieve the semantically relevant chunks at query time. The AI never "learns" your document — it retrieves and reasons on the fly.

💡 "RAG is the practical alternative to fine-tuning — cheaper, faster to update, and far more flexible."


Architecture Overview

The application is fully event-driven and uses three Lambda functions, each with a single responsibility:

PDF Upload → S3 → ExtractMetadata Lambda → SQS Queue → GenerateEmbeddings Lambda → FAISS Index → S3
                                                                                          ↓
                                                                               GenerateResponse Lambda ← User Query
Function Trigger Responsibility Config
ExtractMetadata S3 ObjectCreated (.pdf) Reads page count, stores metadata in DynamoDB, pushes job to SQS Default
GenerateEmbeddings SQS message (batch=1) Downloads PDF, creates FAISS index via Bedrock, uploads index files back to S3 1024 MB · 180s
GenerateResponse Manual / API invocation Loads FAISS index, runs similarity search, returns the most relevant chunk 30s

AWS Services Used

  • Amazon S3 — stores PDFs and FAISS index files

  • Amazon SQS — decouples metadata extraction from the heavy embedding job

  • Amazon DynamoDB — tracks document metadata and processing status

  • Amazon Bedrock — provides the amazon.titan-embed-text-v1 embedding model

  • AWS Lambda — serverless compute for all three functions

  • Amazon ECR — hosts container images for Lambda (bypasses the 250 MB zip limit)

  • AWS SAM — infrastructure-as-code and deployment tooling


Step-by-Step Walkthrough

Step 1 — Configure the SAM CLI

The samconfig.toml file stores your deployment parameters so you don't repeat CLI flags on every deploy:

version=0.1
[default.global.parameters]
stack_name = "EmbeddingsStack"

[default.deploy.parameters]
region = "us-west-2"
s3_bucket = "deploy-assets-tkngbx"
s3_prefix = "ca-lab"
confirm_changeset = true
capabilities = "CAPABILITY_IAM"
tags = "project=\"ca-labs\" stage=\"development\""

Step 2 — Define Infrastructure in template.yaml

The SAM template wires everything together. The core infrastructure resources are:

AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31

Globals:
  Function:
    Architectures:
      - x86_64
    Environment:
      Variables:
        LOG_LEVEL: INFO

Resources:
  DocumentBucket:
    Type: "AWS::S3::Bucket"
    Properties:
      BucketName: !Sub "documents-${AWS::AccountId}"

  EmbeddingQueue:
    Type: AWS::SQS::Queue
    Properties:
      VisibilityTimeout: 180
      MessageRetentionPeriod: 3600

  DocumentTable:
    Type: AWS::DynamoDB::Table
    Properties:
      KeySchema:
        - AttributeName: document_id
          KeyType: HASH
        - AttributeName: created
          KeyType: RANGE
      BillingMode: PAY_PER_REQUEST

Why container images instead of zip packages? Lambda functions here use ECR container images. This sidesteps the 250 MB zip limit — critical when your dependencies include PyPDF2, LangChain, and FAISS which together are quite large.

The three Lambda functions are added to this same template:

  ExtractMetadata:
    Type: AWS::Serverless::Function
    Properties:
      FunctionName: ExtractMetadata
      ImageUri: !Sub "\({AWS::AccountId}.dkr.ecr.\){AWS::Region}.amazonaws.com/lambdas:extract-metadata-latest"
      Role: !Sub "arn:aws:iam::${AWS::AccountId}:role/LambdaExecutionRole"
      PackageType: Image
      Environment:
        Variables:
          DOCUMENT_TABLE: !Ref DocumentTable
          QUEUE: !GetAtt EmbeddingQueue.QueueName
          BUCKET: !Sub "documents-${AWS::AccountId}"
      Events:
        S3Event:
          Type: S3
          Properties:
            Bucket: !Ref DocumentBucket
            Events:
              - s3:ObjectCreated:*
            Filter:
              S3Key:
                Rules:
                  - Name: suffix
                    Value: .pdf

  GenerateEmbeddings:
    Type: AWS::Serverless::Function
    Properties:
      FunctionName: GenerateEmbeddings
      ImageUri: !Sub "\({AWS::AccountId}.dkr.ecr.\){AWS::Region}.amazonaws.com/lambdas:generate-embeddings-latest"
      Role: !Sub "arn:aws:iam::${AWS::AccountId}:role/LambdaExecutionRole"
      PackageType: Image
      Timeout: 180
      MemorySize: 1024
      Environment:
        Variables:
          DOCUMENT_TABLE: !Ref DocumentTable
          BUCKET: !Ref DocumentBucket
      Events:
        EmbeddingQueueEvent:
          Type: SQS
          Properties:
            Queue: !GetAtt EmbeddingQueue.Arn
            BatchSize: 1

  GenerateResponse:
    Type: AWS::Serverless::Function
    Properties:
      FunctionName: GenerateResponse
      Timeout: 30
      ImageUri: !Sub "\({AWS::AccountId}.dkr.ecr.\){AWS::Region}.amazonaws.com/lambdas:generate-response-latest"
      Role: !Sub "arn:aws:iam::${AWS::AccountId}:role/LambdaExecutionRole"
      PackageType: Image
      Environment:
        Variables:
          BUCKET: !Ref DocumentBucket

Step 3 — The ExtractMetadata Function

Triggered automatically when any .pdf lands in S3. It downloads the file, reads the page count, writes a DynamoDB record, and dispatches a job to SQS.

def lambda_handler(event, context):
    key = urllib.parse.unquote_plus(event["Records"][0]["s3"]["object"]["key"])
    file_name = key.split("/")[0]
    document_id = shortuuid.uuid()

    s3.download_file(BUCKET, key, f"/tmp/{file_name}")

    with open(f"/tmp/{file_name}", "rb") as f:
        reader = PyPDF2.PdfReader(f)
        pages = str(len(reader.pages))

    document = {
        "document_id": document_id,
        "filename": file_name,
        "created": timestamp_str,
        "pages": pages,
        "filesize": str(event["Records"][0]["s3"]["object"]["size"]),
        "docstatus": "UPLOADED"
    }

    document_table.put_item(Item=document)
    sqs.send_message(QueueUrl=QUEUE,      MessageBody=json.dumps(message))

What's happening here:

  • The S3 event delivers the object key — we parse out the file name

  • shortuuid generates a unique ID per document

  • PyPDF2 reads the file to count pages (no heavy ML yet)

  • DynamoDB records the document with status UPLOADED

  • SQS receives a lightweight message containing just the document ID and key


Step 4 — The GenerateEmbeddings Function (the heavy lifter)

This function is triggered by SQS messages. It does the actual ML work — loading the PDF with LangChain, calling Bedrock to embed text chunks, and saving a FAISS index.

from langchain_community.embeddings import BedrockEmbeddings
from langchain_community.document_loaders import PyPDFLoader
from langchain.indexes import VectorstoreIndexCreator
from langchain_community.vectorstores import FAISS

def lambda_handler(event, context):
    # Parse SQS message
    event_body = json.loads(event["Records"][0]["body"])
    document_id = event_body["documentid"]
    key = event_body["key"]

    set_doc_status(document_id, created, "PROCESSING")

    # Download and load PDF
    s3.download_file(BUCKET, key, f"/tmp/{file_name_full}")
    loader = PyPDFLoader(f"/tmp/{file_name_full}")

    # Set up Bedrock embeddings
    bedrock_runtime = boto3.client(service_name="bedrock-runtime", region_name="us-west-2")
    embeddings = BedrockEmbeddings(
        model_id="amazon.titan-embed-text-v1",
        client=bedrock_runtime,
    )

    # Create and save FAISS index
    index_creator = VectorstoreIndexCreator(vectorstore_cls=FAISS, embedding=embeddings)
    index_from_loader = index_creator.from_loaders([loader])
    index_from_loader.vectorstore.save_local("/tmp")

    # Upload index files to S3
    s3.upload_file("/tmp/index.faiss", BUCKET, f"{file_name_full}/index.faiss")
    s3.upload_file("/tmp/index.pkl",   BUCKET, f"{file_name_full}/index.pkl")

    set_doc_status(document_id, created, "READY")

LangChain components used:

Component Role
PyPDFLoader Parses the PDF and extracts text per page
BedrockEmbeddings Calls Bedrock Titan to convert text chunks into vectors
VectorstoreIndexCreator Orchestrates chunking + embedding + indexing
FAISS Stores vectors locally for fast similarity search

Step 5 — The GenerateResponse Function

This one is invoked on demand. It loads the pre-built FAISS index from S3, runs a similarity search against your query, and returns the most relevant text chunk.

def lambda_handler(event, context):
    file_name    = event["file_name"]
    human_input  = event["prompt"]

    # Pull FAISS index from S3
    s3.download_file(BUCKET, f"{file_name}/index.faiss", "/tmp/index.faiss")
    s3.download_file(BUCKET, f"{file_name}/index.pkl",   "/tmp/index.pkl")

    # Load the index
    embeddings  = BedrockEmbeddings(model_id="amazon.titan-embed-text-v1", client=bedrock_runtime)
    faiss_index = FAISS.load_local("/tmp", embeddings, allow_dangerous_deserialization=True)

    # Retrieve top matching chunk
    retriever = faiss_index.as_retriever()
    docs      = retriever.invoke(human_input)

    return {
        "statusCode": 200,
        "body": json.dumps(docs[0].page_content),
    }

Step 6 — Deploy

sam deploy --resolve-image-repos

The --resolve-image-repos flag tells SAM to pull Lambda container images directly from ECR, skipping the local packaging step. Confirm the changeset when prompted. Deployment completes in ~3 minutes.


Step 7 — Test It

Upload waf-intro.pdf (an AWS Well-Architected Framework page) to the S3 bucket. After a few moments, a new folder appears in the bucket containing index.faiss and index.pkl — the pipeline ran automatically.

Then invoke GenerateResponse from the Lambda console Test tab with:

{
  "file_name": "waf-intro.pdf",
  "prompt": "what is the AWS WA Tool?"
}

Result: The function returned the most semantically similar paragraph from the document — the document was split into 3 chunks, and the correct section about the AWS Well-Architected Tool was retrieved instantly. ✅


Key Concepts I Internalized

Embeddings are coordinates. Each text chunk becomes a point in high-dimensional vector space. Similarity search finds the nearest neighbour to your query vector — no keyword matching involved.

FAISS is fast. Facebook AI's library does approximate nearest-neighbour search efficiently. For Lambda's ephemeral environment where compute time is money, this matters.

LangChain as glue. The abstractions (loaders, embeddings, retrievers) dramatically cut down the boilerplate needed to wire Bedrock and FAISS together. Without it you'd be writing a lot of repetitive plumbing code.

ECR for fat dependencies. Container images lift the 250 MB Lambda limit — essential for ML workloads that bundle native libraries like FAISS.

SQS for resilience. Decoupling the fast metadata step from the slow embedding step via a queue means uploads are always instant, and failed embedding jobs can be retried automatically without re-triggering the S3 event.


What I Would Improve

🔐 The S3 bucket is publicly accessible. The lab disables all public access blocks for simplicity. In production, use pre-signed URLs or IAM-scoped access — never expose a document bucket publicly.

🧩 GenerateResponse returns a raw chunk, not a generated answer. A complete RAG pipeline passes the retrieved chunks as context to a generative model (Claude, Llama, etc.) which synthesizes a human-readable answer. Right now this is just retrieval, not generation.

📊 No observability. There's no CloudWatch dashboard or alerting for failed embeddings. A dead-letter queue on SQS plus a Lambda error alarm would be the minimum production requirement.

🗂️ FAISS index as flat S3 files doesn't scale. For multi-user or multi-document scenarios, a managed vector store like Amazon OpenSearch Serverless (vector engine) or Pinecone would be far more robust.

🧹 No control over chunking strategy. VectorstoreIndexCreator uses sensible defaults, but for domain-specific documents (legal, medical, financial) you'd want to tune chunk size and overlap for better retrieval precision..

Final Thoughts

This lab gave me a solid mental model for RAG on AWS. The event-driven pattern — S3 → Lambda → SQS → Lambda — is something I'll reuse across many projects. The combination of LangChain's abstractions and Bedrock's managed models means you can go from PDF to queryable knowledge base in under 200 lines of Python.

The biggest lesson: RAG is not magic — it's structured retrieval. Answer quality depends far more on chunking strategy and embedding quality than on which model you pick. Get the retrieval right first; the generation almost takes care of itself.


If you found this useful, drop a reaction and share it with your team. Building in public is how we all get better. 🚀

Repo Link - https://github.com/suryansh639/bedrock-genai-aws.git


Tags: AWS ServerlessComputing GenerativeAI LangChain RAG AmazonBedrock MachineLearning Python CloudComputing AWSCommunity