Serverless RAG on AWS: Step-by-Step PDF Embedding with Lambda, Bedrock, FAISS & LangChain

3× AWS Certified & GitHub Certified Professional | DevOps & Cloud Engineer passionate about building scalable, secure, and automated cloud solutions. I share insights on AWS, CI/CD, Kubernetes, and cloud-native development — helping developers and engineers level up their cloud journey
A step-by-step walkthrough of building an automated PDF embedding application using Lambda, Bedrock, FAISS, and LangChain — with honest feedback and what I'd improve next time.
The Problem This Solves
If you've ever wanted to ask a document a question — not keyword search it, but genuinely query its meaning — you need a Retrieval-Augmented Generation (RAG) pipeline. This lab taught me how to build one from scratch on AWS using fully serverless, event-driven components.
Instead of fine-tuning a model, RAG lets you store your documents as vector embeddings and retrieve the semantically relevant chunks at query time. The AI never "learns" your document — it retrieves and reasons on the fly.
💡 "RAG is the practical alternative to fine-tuning — cheaper, faster to update, and far more flexible."
Architecture Overview
The application is fully event-driven and uses three Lambda functions, each with a single responsibility:
PDF Upload → S3 → ExtractMetadata Lambda → SQS Queue → GenerateEmbeddings Lambda → FAISS Index → S3
↓
GenerateResponse Lambda ← User Query
| Function | Trigger | Responsibility | Config |
|---|---|---|---|
ExtractMetadata |
S3 ObjectCreated (.pdf) |
Reads page count, stores metadata in DynamoDB, pushes job to SQS | Default |
GenerateEmbeddings |
SQS message (batch=1) | Downloads PDF, creates FAISS index via Bedrock, uploads index files back to S3 | 1024 MB · 180s |
GenerateResponse |
Manual / API invocation | Loads FAISS index, runs similarity search, returns the most relevant chunk | 30s |
AWS Services Used
Amazon S3 — stores PDFs and FAISS index files
Amazon SQS — decouples metadata extraction from the heavy embedding job
Amazon DynamoDB — tracks document metadata and processing status
Amazon Bedrock — provides the
amazon.titan-embed-text-v1embedding modelAWS Lambda — serverless compute for all three functions
Amazon ECR — hosts container images for Lambda (bypasses the 250 MB zip limit)
AWS SAM — infrastructure-as-code and deployment tooling
Step-by-Step Walkthrough
Step 1 — Configure the SAM CLI
The samconfig.toml file stores your deployment parameters so you don't repeat CLI flags on every deploy:
version=0.1
[default.global.parameters]
stack_name = "EmbeddingsStack"
[default.deploy.parameters]
region = "us-west-2"
s3_bucket = "deploy-assets-tkngbx"
s3_prefix = "ca-lab"
confirm_changeset = true
capabilities = "CAPABILITY_IAM"
tags = "project=\"ca-labs\" stage=\"development\""
Step 2 — Define Infrastructure in template.yaml
The SAM template wires everything together. The core infrastructure resources are:
AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Globals:
Function:
Architectures:
- x86_64
Environment:
Variables:
LOG_LEVEL: INFO
Resources:
DocumentBucket:
Type: "AWS::S3::Bucket"
Properties:
BucketName: !Sub "documents-${AWS::AccountId}"
EmbeddingQueue:
Type: AWS::SQS::Queue
Properties:
VisibilityTimeout: 180
MessageRetentionPeriod: 3600
DocumentTable:
Type: AWS::DynamoDB::Table
Properties:
KeySchema:
- AttributeName: document_id
KeyType: HASH
- AttributeName: created
KeyType: RANGE
BillingMode: PAY_PER_REQUEST
Why container images instead of zip packages? Lambda functions here use ECR container images. This sidesteps the 250 MB zip limit — critical when your dependencies include PyPDF2, LangChain, and FAISS which together are quite large.
The three Lambda functions are added to this same template:
ExtractMetadata:
Type: AWS::Serverless::Function
Properties:
FunctionName: ExtractMetadata
ImageUri: !Sub "\({AWS::AccountId}.dkr.ecr.\){AWS::Region}.amazonaws.com/lambdas:extract-metadata-latest"
Role: !Sub "arn:aws:iam::${AWS::AccountId}:role/LambdaExecutionRole"
PackageType: Image
Environment:
Variables:
DOCUMENT_TABLE: !Ref DocumentTable
QUEUE: !GetAtt EmbeddingQueue.QueueName
BUCKET: !Sub "documents-${AWS::AccountId}"
Events:
S3Event:
Type: S3
Properties:
Bucket: !Ref DocumentBucket
Events:
- s3:ObjectCreated:*
Filter:
S3Key:
Rules:
- Name: suffix
Value: .pdf
GenerateEmbeddings:
Type: AWS::Serverless::Function
Properties:
FunctionName: GenerateEmbeddings
ImageUri: !Sub "\({AWS::AccountId}.dkr.ecr.\){AWS::Region}.amazonaws.com/lambdas:generate-embeddings-latest"
Role: !Sub "arn:aws:iam::${AWS::AccountId}:role/LambdaExecutionRole"
PackageType: Image
Timeout: 180
MemorySize: 1024
Environment:
Variables:
DOCUMENT_TABLE: !Ref DocumentTable
BUCKET: !Ref DocumentBucket
Events:
EmbeddingQueueEvent:
Type: SQS
Properties:
Queue: !GetAtt EmbeddingQueue.Arn
BatchSize: 1
GenerateResponse:
Type: AWS::Serverless::Function
Properties:
FunctionName: GenerateResponse
Timeout: 30
ImageUri: !Sub "\({AWS::AccountId}.dkr.ecr.\){AWS::Region}.amazonaws.com/lambdas:generate-response-latest"
Role: !Sub "arn:aws:iam::${AWS::AccountId}:role/LambdaExecutionRole"
PackageType: Image
Environment:
Variables:
BUCKET: !Ref DocumentBucket
Step 3 — The ExtractMetadata Function
Triggered automatically when any .pdf lands in S3. It downloads the file, reads the page count, writes a DynamoDB record, and dispatches a job to SQS.
def lambda_handler(event, context):
key = urllib.parse.unquote_plus(event["Records"][0]["s3"]["object"]["key"])
file_name = key.split("/")[0]
document_id = shortuuid.uuid()
s3.download_file(BUCKET, key, f"/tmp/{file_name}")
with open(f"/tmp/{file_name}", "rb") as f:
reader = PyPDF2.PdfReader(f)
pages = str(len(reader.pages))
document = {
"document_id": document_id,
"filename": file_name,
"created": timestamp_str,
"pages": pages,
"filesize": str(event["Records"][0]["s3"]["object"]["size"]),
"docstatus": "UPLOADED"
}
document_table.put_item(Item=document)
sqs.send_message(QueueUrl=QUEUE, MessageBody=json.dumps(message))
What's happening here:
The S3 event delivers the object key — we parse out the file name
shortuuidgenerates a unique ID per documentPyPDF2reads the file to count pages (no heavy ML yet)DynamoDB records the document with status
UPLOADEDSQS receives a lightweight message containing just the document ID and key
Step 4 — The GenerateEmbeddings Function (the heavy lifter)
This function is triggered by SQS messages. It does the actual ML work — loading the PDF with LangChain, calling Bedrock to embed text chunks, and saving a FAISS index.
from langchain_community.embeddings import BedrockEmbeddings
from langchain_community.document_loaders import PyPDFLoader
from langchain.indexes import VectorstoreIndexCreator
from langchain_community.vectorstores import FAISS
def lambda_handler(event, context):
# Parse SQS message
event_body = json.loads(event["Records"][0]["body"])
document_id = event_body["documentid"]
key = event_body["key"]
set_doc_status(document_id, created, "PROCESSING")
# Download and load PDF
s3.download_file(BUCKET, key, f"/tmp/{file_name_full}")
loader = PyPDFLoader(f"/tmp/{file_name_full}")
# Set up Bedrock embeddings
bedrock_runtime = boto3.client(service_name="bedrock-runtime", region_name="us-west-2")
embeddings = BedrockEmbeddings(
model_id="amazon.titan-embed-text-v1",
client=bedrock_runtime,
)
# Create and save FAISS index
index_creator = VectorstoreIndexCreator(vectorstore_cls=FAISS, embedding=embeddings)
index_from_loader = index_creator.from_loaders([loader])
index_from_loader.vectorstore.save_local("/tmp")
# Upload index files to S3
s3.upload_file("/tmp/index.faiss", BUCKET, f"{file_name_full}/index.faiss")
s3.upload_file("/tmp/index.pkl", BUCKET, f"{file_name_full}/index.pkl")
set_doc_status(document_id, created, "READY")
LangChain components used:
| Component | Role |
|---|---|
PyPDFLoader |
Parses the PDF and extracts text per page |
BedrockEmbeddings |
Calls Bedrock Titan to convert text chunks into vectors |
VectorstoreIndexCreator |
Orchestrates chunking + embedding + indexing |
FAISS |
Stores vectors locally for fast similarity search |
Step 5 — The GenerateResponse Function
This one is invoked on demand. It loads the pre-built FAISS index from S3, runs a similarity search against your query, and returns the most relevant text chunk.
def lambda_handler(event, context):
file_name = event["file_name"]
human_input = event["prompt"]
# Pull FAISS index from S3
s3.download_file(BUCKET, f"{file_name}/index.faiss", "/tmp/index.faiss")
s3.download_file(BUCKET, f"{file_name}/index.pkl", "/tmp/index.pkl")
# Load the index
embeddings = BedrockEmbeddings(model_id="amazon.titan-embed-text-v1", client=bedrock_runtime)
faiss_index = FAISS.load_local("/tmp", embeddings, allow_dangerous_deserialization=True)
# Retrieve top matching chunk
retriever = faiss_index.as_retriever()
docs = retriever.invoke(human_input)
return {
"statusCode": 200,
"body": json.dumps(docs[0].page_content),
}
Step 6 — Deploy
sam deploy --resolve-image-repos
The --resolve-image-repos flag tells SAM to pull Lambda container images directly from ECR, skipping the local packaging step. Confirm the changeset when prompted. Deployment completes in ~3 minutes.
Step 7 — Test It
Upload waf-intro.pdf (an AWS Well-Architected Framework page) to the S3 bucket. After a few moments, a new folder appears in the bucket containing index.faiss and index.pkl — the pipeline ran automatically.
Then invoke GenerateResponse from the Lambda console Test tab with:
{
"file_name": "waf-intro.pdf",
"prompt": "what is the AWS WA Tool?"
}
Result: The function returned the most semantically similar paragraph from the document — the document was split into 3 chunks, and the correct section about the AWS Well-Architected Tool was retrieved instantly. ✅
Key Concepts I Internalized
Embeddings are coordinates. Each text chunk becomes a point in high-dimensional vector space. Similarity search finds the nearest neighbour to your query vector — no keyword matching involved.
FAISS is fast. Facebook AI's library does approximate nearest-neighbour search efficiently. For Lambda's ephemeral environment where compute time is money, this matters.
LangChain as glue. The abstractions (loaders, embeddings, retrievers) dramatically cut down the boilerplate needed to wire Bedrock and FAISS together. Without it you'd be writing a lot of repetitive plumbing code.
ECR for fat dependencies. Container images lift the 250 MB Lambda limit — essential for ML workloads that bundle native libraries like FAISS.
SQS for resilience. Decoupling the fast metadata step from the slow embedding step via a queue means uploads are always instant, and failed embedding jobs can be retried automatically without re-triggering the S3 event.
What I Would Improve
🔐 The S3 bucket is publicly accessible. The lab disables all public access blocks for simplicity. In production, use pre-signed URLs or IAM-scoped access — never expose a document bucket publicly.
🧩 GenerateResponse returns a raw chunk, not a generated answer. A complete RAG pipeline passes the retrieved chunks as context to a generative model (Claude, Llama, etc.) which synthesizes a human-readable answer. Right now this is just retrieval, not generation.
📊 No observability. There's no CloudWatch dashboard or alerting for failed embeddings. A dead-letter queue on SQS plus a Lambda error alarm would be the minimum production requirement.
🗂️ FAISS index as flat S3 files doesn't scale. For multi-user or multi-document scenarios, a managed vector store like Amazon OpenSearch Serverless (vector engine) or Pinecone would be far more robust.
🧹 No control over chunking strategy. VectorstoreIndexCreator uses sensible defaults, but for domain-specific documents (legal, medical, financial) you'd want to tune chunk size and overlap for better retrieval precision..
Final Thoughts
This lab gave me a solid mental model for RAG on AWS. The event-driven pattern — S3 → Lambda → SQS → Lambda — is something I'll reuse across many projects. The combination of LangChain's abstractions and Bedrock's managed models means you can go from PDF to queryable knowledge base in under 200 lines of Python.
The biggest lesson: RAG is not magic — it's structured retrieval. Answer quality depends far more on chunking strategy and embedding quality than on which model you pick. Get the retrieval right first; the generation almost takes care of itself.
If you found this useful, drop a reaction and share it with your team. Building in public is how we all get better. 🚀
Repo Link - https://github.com/suryansh639/bedrock-genai-aws.git
Tags: AWS ServerlessComputing GenerativeAI LangChain RAG AmazonBedrock MachineLearning Python CloudComputing AWSCommunity


