How to change the metadata of the s3 objects on the fly using Python Lambda

Vipin Katiyar
2 min readMar 22, 2023

--

I gone through a problem where we uploaded set of PDF files (Old) in the S3 bucket and after uploading we got to know that some the very old pdf files have the Content Type “application/ocate-stream” rather than “application/pdf”.

Although, there is no such issues while opening this file after downloading from S3. The Problem was that, when I attach this pdf path as a link in the web page, then when we click on this link, it start downloading instead of displaying it on browser, but our requirement was to display all pdf files in the browser and not to download it automatically.

After some research, we got to know that browser only display the pdf file which have Content Type “Application/pdf” . So now, what we feel is we need to update the content type of all the pdf files to “application/pdf” if pdf file has content type other than it.

I wrote this Python Lambda to do so.

import boto3
s3_resource = boto3.resource("s3")
s3_client = boto3.client('s3')
def update_metadata_for_object(obj, etag, content_type, BUCKET_NAME):
s3_client.copy_object(Key=obj.key, Bucket=BUCKET_NAME,
CopySource={"Bucket": BUCKET_NAME, "Key": obj.key},
ContentType=content_type,
CopySourceIfMatch=etag,
MetadataDirective="REPLACE")
return
def fetch_metadata(obj,BUCKET_NAME):
metadata = s3_client.head_object(Bucket=BUCKET_NAME, Key=obj.key)
# print(metadata)
return metadata
def handler(event, context):
print("Initiating PDF metadata update ...")
BUCKET_NAME = event['BUCKET_NAME'] # Need to set
BUCKET_KEY = event['BUCKET_KEY'] # Need to set
bucket = s3_resource.Bucket(name=BUCKET_NAME)
destination_content_type = "application/pdf"
# Scan files.
for i, obj in enumerate(bucket.objects.filter(Prefix=BUCKET_KEY)):
# Check only files with PDF extension.
if obj.key[-4:] == ".pdf":
# Fetch file metadata.
metadata = fetch_metadata(obj,BUCKET_NAME)
# Get file ContentType.
source_content_type = metadata["ContentType"]
# Check if ContentType is correct (application/pdf).
if source_content_type != destination_content_type:
print("{} file has wrong ContentType: {}".format(obj.key, source_content_type))
etag = metadata["ETag"]
target_object = s3_resource.Object(BUCKET_NAME, obj.key)
print("Updating PDF file {} -- ETag: {}".format(obj.key, etag))
update_metadata_for_object(target_object, etag, destination_content_type , BUCKET_NAME)

Above function will iterate the pdf files and check the metadata of the each file and if the content Type of the file is not “application/pdf”, it will update the Content Type of the file without downloading it on the disk and re-upload it.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Vipin Katiyar
Vipin Katiyar

No responses yet

Write a response