How to change the metadata of the s3 objects on the fly using Python Lambda
I gone through a problem where we uploaded set of PDF files (Old) in the S3 bucket and after uploading we got to know that some the very old pdf files have the Content Type “application/ocate-stream” rather than “application/pdf”.
Although, there is no such issues while opening this file after downloading from S3. The Problem was that, when I attach this pdf path as a link in the web page, then when we click on this link, it start downloading instead of displaying it on browser, but our requirement was to display all pdf files in the browser and not to download it automatically.
After some research, we got to know that browser only display the pdf file which have Content Type “Application/pdf” . So now, what we feel is we need to update the content type of all the pdf files to “application/pdf” if pdf file has content type other than it.
I wrote this Python Lambda to do so.
import boto3
s3_resource = boto3.resource("s3")
s3_client = boto3.client('s3')
def update_metadata_for_object(obj, etag, content_type, BUCKET_NAME):
s3_client.copy_object(Key=obj.key, Bucket=BUCKET_NAME,
CopySource={"Bucket": BUCKET_NAME, "Key": obj.key},
ContentType=content_type,
CopySourceIfMatch=etag,
MetadataDirective="REPLACE")
return
def fetch_metadata(obj,BUCKET_NAME):
metadata = s3_client.head_object(Bucket=BUCKET_NAME, Key=obj.key)
# print(metadata)
return metadata
def handler(event, context):
print("Initiating PDF metadata update ...")
BUCKET_NAME = event['BUCKET_NAME'] # Need to set
BUCKET_KEY = event['BUCKET_KEY'] # Need to set
bucket = s3_resource.Bucket(name=BUCKET_NAME)
destination_content_type = "application/pdf"
# Scan files.
for i, obj in enumerate(bucket.objects.filter(Prefix=BUCKET_KEY)):
# Check only files with PDF extension.
if obj.key[-4:] == ".pdf":
# Fetch file metadata.
metadata = fetch_metadata(obj,BUCKET_NAME)
# Get file ContentType.
source_content_type = metadata["ContentType"]
# Check if ContentType is correct (application/pdf).
if source_content_type != destination_content_type:
print("{} file has wrong ContentType: {}".format(obj.key, source_content_type))
etag = metadata["ETag"]
target_object = s3_resource.Object(BUCKET_NAME, obj.key)
print("Updating PDF file {} -- ETag: {}".format(obj.key, etag))
update_metadata_for_object(target_object, etag, destination_content_type , BUCKET_NAME)
Above function will iterate the pdf files and check the metadata of the each file and if the content Type of the file is not “application/pdf”, it will update the Content Type of the file without downloading it on the disk and re-upload it.