Multipart uploads with S3 pre-signed URLs
Breaking an object into pieces, uploading it to S3 in a secure way and constructing it to one piece
Originally published on Altostra
Let’s get on the same page
In my previous post, Working with S3 pre-signed URLs, I showed you how and why I used pre-signed URLs. This time I faced another problem: I had to upload a large file to S3 using pre-signed URLs. To be able to do so I had to use multipart upload, which is basically uploading a single object as a set of parts, with the advantage of parallel uploading. In the following article, I’ll describe how I combined and used multipart upload with pre-signed URLs.
A normal multipart upload looks like this:
-
Initiate a multipart upload
-
Upload the object’s parts
-
Complete multipart upload
For that to work with pre-signed URLs, we should add one more stage: generating pre-signed URLs for each part. The final process looks like this:
-
Initiate a multipart upload
-
Create pre-signed URLs for each part
-
Upload the object’s parts
-
Complete multipart upload
Stages 1,2, and 4 are server-side stages, which require an AWS access key id & secret, where stage number 3 is client-side.
Let’s explore each stage.
Before we begin
As in my previous post, in the examples below, I used the constants BUCKET_NAME
and OBJECT_NAME
to represent my bucket and object - replace these with your own as you see fit.
Also, there are placeholders for the accesskeyId
and secretAccessKey
, you can read about them here.
Stage one — Initiate a multipart upload
At this stage, we request from AWS S3 to initiate multipart upload, in response, we will get the UploadId
which will associate each part to the object they are creating.
import AWS from 'aws-sdk'
import cuid from 'cuid'
async function initiateMultipartUpload() {
const s3 = new AWS.S3({
accessKeyId: /* Bucket owner access key id */,
secretAccessKey: /* Bucket owner secret */,
sessionToken: `session-${cuid()}`
})
const params = {
Bucket: BUCKET_NAME,
Key: OBJECT_NAME
}
const res = await s3.createMultipartUpload(params).promise()
return res.UploadId
}
Stage Two — Create pre-signed URLs for each part
This stage is responsible for creating a pre-signed URL for each part. Each pre-signed URL is being signed with the UploadId
and the PartNumber
, because of that, we have to create a separate pre-signed URL for each part. The pre-signed URLs are being generated with the uploadPart
method, which grants the user access to, obviously, upload a part.
import AWS from 'aws-sdk'
async function generatePresignedUrlsParts(s3: AWS.S3, uploadId: string, parts: number) {
const baseParams = {
Bucket: BUCKET_NAME,
Key: OBJECT_NAME,
UploadId: uploadId
}
const promises = []
for (let partNo = 1; partNo < parts + 1; partNo++) {
promises.push(s3.getSignedUrlPromise('uploadPart', {
...baseParams,
PartNumber: partNo
}))
}
const res = await Promise.all([
...promises
])
const urls: Record<number, string> = {}
for (let partNo = 0; partNo < parts; partNo++) {
urls[partNo + 1] = res[partNo]
}
return urls
}
Stage Three — Upload the object’s parts
At this stage, we will upload each part using the pre-signed URLs that were generated in the previous stage. You can see each part is set to be 10MB in size. S3 Multipart upload doesn’t support parts that are less than 5MB (except for the last one).
After uploading all parts, the etag
of each part that was uploaded needs to be saved. We will use the etag
in the next stage to complete the multipart upload process.
import Axios from 'axios'
interface Part {
ETag: string
PartNumber: number
}
const FILE_CHUNK_SIZE = 10_000_000
async function uploadParts(file: Buffer, urls: Record<number, string>) {
const axios = Axios.create()
delete axios.defaults.headers.put['Content-Type']
const keys = Object.keys(urls)
const promises = []
for (const indexStr of keys) {
const index = parseInt(indexStr) - 1
const start = index * FILE_CHUNK_SIZE
const end = (index + 1) * FILE_CHUNK_SIZE
const blob = index < keys.length
? file.slice(start, end)
: file.slice(start)
promises.push(axios.put(urls[index], blob))
}
const resParts = await Promise.all(promises)
let parts: Part[] = []
let i = 1
for (let part of resParts) {
parts.push({
ETag: (part as any).headers.etag,
PartNumber: i
})
i++
}
return parts
}
Stage Four — Complete multipart upload
The last stage’s job is to inform to S3 that all the parts were uploaded. By giving the completeMultipartUpload
function the etag
of each part, S3 knows how to constructs the object from the uploaded parts.
import AWS from 'aws-sdk'
import cuid from 'cuid'
interface Part {
ETag: string
PartNumber: number
}
async function completeMultiUpload(uploadId: string, parts: Part[]) {
const s3 = new AWS.S3({
accessKeyId: /* Bucket owner access key id */,
secretAccessKey: /* Bucket owner secret */,
sessionToken: `session-${cuid()}`
})
const params = {
Bucket: BUCKET_NAME,
Key: OBJECT_NAME,
UploadId: uploadId,
MultipartUpload: { Parts: parts }
}
await s3.completeMultipartUpload(params).promise()
}
Extra Stage — Avoid extra charges
If at any stage you want to abort the upload, you can do it with the function abortMultipartUpload
(you can read about it here).
Be aware that after you initiate a multipart upload and upload one or more parts, to stop from being charged for storing the uploaded parts, you must either complete or abort the multipart upload. Amazon S3 frees up the space used to store the parts and stop charging you for storing them only after you either complete or abort a multipart upload.
As a best practice, Amazon recommends you configure a lifecycle rule (using the AbortIncompleteMultipartUpload
action) to minimize your storage costs.