S3 Multipart Uploading

S3 Multipart Uploading

Working on top of what I learnt in S3 Uploading I wanted to implement multipart uploading. This is a technology used by cloud storage providers such as Amazon and Digital Ocean where a file or piece of data can be chunked and asynchronously sent to the cloud individually where it is then stitched back together again. Multipart uploading is very useful when you have large files as it allows for data to be uploaded much quicker, for the uploading processes to be paused and even resumed much later in the future and allows for a reasonable amount of errors or faults to happen within the process without needing to restart the whole process again.

The reason for looking more into this is that in most scenarios Stamp is used to record a full theatre production. Recordings may be split between the acts of the play but the video recordings themselves are likely to be in excess of 1 hour. The idea behind this cloud hosting is that straight after a production a timeline recording can be uploaded to the cloud. Productions typically finish around 10:30pm and so I would not want the user waiting hours for the upload to complete before they could go home. The idea is to have the fastest and most robust upload process as possible.

There are a few steps to the process.

  1. Create a S3 object to interact with an S3 Bucket.
    1
     let s3 = S3(accessKeyId: accessKeyId, secretAccessKey: secretAccessKey, sessionToken: sessionToken, region: .euwest2)
    
  2. Create a multipart upload request. The key in AWS S3 parlance is the name of the file, it needs to be unique.
    1
     let multipartRequest = S3.CreateMultipartUploadRequest(bucket: bucketName, key: "filename.mov")
    
  3. Create a multipart upload
    1
    2
    3
     s3.createMultipartUpload(multipartRequest).always { result in
         ...
     }
    
  4. On successfully creating a multipart upload begin the upload
    1
    2
    3
     s3.multipartUpload(multipartRequest, filename: filename, progress: { progress in
         ...
     }
    

The final very rough code looks like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
    private func multipartUpload(with accessKeyId: String, secretAccessKey: String, sessionToken: String, bucketName: String, filename: String) {
        print("Multipart Uploading to \(bucketName)...")
        let s3 = S3(accessKeyId: accessKeyId, secretAccessKey: secretAccessKey, sessionToken: sessionToken, region: .euwest2)
        let uuid = UUID()
        
        //the key in AWS S3 parlance is the name of the file, it needs to be unique
        let multipartRequest = S3.CreateMultipartUploadRequest(bucket: bucketName, key: "\(uuid.uuidString).mov")
        let _ = s3.createMultipartUpload(multipartRequest).always { result in
            switch result {
            case .success(let success):
                print("created Multipart Upload..")
                print(success)
                s3.multipartUpload(multipartRequest, filename: filename, progress: { progress in
                    print(progress)
                }).always { uploadResult in
                    switch uploadResult {
                    case .success(let output):
                        print(output)
                    case .failure(let uploadError):
                        print(uploadError.localizedDescription)
                    }
                }
            case .failure(let error):
                print("Failed to create a multipart upload: \(error.localizedDescription)")
            }
        }
    }

Video

Challenges

I came across a few challenges due to my lack of understanding around the framework I was using. I initially didn’t understand that filename within the s3.multipartUpload() function was a path to a file rather than a name I might want to give to it but I was quickly put straight by the creator Adam Fowler who also kindly worked to implement a feature I requested which was regarding resuming a multipart upload that might have been interrupted e.g. The internet may have been disconnected or the laptop ran out of battery and shutdown.

Conversation with Adam Fowler on Slack:

Sammy Smallman 12:03 PM Hey would anyone be able to talk me through uploading a file through multipart upload. I think I’ve got to the point where I’ve created a multipartUploadRequest and then receive an uploadId. I understand there is

1
2
3
4
5
6
multipartUpload(_ input: CreateMultipartUploadRequest, 
  partSize: Int = 5*1024*1024, 
  filename: String, 
  on eventLoop: EventLoop? = nil, 
  progress: @escaping (Double) throws->() = { _ in }
) -> EventLoopFuture<CompleteMultipartUploadOutput>

But i’m not quite sure how where I’m telling each upload request the chunked data to be uploaded… (edited)

Adam Fowler 12:05 PM Hey @Sammy Smallman that function does it all for you. It creates the multipart upload, and then uploads each part and once finished it completes the multipart upload.

Sammy Smallman 12:05 PM So is filename a path to an actual file?

Adam Fowler 12:05 PM yes

Sammy Smallman 12:05 PM ahh thats where im going wrong, perfect 12:10 and for instance lets say that some but not all parts were uploaded as it was interrupted and I wanted to then upload those last parts at a later time, what would be the best way to just do those last few and then complete the upload. (edited)

Adam Fowler 12:12 PM err you are on your own there. If you want to support that you would need to write your own version of multipartUpload. Sorry. I can look to add support for that in v5 of aws-sdk-swift 12:13 Add an issue

Sammy Smallman 12:16 PM It may not be an issue at the moment. I’m mainly thinking of an application that uploads large video files but there might be instances where the connection is lost or maybe the user would want to upload part of it and then the next morning carry on completing it. But if these files can be uploaded quickly then it may not be a problem at all. I’ll add an issue after some testing. Thanks so much for your help @Adam Fowler. (edited)

Adam Fowler 12:20 PM I think once v5 is out you should get better support for that. I know in v4 we have had issue with failed uploads, having experienced it myself. v5 has support for retrying failed requests, but there isn’t a resume multipart upload yet. As I said add an issue and I’ll include it in the tasks for v5. It is almost ready for release, so you would see it soon.

Sammy Smallman 12:21 PM Cool, will do then! Thanks


Adam Fowler Sep 2nd at 2:52 PM Hi @Sammy Smallman just added resumeMultipartUpload which will resume a multipart upload that previously failed and only upload the parts that haven’t already been uploaded. https://github.com/swift-aws/aws-sdk-swift/pull/358

Sammy Smallman 3 months ago That was fast! I’m just looking at the commit now. Do i understand this right? Multipart upload will now return an error to say its failed but wont abort the upload (If i set abortOnFail to false). If i then use resumeMultipartUpload with the same uploadRequest, or more specifically with an upload request with the same uploadID, the function queries AWS to see which parts have already been uploaded (I think this is what ListPartsRequest is for?) and then will continue to upload what is left. (edited)

Adam Fowler 3 months ago Yes you right

Adam Fowler 3 months ago If you check the PR I added a test which shows it being used.

Sammy Smallman 3 months ago Is it possible to create a MultipartUploadRequest with a specified uploadID, or if i use the same key on the same bucket would it return the same uploadID? I’m thinking about whether i need to persist the multipart request if an upload fails, or would i just get a new request?

Adam Fowler 3 months ago You can create a new MultipartUploadRequest object. The uploadId is separate from the MultipartUploadRequest.

Sammy Smallman 3 months ago Ah I now see the uploadID parameter in the resume function! This looks brilliant, I can’t wait to test it out. What would happen if you attempted to upload a different file to the one previously. I presume I should do those checks myself. Does AWS do anything as well?

Adam Fowler 3 months ago It would upload the sections of the other file. I could probably change the API to avoid this issue

Adam Fowler 3 months ago Not sure what you mean by does AWS do anything? If you mean check for uploading the wrong, I believe it would be nothing. Not sure how they would be able to tell.

Sammy Smallman 3 months ago I wondered if there was some checksum but on reflection in the request you’re just telling AWS a key and which bucket so it doesn’t know anything about the file as a whole until you complete the multipart upload, right? I’m more than happy to do the file checking. This is so good, thanks so much for your work on it @Adam Fowler.

Adam Fowler 3 months ago I think I’ll still do the change. It’ll protect against bad use of the functions and save me hassle.

Adam Fowler 3 months ago https://github.com/swift-aws/aws-sdk-swift/pull/362