Welcome to WuJiGu Developer Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.5k views
in Technique[技术] by (71.8m points)

amazon web services - Technically what is the difference between s3n, s3a and s3?

I'm aware of the existence of https://wiki.apache.org/hadoop/AmazonS3 and the following words:

S3 Native FileSystem (URI scheme: s3n) A native filesystem for reading and writing regular files on S3. The advantage of this filesystem is that you can access files on S3 that were written with other tools. Conversely, other tools can access files written using Hadoop. The disadvantage is the 5GB limit on file size imposed by S3.

S3A (URI scheme: s3a) A successor to the S3 Native, s3n fs, the S3a: system uses Amazon's libraries to interact with S3. This allows S3a to support larger files (no more 5GB limit), higher performance operations and more. The filesystem is intended to be a replacement for/successor to S3 Native: all objects accessible from s3n:// URLs should also be accessible from s3a simply by replacing the URL schema.

S3 Block FileSystem (URI scheme: s3) A block-based filesystem backed by S3. Files are stored as blocks, just like they are in HDFS. This permits efficient implementation of renames. This filesystem requires you to dedicate a bucket for the filesystem - you should not use an existing bucket containing files, or write other files to the same bucket. The files stored by this filesystem can be larger than 5GB, but they are not interoperable with other S3 tools.

Why a letter change on the URI could make such difference? For example

val data = sc.textFile("s3n://bucket-name/key")

to

val data = sc.textFile("s3a://bucket-name/key")

What is the technical difference underlying this change? Are there any good articles that I can read on this?

Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

The letter change on the URI scheme makes a big difference because it causes different software to be used to interface to S3. Somewhat like the difference between http and https - it's only a one-letter change, but it triggers a big difference in behavior.

The difference between s3 and s3n/s3a is that s3 is a block-based overlay on top of Amazon S3, while s3n/s3a are not (they are object-based).

The difference between s3n and s3a is that s3n supports objects up to 5GB in size, while s3a supports objects up to 5TB and has higher performance (both are because it uses multi-part upload). s3a is the successor to s3n.

If you're here because you want to understand which S3 file system you should use with Amazon EMR, then read this article from Amazon (only available on wayback machine). The net is: use s3:// because s3:// and s3n:// are functionally interchangeable in the context of EMR, while s3a:// is not compatible with EMR.

For additional advice, read Work with Storage and File Systems.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to WuJiGu Developer Q&A Community for programmer and developer-Open, Learning and Share
...