AWS Snowball to move TeraBytes of data into AWS
Using AWS Snowball to Move Large (TB) data workloads into an AWS FSX File System
Short answer: Yes — you can use AWS Snowball to move several Terabytes of data into an FSx file system. In most cases the path is Snowball → S3 → FSx, with service-specific nuances described below.
1) When Snowball Makes Sense
AWS Snowball is built for offline, petabyte-scale migrations. It’s ideal when:
- Network bandwidth is limited or expensive
- You need to seed large datasets quickly (weeks of transfer time avoided)
- You want a predictable, shippable transfer workflow
2) FSx Type-Specific Guidance
| FSx Type | Can You Use Snowball? | Typical Method | Notes |
|---|---|---|---|
| FSx for Windows File Server | ✅ Yes (indirect) | Snowball Edge → S3 → FSx | Load to S3 via Snowball, then copy to FSx using AWS DataSync or Robocopy from an EC2/Windows host. |
| FSx for Lustre | ✅ Yes (optimized) | S3-linked FSx for Lustre | Put data in S3 via Snowball, then link/import with data repository tasks or at file system creation. |
| FSx for NetApp ONTAP | ✅ Yes (indirect) | Snowball → S3 → FSx (NFS/SMB copy) | Copy from S3 to FSx using rsync, robocopy, or leverage SnapMirror if you have a source NetApp. |
| FSx for OpenZFS | ⚠️ Partially | Snowball → EC2 staging → FSx (NFS) | Stage from S3 onto EC2, then write to OpenZFS over NFS; consider parallelization for throughput. |
3) Reference Workflow (Windows or ONTAP)
- Order an AWS Snowball Edge device sized for your dataset.
- Copy on-prem data to the Snowball device.
- Ship the device back; AWS ingests into your target S3 bucket.
- Provision the FSx file system (Windows, ONTAP, Lustre, or OpenZFS) in the target VPC.
- Move S3 → FSx using:
- AWS DataSync (supports SMB/NFS/Lustre) for managed, parallel transfer and verification
- Or EC2-hosted tools such as
robocopy,xcopy, orrsync
Example: Event-Driven Auto-Tagging of New EC2 (optional helper for staging hosts)
Use an EventBridge rule on RunInstances to trigger a Lambda that tags staging copy hosts.
import boto3
ec2 = boto3.client("ec2")
def lambda_handler(event, context):
instance_id = event["detail"]["instance-id"]
ec2.create_tags(Resources=[instance_id], Tags=[
{"Key": "Purpose", "Value": "FSx-seed"},
{"Key": "AutoTagged", "Value": "true"}
])
4) Snowball vs. Online Transfer
| Scenario | Recommended Method |
|---|---|
| < 5 TB and ≥ 1 Gbps sustained | Online via AWS DataSync |
| 5–100 TB (one-time or burst) | AWS Snowball Edge |
| > 100 TB or ongoing ingestion | DataSync + Direct Connect or multiple Snowballs |
5) Practical Tips
- Pre-compress/dedupe to reduce bytes shipped.
- Design a consistent directory layout (S3 → FSx mapping is simpler).
- Use DataSync filtering and incremental jobs for cutover deltas.
- Confirm permissions/ACLs (NTFS for Windows, POSIX/NFS for others) after transfer.
- Plan a final delta sync just before production cutover.
Bottom line: For a 50 TB migration, Snowball is a great fit. The common pattern is Snowball → S3 → FSx, with FSx for Lustre offering the most streamlined S3 integration and DataSync providing managed, parallelized copies for Windows, ONTAP, and OpenZFS.
Leave a Reply