If your company is using Amazon Web Services (AWS) S3 buckets for data storage, then you’ve probably already encountered times when you want to copy objects from one bucket to another, and don’t want to write scripts to accomplish your tasks.  In our business, it’s a frequent request to move data within S3 for a variety of use cases and environments. In our journey helping customers use S3 more effectively, we saw a few challenges no one else was solving including:

  • Move data between S3 buckets owned by different customers or organizations with different credentials
  • Move data to and from buckets outside of the global S3 namespace, specifically GovCloud (US) and China (Beijing)

One more feature we added is enabling an “always-on” streaming data pipeline from the source S3 bucket to the destination S3 bucket.  This facilitates the automated transfer of “fresh” data from source to destination.  Drop new data into your source, and only that “new” data will be “automatically” replicated in your destination bucket.

Our S3 streaming replication is part of larger capability within our data platform to stream data from any supported B23 Data Platform source to an S3 bucket destination.  This means uses can automatically set up data pipelines from public data providers and stream that data to an S3 bucket of your choice.  From there, ingesting or accessing that data in S3 with our supported data science tools is trivial.

In this post, we present an example of how to configure and enable this capability (no command line needed) in under 30 seconds.

 

 .            Our source S3 bucket:  b23-source                                                            The bucket b23-destination is initially empty.

 

The steps to link these buckets are so quick and easy we can present them in one animated GIF.

The full S3 Mirroring Process on B23 Data Platform in action

 

 

Status update shows our S3 mirroring job is successfully running

 

 

Now b23-destination is a carbon copy of b23-source.

 

 

Going forward, all files added to b23-source will be automatically copied to b23-destination

 

In minutes, B23 Data Platform enables data scientists to discover, access and securely analyze their data in the cloud—with the freedom to use familiar tools like Apache Spark, Apache Zeppelin, R Studio, H20 and Jupyter. Learn how B23 Data Platform can help your organization at www.b23.io or reach out to us at info@b23.io.

About the Author: Andrew Burkard is a Data Scientist with B23 LLC working on the B23 Data Platform. He has a B.S. in Computer Science and Mathematics from Virginia Tech and is currently pursuing an M.S. in Data Science from Georgetown University. When he’s not writing code, he enjoys watching sports and cracking jokes.