S3 Service
S3 Service Overview
S3 service is a versatile solution suitable for a wide range of use cases, including basic data storage, automated backups, and various data handling applications.
Access to the service is managed through virtual organizations and corresponding groups. S3 is ideal for sharing data between individual users and groups, which may include members from different institutions. Tools for managing users and groups are provided by the e-infrastructure.
Users with access to S3 can be individuals or “service accounts” (e.g., backup machines, as many modern backup tools natively support S3 connections). Data in S3 is organized into buckets, which are typically linked to the logical structure of your data workflow, such as different stages of data processing.
For sensitive data, encrypted buckets can be used on the client side, ensuring that even the storage manager does not have access to the data. Client-side encryption also secures data transmission over the network, protecting it from being decrypted in case of eavesdropping.
Basic Terms Definition
- S3 Bucket: A storage container for objects within the Simple Storage Service (S3). Buckets are similar to file folders in object storage.
- Object: The data stored in a bucket, consisting of:
- Content: The data itself.
- Metadata: Information such as size, name, last modified date, and URL.
- Unique Identifier: A unique ID that distinguishes the object.
How to get S3 service?
To connect to S3 service you have to contact support at: support@cesnet.cz
S3 Elementary Use Cases
The following section describes various elementary use cases related to the S3 service.
Automated Backup of Large Datasets Using Tools Natively Supporting S3
If you use specialized automated backup tools like Veeam, Bacula, or Restic, many of these tools offer native integration with the S3 service for backups. This means you don’t need to worry about connecting block devices to your infrastructure. Simply request an S3 storage setup and reconfigure your backup process. You can also combine this with the WORM (Write Once, Read Many) model to protect against unwanted overwriting or ransomware attacks.
Data Sharing Across Your Laboratory or Multiple Institutions
For research groups that need to share data, such as data collection and post-processing, S3 can be a powerful solution. The S3 service enables users to share data within a group or between institutions. This use case assumes each user has their own access to the repository. It is also ideal for sharing sensitive data across organizations, especially if you do not have a secure VPN. You can use encrypted S3 buckets (client-side encryption), which ensures that data is encrypted both at rest and during transmission, protecting it from eavesdropping.
Data Management for Systems like Learning Management Systems, Catalogs, and Repositories
If you manage large datasets and operate applications within an e-infrastructure that serves data to your users, S3 can support this use case effectively. This is particularly useful for applications that distribute large datasets (e.g., raw scans, videos, scientific datasets for computational environments) to end users. With S3, there is no need to upload data to the application server itself. Instead, users can directly upload and download data to/from object storage using S3 presigned URLs.
Personal space for your data
This case is similar to the VO storage service. This is a personal space in the S3 service just for your data, which does not allow sharing with a specific user. Public reading can be set for buckets, or presign URL requests can be used.
Personal Storage Space for Your Data
This use case is similar to the VO (Virtual Organization) storage service. It provides a personal space within S3 for your data, which is not shared with any specific user. You can configure public read access for the bucket or use presigned URLs to allow temporary access.
Dedicated S3 Endpoint for Special Applications
This service is designed for selected customers or users. A dedicated S3 endpoint can be created for critical systems to protect against DDoS (Distributed Denial of Service) attacks. The endpoint will be hidden from external users, and only authorized insiders will have access to it.
Any Other Application
If you need a combination of the services listed above or have an innovative idea for utilizing object storage services, don’t hesitate to reach out to us for further assistance.
S3 Data Reliability (Data Redundancy) - replicated vs erasure coding
The section below describes additional approaches for data redundancy applied to the object storage pool. The S3 service can be configured with replicated or erasure code (EC) redundancy.
Replicated
With replication, your data is stored in three copies within the data center. If one copy is corrupted, the original data remains intact and readable, while the damaged copy is restored in the background. Using replication also allows for faster read speeds since data can be retrieved from all replicas simultaneously. However, the write speed may be slower because the write operation waits for confirmation from all three replicas.
Suitable for?
This method is ideal for smaller volumes of live data where read speed is a priority (though not as well-suited for large data volumes).
Erasure Coding (EC)
Erasure coding (EC) is a data protection technique similar to dynamic RAID found in disk arrays. In EC, data is split into individual fragments and distributed across storage with built-in redundancy. If a disk or an entire storage server fails, the data remains accessible and is automatically restored in the background. This method ensures that your data isn’t stored on a single disk that could fail and result in data loss.
Suitable for?
Erasure coding is well-suited for storing large data volumes and datasets.
Last updated on