Keep HPLC data in S3 (AWS, Google Cloud, Azure, etc)
Chromatography/Mass Spec data can be very heavy, especially spectra. Instead of keeping it all in Postgres, Peaksel allows storing signals to S3-compatible blob storage (AWS, Google Cloud, Azure Object Storage, Hetzner).
To activate this, you have to add/uncoment these to docker-compose.yml
:
s3.blobs.access_key_id: [key id]
s3.blobs.access_key: [key content]
s3.blobs.bucket: [s3 bucket]
s3.blobs.endpoint: [s3 root URL]
s3.blobs.region: [region]
s3.provider: [cloud provider]
By default, Peaksel will first store binaries in Postgres first, and after some time (e.g. a day) it’ll be migrating the data to S3. It’s possible to control:
-
How old the blobs should before they are migrated to S3 with
job.s3.blobs.older_than_seconds
, default is 1 day -
The min size of the blob to be considered for migration`job.s3.blobs.min_size_bytes`, defaults to 64kB
-
The min size after which the object is going to be uploaded directly to S3 bypassing Postgres:
s3.blob.immediate_upload_threshold_bytes
, default is 1GB
Cloud providers
Cloud provider property (s3.cloud.provider
) has the following possible values:
-
aws
- use AWS cloud. With this option, thes3.blobs.endpoint
property will be ignored. -
gcp
- use Google cloud. For using Google Cloud Storage buckets, a default project must be set in interoperability settings. With this option, thes3.blobs.endpoint
property will be ignored. -
hetzner
- use Hetzner cloud. With this option, thes3.blobs.endpoint
property will be ignored. -
azures3
- support for Azure Blob Storage -
other
- use other S3-compatible cloud providers.s3.blobs.endpoint
is mandatory with this option. It must include the complete schema, including regions, buckets and anything else that is present. Those properties must match the values ins3.blobs.bucket
ands3.blobs.region
. Support is not guaranteed. If you have any issues with your provider, please contact us for supporting them.
If no value is provided or invalid value is provided, defaults to other
.
Azure Blob Storage
Peaksel can interact with Azure via S3Proxy. We don’t support Azure natively as it’s not S3 compatible. If you have any issues with Azure blob storage, please contact us for native support. S3Proxy can be run as a docker container:
services:
s3proxy:
restart: always
container_name: s3proxy
image: andrewgaul/s3proxy:sha-b6ce601
ports:
- "9000:80"
expose:
- "80"
environment:
- LOG_LEVEL=info # can be set to debug or trace for troubleshooting
- S3PROXY_ENDPOINT=http://0.0.0.0:80
- S3PROXY_IDENTITY=local-identity # value that will be used as access_key_id in peaksel
- S3PROXY_CREDENTIAL=local-credential # value that will be used as access_key in peaksel
- S3PROXY_AUTHORIZATION=none # signature authentication. For peaksel, set 'none'
- JCLOUDS_PROVIDER=azureblob-sdk # name of azure service account
- JCLOUDS_IDENTITY=serviceaccount # secret key
- JCLOUDS_CREDENTIAL=accesskey # https://<service-account-name>.blob.core.windows.net
- JCLOUDS_ENDPOINT=https://serviceaccount.blob.core.windows.net
In Peaksel, corresponding properties are:
s3.blobs.access_key_id: local-identity
s3.blobs.access_key: local-credential
s3.blobs.bucket: [azure blob container name]
s3.blobs.endpoint: http://localhost:9000
s3.blobs.region: [azure blob container region]
s3.provider: azures3
For more information, refer to a blog post by Microsoft.