Keep HPLC data in S3 (AWS, Google Cloud, etc)

Chromatography/Mass Spec data can be very heavy, especially spectra. Instead of keeping it all in Postgres, Peaksel allows storing signals to S3-compatible blob storage (AWS, Google Cloud, Azure Object Storage).

To activate this, you have to add/uncoment these to docker-compose.yml:

s3.blobs.access_key_id: [key id]
s3.blobs.access_key: [key content]
s3.blobs.bucket: [s3 bucket]
s3.blobs.endpoint: [s3 root URL]
s3.blobs.region: [region]

By default, Peaksel will first store binaries in Postgres first, and after some time (e.g. a day) it’ll be migrating the data to S3. It’s possible to control:

  • How old the blobs should before they are migrated to S3 with job.s3.blobs.older_than_seconds, default is 1 day

  • The min size of the blob to be considered for migration`job.s3.blobs.min_size_bytes`, defaults to 64kB

  • The min size after which the object is going to be uploaded directly to S3 bypassing Postgres: s3.blob.immediate_upload_threshold_bytes, default is 1GB