Data storage system

  • Saving data in the GCP bucket or S3 might be more practical, because it is hard to upload and download, and not sure there is a cli to do that in a quick way for google drive.

  • need to fine a sustainable solution to save historical preprocessing data or checkpoints (for example, cold line storage in GCP storage bucket):

    • Mainly used for backup and occasionally retrieval

    • Have a cli to download and upload (no many requirements on latency, not need to be very fast)

    • Download frequency is low (once per 2-3 months?)

    • Large data(1T+) and expected to increase

    • High durability and have a valid disaster plan

    • Low cost

    • Don’t need to be multi-region

  • set up a sustainable plan (which data to transfer, which place, etc),

Last updated