# Data storage system

* Saving data in the GCP bucket or S3 might be more practical, because it is hard to upload and download, and not sure there is a cli to do that in a quick way for google drive.&#x20;
* need to fine a sustainable solution to save historical preprocessing data or checkpoints (for example, cold line storage in GCP storage bucket):
  * Mainly used for backup and occasionally retrieval&#x20;
  * Have a cli to download and upload (no many requirements on latency, not need to be very fast)
  * Download frequency is low (once per 2-3 months?)
  * Large data(1T+) and expected to increase
  * High durability and have a valid disaster plan
  * Low cost
  * Don’t need to be multi-region
* set up a sustainable plan (which data to transfer, which place, etc),&#x20;
