Chap 7. Deployment and model serving

Checklist before deployment.

  • What's the input/output? which params should be considered in the config file?

  • What's the minimal RAM, running environment required to run the inference?

  • What are the fixed requirements for the deployment? for example, fixed GPU types, strict inference time, low latency, etc?

  • On the cloud or on-site, or edge/mobile deployment?

  • Security issue; encrypt the code

  • set up the license

Then, here comes with deployment lifecycle:

  • develop for testing

  • staging for pre-release

  • production for release

Now that you set up a SAAS or Restful API that deploys your deep learning model, and everything looks good. What's next? Which test should be included?

This could be a good starter.

  • Profile and record the utilities on your local container, including RAM, GPU memory, GPU utilization, etc.

  • Have a toy dataset to make sure the pipeline works; the function is correct.

  • Very large data (be mindful of not only the inference part but also, the uploading, compressing, uncompressing, sending back, etc).

  • Set up an email reminder or other notifications when it failed.

  • Logging system to monitor each step (easier to debug; time spent for profiling).

  • If it involves multiple GPU setups, you should also check on the multiple scaling part.

After release, the major bandwidth will be focused on maintaining:

  • pipeline side

  • model side

If there is a new issue popping up, could apply a patch to fix that and release a new minor version; or fix it in the new major version's release.

References:

Last updated