Chap 7. Deployment and model serving
Checklist before deployment.
What's the input/output? which params should be considered in the config file?
What's the minimal RAM, running environment required to run the inference?
What are the fixed requirements for the deployment? for example, fixed GPU types, strict inference time, low latency, etc?
On the cloud or on-site, or edge/mobile deployment?
Security issue; encrypt the code
set up the license
Then, here comes with deployment lifecycle:
develop for testing
staging for pre-release
production for release
Now that you set up a SAAS or Restful API that deploys your deep learning model, and everything looks good. What's next? Which test should be included?
This could be a good starter.
Profile and record the utilities on your local container, including RAM, GPU memory, GPU utilization, etc.
Have a toy dataset to make sure the pipeline works; the function is correct.
Very large data (be mindful of not only the inference part but also, the uploading, compressing, uncompressing, sending back, etc).
Set up an email reminder or other notifications when it failed.
Logging system to monitor each step (easier to debug; time spent for profiling).
If it involves multiple GPU setups, you should also check on the multiple scaling part.
After release, the major bandwidth will be focused on maintaining:
pipeline side
model side
If there is a new issue popping up, could apply a patch to fix that and release a new minor version; or fix it in the new major version's release.
References:
flask / streamlit/ Starlette:https://towardsdatascience.com/10-minutes-to-deploying-a-deep-learning-model-on-google-cloud-platform-13fa56a266ee
LAMBDA is still a RESTful API with preset protocols.
Last updated
Was this helpful?