What it takes to be a ML infra engineer
Coding:
Reviewing our tools and code to help us continue moving at state-of-the-art speed
Spark/Dataproc for experiment at scale, mlflow for experiment management
Substantial experience with multiple technologies from the following list: Arrow, Bazel, Docker, Kibana, MPI, MySQL, Redis, Spark, Zookeeper.
Ability to build full-stack web applications/services for internal tooling.
Experience using Cython Numba, C or similar to speed up analytical code
Experience with GPU acceleration (CUDA and CUDNN)
Flink / MLflow
Integration:
building deep integration between NVIDIA's GPU-backed RAPIDS frameworks and all of the major cloud and on-premise machine learning platforms
Deployment:
automate, deliver, monitor, and improve machine learning solutions while ensuring data and models are secure
Experience with Terraform and Puppet for infrastructure management and automation
Experience with Kubernetes deployments and cluster management
Last updated
Was this helpful?