Hi! Great article with good presentation of how the entire system works. Just one question regarding the model repository for the inference servers: from "The model repository is embedded in the container image of the inference server", does that mean you copy the serialized models in the image when you build it ? If so, I imagine that depending on the number of models you want to have available the image might become quite large, did you encountered any significant "boot" time increase when spinning up a new image?
Also did you take into consideration using a cloud storage for the model registry and let triton take care of loading / updating the models ?
Thank you!