1.7
BentoML is a framework for building reliable, scalable, and cost-efficient AI applications. It comes with everything you need for model serving, application packaging, and production deployment. Highlights 🍱 Bento is the container for AI apps
Open standard and SDK for AI apps, pack your code, inference pipelines, model files, dependencies, and runtime configurations in a Bento. Auto-generate API servers, supporting REST API, gRPC, and long-running inference jobs. Auto-generate Docker container images.
🏄 Freedom to build with any AI models
Import from any model hub or bring your own models built with frameworks like PyTorch, TensorFlow, Keras, Scikit-Learn, XGBoost and many more. Native support for LLM inference, generative AI, embedding creation, and multi-modal AI apps. Run and debug your BentoML apps locally on Mac, Windows, or Linux.
🍭 Simplify modern AI application architecture
Python-first! Effortlessly scale complex AI workloads. Enable GPU inference without the headache. Compose multiple models to run concurrently or sequentially, over multiple GPUs or on a Kubernetes Cluster. Natively integrates with MLFlow, LangChain, Kubeflow, Triton, Spark, Ray, and many more to complete your production AI stack.
🚀 Deploy Anywhere
One-click deployment to ?? BentoCloud, the Serverless platform made for hosting and operating AI apps. Scalable BentoML deployment with 🦄? Yatai on Kubernetes. Deploy auto-generated container images anywhere docker runs.
Composable, ready-to-use data and AI infrastructure pre-loaded with web3 data. Accelerate development of the next generation of intelligent software.