PyData Global 2023

Xorbits Inference: Model Serving Made Easy
12-07, 13:00–13:30 (UTC), General Track

In the rapidly evolving landscape of AI and machine learning, the deployment and serving of models have become as crucial as their development. Xinference, a state-of-the-art library, emerges as a game-changer in this domain, offering seamless model serving capabilities. This talk aims to delve deep into how Xinference not only simplifies the process of deploying language, speech recognition, and multimodal models but also intelligently manages hardware resources. By choosing an appropriate inference runtime based on the hardware and allocating models to devices according to their usage, Xinference ensures optimal performance and resource utilization.


Introduction

  • Brief overview of the challenges in model serving, especially in terms of hardware resource management and model deployment.
  • Introduction to Xinference and its significance in the current AI ecosystem.

Optimized Hardware Resource Management

  • Deep dive into how Xinference manages heterogeneous hardware resources, ensuring that GPUs and CPUs are utilized to their maximum potential.
  • Discussion on intelligent allocation of resources based on the specific needs of the model and the available hardware.

Intelligent Inference Runtime Selection

  • Exploration of how Xinference selects the most suitable inference runtime based on the hardware it's operating on.
  • Real-world examples showcasing the performance improvements and efficiency gains achieved through this feature.

Dynamic Model Loading Based on Device Usage

  • Insight into Xinference's capability to allocate models to specific devices based on their current usage and workload.
  • A demonstration of how this dynamic allocation ensures smoother model serving, reduced latency, and improved user experience.

Real-World Applications and Case Studies

  • Presentation of real-world scenarios where Xinference has been implemented.
  • Discussion on the benefits realized, challenges faced, and solutions provided by Xinference in these scenarios.

Future Roadmap and Enhancements

  • Sneak peek into the upcoming features and enhancements planned for Xinference.

Prior Knowledge Expected

No previous knowledge expected

Jon Wang possesses a deep understanding of large model inference systems, relevant ecosystems like LangChain, and their practical applications. With over 4 years of experience in distributed system design and development, Wang has a proven track record in the creation, development, testing, and delivery of products from scratch. He is well-acquainted with the open-source ecosystem and has been an active contributor to Apache IoTDB, making significant contributions in terms of key features and bug fixes. Reliable and adept at communication, Wang is a team player with a strong passion for technology.