Jon Wang
Jon Wang possesses a deep understanding of large model inference systems, relevant ecosystems like LangChain, and their practical applications. With over 4 years of experience in distributed system design and development, Wang has a proven track record in the creation, development, testing, and delivery of products from scratch. He is well-acquainted with the open-source ecosystem and has been an active contributor to Apache IoTDB, making significant contributions in terms of key features and bug fixes. Reliable and adept at communication, Wang is a team player with a strong passion for technology.
Sessions
In the rapidly evolving landscape of AI and machine learning, the deployment and serving of models have become as crucial as their development. Xinference, a state-of-the-art library, emerges as a game-changer in this domain, offering seamless model serving capabilities. This talk aims to delve deep into how Xinference not only simplifies the process of deploying language, speech recognition, and multimodal models but also intelligently manages hardware resources. By choosing an appropriate inference runtime based on the hardware and allocating models to devices according to their usage, Xinference ensures optimal performance and resource utilization.