Large, GenAI models are taking the world by storm. However, recent developments are showing smaller GenAI is equally performant, with the added benefit of being agile, efficient, and not wasting as many resources as their larger counterparts.
In this talk, we’ll discuss intriguing new dilemmas in the GenAI space: Will Large Language Models (LLMs) fit on small format factor machines? Smaller LMs vs. LLMs? Where is the sweet spot for local inference? We’ll discuss the era of LLM compression, including INT8, INT4, and 1-bit LLM models, and how it can be more effective to work on complex deep learning models and big data processing with GPUs, NPUs, and CPUs.
To solve the challenges of GenAI inference, we’ll demonstrate how GenAI deployment on AI PCs and edge devices can be fast and accurate, with the OpenVINO™ toolkit. We’ll explore exciting new developments you can try today with the latest OpenVINO release, including broader LLM models and Mixture of Experts architecture support, new model compression and memory optimization techniques for LLMs; expanded model deployment on Intel HW; deploying AI models with OpenVINO Model Server, and new GenAI workloads!
Join us for an exciting conversation with our AI Evangelists and Fellow from Intel on the future of AI and roadmaps in GenAI, seeing real live demos on how to run LLMs locally for faster, smarter inference, and learn how to do this on your machine today with OpenVINO.
×