Workshop Date: Aug 3 or 4 (to be confirmed soon)

Workshop Venue: Toronto, ON, Canada

Inference optimization represents a critical frontier in data mining and machine learning research, particularly as Large Language Models (LLMs) become central to modern AI applications. This emerging field combines machine learning, systems engineering, and optimization theory to address one of the most pressing challenges in practical AI: how to efficiently generate predictions from increasingly massive language models.

The research challenges are multifaceted with focus on real-world impact and scalable solutions. They range from developing novel algorithmic and modeling approaches such as model distillation, quantization, sparsification, and pruning, to ML system engineering that requires deep understanding of hardware characteristics of AI accelerators (e.g. GPU) in order to create methodologies for optimizing trade-offs between key performance metrics such as latency, throughput, and model accuracy.

Despite the field's rapid advancement and interdisciplinary nature, there remains a limited exchange of ideas and methodologies between production-facing practitioners and researchers seeking to experiment with new GenAI concepts quickly. To bridge this gap, we are introducing the first KDD workshop on Inference Optimization for Generative AI. Our goal is to create a collaborative platform where researchers and practitioners working across various use cases and stacks of efficient inference can come together to exchange research ideas, establish connections between different disciplines, and identify challenges and research questions that will shape future work.

Call for Papers (CFP)

Submission Guideline

Authors should submit a short paper in PDF format using the Standard ACM Conference Proceedings Template. Submissions are limited to 4 content pages, including all figures and tables but excluding references. References and supplementary material have no page limit, but reviewers are not required to read the appendices. All key claims should be supported within the main 4-page body. A single PDF must include both the main paper and any supplementary material.

We welcome unpublished and under-submission work, as well as recently published papers (2024/2025), including those from KDD 2025 or other venues. All accepted papers will be presented as posters, with select papers chosen for oral presentations based on scheduling constraints. Accepted papers will be posted on the workshop website.

Following the KDD conference submission policy, reviews are double-blind, and author names and affiliations must NOT be included.

Topics of Interest

We welcome submissions in one or more of the following areas: