Registration Desk: Registration (East & West) Tue 10 Dec 07:30 a.m.
Affinity Event: Women in Machine Learning - Overflow Streaming Tue 10 Dec 08:00 a.m.
Affinity Event: Women in Machine Learning Tue 10 Dec 08:15 a.m.
The 19th Workshop for Women in Machine Learning (WiML) will be co-located with NeurIPS 2024 in Vancouver, BC, Canada.
The WiML workshop 2024 will be held in person on December 10th with invited speakers, oral presentations, and posters. The event brings together members of the academic and industry research landscape for an opportunity to connect and exchange ideas, and learn from each other. There will be a mentoring session to discuss current research trends and career choices in machine learning. Underrepresented minorities and undergraduates interested in pursuing machine learning research are encouraged to participate. All presenters should be women or non-binary, and all genders are invited to attend.
https://sites.google.com/wimlworkshop.org/wiml-2024/
Expo Talk Panel: Core Technical Interpretation and Best Practices of Responsible AI Tue 10 Dec 08:30 a.m.
Nowadays, large-scale models are developing rapidly, and the application fields of GenAI have been unprecedentedly expanded, attracting the interest of numerous AI developers and researchers. While users are exploring the efficiency of AI, they are also increasingly concerned about ensuring the security of data, AI algorithms, and models. This speech aims to bring some technical interpretation of Alibaba Cloud in the direction of Responsible AI, providing security detection and protection capabilities in key processes such as data preparation, model training, model fine-tuning, and model inference, to achieve end-to-end data security, content compliance, and reliable models. With industry-leading core technologies, it protects data and privacy through end-to-end data confidentiality and privacy-enhanced computing; supports model deployment and authorized use in various cloud-edge environments to protect model intellectual property; optimizes and governs model inference results to ensure and enhance the availability, robustness, and confidentiality of the model, collectively building Responsible AI. We invite you to witness how, in the era of rapid evolution of AI, to establish a secure, stable, and trustworthy technological foundation, promote the robust landing of AI technology, benefit society, and lead the future. 【Speaker】 Wei Lin, Researcher at Alibaba Cloud Intelligence Group, Senior Director of Platform for Artificial Intellegence (PAI) and Big-Data development and governance platform(Dataworks)
Expo Talk Panel: TikTok Symphony: AIGC Solution for Advertising Tue 10 Dec 08:30 a.m.
At the heart of TikTok's vibrant community lies a simple truth: creative content is the engine of growth. Recognizing this, we introduced TikTok Symphony, our new suite of creative solutions powered by generative AI. This new suite is designed to make content creation easier than ever for marketers and creators alike, simplifying the creative process while scaling it to new dimensions. TikTok Symphony is designed to elevate your content creation journey every step of the way. Blending human imagination with AI-powered efficiency, TikTok Symphony enables businesses of all sizes, creators, and agencies to level the playing field, boost productivity, and uncover valuable insights. From scripting to video production to asset optimization, Symphony transforms the complex into the simple—if you can imagine it, you can create it.
Expo Talk Panel: AI Verification & Validation: Trends, Applications, and Challenges Tue 10 Dec 08:30 a.m.
AI-enabled Engineered Systems are increasingly being developed in sectors such as aerospace, automotive, and manufacturing, where ensuring reliability and safety is paramount. Verifying AI systems, especially those utilizing complex models like neural networks, presents unique challenges distinct from traditional software verification. This is due to their data-driven, inherent non-deterministic nature and the opacity of their decision-making processes. As AI permeates safety-critical industries, there is an urgent need to establish robust verification methodologies that can ensure these systems operate safely and as intended.
In this talk, we will explore the current trends, applications, and challenges in AI verification, highlighting key methodologies such as abstract interpretation methods, which allow to analyze the mathematical properties of AI models to ensure they meet specified constraints and safety requirements. We will discuss seminal contributions, including the FoRMuLA report by Collins Aerospace and the European Union Aviation Safety Agency (EASA), which focuses on emphasizing opportunities for the adoption of formal methods techniques in the design assurance process of machine-learning-enabled systems. Additionally, we will examine the role of constrained deep learning, an approach that incorporates domain-specific constraints into the training of deep neural networks. This technique ensures desirable behavior in safety-critical scenarios by embedding constraints directly into the model construction and training processes. The importance of runtime monitors, which dynamically assess the performance and safety of AI systems during operation, will also be highlighted. These topics will be contextualized within a broader discussion of the applications and challenges faced by industries integrating AI technologies into their systems, offering insights into potential solutions and future directions to enhance safety and reliability.
Affinity Event: LatinX in AI Tue 10 Dec 08:30 a.m.
The workshop is a one-day event with invited speakers, oral presentations, and posters. The event brings together faculty, graduate students, research scientists, and engineers for an opportunity to connect and exchange ideas. There will be a panel discussion and a mentoring session to discuss current research trends and career choices in Artificial Intelligence and Machine Learning. While all presenters will identify primarily as LatinX, all are invited to attend.
Expo Talk Panel: Onboarding Generative AI at the Edge: Challenges and Solutions from Industrial Research Tue 10 Dec 08:30 a.m.
Generative AI has emerged as a transformative force capable of creating new multimodal content - including text, speech, images, video, and 3D - while handling complex dialogues and problem-solving tasks. This disruptive technology is reshaping traditional methodologies across various application domains, redefining the user interfaces for computing devices. Its impact transcends industries, promising substantial advancements in utility, productivity, and efficiency.
As the adoption of generative AI accelerates, its computational demands are increasing dramatically, making on-device processing more essential than ever. Currently, most generative AI applications operate in the cloud, placing significant strain on resources and incurring high equipment and operational costs. These workloads are motivating a reevaluation of effective strategies for the implementation of AI models. One promising approach is to shift the AI workloads to edge devices, such as phones, laptops, and XR headsets, where some -yet- limited processing capabilities are available. This transition not only reduces cost for cloud operations, but also enhances privacy, reduces communication bandwidth needs, and facilitates more streamlined access. However, enabling generative AI on resource-limited devices requires AI models to be optimized for edge devices, leveraging their available AI accelerators. In this talk, we will explore the pivotal role of deploying generative AI on-device and the full-stack optimizations necessary to facilitate this shift. The presentation will feature hands-on demonstrations, showcasing live-action, industrial-grade examples of generative AI models operating on edge devices. Highlights include: Self-Speculative Decoding Visual Content Generation: Generative 3D Diffusers Multimodal Generative Models on Edge Parameter-Efficient Personalization on Edge
Expo Talk Panel: AI for Autonomous Driving at Scale Tue 10 Dec 08:30 a.m.
Speaker: Vincent Vanhoucke, Distinguished Engineer, Waymo
Abstract: Waymo’s mission is to “Be the world’s most trusted driver.” In order to accomplish this mission, we combine unique sensing capabilities with real-time perception, prediction and planning architectures that leverage data at scale, and enable us to safely deliver millions of autonomous rides in a variety of urban areas and driving conditions. In this talk, I’ll describe some of the key components of this stack, how model scaling laws inform improving driving performance, and how we build the foundations for efficiently scaling our approach to autonomy.
Speaker's website: https://vincent.vanhoucke.com/ Waymo: http://www.waymo.com
Expo Talk Panel: Incentivizing Collaborative AI: A Decentralized Approach to Scaling Machine Learning Tue 10 Dec 08:30 a.m.
A talk by Bittensor founder and creator, Jacob Steeves.
As AI rapidly evolves, scaling machine learning models while ensuring data diversity and efficiency is a significant challenge. This talk introduces a decentralized learning framework leveraging distributed networks and incentive mechanisms to address these issues.
Our approach aligns with the demand for scalable, efficient, and ethical AI solutions. By employing a novel incentive mechanism, it facilitates training complex models across distributed networks, ensuring quality contributions from diverse sources.
In this session, we will:
Explore the Architecture: How contributors and validators interact to create a robust decentralized AI training ecosystem.
Explain the Incentive Mechanism: The mathematical foundations of the reward system aligning individual contributions with collective improvement.
Discuss Practical Challenges: Real-world obstacles in implementing decentralized AI systems, including data privacy, network latency, and preventing malicious activities.
Showcase Applications: Case studies demonstrating the impact on AI scalability and efficiency in sectors like finance and healthcare.
Highlight Open-Source Benefits: How the framework's open-source nature fosters global innovation and collaboration.
This talk offers invaluable insights for practitioners seeking to implement AI at scale. Attendees will gain a comprehensive understanding of:
Practical challenges and solutions in decentralized AI training.
How incentive mechanisms drive collaborative innovation.
Broader implications for the AI/ML industry.
Intended Audience: AI/ML practitioners, researchers, and industry professionals interested in large-scale AI deployment, decentralized systems, and collaborative frameworks. Ideal for those seeking innovative solutions to real-world AI challenges.
Expo Talk Panel: Commerce Foundation Model Tue 10 Dec 08:30 a.m.
This talk focuses on our vision for a Commerce Foundation Model, a model that can unify and understand all the different aspects of the commerce journey, from merchants to buyers, and everyone in between. We’ll share our motivation, discuss analogies to foundational models in other modalities (text, vision), and get the audience excited about the wide-ranging impacts and challenges of such a model. Then we’ll go deep on the attributes of the kinds of models/architectures we might consider, and the special/new challenges posed by a modality of this nature. Next we’ll pick one or two architectures to explore in greater depth (eg, HSTU, Tiger, etc), and discuss the pros/cons of different approaches. Finally, we’ll link this theoretical discussion back to the work we’ve been doing at Shopify – or as much of it as we can ;). Think stuff like unifying our logging to get consistent deep and wide event streams, applications we’ve applied it to, and results we’ve achieved so far. The audience will leave excited to learn more about how Shopify is laying the “foundation” to make commerce better for everyone.
Expo Talk Panel: Compute: Past, Present & Future Tue 10 Dec 08:30 a.m.
A 20 minute talk about the history of Compute has evolved to what we envision in Decentralized Computing today
Meetup: Streaming Hangout Tue 10 Dec 09:00 a.m.
Meetup: Quiet Streaming Hangout Tue 10 Dec 09:00 a.m.
Meetup: Quiet Streaming Hangout Tue 10 Dec 09:00 a.m.
Meetup: Quiet Streaming Hangout Tue 10 Dec 09:00 a.m.
Meetup: Quiet Streaming Hangout Tue 10 Dec 09:00 a.m.
Meetup: Quiet Streaming Hangout Tue 10 Dec 09:00 a.m.
Creative AI Session 1 Tue 10 Dec 09:00 a.m.
[ East Ballroom C ]
Abstract
[ East Ballroom C ]
Abstract
[ East Ballroom C ]
Abstract
[ East Ballroom C ]
Abstract
[ East Ballroom C ]
Abstract
[ East Ballroom C ]
Abstract
[ East Ballroom C ]
Abstract
[ East Ballroom C ]
Abstract
[ East Ballroom C ]
Abstract
[ East Ballroom C ]
Abstract
[ East Ballroom C ]
Abstract
Tutorial: Michael Mozer · Katherine Hermann · Jennifer Hu
Experimental Design and Analysis for AI Researchers
Simulation comparisons are often used in machine learning to argue for the superiority of one model or method over another. However, the conclusions that can be drawn from such studies are only as robust as the forethought that is put into their design and analysis. We discuss core techniques used in the experimental sciences (e.g., medicine and psychology) that are too often sidestepped by AI researchers. Topics include: classical statistical inference, hypothesis testing, one-way and multi-factor ANOVA, within- and between-subjects designs, planned vs. post-hoc contrasts, visualization of outcomes and uncertainty, and modern standards of experimental practice. We then focus on two topics of particular interest to AI researchers: (1) human evaluations of foundation models (LLMs, MLMs), e.g., in domains like intelligent tutoring; and (2) psycholinguistic explorations of foundation models, in which models are used as subjects of behavioral studies in order to reverse engineer their operation, just as psychologists and psycholinguists have done with human participants over the past century.
Bio s:Tutorial: Zhijing Jin · Sergio Garrido
Causality for Large Language Models
In this tutorial, we will explore the intersection of causality and large language models (LLMs). Our goal is to provide a comprehensive understanding of how causal inference can enhance the performance, interpretability, and robustness of LLMs. The tutorial will cover foundational concepts in both fields, discuss emerging trends, present three paradigms for causality for LLM research, and corresponding practical applications. We also include a panel of experts with diverse backgrounds, including Yoshua Bengio, to engage the NeurIPS community with a comprehensive overview and diverse perspectives.
Bio s:Tutorial: Yu-Xiang Wang · Lei Li · Xuandong Zhao
Watermarking for Large Language Models
Generative AI has significantly advanced, particularly in natural language processing, exemplified by models like ChatGPT, but these advancements have raised concerns about misuse, such as generating fake news or plagiarizing content. This tutorial explores text watermarking as a solution, embedding detectable patterns within AI-generated text to verify its origin. We will cover the evolution of text watermarking, its modern techniques, and challenges, along with model watermarking for copyright protection. Participants will gain a solid understanding of watermarking methods, their practical applications, and future research directions in this critical field.
Bio s:Tutorial: Edoardo Maria Ponti · André Martins
Dynamic Sparsity in Machine Learning: Routing Information through Neural Pathways
Recent advancements in machine learning have caused a shift from traditional sparse modeling, which focuses on static feature selection in neural representations, to dynamic sparsity, where different neural pathways are activated depending on the input. This line of work is fueling, among other directions, new architectures for foundation models, such as sparse Mixtures of Experts. In this tutorial, we explore how dynamic sparsity provides several advantages, especially: i) incorporating structural constraints in model representations and predictions; ii) performing conditional computation, adaptively adjusting the model size based on the input complexity; iii) attaining the performance of dense models while accelerating training and inference. This tutorial connects these lines of work through a unified perspective, including pedagogical materials with concrete examples in a wide array of applications (including Natural Language Processing, Computer Vision, and Reinforcement Learning) to familiarize general research audiences with this new, emerging paradigm and to foster future research. The tutorial information is available at https://dynamic-sparsity.github.io/
Bio s:Tutorial: Ricky T. Q. Chen · Yaron Lipman · Heli Ben-Hamu
Flow Matching for Generative Modeling
Flow matching is a simple yet effective generative modeling paradigm that has found widespread adoption in diverse domains and large-scale applications. It is inspired by the efficient training of diffusion models, but offers a simpler perspective and enables easy implementation and generalization. At its core, flow matching follows a simple blueprint: regress onto conditional velocities that generate single data examples, and the result is a model that generates the full distribution.
Our objective in this tutorial is to provide a comprehensive yet self-contained introduction to flow matching, beginning with the continuous Euclidean setting. Afterwards, we will explore extensions and generalizations, including adaptations to non-Euclidean geometries, as well as generalizations to discrete domains and even arbitrary Markov processes. Lastly, we will discuss post-training and fine-tuning methodologies for improved inference and conditioning. The tutorial will survey applications of flow matching ranging from image and video generation to molecule generation and language modeling, and will be accompanied by coding examples and a release of an open source flow matching library. We hope this tutorial will serve as a soft entry point for researchers, as well as provide all attendees with both a theoretical and practical understanding of flow matching with an outlook for future advancements.
Bio s:Tutorial: Bo Li · Irina Sigler · Yuan Xue
Evaluating Large Language Models - Principles, Approaches, and Applications
This tutorial delves into the critical and complex domain of evaluating large language models (LLMs), focusing on the unique challenges presented when assessing generative outputs. Despite the difficulty in assigning precise quality scores to such outputs, our tutorial emphasizes the necessity of rigorous evaluation throughout the development process of LLMs. This tutorial will provide an extensive presentation of evaluation scopes, from task-specific metrics to broader performance indicators such as safety and fairness. Participants will be introduced to a range of methodological approaches, including both computation and model-based assessments. The session includes hands-on coding demonstrations, providing the tools and knowledge needed to refine model selection, prompt engineering, and inference configurations. By the end of this tutorial, attendees will gain a comprehensive understanding of LLM evaluation frameworks, contributing to more informed decision-making and ensuring the responsible deployment of these models in real-world applications.
Bio s:Tutorial: Kyle Lo · Akshita Bhagia · Nathan Lambert
Opening the Language Model Pipeline: A Tutorial on Data Preparation, Model Training, and Adaptation
Language models (LMs) have become a critical technology for tackling a wide range of natural language processing tasks, making them ubiquitous in both AI research and commercial products. As their commercial importance has surged, the most powerful models have become more secretive, gated behind proprietary interfaces, with important details of their training data, architectures, and develop- ment undisclosed. Given the importance of these details in scientifically studying these models, including their biases and potential risks, we believe it is essential for the research community to have access to powerful, truly open LMs. In this tutorial, we provide a detailed walkthrough of the language model development pipeline, including pretraining data, model architecture and training, adaptation (e.g., instruction tuning, RLHF). For each of these development stages, we provide examples using open software and data, and discuss tips, tricks, pitfalls, and other- wise often inaccessible details about the full language model pipeline that we’ve uncovered in our own efforts to develop open models. We have opted not to have the optional panel given the extensive technical details and examples we need to include to cover this topic exhaustively.
Bio s:Affinity Event: Global South in AI Tue 10 Dec 10:00 a.m.
Workshop starts from 12PM to 3PM
Affinity Event: Black in AI Tue 10 Dec 10:00 a.m.
Get ready for an inspiring experience. It’s definitely getting ready for you. The 8th Black in AI Workshop will be held co-located with NeurIPS 2024. The workshop will feature invited talks from prominent researchers and practitioners, a poster session, and a startups showcase. We invite all members of the AI community to attend the workshop.
Expo Workshop: Accelerating AI Research and Development with AMD Instinct Systems Tue 10 Dec 12:00 p.m.
There has been growing enthusiasm in research community and industry on AMD systems for large AI model training and inference. This workshop aims to provide a forum for AI researchers, practitioners, and AMD engineers to get together to share experiences and explore how to leverage AMD Instinct accelerators including MI200 and MI300 GPUs to develop scalable, energy efficient, and cost-effective systems for AI model training and inference. The workshop will feature presentations from AMD as well as universities and industry. The presentations will be highly interactive to foster active engagement between the audience and presenters.
Expo Workshop: AI for Enhanced Spacecraft Orientation Tue 10 Dec 12:00 p.m.
Successful space rendezvous missions rely upon accurate pose estimation of a target spacecraft. In this session, we will explore AI/machine learning workflows through hands-on and code-along exercises. You will gain insights into building a successful pose estimation algorithm using the new, state-of-the-art commercially available dataset known as Speed-UE-Cube. This workshop will cover the complete workflow from image pre-processing to deploying deep learning algorithms on hardware and was created in collaboration with Stanford University’s Space Rendezvous Laboratory (SLAB).
In this interactive hands-on workshop, you will: Familiarize yourself, write and run code entirely in the browser using MATLAB® Online™. Create and evaluate necessary components to succeed in AI modeling, by implementing an example of aircraft classification. Deep dive into an advanced, domain-specific application that showcases a complete workflow for accomplishing spacecraft pose estimation.
MathWorks® instructors and teaching assistants (TAs) will be available throughout the session to guide you. If the event is being held onsite, please bring your laptop, and install the Google Chrome™ browser beforehand.
Expo Workshop: Knowledge-enhanced LLMs for Industry Verticals Tue 10 Dec 12:00 p.m.
Hosted by Ant Group, this event will explore how large language models (LLMs) can be adapted and enhanced with domain-specific knowledge to address the unique challenges of various industries. Attendees will discover cutting-edge applications and research, ranging from healthcare and finance to manufacturing and retail, showcasing the potential of LLMs to transform industry-specific workflows. Join us for insightful discussions, live demonstrations, and opportunities to engage with leading experts and practitioners in the field.
Expo Workshop: Accelerating Edge AI: Optimizing and Deploying AI Models with Qualcomm AI Hub Tue 10 Dec 12:00 p.m.
Abstract:
In the rapidly evolving landscape of artificial intelligence, the shift from cloud-centric to edge-centric AI deployments presents unique challenges and opportunities. Qualcomm is at the forefront of this transformation, aiming to democratize AI technology by simplifying the development and deployment of AI models to edge devices. This workshop is designed to empower developers with the knowledge and tools necessary to transition AI workloads efficiently from the cloud to the edge.
Key Objectives:
Understanding Edge AI Deployment: We will address the common challenges developers face when migrating AI workloads to edge devices, including compatibility with existing frameworks and data types.
Performance Optimization: Learn how to enhance the performance of your AI models by up to 4X using Qualcomm AI Hub. This session will demonstrate the ease of testing model performance in less than 5 minutes with just a few lines of code.
Access to Optimized Models: Gain exclusive access to over 100 pre-optimized AI models suitable for a variety of applications across IoT, smartphones, compute platforms, and automotive sectors.
Practical Hands-On Guidance: Follow step-by-step instructions to deploy optimized models on real devices. We will guide you through the initial setup using Qualcomm AI Hub, iterating on your model, and meeting performance benchmarks necessary for on-device deployment.
Integration Techniques: Learn how to optimize your models further and integrate the downloadable target asset into your application effectively.
Workshop Benefits:
Participants will leave the workshop equipped with practical experience and ready-to-use tools provided by Qualcomm AI Hub. By the end of the session, developers will be proficient in leveraging the platform to not only meet but exceed the performance requirements of their edge AI applications.
Expo Talk Panel: Demystify Financial Textual Data with LLMs Tue 10 Dec 12:00 p.m.
Large Language Models (LLMs) have demonstrated remarkable reasoning capabilities across a wide array of natural language processing tasks. However, their potential for solving complex financial trading-related problems has been largely unexplored, primarily due to the scarcity of publicly available proprietary data and the inherently noisy nature of financial textual information. In this talk, we present how we leverage LLMs to address these challenges, offering advanced assistance in extracting insights from diverse financial documents, interpreting market information, and guiding trading judgments. We will explore potential methodologies, model adaptations, and unique strategies to harness LLMs' strengths in overcoming the ambiguities of financial text data, ultimately contributing to more informed and strategic trading decisions.
Expo Workshop: AutoGluon 1.2: Advancing AutoML with Foundational Models and LLM Agents Tue 10 Dec 12:00 p.m.
Automated Machine Learning (AutoML) continues to revolutionize how machine learning models are developed, making it accessible to practitioners with varying levels of expertise. In this workshop, we present the latest advancements in AutoGluon 1.2, an open-source AutoML toolkit developed by Amazon, which empowers users to achieve state-of-the-art performance across diverse machine learning tasks with minimal coding effort.
Expo Workshop: Active Training: Building Agentic Apps with Llama 3.2 and Llama Stack Tue 10 Dec 12:00 p.m.
This session aims to provide hands-on, engaging content that gives developers and researchers a basic understanding of the Llama 3 models, how to access and use them, and build an agentic app using Llama Stack. The audience will also learn core concepts around Prompt Engineering and Fine-Tuning and programmatically implement them using Responsible AI principles. Lastly, we will conclude the talk explaining how they can leverage this powerful tech, different use-cases and what the future looks like.
Understanding Llama 3 and its usage Familiarize yourself with Llama 2 models, how to download, install and access them, and basic use-cases it can accomplish. Additionally, we will also review basic completion, system prompts and responses in different formats.
Generative AI Application Architecture We will walk through the basic Gen AI & Chatbot architecture including implementing chat requests, responses, prompt engineering concepts to get the best out of Llama, Hallucinations and how to prevent them, augmenting external data using Retrieval Augment Generation (RAG). We will also review advanced concepts around Fine-Tuning.
Llama Stack Llama Stack is an attempt to provide standard interfaces (APIs) to streamline innovation in the highly fractured OSS ecosystem by providing, for the first time, a credible alternative developer experience to the closed source models available via simple APIs. We will explore Llama stack to build an agentic application using our Llama model.
Background Knowledge: Attendees will have a deeper understanding of Llama 3 and Llama Stack. They will be able to access Llama 3 and use it in their day-to-day Generative AI projects and applications. Basic knowledge of LLMs and Python
Technology: Large large models (open source)
Live Action: We will provide hands-on, engaging content that gives developers and researchers a basic understanding of our Llama 3 models, and leverage Llama Stack to build an agentic app.
Expo Workshop: Sony’s Efficient Content Creation and Editing through Deep Generative Models Tue 10 Dec 12:00 p.m.
We present a two-part workshop on Efficient Content Creation and Editing using deep generative models. The first part covers our latest advancements in general-purpose deep generative models. The second part offers interactive demos on content restoration and editing with generative modeling tools. These applications meet professional music industry standards and have been integrated into commercial AI-driven music production tools. Participants are invited to engage in demos, where implementation codes are provided, discuss real-world applications, and explore the potential of these models.
Expo Workshop: AutoGen 0.4: Redefining Agentic AI Systems Tue 10 Dec 12:00 p.m.
AutoGen is an open-source framework for building agentic AI systems. Since the release in October 2023, it has gained remarkable popularity with over 1.6M downloads on GitHub and 31.8k stars. Agents built with AutoGen have demonstrated SOTA performance on multiple agentic benchmarks. Enterprises, academic groups, and other communities have been using the framework to create multi-agent solutions for a wide range of scenarios including to perform business processes and drive novel AI based systems.”
In October 2024, we released AutoGen 0.4, a complete redesign from the foundation up based on lessons learned from the AutoGen’s community’s diverse use cases and contributions. AutoGen 0.4 simplifies the creation of event-driven, distributed, scalable, and resilient agentic applications. It allows to quickly build systems where AI agents collaborate and perform tasks autonomously or with human oversight.
This workshop will introduce AutoGen’s new design concepts and library components that make AutoGen 0.4 a powerful tool for agentic development and research. Following the introduction, attendees will engage in a hands-on tutorial, guiding them through the process of customizing a multi-agent application. This practical session will provide valuable experience in leveraging the framework’s features to build robust, scalable AI systems.
Affinity Event: Muslims in ML Tue 10 Dec 01:00 p.m.
The 3rd Muslim in Machine Learning (Muslim in ML) Workshop will be co-located with NeurIPS 2024 in Vancouver, BC, Canada.
Scheduled for December 10th, the Muslim in ML Workshop will showcase an inspiring program featuring invited talks, oral presentations, and poster sessions. This event provides a vibrant platform for researchers, practitioners, and students from the Muslim community to connect, exchange ideas, and foster collaborations in the field of machine learning. The workshop aims to celebrate and amplify the contributions of Muslim individuals in machine learning while promoting inclusivity and community engagement. While the workshop highlights contributions from Muslim individuals, people of all backgrounds and faiths are welcome to attend.
Creative AI Session 2 Tue 10 Dec 01:00 p.m.
[ East Ballroom C ]
Abstract
[ East Ballroom C ]
Abstract
[ East Ballroom C ]
Abstract
[ East Ballroom C ]
Abstract
[ East Ballroom C ]
Abstract
[ East Ballroom C ]
Abstract
[ East Ballroom C ]
Abstract
[ East Ballroom C ]
Abstract
[ East Ballroom C ]
Abstract
[ East Ballroom C ]
Abstract
[ East Ballroom C ]
Abstract
[ East Ballroom C ]
Abstract
Tutorial: Gillian Hadfield · Dylan Hadfield-Menell · Joel Leibo · Rakshit Trivedi
Cross-disciplinary insights into alignment in humans and machines
Aligning the behavior of AI systems and agents with human goals and values continues to be a major challenge. But the problem is not novel: many disciplines such as economics, political science, legal theory, and cultural evolutionary theory, have grappled for decades if not centuries with the question of how to align the behaviors of individuals with the well-being of other individuals and entire societies. Markets, legal institutions and rules, and political processes are mechanisms on which human societies rely to achieve goals such as well-being, fair treatment, and economic innovation and growth. In this tutorial, we will provide an introduction to these mechanisms: how they work and how they can inform a more robust approach to AI alignment. For example, a key misunderstanding in the current alignment literature is the idea that AI alignment can be achieved by fine-tuning AI agents and systems with a pre-defined set of human preferences; this is the principle underlying reinforcement learning from human feedback (RLHF) for large language models. But regulated market systems take a different approach to alignment: they encourage self-interested firms and individuals to take actions that generate wealth and do not impose excessive costs (externalities) on others and use a variety of mechanisms to shape behavior. They focus on the alignment of the system, not the individual agent per se. In this tutorial we’ll introduce participants to core ideas from economics, law, political science, and cultural evolutionary theory to inform the next generation of thinking in AI safety and alignment.
Bio s:Tutorial: Bingbin Liu · Ashok Vardhan Makkuva · Jason Lee
Sandbox for the Blackbox: How LLMs Learn Structured Data?
In recent years, large language models (LLMs) have achieved unprecedented success across various disciplines, including natural language processing, computer vision, and reinforcement learning. This success has spurred a flourishing body of research aimed at understanding these models, from both theoretical perspectives such as representation and optimization, and scientific approaches such as interpretability.
To understand LLMs, an important research theme in the machine learning community is to model the input as mathematically structured data (e.g. Markov chains), where we have complete knowledge and control of the data properties. The goal is to use this controlled input to gain valuable insights into what solutions LLMs learn and how they learn them (e.g. induction head). This understanding is crucial, given the increasing ubiquity of the models, especially in safety-critical applications, and our limited understanding of them.
While the aforementioned works using this structured approach provide valuable insights into the inner workings of LLMs, the breadth and diversity of the field make it increasingly challenging for both experts and non-experts to stay abreast. To address this, our tutorial aims to provide a unifying perspective on recent advances in the analysis of LLMs, from a representational-cum-learning viewpoint. To this end, we focus on the two predominant classes of language models that have driven the AI revolution: transformers and recurrent models such as state-space models (SSMs). For these models, we discuss several concrete results, including their representational capacities, optimization landscape, and mechanistic interpretability. Building upon these perspectives, we outline several important future directions in this field, aiming to foster a clearer understanding of language models and to aid in the creation of more efficient architectures.
Bio s:Tutorial: Levi Lelis · Xinyun Chen · Shao-Hua Sun
In this tutorial, we will present recent advances in program synthesis that enable the generation of programmatic policies for reinforcement learning and production software programs that satisfy user intent. The tutorial consists of two parts. In the first part of this tutorial, we consider the reinforcement learning (RL) setting, where the goal is to learn a policy that observes environments and acts optimally. Instead of representing policies using deep neural networks, programmatic RL (PRL) methods aim to synthesize program policies structured in a human-readable domain-specific language. PRL reformulates the RL into learning to write a program that can be executed in an environment and maximize the return, potentially yielding improved interpretability and generalizability. We will cover different families of algorithms that rely on search and learning-based methods, including those using large language models to help with the search for programmatic policies. In the second part of the tutorial, we consider code generation problems, where users provide their intent as input to a program synthesizer, which generates a program attempting to satisfy that intent. With the advancement of deep learning, neural networks and large language models (LLMs), with their impressive capabilities of understanding and reasoning over natural language and code, have revolutionized code generation. We will first discuss representative work on neural program synthesis, foundational techniques for developing LLMs for code generation, and emerging use cases of LLM-based coding agents. We will conclude this part of the tutorial with a discussion on the challenges and opportunities of LLMs for code generation.
Bio s:
Tutorial: Maggie Makar · Aahlad Manas Puli · Yoav Wald
Out-of-Distribution Generalization: Shortcuts, Spuriousness, and Stability
Machine learning models often face challenges due to distribution shifts, leading to compromised performance during testing and limiting their use in high-stakes applications. For example, vision models have mistakenly relied on the height of shoulders in images to classify radiographs of COVID-19 patients, influenced by specific scanning techniques used during the pandemic's onset. Similarly, language models exhibit susceptibility to misleading syntactic patterns in natural language inference tasks like determining entailment, persisting as models grow in size. Addressing these issues requires characterizing relevant distribution shifts and establishing desired model behaviors under them.
This tutorial aims to provide a holistic perspective on distribution shifts due to spurious correlations and shortcut learning, as exemplified by the aforementioned instances. We situate existing research within a unified formal framework, discuss challenges in practical application of methods, and delineate the evolving landscape of research on spurious correlations in the era of foundation models. This tutorial serves as a compact and self-contained resource for students and researchers learning the topic, as well as practitioners seeking to deepen their understanding of the issues and of the tools to tackle them.
We will provide an overview of research trends, discuss available benchmarks, and propose best practices for future endeavors. The tutorial's final segment will focus on exploring spurious correlations in large models, culminating in a panel discussion on the subject.
Bio s:Tutorial: Matthew Finlayson · Hailey Schoelkopf · Sean Welleck
Beyond Decoding: Meta-Generation Algorithms for Large Language Models
One of the most striking findings in modern research on large language models (LLMs) is that, given a model and dataset of sufficient scale, scaling up compute at training time leads to better final results. However, there is also another lesser-mentioned scaling phenomenon, where adopting more sophisticated methods and/or scaling compute at inference time can result in significantly better output from LLMs. We will present a tutorial on past and present classes of generation algorithms for generating text from autoregressive LLMs, ranging from greedy decoding to sophisticated meta-generation algorithms used to power compound AI systems. We place a special emphasis on techniques for making these algorithms efficient, both in terms of token costs and generation speed. Our tutorial unifies perspectives from three research communities: traditional natural language processing, modern LLMs, and machine learning systems. In turn, we aim to make attendees aware of (meta-)generation algorithms as a promising direction for improving quality, increasing diversity, and enabling resource-constrained research on LLMs.
Bio s:Tutorial: Mimee Xu · Dmitrii Usynin · Fazl Barez
PrivacyML: Meaningful Privacy-Preserving Machine Learning and How To Evaluate AI Privacy
In the world of large model development, model details and training data are increasingly closed down, pushing privacy to the forefront of machine learning – how do we protect privacy of the data used to train the model, permitting more widespread data sharing collaborations? How will individuals trust these technologies with their data? How do we verify that the integration of individual’s data is both useful to the rest of the participating federation, and, more importantly - safe for the data owner? How do the regulations integrate into this complex infrastructure?
These open questions require a multitude of considerations between the incentives of model development, the data owning parties, and the overseeing agencies. Many cryptographic solutions target these incentives problems, but are they covering all essential components of trustworthy data sharing? Are they practical, or likely to be practical soon?
In this tutorial, we attempt to answer questions regarding specific capabilities of privacy technologies in three parts: 1. overarching incentive issues with respect to data and evaluations, 2. Where cryptographic and optimisation solutions can help; for evaluations, we delve deep into secure computation and machine unlearning. 3. Cultural, societal, and research agendas relating to practically implementing these technologies.
Our website is here: https://privacyml.github.io/
We hope that, by identifying the boundaries of the use of privacy technologies, and providing a technical and structured framework for reasoning over these issues, we could empower the general audience to integrate these principles (and practical solutions) into their existing research. Those already interested in applying the technology can gain a deeper, hands-on understanding of implementation useful for modeling and developing incentive-compatible solutions for their own work.
Bio s:
Tutorial: Jiachen (Tianhao) Wang · Ludwig Schmidt · Ruoxi Jia
Advancing Data Selection for Foundation Models: From Heuristics to Principled Methods
Data selection is a critical step in training and fine-tuning foundation models, significantly impacting model performance and training efficiency. Existing approaches deployed in foundation models' data curation pipelines have primarily relied on heuristic methods, which, while practical, often lack a theoretical basis and can lead to suboptimal performance. This tutorial aims to bridge the gap between heuristic practices and emerging principled methods that offer systematic, theoretically grounded approaches to data selection.
We will begin by discussing the algorithmic foundations for data selection. This includes attribution-based approaches, diversity-based approaches, and methods that directly optimize for final model performance. These techniques will be introduced as instantiations of the unified framework of utility function maximization. Next, we will review the data selection techniques currently deployed in the foundation model training pipeline, such as rule-based data filtering, examining their strengths and limitations. Finally, we will introduce recent advances in developing principled data selection methods for foundation models, including both data point-level and source-level data selection. By the end of this tutorial, attendees will gain a deeper understanding of the theoretical underpinnings of data selection, practical knowledge of current data selection heuristics for foundation models, and insights into the research frontier in principled data selection techniques.
Bio s:Affinity Event: New in ML Tue 10 Dec 01:45 p.m.
With the booming research in artificial intelligence, the community is welcoming every day many newcomers. A lack of mentoring and inclusive environment becomes gradually significant. Our goal is to welcome new researchers in the community and provide them with some guidance to contribute to Machine Learning research fully and effectively.
Mentorship: Science Communication for AI Researchers: An Introduction Tue 10 Dec 02:00 p.m.
Science communication is essential. It helps demystify AI for a broad range of people including policy makers, business leaders, and the public. As a researcher, mastering this skill can not only enhance your communication abilities but also expand your network and increase the visibility and impact of your work.
In this brief tutorial, we will teach you how to clearly and concisely explain your research to non-specialists. You'll learn how to avoid hype, how to find suitable images to illustrate your work, and where to start with social media.
The first hour of this session will comprise a talk. The second hour will be an informal drop-in session where you can discuss your sci-comm questions, ideas and stories one-on-one.
Expo Demonstration: ProLLM: Program analysis driven, LLM assisted application modernization Tue 10 Dec 03:00 p.m.
Large Language Models for Code (or code LLMs) are increasingly gaining popularity and capabilities, offering a wide array of application modernization use cases such as code explanation, test generation, code repair, refactoring, translation, code generation, code completion and more. To leverage code LLMs to their full potential, developers must provide code-specific contextual information to the models. We would like to demonstrate generic pipelines we built, that incorporate static analysis to guide LLMs in generating code explanation at various levels (application, method, class) and automated test generation to produce compilable, high-coverage and natural looking test cases. We will also demonstrate how these pipelines can be built using “codellm-devkit”, an open-source library that significantly simplifies the process of performing program analysis at various levels of granularity, by making it easier to integrate detailed, code-specific insights that enhance the operational efficiency and effectiveness of LLMs in coding tasks. And how these use cases can be extended to different programming languages, specifically Java and Python.
Expo Demonstration: Open-Sora: Democratizing Efficient Video Production for All Tue 10 Dec 03:00 p.m.
Video Ocean is an innovative web-based platform that transforms video production, setting a new standard for ease, speed, and creativity. Using cutting-edge AI technology, Video Ocean allows users to generate high-quality videos in a matter of seconds from simple text, image, or character inputs—no technical expertise or expensive equipment required.
In this demo, we will showcase the full capabilities of Video Ocean, walking you through its user-friendly interface and demonstrating how anyone can create professional-grade videos in just a few clicks. You'll see firsthand how our platform excels in text-to-video, image-to-video, and character-to-video generation, and how it opens up endless creative possibilities.
We’ll also highlight the powerful AI models behind Video Ocean, engineered to deliver stunning results while minimizing resource usage. Whether it's generating breathtaking landscapes, lifelike animations, or cinematic footage, Video Ocean is designed to cater to creators of all levels—from professionals to beginners.
Join us for an exclusive demo and discover how Video Ocean is revolutionizing the way videos are created. With its unparalleled efficiency and accessibility, this platform is truly the future of video production, and we are excited to share it with the world.
Expo Demonstration: Deploying Cached Conditional Mixture-of-Experts LLMs on Mobile Devices with Memory Constraints Tue 10 Dec 03:00 p.m.
The Mixture-of-Experts (MoE) architecture has emerged as a powerful approach for enhancing the capacity of large language models (LLMs) while maintaining computational efficiency. This is achieved by activating only a subset of the model's parameters during inference, which allows for substantial scaling without a proportional increase in computation costs. Despite these advantages, deploying MoE models on memory-constrained devices remains challenging due to their large number of parameters, which often exceed the available DRAM capacity of a typical smartphone.
In this demonstration, we present a Qwen-MoE model with 14 billion parameters running on a mobile device that lacks sufficient DRAM to store the entire model. To address this limitation, we implement an expert caching strategy that selectively stores only a subset of experts in DRAM. During dynamic routing, if the required experts are already cached, computation is expedited. However, cache misses necessitate loading experts from flash memory, resulting in increased latency. Furthermore, by conditioning the router on the current cache state, our approach significantly improves cache hit rates and reduces deployment latency without compromising model accuracy on downstream tasks. For further acceleration, we shrink the model from FP32 to INT4 with post-training quantization. Our solution showcases the potential of running large-scale MoE models on mobile devices by dynamically managing memory constraints through intelligent caching mechanisms.
"This Proposal is provided for review and evaluation purposes only. Do not redistribute to any third party without the express prior written consent of Qualcomm Technologies, Inc."
Expo Demonstration: Deploying Personal AI Assistants with Voice, LLMs, RAG via OpenVINO™ on the AI PC Tue 10 Dec 03:00 p.m.
Explore how to build and deploy local LLM-based assistants on the AI PC or at the edge. Our pipeline leverages a real-time speech transcription model (Distil-Whisper), and Large Language Model (LLama3) powered chatbots leveraging Retrieval Augmented Generation (RAG), to personalize user interactions via text generation and summarization over prior interaction history. We discuss how the Intel® Core™ Ultra enables efficient deployment of LLMs on the CPU, iGPU, and NPU, through optimization techniques such as quantization and OpenVINO™ compression libraries. Live demos will be presented throughout the session to ensure developers can see the work in action.
Expo Demonstration: HPC-AI.COM: Unleashing the Power of High-Performance Cloud Computing for Large-Scale AI Model Training Tue 10 Dec 03:00 p.m.
We are thrilled to present HPC-AI.COM, a revolutionary cloud platform tailored for the next generation of AI model training and inference. As AI continues to transform industries, the demand for accessible, high-performance computing has never been greater. HPC-AI.COM delivers this through an intuitive, highly optimized platform designed to meet the needs of researchers, developers, and enterprises alike. Our solution simplifies large-scale AI workflows, accelerates model training, and unlocks new possibilities for innovation. In this demo, we will showcase the power of HPC-AI.COM in transforming the AI landscape by offering cutting-edge infrastructure and seamless user experiences.
Expo Demonstration: Automatically Deploying a Sequence-to-Sequence Transformer for Accelerated Discovery on the IBM HERMES Project Chip Tue 10 Dec 03:00 p.m.
Analog in-memory computing (AIMC) using resistive memory devices has the potential to increase the energy efficiency of deep neural network inference by multiple orders of magnitude. This is enabled by performing matrix vector multiplications – one of the key operations in deep neural network inference – directly within the memory, avoiding expensive weight fetching from external memory such as DRAM. The IBM HERMES Project Chip is a state-of-the-art, 64-core mixed-signal AIMC chip based on Phase Change Memory that makes this concept a reality. Using this chip, we demonstrate automatic deployment and inference of a Transformer model capable of predicting chemical compounds that are formed in a chemical reaction.
Expo Demonstration: EvalAssist - An LLM-as-a-Judge Framework Tue 10 Dec 03:00 p.m.
Evaluation of large language model (LLM) outputs requires users to make critical judgments about the best outputs across various configurations. This process is costly and takes time given the large amounts of data. LLMs are increasingly used as evaluators to filter training data, evaluate model performance, detect harms and risks, or assist human evaluators with detailed assessments. To support this process, effective front-end tools are critical for evaluation. EvalAssist abstracts the llm-as-a-judge evaluation process into a library of parameterize-able evaluators (the criterion being the parameter), allowing the user to focus on criteria definition. EvalAssist consists of a web-based user experience, an API, and a Python toolkit and is based on the UNITXT open-source library. The user interface provides users with a convenient way of iteratively testing and refining LLM-as-a-judge criteria, and supports both direct (rubric-based) and pairwise assessment paradigms, the two most prevalent forms of LLM-as-a-judge evaluation available. In our demo, we will showcase different types of evaluator LLMs for general purpose evaluation and also the latest Granite Guardian model (released October 2024) to evaluate harms and risks.
Expo Demonstration: IBM AI Agent SWE-1.0 Tue 10 Dec 03:00 p.m.
Resolving issues from an issue tracker on a source-code repository is tedious and expensive when done by hand. Recently, the SWE-bench Lite leaderboard has seen submissions by several LLM-based agents that do this automatically. Unfortunately, these agents rely on closed-source frontier models, making them expensive and raising data-sharing concerns for industrial use. In contrast, we built Agent-102, which works with a variety of open-source models such as Llama, Granite, and Mistral. Agent-102 uses sub-agents that are specialized for sub-tasks of localization, editing, and testing. Each sub-task is within reach of the capabilities of an open-source model. Furthermore, Agent-102 uses automated checking and repair of various common mistakes made by models, uses structured formats for data passed between sub-agents, and uses ensembling at multiple levels. Overall, Agent-102 approaches similar issue resolution rates as those of closed-source frontier models but with open-source models.
Expo Demonstration: Moving beyond chat: Enabling LLMs with intrinsic functions that give fine-grained control in application development Tue 10 Dec 03:00 p.m.
We aim to reframe how developers create LLM applications. Instead of iterating on verbose, complex prompts to achieve a desired complex behavior, we break down complex tasks into a series of standard computing elements that can be called by a developer in programmatic way. In this demonstration we will explore how leveraging a LLM trained with key intrinsic functions such as hallucination detection, uncertainty quantification, and topic scoping, could unlock a new way of building and working with LLMs.
Expo Demonstration: IBM Granite Vision Model for Enterprise AI Tue 10 Dec 03:00 p.m.
Enterprise applications present unique challenges for vision and language foundation models, as they frequently involve visual data that diverges significantly from the typical distribution of web images and require understanding of nuanced details such as small text in scanned documents, or tiny defects in industrial equipment images. Motivated by these challenges, we will showcase our IBM Granite Vision model, a foundation model with state-of-the-art performance in document image understanding tasks, such as the analysis of charts, plots, infographics, tables, flow diagrams, and more. We will provide a detailed overview of our methodology and present a live demonstration of our model's capabilities, illustrating its key features and applications. Our model will be open-sourced, allowing the community to access and contribute to its development.
Expo Demonstration: Diffusion Video Editing on Mobile Tue 10 Dec 03:00 p.m.
We demonstrate the first diffusion-based video editing model running on a smartphone powered by Qualcomm Technologies’ latest Snapdragon Mobile Platform. Given an input video, at 512x384 resolution, and a textual prompt instructing the edit, we generate the edited video at 5 frames per seconds on a smartphone by using full-stack AI optimizations to run on the Qualcomm Hexagon NPU for accelerated and efficient inference.
Our model is built on top of an efficient image generation backbone fine-tuned on editing instructions. The image generation backbone is extended to video by introducing cross-frame attentions from key-frames to enforce temporal consistency while being efficient in terms of memory and computation overhead. We further increase the frame rate by using a novel extension of classifier free guidance distillation to multi-modal setting, where the 3 denoising functions (unconditional, text conditioned, and frame conditioned) are all distilled into a single denoising reducing the diffusion sampling cost by a factor of 3. Additionally, we extend the adversarial distillation of diffusion models to editing while preserving the guidance scale which is essential to control the editing strength. This novel extension allows us to perform diffusion sampling with a single step. Finally, we rely on a distilled autoencoder to efficiently extract the latents and pixels required for latent diffusion models. For further acceleration, we shrink the model from FP32 to INT8 with the post-training quantization technique, AdaRound, using the AI Model Efficiency Toolkit (AIMET) from the Qualcomm AI Stack. Our quantization scheme is iteration/denoising stage agnostic with ‘Int16’ bit-width for activations.
"This Proposal is provided for review and evaluation purposes only. Do not redistribute to any third party without the express prior written consent of Qualcomm Technologies, Inc."
Expo Demonstration: Large Multimodal Model running on a mobile device Tue 10 Dec 03:00 p.m.
In this demo, we show an on-device inference of a Large Multi-modal Model (LMM) that interactively answers questions asked by the user on high-resolution images, on an Android smartphone powered by Qualcomm Technologies’ latest Snapdragon Mobile Platform. The overall latency to process a high-resolution image of 768*768 pixels and prefill KV-cache of the LLaMA-3 language model is just 0.2 seconds on a smartphone, that uses Qualcomm’s AI Stack’s optimizations and runs on the Qualcomm Hexagon NPU for accelerated and efficient inference. Scientific Challenge that we tackle
Efficiently running a large multimodal model requires concurrently on-boarding multiple deep models, including a LLaMA-3-8B LLM, a vision encoder, speech to text, and text to speech models. Large model sizes (both model parameters & activations) and high-resolution visual data pose significant challenge towards efficient execution, and enabling interactive conversational experience for the user, with all required computation to be done on the edge-device itself.
How we solve it
To efficiently run the interactive LMM on a mobile device, we design, develop, train and on-board an LMM, with efficient visual backbone, streaming Automatic Speech Recognition (ASR), streaming Text-to-Speech (TTS), and LLaMA-3 language model. Unlike most existing methods that split a high-resolution image to multiple sub-images, we can directly ingest a high-resolution image in a single forward pass. Further, our visual backbone is based upon a hierarchical network design, that efficiently runs on a mobile device, and provides a compact 144 visual tokens per 768*768 image for the LMM. Our on-device LLM can tackle a context length upto 4096 tokens, thus enabling multi-turn conversations from interleaved multiple images. The streaming ASR and TTS further provide a natural speech I/O interface with reduced end-to-end latency.
Expo Demonstration: Low-Rank Adaptation (LoRA) for Large Vision Model on a Mobile Device Tue 10 Dec 03:00 p.m.
As the demand for deploying fine-tuned and customized generative models on edge devices grows, the challenge of fully fine-tuning generative models remains due to its high cost and computational intensity. Parameter Efficient Fine-Tuning (PEFT) provides an effective solution by minimizing the number of fine-tuned parameters and reducing memory usage. This demo showcases Low Rank Adaptation (LoRA), an efficient PEFT technique for a Large Vision Model (LVM) on an Android smartphone powered by Qualcomm Technologies’ latest Snapdragon Mobile Platform.
Scientific Challenge that we tackle
Efficiently running large generative models on a resource-constrained mobile device requires methods that manage compute complexity and memory usage. Further, users need to be able to switch adapters quickly on the device. Given the size and complexity of the generative models and many on-target optimizations used, it is a challenge to perform this switch quickly while retaining the required on-target performance and accuracy.
How we solve it
To accommodate all modules of Stable Diffusion and LoRA adapters on a mobile device, they are efficiently quantized using the post-training quantization technique, AdaRound, with the AI Model Efficiency Toolkit (AIMET) from the Qualcomm AI Stack. To efficiently run Stable Diffusion with adapters on device, and to support rapid switching of the adapters, we statically compile and quantize the model and the adapters once, and support switching with fast parameter updates directly to a small fraction of the model parameters. This retains the optimization of the model execution, supports fast switching, and retains the low memory footprint of the adapters on device. To ensure we retain model accuracy across adapter switches, we also enable updates of certain metadata (such as quantization parameters) to a small fraction of the on-device model to best match the requirements of each adapter.
Expo Demonstration: Augmenting Driving Scenes by High Fidelity Generative Models Tue 10 Dec 03:00 p.m.
Training and evaluation of automotive perception models, i.e., 3D object detection and semantic segmentation, requires high quality annotated images. Collecting annotated data is costly and, in some cases, inherently hard, i.e., rare objects or events such as emergency vehicles, accidents, or odd driving behaviors. The rapid advancements of generative models, in terms of realism, diversity and controllability, has recently shown a great promise in using them to generate samples to be consumed by perception models for better training or evaluation.
We demonstrate an application of state-of-the-art generative models, namely diffusion models and differentiable renderers, to create data for automotive perception models. One key requirement for this application is the tight fidelity requirements ensuring that the generated data precisely matches the intended annotations in the 3D space. This is especially challenging for standard diffusion models, i.e., StableDiffusion or ControlNet, that are solely trained for images with no explicit understanding of 3D geometries. Therefore, most existing image editing tools, built on top of standard diffusion models, lack the high-fidelity requirements needed for automotive applications. Using our explicit 3D conditioning of diffusion models and post-generation fidelity filters, we reinforce the object pose fidelity of diffusion models when editing shape and appearance of vehicles. Additionally, we showcase an application of dynamic Gaussian splatting for 3D reconstruction of driving scenes that allows to seamlessly remove, translate, or add new vehicles to the scene with a high fidelity. We demonstrate these capabilities on multi-camera driving videos. We rely on cross-scene and cross-dataset actor transfer to increase the diversity of the edits beyond the objects existing in the scene.
Expo Demonstration: Site Reliability Engineering Agent-101 for Incident Management Tue 10 Dec 03:00 p.m.
IT failures are increasingly costly, with even brief outages leading to millions in losses as more business moves online. Incident management has become more complex than ever due to a combination of technological advancements, infrastructure heterogeneity, and evolving business needs. Resolving IT incidents is similar if not more complex to software code bug fixing. It is a very tedious and expensive task. Several advancements have been made including IBM’s Intelligent Incident Remediation using LLMs and generative AI to streamline incident resolution by identifying probable causes and using AI-guided remediation steps. In this demo, we are describing how we are advancing the state of the art in incident remediation using agentic Gen AI approaches. We demonstrate SRE-Agent-101, a ReAct style LLM-based agent, along with a benchmark to standardize the effectiveness of analytical solutions for incident management. SRE-Agent-101 uses several custom built tools, namely anomaly detection, causal topology extraction, NL2Traces, NL2Metrics, NL2Logs, NL2TopologyTraversal, and NL2Kubectl. These tools take natural language as input to fetch target data gathered by the observability stack. Given the verbosity of such data, even powerful models can quickly exhaust their context length. We have implemented a methodology to dynamically discover the more specific context using domain knowledge. The target context is then analyzed by underlying LLM to infer the root cause entity, fault, perform actions and this process iteratively continues until the incident is resolved.
Expo Demonstration: Practices of Alibaba Cloud PAI (Platform for AI): An AI Native Large Models and AIGC Engineering Platform Tue 10 Dec 03:00 p.m.
Alibaba Cloud PAI(Platform for AI)is an AI Native large models and AIGC engineering platform, providing functionalities such as data set management, computing power management, model toolchains, model development, model training, model inference, and AI asset management. It includes 100+ best practices for large models, providing users with high-performance and highly stable large model engineering capabilities.
Expo Demonstration: MemoMagnet: A fridge-magnet memo agent based on LLM Tue 10 Dec 03:00 p.m.
We plan to demo, MemoMagnet, a fridge-magnet-like memo device based on large language model (LLM). MemoMagnet could chat with users by voice, and fulfill user tasks like memorizing items in refrigerator, and recommending cookbook based on items in refrigerator. The LLM is deployed on the device (without calling cloud) as an agent to communicate with users and fulfill tasks above.
Expo Demonstration: Trace: An LLM-powered End-to-end Optimization Framework of AI Workflow Tue 10 Dec 03:00 p.m.
In this tutorial, we introduce Trace, a groundbreaking AutoDiff-like framework designed to train AI workflows end-to-end with rich feedback. Trace leverages numerical rewards, losses, natural language text, compiler errors, and more to achieve autonomous interactive optimization.
Key Takeaways:
Understanding the concept and vision behind Trace.
Exploring how Trace generalizes back-propagation for AI workflows.
Learning about how to use Trace to train Python workflows.
Discovering practical applications of autonomous interactive optimization (such as training LLM multi-agent systems, learning robot control policies, and autonomous prompt optimization).
Expo Talk Panel: Sequence Modeling in Financial Services Tue 10 Dec 04:00 p.m.
In this presentation, we explore the challenges and opportunities of applying sequential modeling to tabular data in financial services. We discuss the unique characteristics of this type of data, including its heterogeneous schemas, mixed data types, and temporal nature. We then review current research in the field, highlighting the limitations of existing approaches. We propose a novel data centric approach to transforming tabular data into tokens that enables simple, interpretable techniques for data mining and leverages the success of transformers in large language models to build generalizable pre-trained models for tabular data. We demonstrate promising results on open-source datasets and conclude by discussing future research directions, including new encoding methods for numerical features, neural point process approaches, and more tokenization methods. Our work aims to contribute to the development of more effective and generalizable models for understanding behavior in financial services and other industries.
Expo Talk Panel: Improving Foundation Models Using Human Data Tue 10 Dec 04:00 p.m.
Foundation models including LLMs and multi-modal models released by OpenAI (GPT), Anthropic (Claude), Google (Gemini), Meta (Llama), and others have shown impressive capabilities across a range of tasks. Some key drivers of this performance — such as investments in GPUs/compute, model size, and pre-training data — are relatively well understood.
This talk will focus on a less understood, yet extremely powerful lever that creates significant differentiation and competitive advantage among state-of-the-art models: the use of expert human data for Evaluations (“Evals”), Supervised Fine Tuning (“SFT”), Reinforcement Learning with Human Feedback (“RLHF”), and Direct Preference Optimization (“DPO”).
The talk will also outline some best practices for maximizing returns on financial investments in human data to achieve optimal model performance. This includes effective strategies for sourcing, vetting, hiring, and managing expert human data teams, as well as task design for Evals, SFT, RLHF, and DPO, along with processes and tooling to optimize team performance, data quality and throughput.
Expo Talk Panel: Colossal-AI: Breakthroughs in Efficient AI Training and Implementation Tue 10 Dec 04:00 p.m.
The Colossal-AI system is designed for fast training and inference of AI models on diverse hardware. We aim to minimize the gap between the fast-growing model sizes and limited hardware capacity. For efficient memory management, we support heterogeneous training that facilitates CPU and NVMe offloading. To save activation memory during training, we implement activation checkpointing strategies to recompute some inexpensive activations during the backward pass. With those, we can successfully pre-train a 3 billion parameter (3B) transformer-based model on 4 A100 40GB GPUs, and an 8B model on 4 A100 80GB GPUs, which are 5.9x and 10.3x model sizes that are otherwise supported by not using our strategies. For optimized performances on both speed and memory savings, we have N-dimensional parallelism together with ZeRO redundancy optimizer and mixed precision training. The N-dimensional parallelism includes tensor, pipeline, sequence, and data parallelism. Those parallelism strategies are carefully designed and able to be integrated to speed up model training, overcome the memory bottleneck, and increase model performance. When combined together, we are able to use longer sequences as inputs and we achieve up to 7.73 times faster for single server training and 1.42 times faster for single-GPU inference. For more recent large language models such as LLaMA, we can get a 38% speedup in training compared to other state-of-the-art deep learning systems. We are also outstanding with large-scale model inference using dynamic axial parallelism and other techniques. With Colossal-AI, you can predict the 3D structure from DNA sequences lengthening 2-3K with a higher inference time of up to 11.6x. More information about Colossal-AI is available at https://github.com/hpcaitech/ColossalAI.
Expo Talk Panel: Building industry-grade NLP applications for the financial domain Tue 10 Dec 04:00 p.m.
Bloomberg's AI Group has developed a range of natural language processing (NLP) applications that transform how our clients interact with financial data and news. From solutions such as IB (Instant Bloomberg) NLP, which enables the extraction of key information from trader dialogue, to News Summarization, which provides concise and accurate summaries of market-moving news, our NLP applications are providing business insights and enabling financial professionals to make more informed business and investment decisions.
However, building industry-grade NLP applications for the financial domain is a complex task. In this talk, we will highlight some of the challenging technical requirements we have encountered while developing these applications, including: efficiently deploying high-precision models while meeting our clients' stringent latency requirements; ensuring the factuality and accuracy of LLM outputs, something that is particularly important in the high-stakes financial domain; and evaluating and maintaining the accuracy of our NLP models over time, as market conditions and financial data evolve.
Expo Talk Panel: AgentInstruct: Agentic flows are effective synthetic-data generators Tue 10 Dec 04:00 p.m.
Synthetic data is becoming increasingly important for accelerating the development of language models, both large and small. Despite several successful use cases, researchers also raised concerns around model collapse and drawbacks of imitating other models. This discrepancy can be attributed to the fact that synthetic data varies in quality and diversity. Effective use of synthetic data usually requires significant human effort in curating the data. We focus on using synthetic data for post-training, specifically creating data by powerful models to teach a new skill or behavior to another model, we refer to this setting as Generative Teaching.
AgentInstruct is an agentic solution for synthetic-data generation. By leveraging an agentic framework, AgentInstruct can generate tailored datasets, comprising both prompts and responses, from raw data sources. The efficacy of this approach is exemplified by the substantial improvement observed by fine-tuning a base 7-billion-parameter model and using AgentInstruct to generate a 25-million-pair dataset. The fine-tuned model (which we refer to as Orca-3-Mistral) showcases a notable performance gain across multiple benchmarks. For example, it shows 40% improvement on AGIEval, 19% improvement on MMLU, 54% improvement on GSM8K, 38% improvement on BBH, 45% improvement on AlpacaEval, and a 31.34% reduction of inaccurate or unreliable results across multiple summarization benchmarks.
Expo Talk Panel: Granite Time Series Foundation Models Tue 10 Dec 04:00 p.m.
While Foundation Models (FMs) have revolutionized AI for language and vision, they often fall short when it comes to handling sensor and numerical time-series data, which are crucial in many industries. At IBM Research, our team is dedicated to advancing Time Series foundation models, making significant contributions with influential papers presented at top AI conferences with over 1700 citations and numerous open-source contributions establishing the Granite Time Series Family on Hugging Face. In 2024, we introduced Granite-TimeSeries-Tiny Time Mixer (TTM), the first lightweight foundation model tailored for time-series forecasting. With just 1M parameters, TTM redefines efficiency, speed, and accuracy in zero-shot and few-shot forecasting, outperforming existing SOTAs demanding hundreds of millions to billions of parameters. Since its launch, TTM has amassed over one million downloads in the HuggingFace platform, generating widespread excitement within the time-series community. It delivers up to 40% better performance in zero/few-shot forecasting, all while drastically reducing computational demands. TTM’s lightweight architecture enables it to run efficiently on CPU machines too, driving broader adoption in resource-constrained environments. In this session, we will explore our latest advancements in Granite Time Series Models and their applications in forecasting, imputation, anomaly detection, and many other downstream tasks across various industries.
Expo Talk Panel: Graph Reasoning in Large Language Models Tue 10 Dec 04:00 p.m.
Large language models (LLMs) have demonstrated impressive capabilities in text generation, but their ability to reason over complex data remains an area of ongoing research. In this talk, we present three distinct approaches to improve LLM reasoning over complex structures.
First, we leverage graph algorithms to analyze and understand the reasoning capabilities of transformer models. Our results establish a representational hierarchy, revealing the necessary Transformer capacity (number of layers, embedding dimension size) for solving different classes of reasoning tasks.
Next, we exploit the topology of temporal reasoning to generate novel synthetic problem instances. This allows for a more robust evaluation of LLM reasoning capabilities.
Finally, we introduce a method for improving in-context representations of structured data for pretrained LLMs, facilitating more effective reasoning over complex information.
Expo Talk Panel: Training the Unprecedented Decart Diffusion World Simulator at scale on the Crusoe Cluster Tue 10 Dec 04:00 p.m.
Diffusion-based video generation has attracted significant interest across both academia and industry as the next exciting step in demonstrating the capabilities of large-scale deep learning models. Recently, the goal of world simulator models that enable video generation in real time based on user input has begun to evolve as it may introduce a new paradigm in terms of human interaction with deep-learning models. In this talk, we tackle the training of the largest world simulator model on over millions of hours of data at scale across thousands of GPUs. The training of this large-scale diffusion model was possible due to two fundamental pillars developed by Decart and Crusoe which are crucial to its success and are highlighted in this talk. The first pillar involves the adaption and optimization of the model training infrastructure to enable the fast training of large-scale models. This is integrated with the Crusoe cluster to provide a high-throughput reliable training operation that is resilient to GPU failures. We also rely on the optimization of data pipelines that processed over millions of hours of video data at scale. The second pillar involves new model architectures we proposed that are at the forefront of diffusion-based video generation models and enable real-time conditioning and inferencing of models at scale. Together, these enable the training of massive world simulator models that are at the forefront of advancing the human-model interaction landscape.
Expo Talk Panel: Symbolic AI and Foundation Models Integration towards Reliable and Trustworthy Industry-grade AI Systems Tue 10 Dec 04:00 p.m.
The integration of symbolic AI and foundation models is a promising direction for developing reliable and trustworthy domain-specific industry-grade AI systems. Symbolic AI offers strong reasoning capabilities, interpretability, and the ability to incorporate domain knowledge, whilst foundation models provide powerful learning and generalisation abilities. However, effectively combining these two paradigms presents significant challenges. This panel will bring together research, AI systems, and industry strategy perspectives into discussion and debate about the current state of symbolic AI. Topics covered include foundation model integration, key technical and practical hurdles, and suggested paths forward for deploying such hybrid systems in real-world industrial applications. The panelists will share their perspectives on the complementary strengths of symbolic AI and foundation models, and how they can be seamlessly integrated to create AI solutions that are more robust, transparent, and aligned with human values. They will also explore the business implications of this technological convergence, including the impact on product development, deployment, and customer trust. Through an interactive discussion, the panel aims to provide the audience with a comprehensive understanding of the opportunities and challenges in symbolic AI and foundation models integration, as well as practical insights that can guide the development of the next generation of reliable and trustworthy industrial AI systems.
Opening Remarks Tue 10 Dec 05:00 p.m.
Invited Talk: Alison Gopnik
The Golem vs. Stone Soup: Understanding How Children Learn Can Help Us Understand And Improve AI
A common model of AI suggests that there is a single measure of intelligence, often called AGI, and that AI systems are agents who can possess more or less of this intelligence. Cognitive science, in contrast, suggests that there are multiple forms of intelligence and that these intelligences trade-off against each other and have a distinctive developmental profile. The adult ability to accomplish goals and maximize utilities is often seen as the quintessential form of intelligence. However, this ability to exploit is in tension with the ability to explore. Children are particularly adept at exploration, though at the cost of competent action and decision-making. Human intelligence also relies heavily on cultural transmission, passing on information from one generation to the next, and children are also particularly adept at such learning.Thinking about exploration and transmission can change our approach to AI systems. Large language models and similar systems are best understood as cultural technologies, like writing, pictures and print, that enable information transmission. In contrast, our empirical work suggests that RL systems employing an intrinsic objective of empowerment gain can help capture the exploration we see in children.
Bio :