WORKSHOPS

W1 Energy-Efficient Hardware Accelerators for Edge AI and Data-Intensive Applications

09:30 - 17:30

ROOM 0670

CHAIRS

Manuel Le Gallo (IBM, CH)

Takashi Maeda (Kioxia, JP)

Hongyang Jia (Tsinghua University, CN)

Antoine Frappé (University of Lille, FR)

Marisa Lopez-Vallejo (Polytechnic University of Madrid, ES)

Ruzica Jevtic (Polytechnic University of Madrid, ES)

ABSTRACT

The unique data access patterns and parallel computation demands of AI, along with the need for processing at the edge and scalable solutions in the cloud, drive the development of specialized hardware architectures. This workshop explores a range of solutions, including digital and analog in-memory computing, near-memory computing, and domain-optimized general-purpose units. Experts will discuss circuit-level and architectural innovations, design methodologies, and accuracy tradeoffs for standalone accelerators and near/in-sensor computing. Attendees will gain valuable insights into new architectures and design strategies that prioritize energy efficiency and address the evolving requirements of AI applications.

PROGRAM

Session 1: Analog In-Memory Computing Accelerators

9:30 - 10:00

Heterogeneous Analog In-Memory Computing Accelerators for AI

Irem Boybat (IBM Research Europe, CH)

AI workloads are inherently data-intensive and demand massive parallel computation, placing significant strain on conventional computing architectures. Traditional von Neumann systems struggle to meet these demands due to the increasing overhead of data movement between memory and processing units. Analog In-Memory Computing (AIMC) offers a promising alternative by performing computations directly within memory arrays, significantly improving energy efficiency. This workshop will explore heterogeneous AIMC architectures that combine different types of accelerator nodes to enable efficient deployment of deep learning inference at the edge. A specialized embedded neural processing unit will be presented, supporting diverse operation types and precision levels through this architectural mix. The design incorporates AIMC tiles based on Phase-Change Memory (PCM) to perform energy-efficient matrix-vector multiplications while providing high non-volatile on-chip weight capacity. The workshop will also examine how such heterogeneous architectures can be adapted to specific application domains, with focused case studies from physics and bioinformatics.

10:00 - 10:30

Area and Energy-Efficient Data Converters for Analog In-Memory Computing

Taekwang Jang (ETH Zurich, CH)

As the demand for energy-efficient and high-performance computing continues to grow, analog in-memory computing (AIMC) has emerged as a promising solution to overcome the limitations of digital computing. By performing computations directly within memory arrays, AIMC significantly reduces data movement and overall energy consumption. However, the area and power overhead of data converters accessing the memory crossbar array remains a critical concern, especially in large-scale deployments. This talk presents recent advancements in the design of area- and energy-efficient data converters for AIMC systems. Various ADC architectures, such as successive approximation, charge injection, and flash ADCs, along with related circuit techniques, will be discussed.

10:30 - 11:00

Accuracy Simulation of Analog In-Memory Computing Accelerators

Bipin Rajendran (King's College London, UK)

Analog In-Memory Computing (AIMC) has emerged as a promising solution to address the growing computational demands of deep neural networks (DNNs), offering significant reductions in latency and energy consumption. However, nanoscale memory devices used in AIMC hardware exhibit non-idealities such as programming noise, read noise, and conductance drift, which can degrade model accuracy if software-trained networks are directly deployed on hardware. In this talk, we will discuss strategies for hardware-aware training that help mitigate these effects, enabling AIMC accelerators to achieve accuracy levels comparable to software baselines. Additionally, we introduce the IBM Analog Hardware Acceleration Kit (AIHWKit), an open-source Python library that facilitates the simulation of DNN training and inference on AIMC hardware, providing a valuable tool for researchers to evaluate model accuracy, hardware non-idealities, and deployment strategies for AI applications.

11:00 - 11:30

Coffee break

Session 2: Near-Memory Computing for Data-Intensive Applications

11:30 - 12:00

A Tale of Two: Architecture and System Approaches for Exploring Compute Memories

Giovanni Ansaloni (EPFL, CH)

The demise of scaling laws and the unabated increase in AI workloads is fostering disruptive approaches for next-generation computing architectures. Among those, compute memories (CMs) are especially attractive: they have the potential to reduce every-more costly data movements, while leveraging the massive parallelism offered by the hierarchical organization of memory arrays.

Nonetheless, the development of CM solutions is hampered by the paucity of exploration frameworks. In this talk, I illustrate two approaches which aim to address this challenge, based on open hardware and system simulation frameworks, respectively. The talk details the architecture of two CMs (both resulting in >100X performance increase compared to traditional processor-centric execution) using each of the strategies, highlighting differences in capabilities, target scenarios and implementation philosophies.

12:00 - 12:30

Scalable Compute-in/near-Memory Systems with 2.5D/3D/3.5D Integration

Chixiao Chen (Fudan University, CN)

The rapid expansion of large AI models, such as ChatGPT, has placed significant chanllenges on traditional computing architectures. To overcome the memory wall memory-centric architectures such as Computing in/near Memory (CIM/PNM) have emerged, integrating computation and memory to reduce both latency and energy consumption. This talk explores the scalability of CIM/PNM systems through 2.5D/3D/3.5D heterogeneous integration enabled by advanced packaging techniques. In 2.5D integration, a layer-wise pipeline parallelism mapping is discussed, where inter-chiplet communication is minimized to improve efficiency. In 3D integration, the stacking interface promises enhanced bandwidth, reduced interconnect delays, and scalable performance for AI workloads. An active interposer-based 3D CIM system is developed to enable flexible 3D communication. The talk will also address the architectural and circuit-level challenges associated with designing active interposer-based systems. In 3.5D integration, an architecture-algorithm co-design approach is developed for decoder-only large language model (LLM) systems. The recent 3.5D PNM technology effectively addresses the extremely large memory access and tens of gigabytes of weight data required by these systems. These 2.5D/3D/3.5D approaches offer a viable path for sustaining advancements in scaling laws in the post-transistor-scaling era, with significant implications for AI infrastructure, edge computing, and high-performance systems.

12:30 - 13:00

Near/In-sensor Computation: from Processing to Generating

Yongpan Liu (Tsinghua University, CN)

The rapid development of emerging applications, such as embodied intelligence, has driven unprecedented demand for data sensing. Compared with traditional architectures that separate sensing and computing, near/in sensor computing, demonstrates significant advantages in reducing energy consumption, minimizing data transfer, and enhancing sensor performance. Recent advancements in near/in sensor computing reveal a transformative shift from data processing to data generation. Prior researches have covered topics from processing architectures for single-modal data (e.g., visual or text signals) and those leveraging multi-modal sensor fusion for complementary cross-modal perception. However, these efforts remain constrained by the physical limitations of sensors, such as limited sensing dimensions or information loss caused by object occlusion. The latest breakthroughs overcome these constraints by integrating sensing with AI-driven data generation (e.g., via NeRF or 3D Gaussian Splatting). We have been focused on designing energy-efficient architectures for data processing and generating. This presentation will introduce our designed chips for multimodal fusion and generative perception. We will also show our avalanche-photodiode (APD) based pixel-level in-sensor computing system. Finally, we will discuss the implications of near/in sensor computation for future intelligent systems.

13:00 - 14:00

Lunch

Session 3: Dedicated Accelerators for Edge AI

14:00 - 14:30

In-Memory Computing-Based Embedded Neural Processing Units for AI

Thomas Boesch (STMicroelectronics, CH)

Today’s SoC devices for consumer, industrial, and automotive applications often require integrated AI processing capabilities whereas Neural Processing Units (NPUs) based on classical digital architectures provide only limited compute density and energy efficiency of a few TOPs per Watt. With the advent of In-Memory Computing (IMC), those limitations can be lifted further and a boost in terms of energy and cost efficiency can be achieved. IMC functionality can be supported by on-chip SRAM tiles as well as Non-Volatile Memories (NVM) such as embedded Phase Change Memories (PCM) using digital or analog computing methods. While analog computing potentially provides better power efficiency it may generates implications for computation accuracy due to process, temperature, age, and voltage variations. Furthermore, techniques to improve storage density like multi-level memory cells can provide opportunities to further improve the key metrics of TOPs/Watt or TOPS/mm². In general, the characteristics of the different IMC computing technologies are of great importance for the selection of the best edge AI device architecture and this workshop session will provide a deep dive into multiple aspects of using IMC in the context of AI edge computing.

14:30 - 15:00

Leveraging Domain-Specialized RISC-V Multi-core Processors for Heterogeneous AI Acceleration at the Edge

Angelo Garofalo (University of Bologna, IT)

Deploying modern AI workloads – including Transformers – to resource-constrained edge devices remains a significant challenge. To meet performance requirements within tight energy budgets, edge systems must rely on fast, low-bitwidth arithmetic. This talk explores how specializing RISC-V processors with domain-specific ISA extensions provides a low-cost solution for executing AI workloads on tiny edge devices. It also analyzes how specialized RISC-V multi-core architectures can complement the computing capabilities of analog and digital AI accelerators in tightly-coupled, scaled-up heterogeneous fabrics. As accelerators boost performance on linear kernels, specialized programmable cores play a key role in mitigating performance bottlenecks that arise per Amdahl’s effect in non-linear or fast-evolving functions – tasks less suitable to custom hardware acceleration. The talk addressed these challenges and provides design guidelines for next-generation edge AI platforms that balance flexibility, energy efficiency, and scalability.

15:00 - 15:30

A NeuroAI framework for Edge Intelligence

Charlotte Frenkel (TU Delft, NL)

Can key organizing principles of the brain be identified and exploited toward breakthroughs for intelligent devices at the edge? While the brain is still the gold standard in aspects ranging from adaptability to energy and data efficiency, exploiting brain insight is missing a clear framework. Should we start from the brain's computational primitives and figure out how to apply them to real-world problems (bottom-up approach), or should we build on working AI solutions and fine-tune them to increase their biological plausibility (top-down approach)? In this talk, we will (i) adopt a NeuroAI framework revealing how to reconcile both approaches, and (ii) show concrete outcomes, from event-based edge vision systems that can react within microseconds to tiny wearables that can autonomously learn how to classify gestures and keywords in real time within microwatts.

15:30 - 16:00

Coffee break

Session 4: Energy-Efficient Digital Accelerators

16:00 - 16:30

Efficient Design Flows for Decoupled Dataflow Accelerators

Marian Verhelst (KU Leuven, BE)

This talk presents efficient design flows for decoupled dataflow accelerators, enabling orthogonal optimization of data flow, scheduling, and memory layout in edge AI processors. By decoupling these components, each can be independently optimized to meet the demands of AI workloads.

We introduce automated generation of Multiply-Accumulate (MAC) arrays for high-performance computation and streamers for efficient data fetching from memory, reducing latency and maximizing bandwidth. Additionally, we explore automated data layout transformations that optimize memory organization for varying workloads. These hardware components are integrated with a customized compiler, enabling seamless translation of AI models into optimized hardware configurations. Our approach aims to achieve scalable, energy-efficient accelerators for edge AI applications.

16:30 - 17:00

Enhancing the Efficiency of Heterogeneous Edge‐AI Systems: from HW/SW Accelerator Co‐Design to Workload Off‐Loading

Marina Zapater (HEIG‐VD University, CH)

Heterogeneous systems are a key enabler for moving Artificial Intelligence (AI) from the datacenter to the edge. Novel accelerators, such as in-cache and in-memory accelerators, and standard interconnects (such as PCIe, CXL and UCIe) are key for enhancing efficiency for both edge (energy-optimized) and HPC (performance-optimized) systems. A seamless path from software (SW) to hardware (HW) needs to be enabled to assess the overall SW-HW stack in terms of performance, power and thermal constraints. Architectural exploration using full-system simulations, such as gem5, turns out to be a key enabler to understand the power/performance trade-offs of Edge-AI systems and how to better exploit its capabilities. At the same time, the most resource-hungry workloads (such as LLMs) cannot be solely executed at the edge, and therefore required to be off-loaded either to edge servers or to data centers. In this talk I will be describe the challenges encountered to simulate full-stack state-of-the-art heterogeneous with novel accelerators and standard interconnects. I will highlight how a full-stack simulation framework enables to provide design guidelines and assess efficiency at all levels, enabling to exploit the benefits of hardware-software co-design, and describe machine-learning based techniques enabling workload off-loading from the edge to the cloud.

17:00 - 17:30

Q&A

BIOSKETCHES

Giovanni Ansaloni

Giovanni Ansaloni is a researcher and lecturer at the Embedded Systems Laboratory of EPFL (Lausanne, Switzerland). He previously worked as a Post-Doc at the University of Lugano (USI, Switzerland) between 2015 and 2020, and at EPFL between 2011 and 2015. He received a MS degree in electronic engineering from University of Ferrara (Italy) in 2003, an executive master in embedded systems design from the ALaRI institute (Switzerland) in 2005 and a PhD degree from USI in 2011.

His research focuses on domain-specific and ultra-low-power architectures and algorithms for edge computing systems, including hardware and software optimization techniques. On these topics, Dr. Ansaloni is the co-author of more than 80 papers in peer-reviewed conferences and journals.

Thomas Boesch

Thomas Boesch graduated in electrical engineering from the ETH in Zurich, Switzerland, in 2000 and received his PhD degree in microelectronics from the ETH in 2004 for his work on data-flow oriented accelerator architectures for multimedia applications. 2004 he joined STMicroelectronics doing research on VLIW architectures with a special focus on embedded multithreaded and multicore microprocessor systems. Now he works as a technical director and regional fellow at STMicroelectronics Geneva and leads a team of hardware design engineers and architects to design accelerator architectures for embedded AI applications. He coauthored various scientific publications on computing architectures and contributed to more than 40 patent filings in the field of data processing architectures and artificial intelligence applications.

Irem Boybat

Irem Boybat is a Staff Research Scientist at IBM Research Europe, Zurich, Switzerland. She received her Ph.D. degree in Electrical Engineering from Ecole Polytechnique Federale de Lausanne (EPFL), Switzerland, in 2020. Previously, she had obtained an M.Sc. degree in Electrical Engineering from EPFL, Switzerland, in 2015, and a B.Sc. degree in Electronics Engineering from Sabanci University, Turkey, in 2013. Her research is primarily centered around analog in-memory computing for accelerating deep neural networks using non-volatile memory devices. She has co-authored over 50 scientific papers in journals and conferences, received four best conference presentation/paper/poster awards and holds 8 granted patents. She was a co-recipient of the 2018 IBM Pat Goldberg Memorial Best Paper Award and 2020 EPFL PhD Thesis Distinction in Electrical Engineering.

Chixiao Chen

Chixiao Chen received the B.S. and Ph.D. degrees in microelectronics from Fudan University, Shanghai, China, in 2010 and 2015, respectively. In 2015, he worked at Calterah Inc. as an Analog/Mixed Signal Circuit Design Engineer. From 2016 to 2018, he was a Post-Doctoral Research Associate with the University of Washington, Seattle. Since 2019, he joined Fudan University as an Assistant Professor. Currently, he is an associate professor with the Frontier Institute of Chips & Systems, and the director of chiplet innovation center of the State Key Laboratory of Integrated Chips, Fudan University. He has served as a TPC member of A-SSCC since 2022 and outstanding reviewer of IEEE SSCL society. He was the co-awardee of ASSCC 2024 Distinguished Design Award. His research interests include mixed-signal integrated circuit design, intelligent computing circuits & systems, and 2.5D/3D/3.5D integration technology.

Charlotte Frenkel

Charlotte Frenkel is an Assistant Professor at Delft University of Technology, The Netherlands, since July 2022, and a Visiting Faculty Researcher at Google since October 2024. She received her Ph.D. from Université catholique de Louvain in 2020 and was a post-doctoral researcher at the Institute of Neuroinformatics, UZH and ETH Zurich, Switzerland. Her research aims at bridging bio-inspired and engineering-driven design approaches toward neuro-inspired AI systems, with a focus on digital neuromorphic processor design, embedded machine learning, and on-device learning. Dr. Frenkel serves or has served in the technical program committee of various conferences such as DATE, ESSERC, and NICE, co-leads the NeuroBench initiative for benchmarks in neuromorphic computing, and is an associate editor for the IEEE Transactions on Biomedical Circuits and Systems.

Angelo Garofalo

Angelo Garofalo received his B.Sc. and M.Sc. degrees in Electronic Engineering from the University of Bologna, Italy, in 2016 and 2018, respectively, and a Ph.D. degree from the same institution in 2022. He is currently an Assistant Professor at the University of Bologna and a Postdoctoral Researcher at ETH Zurich, Switzerland. His research focuses on heterogeneous computing architectures for edge AI, energy-efficient multi-core processors, and reliable, time-predictable mixed criticality Systems-on-Chip. He has co-authored more than 40 peer-reviewed conference and journal papers. He is the recipient of the Best Paper Award at IEEE ISVLSI 2023 and the Outstanding Forum Speaker Award at IEEE ISSCC 2023.

Taekwang Jang

Taekwang Jang received his B.S. and M.S. in electrical engineering from KAIST, Korea, in 2006 and 2008, respectively. From 2008 to 2013, he worked at Samsung Electronics Company Ltd. In 2017, he received his Ph.D. from the University of Michigan and worked as a post-doctoral research fellow at the same institution. Currently, he is an associate professor at ETH Zürich and is leading the Energy-Efficient Circuits and Intelligent Systems group. He focuses on circuits and systems for highly energy-constrained applications such as wireless sensors and biomedical interfaces. He holds 15 patents and has (co)authored more than 80 peer-reviewed conferences and journal articles. He is the recipient of the 2024 IEEE Solid-State Circuits Society New Frontier Award, the IEEE ISSCC 2021 and 2022 Jan Van Vessem Award for Outstanding European Paper, the IEEE ISSCC 2022 Outstanding Forum Speaker Award, and the 2009 IEEE CAS Guillemin-Cauer Best Paper Award. Since 2022, he has been a TPC member of the IEEE International Solid-State Circuits Conference, and IEEE Asian Solid-State Circuits Conference. Since 2023, he has been serving as an Associate Editor for the Journal of Solid-State Circuits and was appointed as a Distinguished Lecturer for the Solid-State Circuits Society in 2024.

Yongpan Liu

Yongpan Liu received the B.S., M.S., and Ph.D. degrees from Tsinghua University, Beijing, China, in 1999, 2002, and 2007, respectively. He is currently a Full Professor with the Department of Electronic Engineering, Tsinghua University, China. Prof. Liu served as a technical program committee member for ISSCC, A-SSCC and CICC. He has received Under-40 Young Innovators Award DAC 2017, Best Paper/Poster Award from ASPDAC 2021 and 2017, Micro Top Pick 2016, HPCA 2015, and Design Contest Awards of ISLPED in 2012, 2013 and 2019. He served as General Secretary for ASPDAC 2021 and Technical Program Chair for ICAC2023. He was an Associate Editor of the IEEE Transactions on Computer-Aided Design of Integrated Circuits, the IEEE Transactions on Circuits and Systems II, and the IET Cyber-Physical Systems. He also served as A-SSCC2020/AICAS2022 tutorial speaker and IEEE CASS Distinguished Lecturer 2021.

Bipin Rajendran

Bipin Rajendran is currently Professor of intelligent computing systems and an EPSRC Fellow at King’s College London (KCL). He received the B.Tech. degree from IIT Kharagpur in 2000 and the M.S. and Ph.D. degrees in Electrical Engineering from Stanford University in 2003 and 2006, respectively. From 2006 to 2012, he was a Master Inventor and a Research Staff Member with the IBM Thomas J. Watson Research Center, New York, and has since held faculty positions in India and USA. At King’s, he co-directs the Centre for Intelligent Information Processing Systems at the Department of Engineering and the King’s Institute for Human and Synthetic Minds. He has co-authored more than 100 peer-reviewed articles, one monograph, an edited book, and holds 59 U.S. patents. His research interests include brain-inspired computing, neuromorphic systems, and hardware accelerators.

Marian Verhelst

Marian Verhelst is a professor at the MICAS lab of KU Leuven and a research director at imec. Her research focuses on embedded machine learning, hardware accelerators, and low-power edge processing. She received a PhD from KU Leuven in 2008, and worked as a research scientist at Intel Labs from 2008 till 2010. Marian is the ML track chair of ESSERC, the program vice-chair of ISSCC, and a scientific advisor to multiple startups. She is a science communication enthusiast as an IEEE SSCS Distinguished Lecturer, as a regular member of the Nerdland science podcast (in Dutch), and as the founding mother of KU Leuven’s InnovationLab high school program. Marian received the laureate prize of the Royal Academy of Belgium in 2016, the 2021 Intel Outstanding Researcher Award, and the André Mischke YAE Prize for Science and Policy in 2021.

Marina Zapater

Marina Zapater is Associate Professor in the REDS Institute at the School of Engineering and Management of Vaud (HEIG-VD) of the University of Applied Sciences Western Switzerland (HES-SO) since 2020. She is also a guest professor at the Swiss Federal Institute of Technology Lausanne (EPFL), Switzerland, for the co-supervision of PhD thesis. She was post-doctoral Research Associate in the Embedded System Laboratory (ESL) at EPFL, from 2016 to 2020. She received her Ph.D. degree in Electronic Engineering from Universidad Politécnica de Madrid, Spain, in 2015. Her research interests include thermal, power and performance design and optimization of complex heterogeneous architectures, from embedded Edge-AI systems to High-Performance Computing processors, servers and data centers. In these fields, she has co-authored more than 75 papers in top-notch conferences and journals and acted as PI and co-PI in projects with both academia and industry.

BACK