This session will provide an overview of the seminar and touch on HLS' growth in the markets that will be covered today.
Speaker: Stuart Clubb
The focus of this seminar is to have real-world customers present their successes using Catapult High-Level Synthesis (HLS) in markets such as Automotive, 5G/Communications, Video/Imaging, AI/ML, and MEMs Sensors. The companies who will be presenting are:
This session will provide an overview of the seminar and touch on HLS' growth in the markets that will be covered today.
Speaker: Stuart Clubb
HLS enables designers to rapidly go from a high-level description in C++/SystemC to optimized RTL. This introduction will show the basics around how High-Level Synthesis and Catapult HLS can be used to synthesize to optimal RTL for a production design flow.
Speaker: Michael Fingeroff
In this presentation it will be shown why we decided to adopt our methodology towards High-Level Synthesis with Catapult. Our chosen design and verification flow is outlined together with power estimation and optimization steps that are used. Using our real life design some examples will be shown and our experiences shared. The future next steps will conclude the presentation.
Speaker: NXP - Reinhold Schmidt
Video sharing (e.g., YouTube, Vimeo, Facebook, TikTok) accounts for the majority of internet traffic, and video processing is also foundational to several other key workloads (video conferencing, virtual/augmented reality, cloud gaming, video in Internet-of-Things devices, etc.). The importance of these workloads motivates larger video processing infrastructures and - with the slowing of Moore’s law - specialized hardware accelerators to deliver more computing at higher efficiencies. This session describes the design and deployment, at scale, of a new accelerator targeted at warehouse-scale video transcoding. We present our approach to using High-Level Synthesis to design our hardware for deeper architecture evaluation and verification, including a new accelerator building block - the video coding unit (VCU) - and discuss key design trade-offs for balanced systems at data center scale and co-designing accelerators with large-scale distributed software systems. We evaluate these accelerators “in the wild” serving live data center jobs, demonstrating 20-33x improved efficiency over our prior well-tuned non-accelerated baseline. Our design also enables effective adaptation to changing bottlenecks and improved failure management, and new workload capabilities not otherwise possible with prior systems.
Speaker: Google - Aki Kuusela
MatchLib is an open-source library written in SystemC and C++, originally created by NVIDIA Research, that enables much faster design and verification of SOCs using HLS. One of the primary objectives of MatchLib is easier performance accurate modeling of SOCs which enables designers to find system-level performance bottlenecks far sooner in their design cycle. This session will introduce MatchLib, and show how it enables designers to identify and resolve issues such as bus and memory contention, arbitration strategies, and optimal interconnect structure at a much higher level of abstraction than RTL.
Speaker: Michael Fingeroff
Interconnect design is a critical part of many highly complex SoCs, yet HLS has not historically been used for chip-level interconnect. One major limiter is that interconnect architecture and physical floorplan are tightly coupled and can be difficult to estimate early in the design process.
We demonstrate IPA (Interconnect Prototyping Assistant) to help address this gap. IPA is an open-source framework (available at https://github.com/NVlabs/IPA) for interconnect prototyping and implementation in HLS-based SoC flows, written in SystemC and Python. IPA is used during early architectural prototyping by abstracting specifics of interconnect implementation. IPA then generates interconnect models, including interfaces, for cycle-accurate SystemC simulations. If the design requires long wires between communication units, IPA automatically inserts retiming stages to meet clock frequency targets. IPA’s SystemC code is fully HLS-compatible for RTL creation, and thus can be used within a full-chip HLS flow for pushbutton interconnect generation once a design point is selected.
Speaker: NVIDIA Research - Nathaniel Pinckney
day 2 intro
Speaker: Stuart Clubb
Switching the whole IP design to HLS methodology brought the design team a lot of benefits on coding effort and simulation runtime saving. However, it also posed some challenges, like how to handle such a big design efficiently and achieve the same design quality as handwritten RTL. This presentation will cover how we met these challenges and magnified the benefits of HLS to our designs.
Speaker: NVIDIA - Hai Lin
STMicroelectronics has been using Catapult High-Level Synthesis (HLS) in last years 10 years. Firstly has been used the huge digital on top ASIC implementation with the aim of designing complex arithmetical structures in a fast but efficient way. In this presentation work we’ll show that Catapult delivers great benefits also on AMS applications with a growing digital content. STMicroelectronics is leader in adding intelligence to its sensors and actuators products, but now the challenge is moving to analog native product, where adding elaboration into analog paves the way to new application, and allows system optimization from computation and power perspective. This presentation will show two examples of blocks where using Catapult can be a booster for AMS products: will focus on a contactless thermometer formula, an ASK general purpose demodulator. The use of HLS has enabled last minute functional changes without impacting timeline. The digital block can be integrated on silicon easily, quickly and effectively.
Speaker: STMicroelectronics - Sandro Dalle Feste
In order to understand which Catapult HLS flow is better suited for our needs at JPL, we implemented Harris Corner Detector image processing core in both untimed C++ and timed SystemC. Untimed C++ was easy to get started with. After modifying the algorithmic model to synthesize on Catapult HLS, we plugged the design back into algorithmic regression and verified that we did not introduce bugs. However, with this flow we did encounter unexpected behaviors in synthesized RTL. C++ design masked these problems due to its untimed nature. Desiring higher level of control over hardware synthesized, we implemented the same design in SystemC. SystemC synthesized the design to our expectations but it required more effort to design and verify. In this presentation I will present what were the lessons learned from each flow.
Speaker: NASA-JPL - Ashot Hambardzumyan
The 3GPP Radio Layer 1 Protocol Stack, also known as RAN1, encodes and modulates signals, performs MIMO and beamforming, along with other compute-intensive functions such as error correction, rate matching, mapping, and RF processing. L1 functions can be hosted on general purpose compute, FPGA, or ASIC with respective trade-offs in power, performance, density and costs. Implementing L1 functions in common high level language lets the designer explore trade-offs that can result in the most optimal deployment based on their network capacity, technical, and market constraints.
Speaker: Viosoft - Hieu Tran
Summary, what we learned, where to get started, and prize giveaway!
Speaker: Stuart Clubb