Interfacing DDR with Programmable Logic on the AMD Versal NoC

Efficiently moving large amounts of data between the processing system, AI Engines, and programmable logic is a core capability of modern FPGA designs and a challenge for today’s engineers designing with them. The AMD Versal architecture accomplishes this with a dedicated Network on Chip (NoC), enabling high-speed, deterministic communication, including direct access to DDR memory controllers. This architecture improves performance while  reducing resource overhead, allowing engineers to focus on application logic.

Understanding the Versal NoC and DDR Integration

The NoC is a hard silicon fabric that interconnects the scalar engines, adaptable engines, and intelligent engines of the Versal architecture. By design, it allows data to move rapidly across the chip without consuming programmable fabric resources, making DDR memory accessible to any engine or peripheral.

Key features include:

  • High-speed packetized data transfer: Transactions are broken into “flits” (flow control units) and routed through horizontal (H-NoC) and vertical (V-NoC) spines, supporting AXI memory-mapped and AXI stream protocols.
  • Built-in DDR Memory Controllers (DDRMCs): Devices provide one to four integrated DDRMCs, enabling multiple engines or peripherals to share DDR memory efficiently.
  • Hardened timing support: The dedicated silicon fabric is designed to simplify timing closure. AMD documentation shows NoC clocks around 1 GHz in performance examples, but achievable frequency depends on device and configuration.
  • Master and slave units: NoC master units initiate transactions, while slave units depacketize incoming data to AXI interfaces.

Tip: The NoC handles all clock domain crossings internally. Designers only need to specify the clock domains for each master and slave, simplifying multi-clock designs.

Resource: https://docs.amd.com/r/en-US/pg313-network-on-chip/Versal-Programmable-NoC-Overview


What Is DDR in FPGAs and SoCs?

Before we get too far, let’s define DDR as well. In FPGAs, DDR refers to Double Data Rate memory, most commonly DDR SDRAM (Synchronous Dynamic Random-Access Memory).

Key points:

  • Double Data Rate: Data is transferred on both the rising and falling edges of the clock signal, effectively doubling the throughput without increasing clock frequency.

  • Types: DDR (older), DDR2, DDR3, DDR4, and DDR5, as well as LPDDR variants for low-power applications.

  • Use in FPGAs:

    • Provides large, external memory storage for buffering, data logging, high-speed I/O, video/image processing, and AI/ML workloads.

    • Accessed through FPGA memory interfaces (often hardened controllers or soft IP).

    • In devices like AMD UltraScale, UltraScale+, Versal, and Zynq families, hardened DDR memory controllers and PHYs are built in to simplify design.

    • Can be connected via NoC (Network on Chip) or AXI interfaces for efficient access between processing system (PS) and programmable logic (PL).

Struggling with Versal design challenges?

Don’t guess. Ask the BLT Experts.

What Are the Performance and Resource Benefits of DDR / NoC Integration?

Integrating DDR through the NoC offers substantial advantages compared to traditional AXI interconnect approaches. By moving interconnect logic into dedicated hardware, engineers can save resources while maintaining high data throughput.

Highlights include:

  • Traditional AXI interconnect designs can consume up to 50% of LUTs and flip-flops just for routing between peripherals.
  • With the NoC, these interconnect resources are handled by dedicated hardware, freeing resources for application design.
  • Typical throughput over the NoC achieves up to 1 GHz, providing high-speed access to DDR without timing violations.

For Managers: Reduced interconnect overhead translates to faster time to market, fewer design iterations, and more capacity for critical application features.

Resource: https://docs.amd.com/r/en-US/pg313-network-on-chip/NoC-Functions

Learn more about AXI

Interfacing DDR through Vivado IP Integrator

Designers can quickly connect peripherals to DDR using the Vivado IP Integrator. The NoC abstracts much of the complexity, allowing engineers to focus on application-specific logic.

Steps for integration include:

  1. Instantiate the NoC IP: Use IP search to select AXI NoC and configure for memory-mapped or streaming traffic.
  2. Enable DDR interfaces: Connect DDRMCs to the NoC and assign relevant clock domains.
  3. Configure peripherals: Connect AXI traffic generators, AI Engines, or Block RAM controllers.
  4. Validate the design: Vivado ensures correct address mapping and clock domain connections.

Tip: Using block automation, you can connect DDR and PL peripherals in minutes instead of manually wiring multiple AXI interfaces.

Creating NoC Designs from Scratch

For the most demanding applications, designers may need to configure the NoC manually. This allows precise control over clock domains, master/slave assignments, and memory access, ensuring optimal performance for complex systems.

Steps include:

  • Add AXI Block RAM controllers connected to embedded memory generators.
  • Instantiate multiple NoC IPs; the compiler consolidates them into the device’s physical NoC..
  • Assign clock domains for each master and slave interface (PS, PL, AI Engines).
  • Configure cache-coherent ports: Use the CCI—A72 cores connect via ACE ports, while other masters including NoC ingress use ACE-Lite for I/O coherency.
  • Use connection automation to map clocks, DDR reference signals, and traffic generators.

Resource: https://docs.amd.com/r/en-US/am011-versal-acap-trm/NoC-Interconnect

Check Out Related Content

AMD TRAINING:

Embedded Heterogeneous Design (for Getting Started with Versal Devices)

AI Engines and DDR Access

AI Engines (AIEs) in the Versal architecture can leverage DDR memory via the NoC for high-speed data operations. Integrating AI Engines with DDR ensures low-latency memory access for AI and DSP workloads.

Key points:

  • Connect AIE GMIO ports to the NoC to reach DDR memory via DDRMCs.
  • Supports high-performance AI workloads with low latency memory access.
  • Works alongside PL and PS peripherals sharing the same DDR controllers.

Tip: Ensure correct cache-coherent connections when software running on the PS accesses shared DDR.

Resource: https://docs.amd.com/r/en-US/am011-versal-acap-trm/NoC-DDRMC-PL-Subsystems

What Are System-Level Considerations for DDR / NoC Integrations?

Designing a system with DDR and the NoC requires planning for clock domains, cache coherence, memory addressing, and traffic priorities. These considerations help prevent data bottlenecks and ensure predictable performance.

Considerations include:

  1. Clock Domains: Assign the correct clock to each master and slave. The NoC handles crossings internally.
  2. Cache Coherence: Use the CCI for coherency. A72 cores use ACE; NoC ingress and other masters use ACE-Lite.
  3. Quality of Service (QoS): Configure NoC QoS plus routing groups/VCs to prioritize traffic and avoid deadlocks.
  4. Memory Addressing: Use Vivado Address Editor and SMMU to manage DDR access; avoid conflicts by tool-based mapping.
  5. Bandwidth Management: Use BRAM or UltraRAM for on-chip PL-only storage, or dedicate a DDRMC to PL if external memory is required.
  6. Optional DDR for PL Only: UltraRAM or separate DDR for PL can be used for high-bandwidth applications.

Key Takeaway: Proper planning ensures DDR access is optimized for performance, reliability, and scalability.

Resource: https://docs.amd.com/r/en-US/am011-versal-acap-trm/NoC-DDRMC-PL-Subsystems

 

Multi-DDR and Scalability with Versal Devices

The Versal architecture supports multiple DDR controllers, enabling scalable high-performance systems. Careful planning ensures all masters get sufficient bandwidth without overloading memory resources.

Key points:

  • Many Versal devices provide one to four DDR memory controllers. Check the device datasheet.
  • Multiple NoC masters can access the same DDR, but bandwidth limitations may require arbitration to prevent peripheral starvation.
  • Designers can separate DDR usage between PS and PL for high-bandwidth applications, such as AI processing or high-speed data acquisition.

Resource: https://docs.amd.com/r/en-US/ug1273-versal-acap-design/NoC

Summary

The Versal NoC simplifies DDR integration, reduces interconnect resource overhead, and accelerates design cycles. Engineers gain a high-performance, deterministic fabric, while managers can see clear value in faster deployment, scalable architecture, and reliable memory access across PL, PS, and AI Engines.

Whether using automated block designs or manual instantiation, the NoC enables high-speed DDR communication without compromising system complexity, making it a cornerstone of modern Versal-based designs.

DDR and AMD Versal NoC Integration