On-chip BUS [Part 1] | TickLab

When designing a System-on-Chip (SoC), connecting components such as the processor, memory, and peripherals (e.g., UART, I2C, SPI) is a critical challenge. In this article, I will introduce how these components communicate with each other through an On-chip BUS, as well as explain why the bus architecture is significant and designed the way it is. We’ll start with basic concepts, move on to memory organization, and explore how the processor interacts with multiple peripherals using the Memory-Mapped I/O mechanism.

Overview

1. General Structure of an SoC

A System-on-Chip typically consists of three main components:

Processor: The central element of the system, responsible for controlling and processing data. The processor can communicate with both input components and output components.
Memory: Used to store programs and temporary data. This can include RAM, ROM, or cache memory integrated within the processor.
I/O Components (Peripherals): These are components that interface with the external world, such as UART, I2C, SPI, GPIO, etc., enabling the SoC to exchange data with external devices.

The figure below illustrates the Von Neumann architecture, where these three components are interconnected via a communication BUS.

Von Neumann Architecture

2. Role of the Bus in the System

The BUS serves as the component that connects all elements to the processor and facilitates the transfer of data packets between them. Modern buses often adhere to standard protocols such as AXI, AHB, APB (part of the AMBA standard), Whisbone, Avalon, and others.

In this discussion of On Chip Bus, I’ll cover the core ideas behind most standard on-chip bus designs today and explain why they are structured as they are.

Foundation

1. Interface

If you’ve studied computer architecture or examined the design of a general-purpose processor, you’ll notice that its interface isn’t directly connected to input or output components (as shown in the figure below).

Simplified Illustration of a MIPS Processor Architecture

So, in your opinion, when an SoC integrates input and output components, does the processor need to incorporate multiple different interfaces to connect to these components?

(Take a moment to consider the best answer before reading on.)

a. Approach 1:

If your answer is “Yes,” you’ll encounter two key challenges:

Problem 1: Each input or output component has unique characteristics, meaning that when designing peripherals for integration into the system, the interfaces connecting to the processor may differ.
For example, let’s compare two peripheral protocols: a UART (TX) peripheral and an I2C (Master) peripheral.
- A UART frame requires only one data field:
  
  Thus, when designing a UART (TX) peripheral, its interface only needs one field for the processor to send TX data to the peripheral:
- In contrast, an I2C frame requires 2-3 data fields: a Slave Address frame and Data Frame 1/2 (as shown below):
  
  Therefore, the interface for an I2C peripheral needs three fields for the processor to send the Address frame and Data frames to the peripheral:
  
  → This increases the complexity of the processor’s interface because each I/O component has distinct characteristics. And that’s before considering the second problem below.
Problem 2: If an SoC includes 10-15 input and output components, would we need to add 10-15 corresponding interfaces to the processor?
⇒ If the answer is “Yes,” then integrating a new peripheral would require redesigning the entire processor, significantly increasing complexity.

Therefore, this approach is not viable for SoCs with many peripherals. As a result, we will now consider the second approach described below.

b. Approach 2:

If your answer is “No,” then where does the processor’s interface to the input and output components reside?

Here’s the answer:

The processor treats input and output components as memory regions, where it can exchange data through read and write operations. Naturally, each memory region has a specific address, meaning each input and output component is assigned its own distinct address range.

This mechanism is known as Memory-Mapped I/O.

As a result, the interface between the processor and the system’s components uses the same type as the interface with data memory, consisting of the following signals:

Address
WriteData
ReadData
MemRead enable
MemWrite enable

These five signals can be grouped into three main categories:

Address Lines - Includes the Address bus → Used to map to the desired address region for execution.
Data Lines - Includes the WriteData bus and ReadData bus → Used for data transfer.
Control Lines - Includes MemRead and MemWrite → Used to specify operations.

→ Thus, as long as the address is mapped to a specific component, that component is responsible for:

Receiving data from WriteData when a MemWrite signal is present.
Sending ReadData back to the processor when a MemRead signal is present.

As you can see, with just these three fields, the processor can manage numerous components, as illustrated below:

Each component has a separate address region, and the starting address of a region is called the base address of that component.

Here are some examples of memory mapping used to manage peripherals (I/O components) in microcontroller families:
1. Address mapping table for peripherals in an STM32 family:
  Memory Map table of an ARM STM32’s family
2. Address mapping table for peripherals in an ESP32 family:
  Memory Map table of a ESP32’s family

So far, we’ve addressed one challenge: “How can the processor manage 10-15 input and output components with a single interface?” (Problem 2 above). But what about the first problem: “Each component has different characteristics, resulting in varying numbers of data fields to exchange (see Problem 1). How does the processor handle this while maintaining a consistent interface?”

The answer: Further subdivide the address space into smaller sub-regions, each representing specific functions of a component.

For example, let’s say the processor manages two peripherals: UART and I2C. (In this example, I’ll temporarily fix the configuration parameters for UART and I2C and focus only on transmitted and received data.)

For UART, there are two main data fields: TX Data and RX Data.
For I2C, there are three main data fields: Slave Address, Write Data, and Read Data.

For each type of data field of a component, I’ll divide it into sub-address regions, which must still reside within the base address region of the corresponding component.

Here’s an illustration of subdividing into smaller sub-address regions:

Based on the subdivided address regions in the illustration, if you want the UART to send data via TX Data, you simply place the data into the memory region at address 0x2000_0000 (TX Data). Similarly, if you want the I2C to write data to an external slave via the I2C protocol, you place the data into two memory regions at addresses 0x3000_0000 (Slave Address) and 0x3000_0001 (Write Data).

Each address field—such as TX Data, RX Data, Slave Address, Write Data, or Read Data—is called a data register. Each component has its own set of data registers specific to that component.

Each data register is offset from the base address of its corresponding component by a certain amount, known as the offset.

Data Registers
As mentioned, these registers are used for the primary data fields of a component.
For example, UART might have functional registers such as:
- UART_TX_DATA register: Holds the transmit data the user wants to send via the TX port.
- UART_RX_DATA register: Holds the received data from the RX port.
- And so on.
Configuration Registers
Beyond functional registers, there are also configuration registers for each component. These are used to set parameters and operating modes for the component.
For example, UART might include configuration registers like:
- UART_EN register: Enables or disables the UART.
- UART_IRQ_MASK register: Enables or disables interrupts for the peripheral.
- UART_BAUDRATE register: Sets the baud rate for the UART protocol.
- UART_LEN register: Defines the size of a UART data frame.
- UART_PARITY register: Selects Odd, Even, or No parity bit.
- And so on.
Status Registers
Status registers allow users to monitor the operational state of a component. These are typically read-only.
For example, UART might have status registers such as:
- UART_TX_STATE register: Indicates the status of the UART TX controller (IDLE or BUSY).
- UART_RX_STATE register: Indicates the status of the UART RX controller (IDLE or BUSY).
- UART_IRQ_SOURCE register: Identifies the source of an interrupt signal (useful when multiple interrupt events share a single interrupt pin for that peripheral).
- And so on.

For the software layer: Users classify these registers into three types:
Data registers
Configuration registers
Status registers
For the hardware layer (RTL design): Designers categorize registers into four types:
Read-write (RW) registers
Read-only (RO) registers
Write-1-to-set (RW1S) registers
Write-1-to-clear (RW1C) registers
Each type has unique characteristics, which I’ll explore in a future article.

Thus, the Memory-Mapped approach offers several advantages:

Reduces complexity for the processor: Since the interface is unified with the data memory, adding more components doesn’t impact the initial interface.
Enhances SoC scalability: When expanding the number of peripherals, designers only need to define additional address regions without significantly affecting the existing system.
Simplifies firmware/software development and expansion: When using the SoC, developers only need to focus on the address region corresponding to each component.

2. Behavior

As noted earlier, the processor communicates with components via the data memory interface, which includes the following signals:

Address Lines:
- Address
Data Lines:
- WriteData
- ReadData
Control Lines:
- MemWrite
- MemRead

However, this is a one-way interface (from processor to components). If a component isn’t ready to receive WriteData or provide ReadData, it could result in WriteData being dropped or ReadData being undefined.

This “not ready” scenario occurs frequently, with specific cases such as:

For WriteData (write operation): Most component designs include buffers (e.g., FIFOs) at the interface to queue data for processing. However, these buffers have limits, and processing one piece of data can take much longer than sending it. Without a control mechanism, WriteData drops are common.
For ReadData (read operation): Typically, ReadData is paired with an interrupt mechanism—when data is ready, the component signals the processor. However, not all components enable interrupts. Some rely on polling, raising the question: “When is ReadData ready inside the component?”

To address this, on-chip protocols add signals from the component to the processor (e.g., component_ready or component_busy) so the processor knows when WriteData can be received and when ReadData is ready to be sent.

→ This mechanism is collectively known as the Handshaking protocol.

Handshaking Protocol

In the handshaking protocol, components are divided into two roles:

Master: The entity that initiates transactions (transfers).
Slave: The entity that receives transactions and responds to the master (if applicable).

Each transfer falls into two main categories:

Write Transfer: The master requests to write data to a slave.
Direction of all ports in Write channel
- Step 1: The master places the address and data into the WR_ADDR and WR_DATA ports, then sets the write request pin (WR_REQ) to level 1.
- Step 2: When the slave is ready, it sets the WR_ACK port to level 1 to indicate it has received the data.
- Step 3: Once the transaction is confirmed (both WR_REQ and WR_ACK are high simultaneously), it’s complete, and the master can proceed with a new transaction or pause.
Read Transfer: The master requests to read data from a slave.
Direction of all ports in Read channel

- Step 1: The master places the address into the RD_ADDR port and sets the read request pin (RD_REQ) to level 1.
- Step 2: The slave retrieves the data from the corresponding address region. Once ready, it places the data into the RD_DATA port and sets the RD_ACK port to level 1.
- Step 3: Once the transaction is confirmed (both RD_REQ and RD_ACK are high simultaneously), it’s complete, and the master can proceed with a new transaction or pause.

Summary

In this first part of the series on On-chip Bus, you’ve learned how an SoC can connect and control multiple components (e.g., UART, I2C) using a single interface thanks to the Memory-Mapped I/O mechanism. This highlights the benefits of standardizing communication within the system, reducing complexity when integrating numerous peripherals.

By mapping each component to a distinct address region and subdividing it into smaller registers, the processor can effortlessly exchange data, configure settings, and monitor the status of each component.