
On-chip BUS [Part 1]
When designing a System-on-Chip (SoC), connecting components such as the processor, memory, and peripherals (e.g., UART, I2C, SPI) is a critical challenge. In this article, I will introduce how these components communicate with each other through an On-chip BUS, as well as explain why the bus architecture is significant and designed the way it is. We’ll start with basic concepts, move on to memory organization, and explore how the processor interacts with multiple peripherals using the Memory-Mapped I/O mechanism.
Overview
1. General Structure of an SoC
A System-on-Chip typically consists of three main components:
- Processor: The central element of the system, responsible for controlling and processing data. The processor can communicate with both input components and output components.
- Memory: Used to store programs and temporary data. This can include RAM, ROM, or cache memory integrated within the processor.
- I/O Components (Peripherals): These are components that interface with the external world, such as UART, I2C, SPI, GPIO, etc., enabling the SoC to exchange data with external devices.
The figure below illustrates the Von Neumann architecture, where these three components are interconnected via a communication BUS.

Von Neumann Architecture
2. Role of the Bus in the System
The BUS serves as the component that connects all elements to the processor and facilitates the transfer of data packets between them. Modern buses often adhere to standard protocols such as AXI, AHB, APB (part of the AMBA standard), Whisbone, Avalon, and others.
In this discussion of On Chip Bus, I’ll cover the core ideas behind most standard on-chip bus designs today and explain why they are structured as they are.
Foundation
1. Interface
If you’ve studied computer architecture or examined the design of a general-purpose processor, you’ll notice that its interface isn’t directly connected to input or output components (as shown in the figure below).

Simplified Illustration of a MIPS Processor Architecture
So, in your opinion, when an SoC integrates input and output components, does the processor need to incorporate multiple different interfaces to connect to these components?
(Take a moment to consider the best answer before reading on.)
a. Approach 1:
If your answer is “Yes,” you’ll encounter two key challenges:
- Problem 1: Each input or output component has unique characteristics, meaning that when designing peripherals for integration into the system, the interfaces connecting to the processor may differ.
For example, let’s compare two peripheral protocols: a UART (TX) peripheral and an I2C (Master) peripheral. - A UART frame requires only one data field:
Thus, when designing a UART (TX) peripheral, its interface only needs one field for the processor to send TX data to the peripheral: - In contrast, an I2C frame requires 2-3 data fields: a Slave Address frame and Data Frame 1/2 (as shown below):
Therefore, the interface for an I2C peripheral needs three fields for the processor to send the Address frame and Data frames to the peripheral:
→ This increases the complexity of the processor’s interface because each I/O component has distinct characteristics. And that’s before considering the second problem below.
- A UART frame requires only one data field:
- Problem 2: If an SoC includes 10-15 input and output components, would we need to add 10-15 corresponding interfaces to the processor?
⇒ If the answer is “Yes,” then integrating a new peripheral would require redesigning the entire processor, significantly increasing complexity.
Therefore, this approach is not viable for SoCs with many peripherals. As a result, we will now consider the second approach described below.
b. Approach 2:
If your answer is “No,” then where does the processor’s interface to the input and output components reside?
Here’s the answer:

The processor treats input and output components as memory regions, where it can exchange data through read and write operations. Naturally, each memory region has a specific address, meaning each input and output component is assigned its own distinct address range.
This mechanism is known as Memory-Mapped I/O.
As a result, the interface between the processor and the system’s components uses the same type as the interface with data memory, consisting of the following signals:
Address
WriteData
ReadData
MemRead
enableMemWrite
enable
These five signals can be grouped into three main categories:
- Address Lines - Includes the
Address
bus → Used to map to the desired address region for execution. - Data Lines - Includes the
WriteData
bus andReadData
bus → Used for data transfer. - Control Lines - Includes
MemRead
andMemWrite
→ Used to specify operations.

→ Thus, as long as the address is mapped to a specific component, that component is responsible for:
- Receiving data from
WriteData
when aMemWrite
signal is present. - Sending
ReadData
back to the processor when aMemRead
signal is present.
As you can see, with just these three fields, the processor can manage numerous components, as illustrated below:

Each component has a separate address region, and the starting address of a region is called the base address of that component.
- Here are some examples of memory mapping used to manage peripherals (I/O components) in microcontroller families:
- Address mapping table for peripherals in an STM32 family:
Memory Map table of an ARM STM32’s family
- Address mapping table for peripherals in an ESP32 family:
Memory Map table of a ESP32’s family
- Address mapping table for peripherals in an STM32 family:
So far, we’ve addressed one challenge: “How can the processor manage 10-15 input and output components with a single interface?” (Problem 2 above). But what about the first problem: “Each component has different characteristics, resulting in varying numbers of data fields to exchange (see Problem 1). How does the processor handle this while maintaining a consistent interface?”
The answer: Further subdivide the address space into smaller sub-regions, each representing specific functions of a component.
For example, let’s say the processor manages two peripherals: UART and I2C. (In this example, I’ll temporarily fix the configuration parameters for UART and I2C and focus only on transmitted and received data.)
- For UART, there are two main data fields:
TX Data
andRX Data
. - For I2C, there are three main data fields:
Slave Address
,Write Data
, andRead Data
.
For each type of data field of a component, I’ll divide it into sub-address regions, which must still reside within the base address region of the corresponding component.
Here’s an illustration of subdividing into smaller sub-address regions:


Based on the subdivided address regions in the illustration, if you want the UART to send data via TX Data, you simply place the data into the memory region at address 0x2000_0000 (TX Data
). Similarly, if you want the I2C to write data to an external slave via the I2C protocol, you place the data into two memory regions at addresses 0x3000_0000 (Slave Address
) and 0x3000_0001 (Write Data
).
Each address field—such as TX Data
, RX Data
, Slave Address
, Write Data
, or Read Data
—is called a data register. Each component has its own set of data registers specific to that component.
Each data register is offset from the base address of its corresponding component by a certain amount, known as the offset.
- Data Registers
As mentioned, these registers are used for the primary data fields of a component.
For example, UART might have functional registers such as: UART_TX_DATA
register: Holds the transmit data the user wants to send via the TX port.UART_RX_DATA
register: Holds the received data from the RX port.- And so on.
- Configuration Registers
Beyond functional registers, there are also configuration registers for each component. These are used to set parameters and operating modes for the component.
For example, UART might include configuration registers like: UART_EN
register: Enables or disables the UART.UART_IRQ_MASK
register: Enables or disables interrupts for the peripheral.UART_BAUDRATE
register: Sets the baud rate for the UART protocol.UART_LEN
register: Defines the size of a UART data frame.UART_PARITY
register: Selects Odd, Even, or No parity bit.- And so on.
- Status Registers
Status registers allow users to monitor the operational state of a component. These are typically read-only.
For example, UART might have status registers such as: UART_TX_STATE
register: Indicates the status of the UART TX controller (IDLE or BUSY).UART_RX_STATE
register: Indicates the status of the UART RX controller (IDLE or BUSY).UART_IRQ_SOURCE
register: Identifies the source of an interrupt signal (useful when multiple interrupt events share a single interrupt pin for that peripheral).- And so on.
For the software layer: Users classify these registers into three types:
- Data registers
- Configuration registers
- Status registers
For the hardware layer (RTL design): Designers categorize registers into four types:
- Read-write (RW) registers
- Read-only (RO) registers
- Write-1-to-set (RW1S) registers
- Write-1-to-clear (RW1C) registers
Each type has unique characteristics, which I’ll explore in a future article.
Thus, the Memory-Mapped approach offers several advantages:
- Reduces complexity for the processor: Since the interface is unified with the data memory, adding more components doesn’t impact the initial interface.
- Enhances SoC scalability: When expanding the number of peripherals, designers only need to define additional address regions without significantly affecting the existing system.
- Simplifies firmware/software development and expansion: When using the SoC, developers only need to focus on the address region corresponding to each component.
2. Behavior
As noted earlier, the processor communicates with components via the data memory interface, which includes the following signals:
- Address Lines:
Address
- Data Lines:
WriteData
ReadData
- Control Lines:
MemWrite
MemRead
However, this is a one-way interface (from processor to components). If a component isn’t ready to receive WriteData
or provide ReadData
, it could result in WriteData
being dropped or ReadData
being undefined.
This “not ready” scenario occurs frequently, with specific cases such as:
- For
WriteData
(write operation): Most component designs include buffers (e.g., FIFOs) at the interface to queue data for processing. However, these buffers have limits, and processing one piece of data can take much longer than sending it. Without a control mechanism,WriteData
drops are common. - For
ReadData
(read operation): Typically,ReadData
is paired with an interrupt mechanism—when data is ready, the component signals the processor. However, not all components enable interrupts. Some rely on polling, raising the question: “When isReadData
ready inside the component?”
To address this, on-chip protocols add signals from the component to the processor (e.g., component_ready or component_busy) so the processor knows when WriteData
can be received and when ReadData
is ready to be sent.
→ This mechanism is collectively known as the Handshaking protocol.
Handshaking Protocol
In the handshaking protocol, components are divided into two roles:
- Master: The entity that initiates transactions (transfers).
- Slave: The entity that receives transactions and responds to the master (if applicable).
Each transfer falls into two main categories:
- Write Transfer: The master requests to write data to a slave.
Direction of all ports in Write channel
- Step 1: The master places the address and data into the
WR_ADDR
andWR_DATA
ports, then sets the write request pin (WR_REQ
) to level 1. - Step 2: When the slave is ready, it sets the
WR_ACK
port to level 1 to indicate it has received the data. - Step 3: Once the transaction is confirmed (both
WR_REQ
andWR_ACK
are high simultaneously), it’s complete, and the master can proceed with a new transaction or pause.
- Step 1: The master places the address and data into the
- Read Transfer: The master requests to read data from a slave.
Direction of all ports in Read channel
- Step 1: The master places the address into the
RD_ADDR
port and sets the read request pin (RD_REQ
) to level 1. - Step 2: The slave retrieves the data from the corresponding address region. Once ready, it places the data into the
RD_DATA
port and sets theRD_ACK
port to level 1. - Step 3: Once the transaction is confirmed (both
RD_REQ
andRD_ACK
are high simultaneously), it’s complete, and the master can proceed with a new transaction or pause.
- Step 1: The master places the address into the
Summary
In this first part of the series on On-chip Bus, you’ve learned how an SoC can connect and control multiple components (e.g., UART, I2C) using a single interface thanks to the Memory-Mapped I/O mechanism. This highlights the benefits of standardizing communication within the system, reducing complexity when integrating numerous peripherals.
By mapping each component to a distinct address region and subdividing it into smaller registers, the processor can effortlessly exchange data, configure settings, and monitor the status of each component.