July 16, 2021
Editor’s Note: This content is republished from the MicroZed Chronicles, with permission from the author.
It’s interesting how often I seem to go through similar development projects with different clients. Within a short period of time recently, I had several clients who are looking at DDR3/4 implementations or struggling with their implementation. As it just so happens, now I have several clients who are doing interesting things with MicroBlaze.
These projects range from Triple Modular Redundant versions flying in space to implementing TinyML and machine learning analyzing sensors on a IOT board.
Of course, that does mean we want our MicroBlaze solution to be as optimal as possible to increase performance in these applications and others. This is where the approach we take for memory deployment and organization is most critical in order to achieve the desired performance. At the basic level, our choices for the MicroBlaze program execution and data storage are either internal from Block RAM or external executing from DDR. However, there is always a little bit more to it than that.
Executing from BRAM will give the highest performance because there is no need to go off chip. However, BRAM in FPGAs is not infinite and is also needed for other elements in the design.It stands to reason then that any reasonable application requires eternal memory to execute from. In the case of many systems, this is DDR3 or DDR4. This allows for much larger applications and even the implementation of operating systems such as PetaLinux or popular real-time systems like FreeRTOS.
The use of external memory comes with a little more complexity in the design solution. If we wish to execute the application only from BRAM, the application ELF can be merged with the FPGA Bit file. Following programming, the MicroBlaze will be configured in the logic and the program will execute from BRAM, thereby making the MicroBlaze boot process straight forward.
If, however, we wish to execute the program from DDR3/4, we need to ensure the application software is stored in a non-volatile memory which is often an unused section of the configuration device. A boot loader application is then merged with the Bit file that runs from BRAM following device configuration to cross load the application from the configuration memory to the DDR3/4 memory before the start of execution.
We need to utilize a cache to get the best performance from the MicroBlaze when external DDR3/4 memory is used. Using a cache for both the data and instructions memory spaces enables higher system performance because the critical instructions and data are held closer to the processor. If cache hit occurs, a cache miss will mean the processor has to read in the value and update the cache. Caching and performance is a complex subject especially in multiprocessor systems where cache coherence is required across the multiple processors.
In a MicroBlaze system, the cache is implemented in BRAM and provides much faster access than accessing off-chip memories. The memory range allocated for the cache coverage must be outside the Local Memory Bus (LMB) address range. With both the cache and LMB implemented in BRAM, it would have little performance impact to cache the LMB.
Configuring the cache in the MicroBlaze is a straightforward once we have enabled the cache in the configuration wizard. We can configure the exact data and instruction caches. Configurable elements include:
To achieve the best performance, we can also distribute RAM to contain the tags for each cache line. This has the advantage of reducing the BRAM required and increasing the maximum frequency.
Implementing the most optimal MicroBlaze memory architecture will help you achieve the performance requirements placed upon your development, reduce the time it takes to optimize the solution for performance, and allow more time to focus on application development.
Keep an eye out for an upcoming blog where we will look at the differences that using a cache can make versus not using a cache. We will also explore different cache configurations.