Simulation Save/Restore Checkpointing with CXXRTL
- Introduction
- Save/Restore Checkpoint Use Cases
- The CXXRTL Data Model
- Save/Restore of the VexRiscv Simulation Example
- Adding Save/Restore to a CXXRTL Testbench
- Design Introspection to Capture the UART TX Writes
- More Complex Designs and Potential Improvements
- Conclusion
- References
Introduction
In my initial blog post about CXXRTL, I wrote about how the underlying data model of CXXRTL allows for design introspection, and how this could be used to save the state of a design to a file, and later restore it.
I wanted to try this out on a real example, so that’s what I’ll be discussing here.
The design is not a toy example: it contains a VexRiscv RISC-V CPU with memories, LEDs, and a UART to print out status messages.
Save/Restore Checkpoint Use Cases
Before diving into the details, let’s talk about some potential use cases.
-
Accelerated debugging of long running simulations
In most RTL regression setups, thousands of simulations are run day in/day out on a simulation farm with all debugging features disabled for maximal simulation speed.
When a regression simulation fails, the whole simulation gets rerun with wave dumping enabled.
That’s a problem when the simulation fails after many hours or even days. Dumping waveforms at all times is not an option, because it slows down the simulation by an order of magnitude.
With a checkpoint save/restore option, one could simulate without dumping waveforms but instead save the state of the design at fixed intervals, say, every 5 minutes, while deleting the previous snapshot to save disk space.
After a simulation failure, one can quickly get waveforms by restarting the simulation after the last saved checkpoint.
The additional run time for a checkpoint save operation is minimal.
-
Aggressive Waveform Format Compression
This is an expansion of the previous use case.
Instead of dumping the changed values of signals whenever they happen, one could instead save checkpoints at regular intervals, together with the simulation model. The checkpoints themselves could even be incremental from one step to the other.
When zooming in on a waveform, the waveform viewer would have to simulate on-the-fly, but that might be an acceptable trade-off.
There are all kinds of optimizations possible: while simulating, you could keep track of each signal whether or not a value has stayed constant or not, thus allowing some kind of immediate visual feedback in the waveform viewer about whether or not something interesting has happened for a particular signal.
Siloti by Synopsys uses this kind of method to reduce the bulk of waveform data.
-
Bypassing a fixed long-running configuration sequence
Imagine simulating an SOC that runs Linux or some other piece of software that requires a long bootup sequence.
One could save a checkpoint after the initialization sequence has completed, but before a specific HW driver has started executing.
With a bit of planning, it’s possible to restart a simulation at the checkpoint, even when the HW driver is different for each run, thus allowing rapid driver development interations on the simulation model.
It’s even possible to do this when the RTL of the HW under test changes between runs: all one needs to do is keep the HW under test in reset up to the checkpoint.
The CXXRTL Data Model
Creating a simulation checkpoint requires an understanding of how a CXXRTL model stores the data of all
state holding objects. This can be derived from the
cxxrtl.h
, a
file that gets included in any CXXRTL generated model.
At the lowest level, CXXRTL has templated
value
,
wire
,
and memory
classes.
These are used to create the basic primitives that contain simulated data values.
-
value
Used for objects that contain a single simulation value.
A
value
is always used to represent a combinatorial signal in your design (but not all combinatorial signals are represented by avalue
.) -
wire
This awkwardly named class (
reg
would have been a better name IMO) is used for objects that contain the current simulation value, and the next simulation value.In most cases, this will be an object that is used to store the contents of a flip-flop or a latch. There are some cases where a
wire
is used for a combinatorial signal, such as for output signals of a module. For our save/restore purpose, it’s not important to understand these low level implementation details. -
memory
Self-explanatory: when using Verilog, this would be used to store an object that’s declared like this
reg [7:0] memory[0:1023]
.
While one could use these objects directly when accessing the internal simulation values of a design, it wouldn’t be very pratical: they don’t have the same base class, and the way they store the simulation data differs per class.
But that’s ok, because there’s a much better way: the
debug_item
class exists specifically to allow external code to access the simulation values in a uniform way. It
also makes it possible to write CXXRTL testbenches with introspection in pure C, rather than C++. (You
still need to compile the CXXRTL model itself with a C++ compiler.) This is useful when you want to embed
your simulation model into a program that’s written in C, such the standard Python compiler.
A debug_item
exposes the following aspects of the simulation data holding objects:
-
Whether the item is a
value
,wire
,memory
, or analias
(which maps one netlist item to another one with identical value.) -
Contains all kind of attributes of the simulation object. The direction of module ports, whether or not a wire is driven by a storage cell, etc.
-
The size, in bits, of the value, and the bit number of the LSB.
These values are essential to interpret the simulation data values correctly.
-
For memories, these indicates the amount of memory locations in the memory, and index of the first word.
For a
wire
orvalue
, these fields are set to 1 and 0 resp. Since adebug_item
has a uniform interface for all simulation data, one doesn’t need to have a special case to access data between the 3 storage classes: you can assume alldebug_items
to be memories, but with only 1 location. -
Pointers to the actual simulation data!
For a
wire
andmemory
,next
is a null pointer.curr
andnext
areuint32_t
pointers because the C++ simulation model ultimately uses that as the way to store simulation data.It’s all pretty straightforward: the LSB of a vector is stored at bit 0 of the first
uint32_t
word, and as manyuint32_t
words are allocated to store all the bits of a vector.
Save/Restore of the VexRiscv Simulation Example
I updated my main CXXRTL example (in the ./cxxrtl directory) to add save/restore checkpointing.
The design runs the following program on a VexRiscv CPU:
int main()
{
uart_init();
uart_tx_str("\nHello World!\n");
REG_WR(LED_DIR, 0xff);
for(int i=0;i<1500;++i){
REG_WR(LED_WRITE, 0x01);
wait_cycles(100);
REG_WR(LED_WRITE, 0x02);
wait_cycles(100);
REG_WR(LED_WRITE, 0x04);
wait_cycles(100);
}
uart_tx_str("\nLEDs done!\n");
while(1);
}
The testbench has new debug level command line parameters to specify saving a checkpoint and restoring it:
// <executable> <debug level> <vcd filename>
// debug level:
// 0 -> No dumping, no save/restore
// 1 -> dump everything
// 2 -> dump everything except memories
// 3 -> dump custom (only wires)
// 4 -> save to checkpoint
// 5 -> restore from checkpoint
Simulation Save
./example_Og_clang9 4
dumps the design state after 10000 simulation cycles into checkpoint.val
file:
ubuntu@ubuntu-xenial:~/projects/cxxrtl_eval/cxxrtl$ ./example_Og_clang9 4
UART TX:
UART TX:
UART TX: H
UART TX: e
UART TX: l
UART TX: l
UART TX: o
UART TX:
UART TX: W
UART TX: o
UART TX: r
UART TX: l
UART TX: d
UART TX: !
UART TX:
UART TX:
led_red: 1 0
led_red: 0 1
...
led_green: 1
led_green: 0
led_blue: 1
led_red: 1 18
led_blue: 0
Saving checkpoint...
Note how the simulation starts with the CPU writing “Hello World!” to the UART of the design, then it goes 18 times through an LED toggling sequence.
Simulation Restore
./example_Og_clang9 5
restores the design from checkpoint.val
and continues where the design was saved earlier:
ubuntu@ubuntu-xenial:~/projects/cxxrtl_eval/cxxrtl$ ./example_Og_clang9 5
Restoring from checkpoint...
Restore done...
All items:
_zz_ExampleTop_1_ : type = 0 ; width = 5 ; depth = 1 ; lsb_at = 0 ; zero_at = 0 ; value = 31
_zz_ExampleTop_2_ : type = 0 ; width = 1 ; depth = 1 ; lsb_at = 0 ; zero_at = 0 ; value = 0
button : type = 0 ; width = 1 ; depth = 1 ; lsb_at = 0 ; zero_at = 0 ; value = 0
clk_cpu : type = 0 ; width = 1 ; depth = 1 ; lsb_at = 0 ; zero_at = 0 ; value = 0
clk_cpu_reset_ : type = 3 ; width = 1 ; depth = 1 ; lsb_at = 0 ; zero_at = 0 ; value = 1
clk_cpu_reset_gen_reset_cntr : type = 1 ; width = 5 ; depth = 1 ; lsb_at = 0 ; zero_at = 0 ; value = 31
clk_cpu_reset_gen_reset_unbuffered_ : type = 0 ; width = 1 ; depth = 1 ; lsb_at = 0 ; zero_at = 0 ; value = 0
clk_cpu_reset_gen_reset_unbuffered__regNext : type = 1 ; width = 1 ; depth = 1 ; lsb_at = 0 ; zero_at = 0 ; value = 1
cpu_u_cpu _zz_CpuTop_1_ : type = 0 ; width = 1 ; depth = 1 ; lsb_at = 0 ; zero_at = 0 ; value = 0
...
uart_rxd : type = 0 ; width = 1 ; depth = 1 ; lsb_at = 0 ; zero_at = 0 ; value = 0
uart_txd : type = 3 ; width = 1 ; depth = 1 ; lsb_at = 0 ; zero_at = 0 ; value = 1
led_red: 1 0
led_red: 0 1
led_green: 1
...
led_red: 0 1482
led_green: 1
led_green: 0
led_blue: 1
UART TX:
UART TX:
UART TX: L
UART TX: E
UART TX: D
UART TX: s
UART TX:
UART TX: d
UART TX: o
UART TX: n
UART TX: e
UART TX: !
UART TX:
UART TX:
There are 2 important things to note here:
- CPU does NOT print out “Hello World!”
- the LED toggle sequence happens 1482 times, not 1500
That’s because the simulation picked up where it was saved after 18 LED toggle sequences.
Adding Save/Restore to a CXXRTL Testbench
Through the design introspection feature of CXXRTL, you can get the simulation values of all value
, wire
, and memory
objects of the
design that link back to an original Verilog named object. The reverse is not necessarily true: depending on the CXXRTL
optimization level, or on optimization steps that were performed by Yosys, named objects of the Verilog source code may
not exist in the simulation model anymore.
To avoid race conditions, CXXRTL expects that values are set by a testbench after the clock has been simulated into a low state, and that values are read by the testbench after a high state has been simulated. I used the same convention when dumping and restoring the state.
While dumping state, I only save the contents of wire
and memory
objects. The simulation value of value
objects can be derived
by executing a simulation step.
The full process is as follows:
-
Prepare the design for introspection
This is no different than preparing the design for VCD waveform dumping, which I covered in my earlier blog post.
cxxrtl_design::p_ExampleTop top; cxxrtl::debug_items all_debug_items; top.debug_info(all_debug_items);
-
Dump the design state when the simulation hits 10000 clock cycles
In the main testbench, I call the
save_state
function:if (dump_level == 4 && i==10000){ cout << "Saving checkpoint..." << endl; std::ofstream checkpoint("checkpoint.val"); save_state(all_debug_items, checkpoint); exit(0); }
This code is called after the design has been simulated with the clock set to 1:
top.p_osc__clk__in.set<bool>(true); top.step();
By default,
save_state
only stores the value ofwire
andmemory
objects:void save_state(cxxrtl::debug_items &items, std::ofstream &save_file, uint32_t types = (CXXRTL_WIRE | CXXRTL_MEMORY));
The implementation is straightforward but naive:
void save_state(cxxrtl::debug_items &items, std::ofstream &save_file, uint32_t types) { save_file << items.table.size() << endl; for(auto &it : items.table){ save_file << it.first << endl; for(auto &part: it.second){ if (part.type & types){ uint32_t *mem_data = part.curr; for(int a=0;a<part.depth;++a){ for(int n=0;n<part.width;n+=32){ save_file << *mem_data << endl; ++mem_data; } } } } } }
Note how it saves the name (
it.first
) of all simulation objects, even when they don’t match the requesttypes
argument. In practice, this means that it dumps the name of allvalue
objects as well, but it doesn’t dump the associated data that comes with it. This is a place where the code can be improved…But more important, notice how easy it is to fetch and save the simulation values of the requested simulation objects.
-
Restore the design state at the start of a simulation
The testbench calls the
restore_state
function for this:if (dump_level == 5){ cout << "Restoring from checkpoint..." << endl; std::ifstream checkpoint("checkpoint.val"); restore_state(all_debug_items, checkpoint); cout << "Restore done..." << endl; dump_all_items(all_debug_items); }
restore_state
is just as straightforward assave_state
:void restore_state(cxxrtl::debug_items &items, std::ifstream &restore_file, uint32_t types) { int size; restore_file >> size; for(int i=0;i<size;++i){ std::string name; uint32_t value; std::getline(restore_file,name); vector<cxxrtl::debug_item> &item_parts = items.table[name]; for(auto &part: item_parts){ if (part.type & types){ uint32_t *mem_data = part.curr; for(int a=0;a<part.depth;++a){ for(int n=0;n<part.width;n+=32){ restore_file >> value; *mem_data = value; ++mem_data; } } } } } }
After restoring the state, it’s important to run a simulation step with the clock set to 1.
And that’s all there is to it!
Design Introspection to Capture the UART TX Writes
To better illustrate that save/restore actually worked, the testbench captures writes to the TX register of a SpinalHDL UART that is connected to the CPU through a standard APB3 bus.
The individual signals are referenced as follows:
cxxrtl::debug_item psel = all_debug_items.at("cpu_u_cpu u_uart io_apb_PSEL");
cxxrtl::debug_item penable = all_debug_items.at("cpu_u_cpu u_uart io_apb_PENABLE");
cxxrtl::debug_item pwrite = all_debug_items.at("cpu_u_cpu u_uart io_apb_PWRITE");
cxxrtl::debug_item pwdata = all_debug_items.at("cpu_u_cpu u_uart io_apb_PWDATA");
cxxrtl::debug_item paddr = all_debug_items.at("cpu_u_cpu u_uart io_apb_PADDR");
The testbench intercepts writes to the UART TX register at address 0 and prints out the transmitted character:
if (debug_item_get_value32(psel) &&
debug_item_get_value32(penable) &&
debug_item_get_value32(pwrite) &&
debug_item_get_value32(paddr) == 0
){
// APB write to UART RXTX register
cout << "UART TX: " << (char)debug_item_get_value32(pwdata) << endl;
}
More Complex Designs and Potential Improvements
While non-trivial, the example is only a proof of concept to illustrate the basics, but it doesn’t deal with
complexities that can make save/restore operations a lot harder.
Asynchronous clock domains
A design with multiple, asynchronous clock domains will require careful timing of when to capture the data to avoid mismatches. I haven’t tried this out myself.
Taking care of external state
More fundamentally, testbenches that have their own state that influences the design under simulation will need to either save/restore their state as well, or they’ll have to accept the reality that a restored design might not simulate exaclty the same way as the design would have simulated if it hadn’t been interrupted. This doesn’t have to be a problem, but it’s something to be aware of.
Dealing with changed design
The current code expects that the CXXRTL simulation model remains the same between save and restore. It’s sufficent for 1 register to change, and it will fail horribly, hopefully with a coredump.
A robust system should deal with these cases gracefully. It could just issue a fatal, and informative, error. Or it could even decide to just warn and continue, for the second use case in my introduction, where RTL has changed between simulations, but the changed RTL was in reset at the time of the checkpoint.
Optimized checkpoint file format
The save_state
routine is very inefficient, since it just dumps all the hierarchical names in full as well
as the data itself as an ASCII string. This can probably be optimized by 2 orders of magnitude!
Conclusion
CXXRTL makes it easy to save and restore a simulation. It’s not something that you’ll often, but one day, you might run into a use case where it can save a major amount of simulation time.
References
My other blog posts about CXXRTL: