



# Software Simulation Technologies in Virtual Platforms





#### Context

- HW/SW Embedded Systems Design Flow
  - > HW/SW Simulation
  - > Performance Analysis
    - avoiding slow design iterations
  - Design Verification
    - At the different abstraction levels





## Agenda

- Motivation: Why SW performance analysis
- Software Simulation Technologies in Virtual Platforms
  - > SCoPE: SW performance analysis for DSE
- Conclusions



### Motivation

- The MPSoC
  - Multi-processing platform
    - ASIC
    - FPGA
    - Commercial multi-processing platform
  - SW-centric design methodology
    - Most of the functionality implemented as Embedded SW
    - With 'some' application-specific HW



### **Motivation**

- Computing needs Time
  - > Edward A. Lee
  - > Communications of the ACM, 52(5):70-79, May 2009



- Computing needs Energy
  - Eugenio Villar
  - > Still to be published





#### **Motivation**

- Embedded SW performance analysis
  - > SW performance analysis based on SW simulation
    - At any abstraction level
  - > As an integral part of the MPSoC simulation
  - Essential for MPSoC verification
    - At any abstraction level
  - Essential for DSE
    - During architectural design











HDL simulation







ISS simulation







Virtualization







- Virtualization (QEMU)
  - > Detailed model
    - High modeling cost
    - Late design steps
  - Faster than ISS

#### PowerPC (200 MHz)

$$# r1 = r1 - 16$$
  
addi r1,r1,-16



#### Intel Core i5 (2.40 GHz)

# addl\_T0\_im -16 # ebx = ebx - 16 add \$0xfffffff0,%ebx # movl\_r1\_T0

# env->regs[1] = ebx mov %ebx,0x4(%ebp)





- Virtualization
  - Functional emulation
  - Rough timed simulation
    - i.e. 1 cycle per instruction
  - > Additional effort needed for more accurate modeling
    - Execution times
    - Power consumption
    - Caches
    - •
  - Requires a specific Virtual Model for each processor
- Commercial tools
  - > OVP, FastModels, Cadence, Carbon, Synopsys (CoWare), etc.





- Native simulation
  - > Embedded code directly executed by the host
  - Good accuracy by back-annotation
  - > Fast execution time





Native simulation based on HAL API

Virtual Model







Native simulation based on OS API

Virtual Model







Basic code annotation in native simulation

```
Global variable
                                       int Sim Time = 0;
Overflow = 0:
s = 1L;
                                     → Sim Time += 20;
for (i = 0; i < L_subfr; i++) {
  Carry = 0;
  s = L_macNs(s, xn[i], y1[i]);
                                     Sim Time += 25;
  if (Overflow != 0) {
                                     → Sim Time += 15;
     break; }}
if (Overflow == 0) {
  exp xy = norm l(s);
                                     Sim Time += 10;
  if (\exp xy \le 0)
     xy = round(L_shr (s, -exp_xy)); Sim_Time += 10;
  else
     xy = round(L_shl (s, exp_xy));|} Sim_Time += 10; wait included
mq_send(queue1, &xy, p, t);-
```





Functional simulation based on code

Virtual Model







- Power estimation based on traces
  - Accurate but slow

#### Virtual Model







Power estimation based on back-annotation

Virtual Model based on Native Simulation

Application Code

Task 1 ... Task n

OS API HdS API

HdS

Abstract

model of

OS & CPU

TLM Bus model

DMA NoC if. ASHW memory

NoC model

Same technique as with execution times

Global variable int Sim\_Energy = 0;



Best ratio accuracy/speed





Performance/Error comparison

|                      | Technology  | Time<br>Estimation | Time & Power Estimation |
|----------------------|-------------|--------------------|-------------------------|
| Functional           | Performance | 5,000              | N.A.                    |
|                      | Error       | N.A.               | N.A.                    |
| Native               | Performance | 1,000              | 500                     |
|                      | Error       | 1.3                | 1.4                     |
| Virtualization       | Performance | 200                | T.B.M.                  |
|                      | Error       | 1.5                | T.B.M.                  |
| ISS (cycle-accurate) | Performance | 10                 | 1                       |
|                      | Error       | 1.1 (DT)           | 1.1                     |
| HDL                  | Performance | 1                  | 0.1                     |
|                      | Error       | 1 (DE)             | 1                       |

Rough approximate figures





- Key features
  - Abstract OS modeling
  - Instruction cache modeling
  - Data cache modeling
  - System power estimation
- Novel features
  - Physical memory accesses
  - Separate memory spaces
  - Configurability for Design-Space Exploration
  - Dynamic Voltage-Frequency Scaling
  - > Thermal modeling
  - > System composition from IP-XACT components
  - ➤ Win32 API





Overflow = 0:

Carry = 0:

**for** (i = 0; i < L subfr; i++) {

**if** (Overflow != 0) {

exp xy = norm l(s);

break: \}}

if (Overflow == 0) {

if  $(\exp xy \le 0)$ 

else

s = L macNs(s, xn[i], y1[i]);

s = 1L:



#### **SCoPE: SW Performance Estimation for DSE**

- Instruction cache modeling
  - Similar to time modeling

```
struct icache line { char num set; char hit; }
                                          static icache line line 124 = \{0\};
                                          static icache_line line_125 = {0};
                                          static icache line line 126 = {0};
                                   If (line 124.hit == 0) insert line(&line 124);
                                   → If (line 125.hit == 0) insert line(&line 125);
xy = round(L shr(s, -exp xy));
<u>xy = round(L_shl (s, exp_xy)); }</u> If (line_126.hit == 0) insert_line(&line_126);
```





- Data cache modeling®
  - Use modified native addresses to get data variable addresses
- Global array with all memory line status

```
L2 modeling bool cache[dcache_size/line_size];
```

```
Overflow = 0:
s = 1L:
                                       → If (cache[GET_TAG(&Overflow)] == 0)
for (i = 0; i < L_subfr; i++) {
                                                  insert line(GET TAG(& Overflow));
   Carry = 0:
                                        If (cache[GET_TAG(\&s)] == 0)
   s = L macNs(s, xn[i], y1[i]);
                                                  insert line(GET TAG(& s));
   if (Overflow != 0) {
                                      → If (cache[GET_TAG(&Carry)] == 0)
      break: }}
                                                  insert line(GET TAG(& Carry));
if (Overflow == 0) {
                                        If (cache[GET TAG(\&i)] == 0)
   exp_xy = norm l(s);
                                                  insert line(GET TAG(&i));
   if (\exp xy \le 0)
                                        If (cache[GET TAG(&xn[i])] == 0)
      xy = round(L shr(s, -exp xy));
                                                  insert line(GET TAG(& xn[i])); ...
   else
      xy = round(L_shl (s, exp_xy)); }
                                      → If (cache[GET_TAG(&exp_xy)] == 0)
                                                  insert line(GET TAG(& exp xy));
```





- System power estimation
  - > Application code
    - Instruction counting from binary
  - > OS & HW-dependent SW
    - > Function power estimation
  - Caches
    - Counting memory accesses
    - Cache misses
  - > Bus
    - Actual bandwidth
      - Cache misses
      - DMA accesses
      - > HW accesses
  - > HW & NoC
    - SystemC power models









- Design-Space Exploration
  - Configurable model









Dynamic Voltage-Frequency Scaling









## SCoPE+: Improvements from Complex

Performance estimation before partitioning





#### Conclusions

- SW simulation and performance analysis
  - Essential Design Technology
  - > HW/SW Embedded Systems
  - > At different design steps
    - Different modeling and simulation technologies
    - Various performance\*accuracy products

#### SCoPE

- SystemC Native Co-Simulation Technology
- Specially tuned to performance analysis
  - Design-Space Exploration





# Thank you for your attention

- Slides available at:
  - www.teisa.unican.es/en/publicaciones
- ▶ Open-source SCoPE available at:
  - www.teisa.unican.es/scope