Transient Fault Protection (TFP) for Application Processor
Introduction
The Cortex®‑A720AE core used in Application Processor supports the Transient Fault Protection (TFP) feature, which enhances reliability by including extra logic to check the integrity of flip-flops in the functional (non-debug) logic. This mechanism is designed to detect single transient faults affecting a group of functional flip-flops. This feature can be utilized to significantly boost the transient fault detection capability of the core during safety critical applications and can be a key component towards achieving a Single Point Fault Metric (SPFM) (transient) goals and Safe Failure Fraction (SFF) (transient) at the core level.
Transient Fault Detection Mechanism
- When TFP is enabled in hardware, additional logic to calculate the parity for a group of functional flops that have a common clock, reset, and enable term is instantiated. 
- The parity information is stored in an additional flip-flop, called the parity-flop. 
- The output of this flop is checked against the parity of the data stored in the associated group of functional-flops and a difference between the two indicates the fault has occurred on the functional-flops or the parityflop itself. 
- The error signals from each group of parity logic are combined by functional unit using a logical OR reduction and routed to the RAS registers for reporting and error signaling. 
Fig. 40 Transient Fault Protection Mechanism
Fault Detection Constraints
- The flop parity mechanism is capable of detecting a single transient fault within a parity group. 
- A fault that causes an even number of bit-flips cannot be detected by the transient fault protection logic. 
Fault Reaction
- Errors that are detected by transient fault protection logic cannot be contained and do not include any specific features for hardware recovery. 
- The errors detected by the flop parity mechanism signal are reported in the - ERXSTATUS_EL1register.
- The detected errors are reported as Uncorrected Errors of type Uncontainable: 
| Register Bit | Value | Description | 
|---|---|---|
| 
 | 1’b1 | Uncorrected Error | 
| 
 | 2’b00 | Uncontainable Type | 
- Additional diagnostic information is provided by the IERR fields within the - ERXSTATUS_EL1register. The IERR codes indicate which TFP chunk (or functional unit) detected the parity error. The IERR codes for the Cortex®‑A720AE core are as follows:
| IERR Code | Affected Protection Unit | 
|---|---|
| 0b00100 | Data side (Dside) | 
| 0b00101 | Vector Unit (VX) | 
| 0b00110 | Memory Management Unit (MMU) | 
| 0b00111 | Level 2 Cache | 
| 0b01000 | GIC CPU Interface (INTC) | 
| 0b01001 | Debug Trace | 
| 0b01010 | Instruction side (Iside) | 
| 0b01011 | Decode | 
| 0b01100 | Rename | 
| 0b01101 | Commit | 
| 0b01110 | Issue | 
| 0b01111 | Iexecute | 
| 0b10000 | Axis Bridge | 
Note
This field is valid only when ERXSTATUS_EL1.V is 0b1
and ERXSTATUS_EL1.SERR is 0x1A. In all other cases,
the field is reported as UNKNOWN.
Implementation in Software
The software implementation of the TFP feature comprises the following elements:
Enabling TFP
To enable detection and reporting of errors via the transient fault protection mechanism, software sets the following fields in RAS registers:
| Register | Bit | Description | 
|---|---|---|
| 
 | 0 | ED Enable error detection and reporting globally | 
| 
 | 33 | TFPEN Enable TFP error detection and reporting | 
It is recommended to enable TFP error reporting in a Mixed-Configuration Hybrid-mode, which is typically employed as per the Aspen specifications where all cores operate in Hybrid split mode.
Error Handling
When a transient fault is detected by the flop parity mechanism:
- The RAS error record is updated. 
- A fault handling interrupt (FHI) is raised. 
- In the TF-A RAS error handler, the - ERXSTATUS_EL1register is examined to confirm a transient fault. The corresponding error information (IERR) indicates the source of the fault which is mentioned in debug print for example:
WARNING: CPU RAS: TFP Error Detected : AXIS_BRIDGE
- The similar processing is implemented in scp-firmware running on SI-CL0 as described in Safety Island error processing, where diagnostic message is logged for example: 
AP detected TFP Error : AXIS_BRIDGE
Validation
The TFP enablement is validated in the Primary Compute CPUs RAS tests.