.. # SPDX-FileCopyrightText: Copyright 2025 Arm Limited and/or its # affiliates # # SPDX-License-Identifier: MIT .. _rd-aspen_design_tfp: ########################################################## Transient Fault Protection (TFP) for Application Processor ########################################################## ************ Introduction ************ The Cortex®‑A720AE core used in Application Processor supports the Transient Fault Protection (TFP) feature, which enhances reliability by including extra logic to check the integrity of flip-flops in the functional (non-debug) logic. This mechanism is designed to detect single transient faults affecting a group of functional flip-flops. This feature can be utilized to significantly boost the transient fault detection capability of the core during safety critical applications and can be a key component towards achieving a Single Point Fault Metric (SPFM) (transient) goals and Safe Failure Fraction (SFF) (transient) at the core level. *********************************** Transient Fault Detection Mechanism *********************************** - When TFP is enabled in hardware, additional logic to calculate the parity for a group of functional flops that have a common clock, reset, and enable term is instantiated. - The parity information is stored in an additional flip-flop, called the parity-flop. - The output of this flop is checked against the parity of the data stored in the associated group of functional-flops and a difference between the two indicates the fault has occurred on the functional-flops or the parityflop itself. - The error signals from each group of parity logic are combined by functional unit using a logical OR reduction and routed to the RAS registers for reporting and error signaling. .. figure:: ../images/tfp_mechanism.* :align: center :alt: Transient Fault Protection Mechanism Transient Fault Protection Mechanism | *************************** Fault Detection Constraints *************************** - The flop parity mechanism is capable of detecting a single transient fault within a parity group. - A fault that causes an even number of bit-flips cannot be detected by the transient fault protection logic. ************** Fault Reaction ************** - Errors that are detected by transient fault protection logic cannot be contained and do not include any specific features for hardware recovery. - The errors detected by the flop parity mechanism signal are reported in the ``ERXSTATUS_EL1`` register. - The detected errors are reported as Uncorrected Errors of type Uncontainable: .. list-table:: Error Reporting Fields :widths: 30 20 20 :header-rows: 1 * - Register Bit - Value - Description * - ``ERXSTATUS_EL1.UE`` - 1’b1 - Uncorrected Error * - ``ERXSTATUS_EL1.UET`` - 2’b00 - Uncontainable Type - Additional diagnostic information is provided by the IERR fields within the ``ERXSTATUS_EL1`` register. The IERR codes indicate which TFP chunk (or functional unit) detected the parity error. The IERR codes for the Cortex®‑A720AE core are as follows: .. list-table:: IERR Codes for TFP Protection Units :widths: 20 60 :header-rows: 1 * - IERR Code - Affected Protection Unit * - 0b00100 - Data side (Dside) * - 0b00101 - Vector Unit (VX) * - 0b00110 - Memory Management Unit (MMU) * - 0b00111 - Level 2 Cache * - 0b01000 - GIC CPU Interface (INTC) * - 0b01001 - Debug Trace * - 0b01010 - Instruction side (Iside) * - 0b01011 - Decode * - 0b01100 - Rename * - 0b01101 - Commit * - 0b01110 - Issue * - 0b01111 - Iexecute * - 0b10000 - Axis Bridge .. note:: This field is valid only when ``ERXSTATUS_EL1.V`` is ``0b1`` and ``ERXSTATUS_EL1.SERR`` is ``0x1A``. In all other cases, the field is reported as ``UNKNOWN``. ************************** Implementation in Software ************************** The software implementation of the TFP feature comprises the following elements: **Enabling TFP** To enable detection and reporting of errors via the transient fault protection mechanism, software sets the following fields in RAS registers: .. list-table:: TFP Control Registers :widths: 30 20 70 :header-rows: 1 * - Register - Bit - Description * - ``ERXCTLR_EL1`` - 0 - **ED** Enable error detection and reporting globally * - ``ERXCTLR_EL1`` - 33 - **TFPEN** Enable TFP error detection and reporting It is recommended to enable TFP error reporting in a Mixed-Configuration Hybrid-mode, which is typically employed as per the Aspen specifications where all cores operate in Hybrid split mode. **Error Handling** When a transient fault is detected by the flop parity mechanism: - The RAS error record is updated. - A fault handling interrupt (FHI) is raised. - In the TF-A RAS error handler, the ``ERXSTATUS_EL1`` register is examined to confirm a transient fault. The corresponding error information (IERR) indicates the source of the fault which is mentioned in debug print for example: .. code-block:: text WARNING: CPU RAS: TFP Error Detected : AXIS_BRIDGE - The similar processing is implemented in scp-firmware running on SI-CL0 as described in :ref:`rd-aspen_ras_si_error_processing`, where diagnostic message is logged for example: .. code-block:: text AP detected TFP Error : AXIS_BRIDGE ********** Validation ********** The TFP enablement is validated in the :ref:`rd-aspen_design_pc_cpus_ras_tests`.