Platform Fault Detection Interface (PFDI)

Overview

The Platform Fault Detection Interface (PFDI) is a modular framework designed to detect and report hardware faults.

PFDI integrates with low-level firmware, the operating system kernel, and user space to enable robust fault monitoring and system health diagnostics. It is primarily intended for use in safety-critical automotive environments, where early detection of hardware anomalies is crucial for maintaining system integrity and safety.

By default, PFDI is enabled with example reference implementations in place of actual firmware test libraries. These serve as integration placeholders for early development and bring-up. Arm Software Test Libraries (STL) can be integrated as the PFDI firmware test backend. To enable STL support and obtain access, please contact Arm or visit Arm Software Test Libraries.

Architecture

The PFDI framework consists of the following key components:

  1. Trusted Firmware-A

    • SMC Service Handlers: Secure Monitor Call (SMC) handlers expose PFDI services to non-secure world software as the Linux kernel. These SMC interfaces are defined as part of the Arm PFDI specification and follow the Arm SMCCC (Secure Monitor Call Calling Convention).

    • PFDI Driver: Executes in EL3 (platform firmware) and interfaces with the platform’s fault detection logic to initiate fault checks.

    • Reference PFDI Firmware Test Implementation: A reference implementation of the PFDI firmware test APIs is provided. It is intended for use in simulation, bring-up, or in environments where platform-specific logic is not yet integrated. This implementation does not perform actual fault detection but provides structural integration to validate the PFDI framework.

  2. Linux Kernel PFDI Driver

    • A miscellaneous character device driver responsible for interacting with the firmware via Secure Monitor Calls (SMC).

    • It serves as the bridge between the user space and the firmware.

  3. User Space PFDI Library

    • The Platform Fault Detection Interface (PFDI) library provides a standardized API for interacting with fault detection interface driver in a Linux environment.

    • It enables platform-specific tests, version management, and forcing errors by abstracting the Input Output Control (ioctl) operations complexity.

  4. User Space PFDI Tool

    • The Platform Fault Detection Interface (PFDI) Tool is implemented to help developers analyze, correct, generate, and pack YAML configuration files that define CPU task ranges.

  5. User Space PFDI Sample Application

    • Command-line utility or background service to initiate and log test results.

    • Useful for demonstration, diagnostics, or integration with larger health monitoring systems.

  6. User Space Command Line Interface

    • Command-line utility to

      1. Query the userspace library version.

      2. Query the firmware library version.

      3. Query the Out-of-Reset (OoR) PFDI results.

      4. Inject PFDI errors.

      5. Query the PFDI test count.

Interaction Flow

The PFDI framework facilitates a multi-layered interaction between user space application and platform firmware. Below is the typical flow of interaction:

Out-of-Reset PFDI

  1. Primary Core

    • During early cold boot, the primary core executes the OoR PFDI.

    • If the primary core fails the OoR PFDI, the boot is aborted.

  2. Secondary Cores

    • The primary core sequentially pulls the secondary cores out of reset.

    • A secondary core runs its OoR PFDI, reports the results and re-enters an off state.

  3. Boot Blocking on Failure

    • Secondary cores are prevented from being turned on by Linux if their OoR PFDI had failed.

Online PFDI

  1. User Space Initiation

    • A user space application invokes a function from the libpfdi library to request a fault detection test.

    • The library constructs a control request and sends it via an ioctl call to the PFDI kernel driver.

  2. Kernel-Level Mediation

    • The Linux kernel driver receives the ioctl call and translates it into a Secure Monitor Call (SMC).

    • An SMC is issued to transition from non-secure EL1 (Linux) to secure EL3 (firmware).

  3. Platform Firmware Execution

    • The Trusted Firmware SMC handler receives the request and delegates it to the internal PFDI driver.

    • The PFDI driver provides appropriate handlers to register with the firmware test library and invoke the necessary test routines.

  4. Test Execution in EL3

    • The firmware test library performs low-level validation of CPU functional logic.

    • The result is captured and returned through the PFDI driver.

  5. Result Propagation

    • The result is passed back through the SMC call to the Linux kernel driver.

    • The kernel driver makes the test result available to the user space application via the original ioctl return or a subsequent query.

The following diagram shows the components and interaction flow that implement the Platform Fault Detection Interface.


Platform Fault Detection Interface