Skip to content

Latest commit

 

History

History
209 lines (148 loc) · 15.6 KB

ragile fancontrol hld.md

File metadata and controls

209 lines (148 loc) · 15.6 KB

Feature Name

Fan control policy for ragile device.

High Level Design Document

Rev 0.1

Table of Content

Revision

Rev Date Author Change Description
0.1 09/22/2021 Ragile Team Initial version

Scope

This document gives the details of fan control design for Ragile device.

Definitions/Abbreviations

Definitions/Abbreviation Description
MAC Medium access control chip
BOARD Motherboard
CPU Central processing unit
INLET Air inlet
OUTLET Air onlet
CPLD Complex programmable logic device
FPGA Field-programmable gate array
Definitions/Abbreviation Description
INLET_T Temperature detection point of INLET
OUTLET_T Temperature detection point of OUTLET
CPU_T Temperature detection point of CPU
BOARD_T Temperature detection point of BOARD
MAC_T Temperature detection point of MAC

Overview

In order to ensure the stable of the networking switch at an appropriate temperature, this document provides an structure of fan control based on the temperature points.

Requirements

The functional requirements include:

  • A stable method of obtaining four temperature points. Like etc.
  • A method to control Fans, like CPLD or FPAG etc.
  • An common platform API that encapsulates above content.

Design

Platform capabilities

  • Fan speed is described by 0 ~ 255(0 ~ 0xff) levels, 0 means stopped and 255 means maximum, the default level is 96(0x60)
  • The temperature detection points involved are CPU_T, INLET_T, OUTLET_T, BOARD_T, MAC_T.
  • Support fan redundancy.

Platform restrictions

  • For safety reasons, level 0 is not allowed, the minimum is limit to 51(0x33).
  • Fans with opposite directions are not allowed.

Policy

  • When keep level 96(0x60).

  • When and device temperature is growing, calculate fan speed following formula:

  • When and device temperature is cooling down, there is two policy:
    • When use formula above.
    • Otherwise keep the original speed.
Definitions Description
Inlet temperature
Minimum allowable temperature
Maximum allowable temperature
Speed of fan
Minimum speed of fan
Maximum speed of fan
Slope of fan speed and temperature
Inlet temperature measured last time
Inlet temperature measured current
Fuse that determine whether to trigger fan control

Emergency policy

  • When the device status fails three times in a row, set level as 187(0xbb) until it back to normal. Then restart the control policy.

  • There is two way to determine device status:

    1. Error reading temperature point.
    2. or
  • When

    or or or or

    enter the warning alram state, print corresponding log, turn state LED to amber and adjust all Fans to full speed.

  • When

    or and

    enter critical alarm state, print corresponding log, turn SYS_LED to red. When any one of the following two conditions is met, reset the machine.

    • and and and
  • To avoid jitter in the measurement, warning state and crital state need to be verified twice.

Definitions Description
Outlet temperature
MAC temperature
BOARD temperature
CPU temperature
Error temperature low threshold, default is -50C°
Error temperature high threshold default is 50C°
MAC warning temperature threshold
MAC critical temperature threshold
OUTLET warning temperature threshold
OUTLET critical temperature threshold
INLET warning temperature threshold
INLET critical temperature threshold
CPU warning temperature threshold
CPU critical temperature threshold
BOARD warning temperature threshold
BOARD critical temperature threshold

SAI API

NA

Configuration and management

NA

CLI/YANG model Enhancements

NA

Config DB Enhancements

NA

Warmboot and Fastboot Design Impact

NA

Restrictions/Limitations

NA

Testing Requirements/Design

NA

Unit Test cases

Run command show platform fan status to check current fan speed, alias FAN_SPEED

  • Pluging out one or more fans, check FAN_SPEED is it running at full speed after
  • Heating up chips to warning threshold, check FAN_SPEED is it running at full speed.
  • Heating up chips to critical threshold, check system is it reset.

System Test cases

NA

Open/Action items - if any

NA