diff --git a/doc/ragile fancontrol hld.md b/doc/ragile fancontrol hld.md new file mode 100644 index 0000000000..b5d0bf7a7e --- /dev/null +++ b/doc/ragile fancontrol hld.md @@ -0,0 +1,209 @@ +# Feature Name # + +Fan control policy for ragile device. + +## High Level Design Document + +Rev 0.1 + +## Table of Content + +* [Feature Name](#feature-name) + * [High Level Design Document](#high-level-design-document) + * [Table of Content](#table-of-content) + * [Revision](#revision) + * [Scope](#scope) + * [Definitions/Abbreviations](#definitionsabbreviations) + * [Overview](#overview) + * [Requirements](#requirements) + * [Design](#design) + * [Platform capabilities](#platform-capabilities) + * [Platform restrictions](#platform-restrictions) + * [Policy](#policy) + * [Emergency policy](#emergency-policy) + * [SAI API](#sai-api) + * [Configuration and management](#configuration-and-management) + * [CLI/YANG model Enhancements](#cliyang-model-enhancements) + * [Config DB Enhancements](#config-db-enhancements) + * [Warmboot and Fastboot Design Impact](#warmboot-and-fastboot-design-impact) + * [Restrictions/Limitations](#restrictionslimitations) + * [Testing Requirements/Design](#testing-requirementsdesign) + * [Unit Test cases](#unit-test-cases) + * [System Test cases](#system-test-cases) + * [Open/Action items - if any](#openaction-items---if-any) + +### Revision + +| Rev | Date | Author | Change Description | +| ---- | ---------- | ----------- | ------------------ | +| 0.1 | 09/22/2021 | Ragile Team | Initial version | + + +### Scope + +This document gives the details of fan control design for Ragile device. + +### Definitions/Abbreviations + +| Definitions/Abbreviation | Description | +| ------------------------ | ----------------------------------------- | +| MAC | Medium access control chip | +| BOARD | Motherboard | +| CPU | Central processing unit | +| INLET | Air inlet | +| OUTLET | Air onlet | +| CPLD | Complex programmable logic device | +| FPGA | Field-programmable gate array | + +| Definitions/Abbreviation | Description | +| ------------------------ | ------------------------------------- | +| INLET_T | Temperature detection point of INLET | +| OUTLET_T | Temperature detection point of OUTLET | +| CPU_T | Temperature detection point of CPU | +| BOARD_T | Temperature detection point of BOARD | +| MAC_T | Temperature detection point of MAC | + +### Overview + +In order to ensure the stable of the networking switch at an appropriate temperature, this document provides an structure of fan control based on the temperature points. + +### Requirements + +The functional requirements include: + +- A stable method of obtaining four temperature points. Like etc. +- A method to control Fans, like CPLD or FPAG etc. +- An common platform API that encapsulates above content. + +### Design + +#### Platform capabilities + +- Fan speed is described by 0 ~ 255(0 ~ 0xff) levels, 0 means stopped and 255 means maximum, the default level is `96(0x60)` +- The temperature detection points involved are `CPU_T`, `INLET_T`, `OUTLET_T`, `BOARD_T`, `MAC_T`. +- Support fan redundancy. + +#### Platform restrictions + +- For safety reasons, level `0` is not allowed, the minimum is limit to `51(0x33)`. +- Fans with opposite directions are not allowed. + +#### Policy + +- When keep level `96(0x60)`. + +- When and device temperature is growing, calculate fan speed following formula: + + +
+ +- When and device temperature is cooling down, there is two policy: + - When use formula above. + - Otherwise keep the original speed. + +| Definitions | Description | +| ------------------- | -------------------------------------------------- | +| | Inlet temperature | +| | Minimum allowable temperature | +| | Maximum allowable temperature | +| | Speed of fan | +| | Minimum speed of fan | +| | Maximum speed of fan | +| | Slope of fan speed and temperature | +| | Inlet temperature measured last time | +| | Inlet temperature measured current | +| | Fuse that determine whether to trigger fan control | + + +#### Emergency policy + +- When the device status fails three times in a row, set level as `187(0xbb)` until it back to normal. Then restart the control policy. + +- There is two way to determine device status: + 1. Error reading temperature point. + 2. or + +- When + + or + or + or + or + + enter the warning alram state, print corresponding log, turn state LED to amber and adjust all Fans to full speed. + +- When + + or + and + enter critical alarm state, print corresponding log, turn SYS_LED to red. When any one of the following two conditions is met, reset the machine. + + - + - and and and + +- To avoid jitter in the measurement, warning state and crital state need to be verified twice. + +| Definitions | Description | +| ------------------- | -------------------------------------------------- | +| | Outlet temperature | +| | MAC temperature | +| | BOARD temperature | +| | CPU temperature | +| | Error temperature low threshold, default is -50C° | +| | Error temperature high threshold default is 50C° | +| | MAC warning temperature threshold | +| | MAC critical temperature threshold | +| | OUTLET warning temperature threshold | +| | OUTLET critical temperature threshold | +| | INLET warning temperature threshold | +| | INLET critical temperature threshold | +| | CPU warning temperature threshold | +| | CPU critical temperature threshold | +| | BOARD warning temperature threshold | +| | BOARD critical temperature threshold | + +### SAI API + +NA + +### Configuration and management + +NA + +#### CLI/YANG model Enhancements + +NA + +#### Config DB Enhancements + +NA + +### Warmboot and Fastboot Design Impact + +NA + +### Restrictions/Limitations + +NA + +### Testing Requirements/Design + +NA + +#### Unit Test cases + +Run command `show platform fan status` to check current fan speed, alias **FAN_SPEED** + +- Pluging out one or more fans, check FAN_SPEED is it running at full speed after +- Heating up chips to warning threshold, check FAN_SPEED is it running at full speed. +- Heating up chips to critical threshold, check system is it reset. + +#### System Test cases + +NA + +### Open/Action items - if any + +NA