-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add back the option f to the reboot script #21667
Closed
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
209b7ddec109587ddeb90071ca23ae6a288b1442 (HEAD -> 201911, origin/201911) Fixed the possibility of using uninitialized variable in route_check.py (sonic-net#1551) e30387cbebaaccbf9385059b1e501955c40be338 route_check: Fix hanging & logging level (sonic-net#1520) 3c8de6950615a4608a80e3d47ea678f8e8487186 Add self timeout and crash if exceeded. (sonic-net#1502) Signed-off-by: Abhishek Dosi <[email protected]>
Fix show interface status Ethernet* (sonic-net#1559) Signed-off-by: Abhishek Dosi <[email protected]>
Fix Bad Merge Signed-off-by: Abhishek Dosi <[email protected]>
…tainers. (sonic-net#7340) Signed-off-by: Yong Zhao [email protected] Why I did it This PR aims to monitor critical processes in router advertiser and dhcp_relay containers by Monit. How I did it Router advertiser container only ran on T0 device and the T0 device should have at least one VLAN interface which was configured an IPv6 address. At the same time, router advertiser container will not run on devices of which the deployment type is 8. As such, I created a service which will dynamically generate Monit configuration file of router advertiser from a template. Similarly Monit configuration file of dhcp_relay was also generated from a template since the number of dhcrelay process in dhcp_relay container is depended on number of VLANs. How to verify it I verified this implementation on a DuT.
…o poll mode (sonic-net#7334) #### Why I did it - xcvrd crash was seen in latest 201811 images. - For Dell S6100,API 2.0 uses poll mode while 1.0 was still using interrupt mode. #### How I did it - Modified get_transceiver_change_event in 1.0 to poll mode in all the related branches. Backport of sonic-net#7309 to the 201911 branch
…dhcp_relay (sonic-net#7378) #### Why I did it Since we will have multiple `dhcrelay` processes if there exists different VLANs in the table `VLAN_INTERFACE` of `CONIFG_DB`, we should use unique service name for each `dhcrelay` process in Monit configuration file. Otherwise, Monit service will fail to work. #### How I did it I append the VLAN name to the end of each service name such that they are unique. Signed-off-by: Yong Zhao <[email protected]>
a364614 2021-04-22 | [201911][acl] Use a list instead of a comma-separated string for ACL port list (sonic-net#1576) [Danny Allen] 391e524 2021-04-15 | [201911] Fix Multi-ASIC show specific resursive route (sonic-net#1563) [gechiang]
[techsupport] Update show ip interface command (sonic-net#1562) Signed-off-by: Abhishek Dosi <[email protected]>
Issue is get_pip.py is moved to pip 21.1 (https://github.com/pypa/get-pip/commits/main) which is not compatible with 3.6. Issue of pip itself is fixed as part of 21.1.1 in pip community (pypa/pip#9835). However get-pip.py is still not updated to latest pip. Also get.pip.py does not support python 3.6 version explicitly (pypa/get-pip#88) Step 15/29 : RUN curl https://bootstrap.pypa.io/get-pip.py | python3.6 ---> Running in bece31f49267 % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 100 1891k 100 1891k 0 0 9564k 0 --:--:-- --:--:-- --:--:-- 9600k Traceback (most recent call last): File "<stdin>", line 24298, in <module> File "<stdin>", line 139, in main File "<stdin>", line 115, in bootstrap File "<stdin>", line 96, in monkeypatch_for_cert File "/tmp/tmp5fnxrz0a/pip.zip/pip/_internal/commands/__init__.py", line 9, in <module> File "/tmp/tmp5fnxrz0a/pip.zip/pip/_internal/cli/base_command.py", line 12, in <module> File "/tmp/tmp5fnxrz0a/pip.zip/pip/_internal/cli/cmdoptions.py", line 30, in <module> File "/tmp/tmp5fnxrz0a/pip.zip/pip/_internal/utils/hashes.py", line 2, in <module> ImportError: cannot import name 'NoReturn' The command '/bin/sh -c curl https://bootstrap.pypa.io/get-pip.py | python3.6' returned a non-zero code: 1 How I did: Got the file from https://github.com/pypa/get-pip/tree/21.0 and added to the buildimage pin pip to the previous release 21.0.1. (Similar is done in other public repos eg: grpc/grpc-java#8115) Signed-off-by: Abhishek Dosi <[email protected]>
New features and fixes in the new SDK/FW: SN4600C | AN/LT support SN2700 | AN/LT bugs fixes WJH | FID_MISS support Signed-off-by: Kebo Liu <[email protected]>
…net#7438) Signed-off-by: Yong Zhao [email protected] Why I did it This PR aims to monitor the critical processes in PMon container by Monit in 201911 branch. How I did it I created a template configuration file of Monit and it will be rendered to generate Monit configuration file of PMon container by a service generate_monit_config.service. How to verify it I verified this on a Mellanox device str-msn2700-03 and an Arista device str-a7050-acs-1. Which release branch to backport (provide reason below if selected) 201811 [x ] 201911 202006 202012
…net#7426) - Fix ACL ANY debug counter to correctly track ACL drops - Add VXLAN source port hard coded range, controlled by K/V Signed-off-by: Dror Prital <[email protected]>
…ly (sonic-net#7501) Signed-off-by: Abhishek Dosi <[email protected]>
…et#7491) 20e1589 [Mellanox] [201911] backport kernel patches for hw-management 7.0100.2303 (sonic-net#210)
* Set monitoring VLAN hostif up dy default (for VNET ping tool) Signed-off-by: Volodymyr Samotiy <[email protected]>
- Update hw-mgmt pointer - Remove unused patches - Fix existing patch to make sure it apply successfully
…ollect process (sonic-net#7308) Recently, we found on some of our testbeds the entropy collecting process finishes more than 60 seconds after system started. This results in swss not able to start sporadically. To install haveged can accelerate the entropy collect process. Signed-off-by: Stephen Sun <[email protected]>
…et#7394) Enable VXLAN src port range configuration via SAI profile
[201911]: add show bgp neigh/network support for multi asic (sonic-net#1587) Signed-off-by: Abhishek Dosi <[email protected]>
Why I did it Added soft-reboot plugin support. Added SSD version s16425cq check Added error message to display in console/SSH in case reboot is called in faulty/non-upgraded devices.
1f249282e8066a5837f2b34478eb4e0f6b4a654c (HEAD -> 201911, origin/201911) [201911] soft-reboot - support ssd_fw_update (sonic-net#1518) 30a3cb3c085a7f208a44b58060ba797e4299214a [route_check] Filter out VNET routes (sonic-net#1582) Signed-off-by: Abhishek Dosi <[email protected]>
dd01491e4d167993b3a80517f737188151443a75 (HEAD -> 201911, origin/201911) [Monitor Vlan] Fix a typo in hostif (sonic-net#1722) Signed-off-by: Abhishek Dosi <[email protected]>
…-net#7514) Add downstreamsubrole parsing to minigraph.py So that downstreamsubrole values can be used for policies. Backport PR, same as sonic-net#7193
e438b0db6a8912b50f7acddf93d4dc2157f53ecf (HEAD -> 201911, origin/201911) Increase Syncd operation timeout from 1 min to 6 min. (sonic-net#828) 17974adb369111b44dd56837547806918ed4b1ed Update syncd_flex_counter.cpp (sonic-net#798) Signed-off-by: Abhishek Dosi <[email protected]>
d898b03e4ec91f964f0e1fcba535ea33a78c838e (HEAD -> 201911, origin/201911) Create mappings using existing tunnel (sonic-net#1593) Signed-off-by: Abhishek Dosi <[email protected]>
…onic-net#7536) #### Why I did it MSN4700 A1/A0 used different sensor chip but keep the existing platform name *x86_64-mlnx_msn4700-r0*, this is a workaround to replace the sensor conf on MSN4700 A1/A0 #### How I did it Use a shell script to get the sensor conf path and copy that files to /etc/sensors.d/sensors.conf
…#7558) Signed-off-by: Yong Zhao [email protected] Why I did it The service file generate_monit_config.service is used to generate the Monit configuration file from template. I also should install this service file and enable it. How I did it I appended this service file name at the end of /etc/sonic/generated_services.conf. How to verify it I verified this on the device str2-7260cx3-acs-1. Which release branch to backport (provide reason below if selected) 201811 [x ] 201911 202006 202012
Signed-off-by: Guohan Lu <[email protected]>
Signed-off-by: Guohan Lu <[email protected]>
* Create Vxlan and Vnet default configs
- Why I did it Added BIOS upgrade infra - How I did it Added new make target - How to verify it Copy msn3800_bios.tar.gz to platform/mellanox/bios make configure PLATFORM=mellanox make target/files/stretch/msn3800_bios.tar.gz Signed-off-by: Nazarii Hnydyn <[email protected]>
…c-net#7719) #### Why I did it To allow SSH connections from IPv6 addresses Resolves sonic-net#7668 #### How I did it In build_debian.sh, modify sshd_config file so as to enable listening for IPv6 connections
*Fix a typo introduced as part of sonic-net#13403
) Why I did it docker.com's gpg key start to work from 2023-02-23. While debian.org's gpg key expired in 2022-11. We used a walkaround for security checking for debian gpg keys. Now we need to exclude docker.com's gpg key. How I did it Update docker.com's gpg key without faketime. Update others' gpg key with faketime '2022-11' How to verify it
Change to use the snapshot mirror http://packages.trafficmanager.net/snapshot. Warning: The Jessie distribution is EOL, please avoid to use it if you can. And the snapshot mirror will be removed in near future as well.
Why I did it Some products might experience an occasional IO failure in the communication between CPU and SSD. Based on some research it could be attributable to some device not handling ATA NCQ (Native Command Queue). This issue currently affect 4 products: DCS-7170-32C* DCS-7170-64C DCS-7060DX4-32 DCS-7260CX3-64 DCS-7050CX3-32S How I did it This change disable NCQ on the affected drive for a small set of products. How to verify it When the fix is applied, these 2 patterns can be found in the dmesg. ata[0-9]+.00: FORCE: horkage modified (noncq) NCQ (not used) Test results using: fio --direct=1 --rw=randrw --bs=64k --ioengine=libaio --iodepth=64 --runtime=120 --numjobs=4 with NCQ (ata1.00: 61865984 sectors, multi 1: LBA48 NCQ (depth 32), AA) READ: bw=33.9MiB/s (35.6MB/s), 33.9MiB/s-33.9MiB/s (35.6MB/s-35.6MB/s), io=4073MiB (4270MB), run=120078-120078msec WRITE: bw=34.1MiB/s (35.8MB/s), 34.1MiB/s-34.1MiB/s (35.8MB/s-35.8MB/s), io=4100MiB (4300MB), run=120078-120078msec without NCQ (ata1.00: 61865984 sectors, multi 1: LBA48 NCQ (not used)) READ: bw=31.7MiB/s (33.3MB/s), 31.7MiB/s-31.7MiB/s (33.3MB/s-33.3MB/s), io=3808MiB (3993MB), run=120083-120083msec WRITE: bw=31.9MiB/s (33.4MB/s), 31.9MiB/s-31.9MiB/s (33.4MB/s-33.4MB/s), io=3830MiB (4016MB), run=120083-120083msec Which release branch to backport (provide reason below if selected)
) Improve sudo cat command for RO user. Manually cherry-pick for sonic-net#14428
… of squashfs (sonic-net#14270) 202211 and above uses different squashfs compression type that 201911 kernel can not handle. Therefore, we avoid mounting squashfs altogather with this change.
Upgrade BRCM SAI to Debian package SAI 3.7.6.1-3.
…t#15083) [Build] Fix the stretch/jessie mirror removed issue.
ISSU version check fails due to inability to mount squashfs from 202211 on 201911
This PR makes two changes: - Store Jinja2 cache in LOGLEVEL DB instead of STATE DB - Store bytecode cache encoded in base64 Tested with the following command: "redis-dump -d 3 -k JINJA2_CACHE" Signed-off-by: Stepan Blyschak <[email protected]>
Why I did it Fix: sonic-net#16086 faketime package url expired. It breaks 201911 build. Update package url. Work item tracking Microsoft ADO (number only): 24930879
ef2a0cd0 [201911] [multi_asic] Script to monitor errors on internal links (sonic-net#2971) 1252e31b Changes to separate UT data for internal link monitor (sonic-net#2976) 3e6654e [[201911] [multi-asic] Unit test fix for internal link monitoring (sonic-net#2977)
… script (sonic-net#16393) Monit changes to enable script to monitor SAI_PORT_STAT_IF_IN_ERRORS & SAI_PORT_STAT_IF_OUT_ERRORS on internal (backend) ports of multi-asic device.
Why I did it Back port sonic-net#6478 and sonic-net#6519 to 201911 branch. Work item tracking Microsoft ADO (number only): 24978836 How I did it Add checking the connection between zebra and bgp during bgpd start. How to verify it Modify start.h, add debug log and check the syslog _Sep 22 02:41:29.716356 str-a7060cx-acs-10 INFO bgp#root: ####: start zebra Sep 22 02:41:30.815341 str-a7060cx-acs-10 INFO bgp#root: ####: start check connection Sep 22 02:41:30.868784 str-a7060cx-acs-10 INFO bgp#root: ####: It took 0.029979 seconds to wait for zebra to be ready to accept connections Sep 22 02:41:30.873685 str-a7060cx-acs-10 INFO bgp#root: ####: start bgpd Sep 22 02:41:35.270569 str-a7060cx-acs-10 INFO bgp#root: ####: done_ _Sep 22 03:28:02.423438 str-a7060cx-acs-10 INFO bgp#root: ####: start zebra Sep 22 03:28:03.731320 str-a7060cx-acs-10 INFO bgp#root: ####: start check connection Sep 22 03:28:33.749152 str-a7060cx-acs-10 INFO bgp#root: ####: Error: zebra is not ready to accept connections Sep 22 03:28:33.752490 str-a7060cx-acs-10 INFO bgp#root: ####: start bgpd Sep 22 03:28:34.259735 str-a7060cx-acs-10 INFO bgp#root: ####: start bgpd done Sep 22 03:28:34.755538 str-a7060cx-acs-10 INFO bgp#root: ####: start bgpcfgd Sep 22 03:28:35.800906 str-a7060cx-acs-10 INFO bgp#root: ####: done_
…onic-net#16907) Fix monit false alarm issue, which located in process_checker and it missed "disk-sleep" status check, thus some 201911 SONiC box report "pmon|sensord" error coincidently. #### Why I did it Currently psutil library returns below detail process status: running: The process is currently running. sleeping: The process is sleeping or waiting for an event to occur. disk-sleep: The process is waiting for I/O operations to complete. stopped: The process has been stopped (e.g. via the SIGSTOP signal). zombie: The process has terminated but is still listed in the process table. dead: The process has terminated and has been removed from the process table. We should regard running/sleeping/disk-sleep as normal case and not alert in monit process. Now once the disk-sleep occurs during monit cycle, below syslog will be paged, so get rid of syslog output meanwhile. yslog.2.gz:Feb 24 06:12:17.394619 MEL23-0101-0301-04T1 ERR monit[6040]: 'pmon|sensord' status failed (1) -- '/usr/sbin/sensord -f daemon' is not running in host syslog.2.gz:Feb 24 06:13:17.932531 MEL23-0101-0301-04T1 ERR monit[6040]: 'pmon|sensord' status failed (1) -- '/usr/sbin/sensord -f daemon' is not running in host syslog.2.gz:Feb 24 06:14:18.502505 MEL23-0101-0301-04T1 ERR monit[6040]: 'pmon|sensord' status failed (1) -- '/usr/sbin/sensord -f daemon' is not running in host Then I tried to reproduce the issue by triggering process_checker for sensord frequently and observed it's under "disk-sleep" status once the alert is raised. ##### Work item tracking - Microsoft ADO **(number only)**:17663589 #### How I did it Fix process_checker script code for adding "disk-sleep" case handling. #### How to verify it Verified in local DUT.
8b9cab7 2023-10-26 [201911] Fix IfHighSpeed UT issue on 201911 (sonic-net#299) 622b771 2023-10-13 | Fix backup port rfc2863 UT to 202012 branch issue (sonic-net#298) [Hua Liu] fa94798 2023-10-11 | Add ifhighspeed UT (sonic-net#296) [Hua Liu] 41789ca 2023-09-14 | Support interface speed for PortChannels (sonic-net#262) [Lukas Stockner]
Signed-off-by: Stepan Blyschak <[email protected]>
…onic-net#18205) * [build] Use public storage for public resources. (sonic-net#18038) * fix * fix
Why I did it Add support for the graceful reboot instead of the sysfs power cycle to avoid filesystem corruption Work item tracking Microsoft ADO (number only): How I did it Rename the platform_reboot script to the pre_reboot_hook. Remove the sysfs power cycle function, from now on the Debian reboot (/sbin/reboot) will be executed instead of the sysfs power cycle. How to verify it Start watching logs by using show log -f and journalctl -p debug -f Execute the reboot command from the switch CLI Check in logs that all systemd services terminated Signed-off-by: Jianyue Wu <[email protected]>
|
/azp run Azure.sonic-buildimage |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Back port from sonic-net/sonic-utilities@544584e
What I did
The same PR for 202311 doesn't have this mistake - sonic-net/sonic-utilities#3204
Why I did it
Fix missing -f option issue.
How I did it
Add back -f option.
How to verify it
Call reboot -f.
Which release branch to backport (provide reason below if selected)