Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Research fallout from usbhid-ups fix for LP64 bit maths #1138

Closed
wants to merge 6 commits into from

Conversation

jimklimov
Copy link
Member

@jimklimov jimklimov commented Oct 20, 2021

Follows up from #1023 (comment) and PRs #1040 and #1055

This PR increases logging detail for report posted in the issue (apparently impacted development on #616), and brings back code of implementation obsoleted by PR #1040 so it is easier to compare the two in real time.

Hopefully this branch will get tested for different endiannesses by the QEMU side of NUT CI farm.

CC @nbriggs @aquette @blecher-at

@jimklimov
Copy link
Member Author

jimklimov commented Oct 20, 2021

Adapting from original issue report in #1023:

Without the changes to drivers/hidparser.c (done by PR #1040), I see 0x1709 converted to 232,7 and 0x14 to 20, which is correct:

0.315012 [D3] Report[get]: (3 bytes) => 0f 17 09
0.315152 [D1] Path: UPS.Input.Voltage, Type: Feature, ReportID: 0x0f, Offset: 0, Size: 16, Value: 232.7
0.229773 [D3] Report[get]: (2 bytes) => 13 14
0.229936 [D1] Path: UPS.Output.PercentLoad, Type: Feature, ReportID: 0x13, Offset: 0, Size: 8, Value: 20
0.228399 [D3] Report[get]: (3 bytes) => 1c f6 01
0.228577 [D1] Path: UPS.Input.Frequency, Type: Feature, ReportID: 0x1c, Offset: 0, Size: 16, Value: 50.2

And after the changes made there, I see wrong values of -0.1, -1 respectively:

0.309297 [D3] Report[get]: (3 bytes) => 0f 17 09
0.309590 [D1] Path: UPS.Input.Voltage, Type: Feature, ReportID: 0x0f, Offset: 0, Size: 16, Value: -0.1
0.229773 [D3] Report[get]: (2 bytes) => 13 14
0.225024 [D1] Path: UPS.Output.PercentLoad, Type: Feature, ReportID: 0x13, Offset: 0, Size: 8, Value: -1

From Jim: On my side, so far I did not reproduce this at least in the test program - with both implementations of GetValue(), a mock of this report is parsed the same way. It may be that some other routine in the real code path expects e.g. an inverted byte order, or the real pData lacks Unit(Exp), LogMin/LogMax and other such settings.

@blecher-at
Copy link
Contributor

I ran the new unittests, and also ran the more verbose version against my ups.

in the unittest, the min/max values are correct:
Test #6 - NEW - buf "0f 17 09" offset 0 size 16 logmin 0 (0x0) logmax 65535 (0xffff) value 2327 PASS
Test #6 - OLD - buf "0f 17 09" offset 0 size 16 logmin 0 (0x0) logmax 65535 (0xffff) value 2327 PASS

on the real hardware however those values are wrong, but I don't know how min/max are derived. if log in logmin/logmax means historical (logged) values, it will always be wrong as we don't have any previous values...

0.298930 [D3] Report[get]: (3 bytes) => 12 fe 08
0.299061 [D1] Path: UPS.Output.Voltage, Type: Feature, ReportID: 0x12, Offset: 0, Size: 16, Unit: 15782177, UnitExp: 6, LogMin: 0, LogMax: -1, PhyMin(notused): 0, PhyMax(notused): 0, Value: -0.1
0.299418 [D3] Report[get]: (2 bytes) => 13 14

@jimklimov
Copy link
Member Author

jimklimov commented Oct 20, 2021

Code comments say it is "logical" Min/Max, and at/near https://github.com/networkupstools/nut/pull/1138/files#diff-8dc52da32d935c64c234bbd7955b6db39136d4d21ed30b6e0eca1a58ffb1311cL400 the parsed long undergoes conversion to double, then conversion "from logical to physical" (whatever that means), and base-10 exponentiation.

In your report, the "Unit" seems weird; wondering if it is at all initialized or random-memory bits, and how it did not impact in older code. And also the LogMin/LogMax seem bogus, but less bogus so to say.

I'm away from computer now for the day, so can only guess these fields were ignored by older code, maybe?..

@blecher-at
Copy link
Contributor

min max were always 0, -1 (which can't be correct) on the wrongly reported values.
see the full report issue_1138_full.txt

@blecher-at
Copy link
Contributor

blecher-at commented Oct 20, 2021

min/max 0,-1 triggered the following (old) code:

	/* figure out how many bits are significant */
	range = pData->LogMax - pData->LogMin + 1;
	if (range <= 0) {
		/* makes no sense, give up */
		*pValue = value;
		return;
	}

If I restore this (
on the main branch: https://github.com/networkupstools/nut/pull/1140/files
(or on my legrand issue: https://github.com/networkupstools/nut/pull/1139/files#diff-fd503d938f5fdf52c584c8643bacca0ed7b9dbdd2fc24c9f7c19abda651313deR467) it works again as expected, but maybe there is a better, cleaner solution to handle this.

@jimklimov
Copy link
Member Author

jimklimov commented Oct 20, 2021 via email

@nbriggs
Copy link
Contributor

nbriggs commented Oct 20, 2021

Thanks for the report. The thing to look for is where the Logical Minimum, and Logical Maximum are being derived. Units isn't involved at this stage (though it looks odd and it would be good to know if that value is actually correct). Your UPS is reporting back a 16-bit value, which is being correctly extracted, but because the logical maximum value is reported as a signed value (-1) rather than unsigned (65535) the newer code is going to honor that rather than throwing up its hands.

@jimklimov
Copy link
Member Author

jimklimov commented Oct 20, 2021 via email

@nbriggs
Copy link
Contributor

nbriggs commented Oct 20, 2021

Thinking out loud here: LogMin and LogMax get set up in HIDParse (hidparser.c:305 et seq), which come from the report descriptor -- which I think was in your diagnostic output so I'll have a look. I'm a little suspicious of hidparser.c:FormatValue, which is ONLY used to extract the logical/physical x min/max values, and doesn't seem to be capable of returning +65535 from a 2-byte value... time to go read the spec. In the cases I was looking at for extracting item values correctly (based on the report descriptor + bytes read) all the report descriptors were correct. You've got a case here where the report descriptor is either plain wrong, or misinterpreted.

@jimklimov
Copy link
Member Author

jimklimov commented Oct 20, 2021 via email

@nbriggs
Copy link
Contributor

nbriggs commented Oct 20, 2021

Don't know... that's why I need to read the spec and look at the data transmitted in the report descriptor. We may be chasing code that's throwing away information and then later trying to guess (sometimes incorrectly) what that information was.

@nbriggs
Copy link
Contributor

nbriggs commented Oct 20, 2021

The values that are pulled out of the report descriptor (for the min/max) can be 1, 2, or 4 bytes long. That's independent of the size of the values in the items that are being sent later. For example, the descriptor could say in 4 bytes that the range is 0..65535 and then later the item value could be transmitted as 1 or 2 bytes as necessary to represent the instantaneous reading. I haven't got as far as reading the spec yet. Regardless, the code base could use a test interface to call the HIDParse() code (if there isn't already one...) so that we can take the bytes that are the report descriptors from the log and see whether they make sense, and test fixes.

@nbriggs
Copy link
Contributor

nbriggs commented Oct 21, 2021

First: With respect to the Unit code, 15782177, (aka 0xF0D121) -- that is indeed the unit code for Volts.

Second: I fed the report descriptor to USB Report Descriptor Parser -- an independent implementation of the HIDParser -- and this is the result:
report-decoded.txt

To pick one of the problematic items:

   0.297229     [D1] Path: UPS.Input.Voltage, Type: Feature, ReportID: 0x0f, Offset: 0, Size: 16, Unit: 15782177, UnitExp: 6, LogMin: 0, LogMax: -1, PhyMin(notused): 0, PhyMax(notused): 0, Value: -0.1

we see that:

0x85, 0x0F,        //     Report ID (15)
0x75, 0x10,        //     Report Size (16)
0x09, 0x30,        //     Usage (0x30)
0x67, 0x21, 0xD1, 0xF0, 0x00,  //     Unit (System: SI Linear, Length: Radians, Mass: Gram)
0x55, 0x06,        //     Unit Exponent (6)
0x26, 0xFF, 0xFF,  //     Logical Maximum (-1)
0x16, 0x00, 0x00,  //     Logical Minimum (0)

which indeed says that the acceptable range is 0..-1. If they meant to say 0..65535, they should have described the LogicalMaximum in a 4-byte field (as they did in Report ID 9) instead of a 2-byte field:

   0.292855     [D1] Path: UPS.PowerSummary.ConfigVoltage, Type: Feature, ReportID: 0x09, Offset: 0, Size: 16, Unit: 15782177, UnitExp: 6, LogMin: 0, LogMax: 65535, PhyMin(notused): 0, PhyMax(notused): 0, Value: 12

which had a report descriptor of

0x85, 0x09,        //     Report ID (9)
0x75, 0x10,        //     Report Size (16)
0x27, 0xFF, 0xFF, 0x00, 0x00,  //     Logical Maximum (65534)

(One could be forgiven for wondering why this report decoder thinks that FFFF0000 (little-endian 32-bit value) is 65534 not 65535!! -- I'm not going to read his Javascript to find out)

I do notice that since I last visited the specs there has been an update to the USB Usage Tables for HID Power Devices (May 29, 2020) -- unfortunately I think the extent of the update is to newly disclaim responsibility for using the information it contains. :-(

So... I think there are a few things to be done

  1. Tell the UPS manufacturer that they got the report descriptor values wrong and ought to fix it in some newer revision of their firmware (fat chance of that happening, I would guess!)
  2. Put a hack in the code that parses the Report Descriptor (not the code that I fixed to extract the values) so that if it is parsing a LogicalMaximum value, and that value is negative and less than the LogicalMinimum (or perhaps only if it's all-bits-set for the size it's using), then it should reinterpret it as an unsigned value and log it as buggy data from the device. Since this happens only when the initial connection is made to the device, not each time it reports the actual value (e.g., for UPS.Input.Voltage), you don't need to do any special handling to reduce the chatter. I think It might be reasonable to put this at Warning level.
  3. Have a look at all the report descriptors from any devices you can get your hands on and see if they make sense.

Comments?

@jimklimov
Copy link
Member Author

Thanks @nbriggs for the fine research, I guess the "hack in the code" makes sense for the time being; at least it would make devices usable again even if with lesser peace of mind that the values are trustworthy protocol-wise.

Unfortunately I do not have USB-connected UPSes currently, so might not be able to help with data-point collections in short term.

@jimklimov jimklimov mentioned this pull request Oct 22, 2021
@jimklimov
Copy link
Member Author

jimklimov commented Oct 26, 2021

I think as a "hotfix" I'll go with merging #1146 (based on #1040 contents) to master for now, but a smarter solution would be welcome now that we are aware there is a problem and a solution is possible :) Even if it would be a hack like number-two above.

@jimklimov jimklimov force-pushed the issue_1023_fallout_qemu branch from dc3bec9 to cc47d78 Compare November 2, 2021 09:45
@jimklimov
Copy link
Member Author

NOTE: Rebased over recent master with its bunch of recipe fixes; should make it easier to ./autogen.sh && ./configure && (cd tests && make all) (or cd drivers etc.) without necessarily rebuilding all of NUT, as one would iterate on "proper-hack" fixes.

@jimklimov
Copy link
Member Author

Thinking out loud: are you aware of any software emulators of UPS protocols - scripts, VMs, etc. that would talk on a Unix or network socket to act like a serial/snmp/... UPS? Maybe even something that would even "be" an USB device (maybe as a kernel driver emulating one, or something that can be "attached" to a VM vUSB port)?

I suppose that would be very helpful for NUT driver debugging and unit-testing, especially around different CPU endianness and word size definitions (while interacting with the same bit stream on the cable).

@jimklimov
Copy link
Member Author

Closing the PR, since this is probably not something we would actually merge, and to avoid rebuilds whenever master changes.

The branch remains available for testing if would be needed (and invoking CI automation to test in QEMU machines with many architectures).

@jimklimov jimklimov closed this Dec 28, 2021
@jimklimov jimklimov added the USB-HID encoding/LogMin/LogMax Issues and solutions (PRs) specifically about incorrect values in bitstream label Jan 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
USB-HID encoding/LogMin/LogMax Issues and solutions (PRs) specifically about incorrect values in bitstream
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants