Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support Armac O/850E/PSW #2153 #2154

Merged
merged 9 commits into from
Jan 18, 2024

Conversation

vytautassurvila
Copy link
Contributor

@vytautassurvila vytautassurvila commented Nov 7, 2023

This pull request is to ease discussion for #2153 issue. I have to admit that I have no experience neither with USB development neither with UPS communication protocols. It would be great to get some guidance either these changes can be incorporated into current subdriver or they do require new one?

With these changes nut can read the ups:

   2.269141     [D4] armac command Q1
   2.529908     [D4] read: ret 64 buf ef: 28 32 33 33 2e  >(233.<
   2.529949     [D3] found null byte in status bits at 45 byte, assuming 0.
   2.529970     [D3] trailing bytes in serial transmission found: 47  copied out of 63
   2.529988     [D3] armac command Q1 response read: '(233.0 000.0 234.0 020 50.0 13.7 20.8 00001000'
   2.530031     [D5] send_to_all: SETINFO input.voltage "233.0"
   2.530064     [D4] armac command I
   2.750945     [D4] read: ret 64 buf a7: 23 20 20 20 20  >#    <
   2.750988     [D3] trailing bytes in serial transmission found: 39  copied out of 63
   2.751013     [D3] armac command I response read: '#                           V2.63     '
   2.751054     [D5] send_to_all: SETINFO ups.firmware "V2.63"
   2.751071     Using protocol: Megatec 0.07
   2.751112     [D5] send_to_all: SETINFO driver.state "init.quiet"
   2.751141     [D5] send_to_all: SETINFO driver.version "2.8.1-31-ge451a97bc"
   2.751164     [D5] send_to_all: SETINFO driver.version.internal "0.36"
   2.751188     [D5] send_to_all: SETINFO driver.name "nutdrv_qx"
   2.751201     [D5] send_to_all: SETINFO driver.state "init.info"
   2.751211     [D1] upsdrv_initinfo...
   2.751237     [D5] send_to_all: SETINFO driver.version.data "Megatec 0.07"
   2.751267     [D4] armac command Q1
   3.011883     [D4] read: ret 64 buf ef: 28 32 33 33 2e  >(233.<
   3.011912     [D3] found null byte in status bits at 45 byte, assuming 0.
   3.011922     [D3] trailing bytes in serial transmission found: 47  copied out of 63
   3.011934     [D3] armac command Q1 response read: '(233.0 000.0 234.0 020 50.0 13.8 20.8 00001000'
   3.011978     [D5] send_to_all: SETINFO input.voltage.fault "0.0"
   3.012022     [D5] send_to_all: SETINFO output.voltage "234.0"
   3.012059     [D5] send_to_all: SETINFO ups.load "20"
   3.012084     [D5] send_to_all: SETINFO input.frequency "50.0"
   3.012102     [D5] send_to_all: SETINFO battery.voltage "13.8"
   3.012136     [D5] send_to_all: SETINFO ups.temperature "20.8"

Closes: #2099
Closes: #2153
Closes: #2219

drivers/nutdrv_qx.c Outdated Show resolved Hide resolved
@@ -1952,10 +1951,10 @@ static int armac_command(const char *cmd, char *buf, size_t buflen)
break;
}

if (bytes_available > ARMAC_READ_SIZE - 1) {
// if (bytes_available > ARMAC_READ_SIZE - 1) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I read it right with current implementation bytes_available cannot be greater than 15. But we get more bytes for Q1 response and this always truncates final response

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your "screenshots" in the PR and issue suggest that 0xEF is that first byte (tmpbuf[0]) so yeah, tmpbuf[0] & 0x0f chops it somewhat.

Curiously, further parsing apparently finds 47 useful bytes in the reply (until nul-bytes), which is 0x2F. So maybe just the 0x0F mask should be larger (here it seems the two high bits are set 0xC0 for some other meaning, and the 6 lower bits - allowing for your 64-byte buffer - are the reply length?), e.g. like tmpbuf[0] & 0x3F? CC @blaa ?

BTW, maybe 64+1 for the string's final null byte could be a safer choice, remaining pre-zeroed from that memset() above, just in case :)

Overall, looking at code and trying to remember earlier visits to its review, IIRC the original logic could only read in chunks up to 6 bytes (the ARMAC_READ_SIZE value before this PR) and collected those short reads from tmpbuf into the final buf buffer, sanity-checking it byte-by-byte and detecting the end of reply. With the interrupt approach it makes sense that it uses one interaction to send the whole response, so this sanity-checking copy would complete within the first loop cycle, I suppose.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we've never figured out the meaning of the high nibble, but it can encode longer messages.

Question is - how do one distinguish between those two protocols. The USB ID is probably the same. Trying message and falling back to interrupt might be slower and might confuse the UPS firmware. Maybe one can do it ONCE, and store the result somewhere (global static? uhm. Maybe driver has some state that can be used).

Maybe having two drivers armac and armac2 is viable, they could share some/most of the code. But the user will have to distinguish and select one.

As for the details - it certainly can be done. Would be best to have a PCAP. We have one here too:
#2099
I had some local ugly mock that allowed me to run the code against stored messages. It's getting somewhat hard to have a nice regression testing without a proper test suite and USB mocking.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tmpbuf[0] & 0x3F works great with current ups - thanks! c395c7568572163eea62bfd67eeaec737ceb34af

BTW, maybe 64+1 for the string's final null byte could be a safer choice, remaining pre-zeroed from that memset() above, just in case :)

Not sure if I understand it correctly but probably 877913a4a4673d02f5d46b29dfd0b02d8d355410 might be the fix?

Copy link
Contributor Author

@vytautassurvila vytautassurvila Nov 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question is - how do one distinguish between those two protocols

If lsusb -v actually shows correct endpoint address bEndpointAddress and transfer type Transfer Type then probably we can use use them in code. And then we might survive with single subdriver. libusb should expose them too https://stackoverflow.com/a/16438105

If you agree with this approach I could try to implement it. Maybe you could advice in which function would be best to retrieve device descriptors (upsdrv_initups, upsdrv_initinfo, something else)?

Copy link
Member

@jimklimov jimklimov Nov 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking of it, this would either way be a toggle to choose one of two otherwise similar behaviors. For the first approximation (and even later, for when smart ideas misfire) a manual toggle as a driver config option is a reasonable choice.

Just that later if it is unspecified, we can try to guess the best setting at run-time.

Copy link

@blaa blaa Nov 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use libusb functions direct there then maybe we need to read struct: https://libusb.sourceforge.io/api-1.0/structlibusb__endpoint__descriptor.html#a111d087a09cbeded8e15eda9127e23d2

bmAttributes: Bits 0:1 determine the transfer type and correspond to libusb_endpoint_transfer_type

So we either have it somewhere, or need to read it. Probably starting with libusb_get_device_descriptor() or maybe libusb_get_active_config_descriptor?

I'm unsure though where to retrieve it. You could do it in current armac subdriver function, but that would need some caching. Start with structure that describes the subdriver, I'd rather try to keep it in those functions. Maybe it was possible to create an subdriver init function and I've just used a ready one - I don't remember. But that would be perfect spot.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've checked; there's only subdriver 'command' function. So to keep it "contained" you could store it there but there's no proper state to save it.

Drivers work as separate processes AFAIK. So there should be no situation in which one creates subdriver for one UPS and then reuses it in another. So static field in armac_command... could be fine maybe?

Otherwise you'd need to change qx driver API somehow. Create subdriver-specific state pass it around... I don't think that's a good idea.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added libusb_get_active_config_descriptor and cache it in static variable. I enable new interrupt logic only when config fully matches new Armac ups. This way we hopefully won't break older ups models that works with usb_control_msg.

I have not added the toggle as driver config option. Cannot figure what would be correct name for it. And now I'm not sure should it be just boolean or also it should describe in and out endpoint addresses and max packet size?

@jimklimov
Copy link
Member

CC @blaa : WDYT?

@jimklimov jimklimov added bug USB Qx protocol driver Driver based on Megatec Q<number> such as new nutdrv_qx, or obsoleted blazer and some others impacts-release-2.8.1 Issues reported against NUT release 2.8.1 (maybe vanilla or with minor packaging tweaks) labels Nov 9, 2023
Copy link
Member

@jimklimov jimklimov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the investigation and PoC.

Like you in the initial posts, I am worried that different devices could behave differently (e.g. maybe not all would like the interrupt), so I think it is better to make it a configurable toggle - see upsdrv_makevartable() and something like langid_fix for the example precedent.

It does seem that your fix can be streamlined to get merged. In particular, maybe extending that bitmask for chunk length from 0x0F to 0x3F could be safe (though better to check against devices that worked with the original control messages), as well as double-checking to revise if ARMAC_READ_SIZE macro change echoes elsewhere... At least, many comments refer to it as a 6 (which could remain valid, with a comment that this is for control-message interactions, and maybe using a macro to mean 6 in that toggle-selected codepath).

USB_ENDPOINT_OUT + USB_TYPE_CLASS + USB_RECIP_INTERFACE,
0x09, 0x200, 0,
ret = usb_interrupt_write(udev,
USB_ENDPOINT_OUT | 2,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering where the "2" came from (is there a macro for it to document the meaning)? Is it just for control vs. interrupt, or a hardware-specific number (e.g. on other modules could be different)?..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was looking in code for any interrupt write usages and found one in nutdrv_qx.c: https://github.com/networkupstools/nut/blob/master/drivers/nutdrv_qx.c#L1504-L1505 It uses same USB_ENDPOINT_OUT | 2. As it worked for me I did not put too much thought into it.

Not sure if it could be related but lsusb -v has following fragment

      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x02  EP 2 OUT
        bmAttributes            3
          Transfer Type            Interrupt
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0040  1x 64 bytes
        bInterval               1

And bEndpointAddress 0x02 EP 2 OUT probably explains USB_ENDPOINT_OUT | 2. And I'm not use if we can trust that but Transfer Type Interrupt probably also could be used to detect if we wan't to use interrupts or commands?

@@ -1915,7 +1914,7 @@ static int armac_command(const char *cmd, char *buf, size_t buflen)
size_t bytes_available;

/* Read data in 6-byte chunks */
ret = usb_interrupt_read(udev, 0x81,
ret = usb_interrupt_read(udev, 0x82,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this change correspond with USB_ENDPOINT_OUT | 2 used in usb_interrupt_write() change above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially I figured this value out via wireshark by comparing trace with manufacturer's app.

Another lsusb -v fragment:

      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x82  EP 2 IN
        bmAttributes            3
          Transfer Type            Interrupt
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0040  1x 64 bytes
        bInterval               1

bEndpointAddress 0x82 EP 2 IN probably could be it

@@ -1952,10 +1951,10 @@ static int armac_command(const char *cmd, char *buf, size_t buflen)
break;
}

if (bytes_available > ARMAC_READ_SIZE - 1) {
// if (bytes_available > ARMAC_READ_SIZE - 1) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your "screenshots" in the PR and issue suggest that 0xEF is that first byte (tmpbuf[0]) so yeah, tmpbuf[0] & 0x0f chops it somewhat.

Curiously, further parsing apparently finds 47 useful bytes in the reply (until nul-bytes), which is 0x2F. So maybe just the 0x0F mask should be larger (here it seems the two high bits are set 0xC0 for some other meaning, and the 6 lower bits - allowing for your 64-byte buffer - are the reply length?), e.g. like tmpbuf[0] & 0x3F? CC @blaa ?

BTW, maybe 64+1 for the string's final null byte could be a safer choice, remaining pre-zeroed from that memset() above, just in case :)

Overall, looking at code and trying to remember earlier visits to its review, IIRC the original logic could only read in chunks up to 6 bytes (the ARMAC_READ_SIZE value before this PR) and collected those short reads from tmpbuf into the final buf buffer, sanity-checking it byte-by-byte and detecting the end of reply. With the interrupt approach it makes sense that it uses one interaction to send the whole response, so this sanity-checking copy would complete within the first loop cycle, I suppose.

@AppVeyorBot
Copy link

@vytautassurvila vytautassurvila force-pushed the armac-2023 branch 3 times, most recently from 332407d to 8eb03e0 Compare November 13, 2023 23:11
@AppVeyorBot
Copy link

Support for Armac UPS that uses interrupt instead of control msg.
It will be enabled automatically only for devices that reports
0x82 as in endpoint address with interrupt transfer type

Signed-off-by: Vytautas Survila <[email protected]>
@vytautassurvila vytautassurvila changed the title proof of concept Armac O/850E/PSW #2153 support Armac O/850E/PSW #2153 Nov 21, 2023
@vytautassurvila vytautassurvila marked this pull request as ready for review November 21, 2023 21:48
…n-after-statement" situations

Also log if effectively skipping the method (not libusb-1.x build)

Signed-off-by: Jim Klimov <[email protected]>
…ement"

Also minor cleanup with comments markup

Signed-off-by: Jim Klimov <[email protected]>
@jimklimov
Copy link
Member

Quickly addressed a few issues highlighted by CI so far, hope to read into the substance of the changes a bit later, maybe weekend.

@AppVeyorBot
Copy link

@jimklimov jimklimov added this to the 2.8.2 milestone Dec 10, 2023
@blaa
Copy link

blaa commented Jan 13, 2024

I've built this version and confirmed it (seemingly) still works with my Armac.

@@ -1911,17 +2036,17 @@ static int armac_command(const char *cmd, char *buf, size_t buflen)
memset(buf, 0, buflen);

bufpos = 0;
while (bufpos + ARMAC_READ_SIZE < buflen) {
while (bufpos + read_size + 1 < buflen) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This +1 along 0x3f instead of 0x0f is pretty much the only change I'm not certain won't affect all other versions.

We'd need to just test it on hardware I guess. Or create a database of recorded communications and do a regression testing on this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the +1 is for ensuring a zero-byte in the end (of buffer to be provided big enough by the caller).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking a bit more about this, it may be prudent to check that buflen > read_size in advance: if the caller's buffer is too small to fit even the first read eventually (if usb_interrupt_read() would return the full requested read_size and not fewer bytes for some reason), this is more of a programmer error to upsdebugx() as such than a device/protocol error. I think it can help long-term troubleshooting as this code evolves.

Then there's a chance that we expect to read say 64 bytes (due to if use_interrupt ... above) but only get say 6 in reality (guessing here, no idea if this reality is possible - WDYT?), and the caller's buffer is big enough for 6 but not for 64. With current while-clause, we would not even bother trying a query. Is this a correct thing to do?

Just trying to think of things that could go wrong, so might brainstorm up some garbage ideas here :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it may be prudent to check that buflen > read_size in advance

If I understand it right we already have this check - we won't enter loop at all in that case while (bufpos + read_size + 1 < buflen). And bellow the loop we emit the error for this specific case upsdebugx(2, "Protocol error, too much data read.");. It still had hardcoded read_size that was fixed via 66ff781

Then there's a chance that we expect to read say 64 bytes (due to if use_interrupt ... above) but only get say 6 in reality

At least in my case I do always get 64 bytes from usb_interrupt_read. From them we have only 47 meaningful bytes that will be copied to buf. This is handled via bytes_available = (unsigned char)tmpbuf[0] & 0x3f;. In theory this function could be changed to work when caller's buffer would be less than 64 bytes. But then we are risking that all data we read from interrupt won't fit to buf. I'm not sure which way is better?


/* Any errors here mean that we are unable to read a reply
* (which will happen after successfully writing a command
* to the UPS) */
if (ret != ARMAC_READ_SIZE) {
if (ret != read_size) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we loop here (in current code and the earlier one), would it be less misleading to return not the latest ret (if positive, just too short presumably), but a count of successfully read bytes e.g. bufpos + ret? If ret is negative (for error), probably pass it through as is and disregard any earlier success? Or return bufpos anyway, if positive?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! Fixed via dc3299e

Copy link
Member

@jimklimov jimklimov Jan 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vytautassurvila @blaa : any particular preference for bufpos vs. bufpos + ret in this case?

Generally I think we can fall into this situation if the device has less info to send than exactly the read_size. Like, it has 8 bytes for some message and we read in 6-byte chunks, so our buffer gets ret=6 in one cycle and ret=2 in the other quite validly - and if ret is always non-negative, we can tell the caller that we have collected 8 bytes.

But this is a general train of thought, you may be better aware about what does and does not happen in these particular interactions :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm not mistaken here we just received data to tmpbuf. It will be copied to buf only at line 2107. I guess idea is to copy data to buf only when some validation is performed on data we received.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, good catch you too :")

So if we abort here, buf has only received bufpos finished bytes indeed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose, here and elsewhere we declare that correct reads always return exactly read_size and may not be shorter, to the best of our practical knowledge at this time at least?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can speak only about my UPS - usb_interrupt_read and read_size always match. Within those 64 bytes not all bytes are meaningful - that is determined by bit masking first byte.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks for the insights. LGTM I guess. The caller's buffer is large enough for the tested devices, I suppose?

@jimklimov jimklimov merged commit b583c96 into networkupstools:master Jan 18, 2024
10 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug impacts-release-2.8.1 Issues reported against NUT release 2.8.1 (maybe vanilla or with minor packaging tweaks) Qx protocol driver Driver based on Megatec Q<number> such as new nutdrv_qx, or obsoleted blazer and some others USB
Projects
None yet
4 participants