Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rpad: value mismatch between Presto and Velox #12343

Open
peterenescu opened this issue Feb 14, 2025 · 2 comments
Open

rpad: value mismatch between Presto and Velox #12343

peterenescu opened this issue Feb 14, 2025 · 2 comments
Labels
bug Something isn't working fuzzer-found

Comments

@peterenescu
Copy link
Contributor

peterenescu commented Feb 14, 2025

Description

rpad produces mismatched values between Presto/Velox which propagates invalid to_base64 value in fuzzer run.

Reproduction

(Velox followed by Presto)

presto:di> select rpad(hmac_sha1(c1, to_big_endian_32((INTEGER '520535002' + (- c2)))), levenshtein_distance(reverse(varchar '{V$~ZG)uW|?Z?1W'), c3), spooky_hash_v2_64(c4)) from (values (varbinary '(D', varbinary 'XSmo6^)oy''Zhx#G{G5%1/&)9kAd%1tvqrHqclkc&AmEoXxj4?.$bb%?^''y~^EZ)t@p)kH3_()%{rg=y[3M%,e0BfY?R[)xO>', 1772733589, varchar '?VV?v4Mt28hy0^qX-wG[tc0Ejf,xn1oK~npx2)$CB5_BfD~9>uG2AY0L''bBaQch8', varbinary 'Y')) t(c0, c1, c2, c3, c4);
                      _col0                      
-------------------------------------------------
 aa 57 d9 2a da ae c6 83 33 a1 75 45 3d 9d 05 54 
 d2 e0 35 65 98 5c aa aa 46 29 5d 21 98 5c aa aa 
 46 29 5d 21 98 5c aa aa 46 29 5d 21 98 5c aa aa 
 46 29 5d 21 98 5c aa aa 46 29 5d 21 98 5c aa aa 
 46 29 5d 21 98 5c aa aa 46 29 5d 21 98 5c aa aa 
 46 29 5d 21 98 5c aa aa 46 29 5d 21             
(1 row)
presto:di> select rpad(hmac_sha1(varbinary 'XSmo6^)oy''Zhx#G{G5%1/&)9kAd%1tvqrHqclkc&AmEoXxj4?.$bb%?^''y~^EZ)t@p)kH3_()%{rg=y[3M%,e0BfY?R[)xO>', to_big_endian_32((INTEGER '520535002' + (- 1772733589)))), levenshtein_distance(reverse(varchar '{V$~ZG)uW|?Z?1W'), varchar '?VV?v4Mt28hy0^qX-wG[tc0Ejf,xn1oK~npx2)$CB5_BfD~9>uG2AY0L''bBaQch8'), spooky_hash_v2_64(varbinary 'Y'));
                      _col0                      
-------------------------------------------------
 aa 57 d9 2a da ae c6 83 33 a1 75 45 3d 9d 05 54 
 d2 e0 35 65 98 5c aa aa 46 29 5d 21 98 5c aa aa 
 46 29 5d 21 98 5c aa aa 46 29 5d 21 98 5c aa aa 
 46 29 5d 21 98 5c aa aa 46 29 5d 21             
(1 row)

Relevant logs

Some additional runs to show nested calls have rpad as the culprit of the value differences. hmac_sha1 matches (Velox followed by Presto)):

presto:di> select hmac_sha1(c1, to_big_endian_32((INTEGER '520535002' + (- c2)))) from (values (varbinary '(D', varbinary 'XSmo6^)oy''Zhx#G{G5%1/&)9kAd%1tvqrHqclkc&AmEoXxj4?.$bb%?^''y~^EZ)t@p)kH3_()%{rg=y[3M%,e0BfY?R[)xO>', 1772733589)) t(c0, c1, c2);
                      _col0                      
-------------------------------------------------
 aa 57 d9 2a da ae c6 83 33 a1 75 45 3d 9d 05 54 
 d2 e0 35 65                                     
(1 row)
presto:di> select hmac_sha1(varbinary 'XSmo6^)oy''Zhx#G{G5%1/&)9kAd%1tvqrHqclkc&AmEoXxj4?.$bb%?^''y~^EZ)t@p)kH3_()%{rg=y[3M%,e0BfY?R[)xO>', to_big_endian_32((INTEGER '520535002' + (- 1772733589))));
                      _col0                      
-------------------------------------------------
 aa 57 d9 2a da ae c6 83 33 a1 75 45 3d 9d 05 54 
 d2 e0 35 65                                     
(1 row)

to_base64 mismatches but because of mismatched input cascaed from rpad run (Velox followed by Presto):


presto:di> SELECT to_base64(hmac_sha256(c0, rpad(hmac_sha1(c1, to_big_endian_32((INTEGER '520535002' + (- c2)))), levenshtein_distance(reverse(varchar '{V$~ZG)uW|?Z?1W'), c3), spooky_hash_v2_64(c4)))) from (values (varbinary '(D', varbinary 'XSmo6^)oy''Zhx#G{G5%1/&)9kAd%1tvqrHqclkc&AmEoXxj4?.$bb%?^''y~^EZ)t@p)kH3_()%{rg=y[3M%,e0BfY?R[)xO>', 1772733589, varchar '?VV?v4Mt28hy0^qX-wG[tc0Ejf,xn1oK~npx2)$CB5_BfD~9>uG2AY0L''bBaQch8', varbinary 'Y')) t(c0, c1, c2, c3, c4);
                    _col0                     
----------------------------------------------
 4fH6sTrTZQrRf9VXR9Gx15rY/clorhuuiWWoRb3mHsQ= 
(1 row)
presto:di> SELECT to_base64(hmac_sha256(varbinary '(D', rpad(hmac_sha1(varbinary 'XSmo6^)oy''Zhx#G{G5%1/&)9kAd%1tvqrHqclkc&AmEoXxj4?.$bb%?^''y~^EZ)t@p)kH3_()%{rg=y[3M%,e0BfY?R[)xO>', to_big_endian_32((INTEGER '520535002' + (- 1772733589)))), levenshtein_distance(reverse(varchar '{V$~ZG)uW|?Z?1W'), varchar '?VV?v4Mt28hy0^qX-wG[tc0Ejf,xn1oK~npx2)$CB5_BfD~9>uG2AY0L''bBaQch8'), spooky_hash_v2_64(varbinary 'Y'))));
                    _col0                     
----------------------------------------------
 ZvCM1VjZw2OiZN19hNlVbd5PzMujGtykCr6caST+KCY= 
(1 row)
@peterenescu peterenescu added bug Something isn't working fuzzer Issues related the to Velox fuzzer test components. fuzzer-found labels Feb 14, 2025
@kagamiori
Copy link
Contributor

Hi @peterenescu, thank you for creating this issue. Could the example queries be simplified further, e.g., by removing the irrelevant subexpressions, or replace subexpressions with their results as literal constants? Also, there are multiple example queries, which one should I look at if I'd like to understand the difference between the rpad in Velox and that in Presto?

@peterenescu peterenescu removed fuzzer Issues related the to Velox fuzzer test components. fuzzer-found labels Feb 20, 2025
@peterenescu
Copy link
Contributor Author

Just updated and moved extraneous queries to Relevant logs in case the reproducer will like the full fuzzer run queries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working fuzzer-found
Projects
None yet
Development

No branches or pull requests

2 participants