Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIVE-28673: Fix issues in JSON SerDe implementations related to Decimal #5584

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

armitage420
Copy link
Contributor

@armitage420 armitage420 commented Dec 19, 2024

What changes were proposed in this pull request?

Fixing jackson parser code to increase precision of decimal values during deserialisation.

Why are the changes needed?

Jackson tree parser incorrectly parses decimal values, that leads to data issues. [Please see jira to reproduce steps and see the wrong values]

Does this PR introduce any user-facing change?

Yes

Is the change a dependency upgrade?

No

How was this patch tested?

My test case and current testclasses

@@ -131,7 +143,7 @@ POSTHOOK: type: QUERY
POSTHOOK: Input: default@json_serde3_2
#### A masked pattern was here ####
-2 FALSE NULL TRUE 1.23E45 value
-2 false NULL true 1.23E45 value
-2 false NULL true 1.23E+45 value
Copy link
Contributor

@okumin okumin Dec 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would this expression change? Is it a valid and expected change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@okumin Thank you so much for the review!
This was not an expected change but definitely a valid one. It is a java issue as we are now using java.math.BigDecimal for float values. BigDecimal parses scientific notations and adds a '+' to our number.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am concerned that all applications using Hive, e.g., BI, can decode both formats or not, especially when they are developed not by Java. I could be too defensive.

@@ -172,7 +173,7 @@ public HiveJsonReader(ObjectInspector oi) {
public HiveJsonReader(ObjectInspector oi, TimestampParser tsParser) {
this.tsParser = tsParser;
this.oi = oi;
this.objectMapper = new ObjectMapper();
this.objectMapper = new ObjectMapper().enable(DeserializationFeature.USE_BIG_DECIMAL_FOR_FLOATS);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this change be made applicable only for Primitive decimal type ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Indhumathi27 Thank you so much for your review!
The changes are required for all decimal types as jackson is not using BigDecimal even for big numbers. Hence the numbers are getting rounded off.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants