Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Selection of boundaries for the default implementation of is_case #21

Open
r-bk opened this issue Jul 18, 2024 · 3 comments
Open

Selection of boundaries for the default implementation of is_case #21

r-bk opened this issue Jul 18, 2024 · 3 comments

Comments

@r-bk
Copy link

r-bk commented Jul 18, 2024

@rutrum Hi!
Firstly, thanks for such a useful crate!

I have found the following example which IMO has a not intuitive result.

#[test]
fn test_upper_snake() {
    use convert_case::{Case, Casing};
    assert!("UPPER_CASE_WITH_DIGIT1".is_case(Case::UpperSnake));
}

This test fails, even though UpperSnake case has only a single boundary _ and the string is in capital letters.

This happens because the implementation of is_case initializes the StateConverter with Boundary::defaults() which includes UpperDigit boundary.

Is this by design, or a bug?

If a bug, would the correct implementation of is_case be something like this?

fn is_case(&self, case: Case) -> bool {
    &self.from_case(case).to_case(case) == self
}
r-bk added a commit to r-bk/tighterror that referenced this issue Jul 19, 2024
This commit fixes an issue with `convert_case` crate reported in [1].

The issue is that the default implementation of `is_case` provided by
that crate initializes the converter with almost all available
boundaries.

As a result the following conversion takes place:

"ERR1".to_case(Case::UpperSnake) -> "ERR_1"

Furthermore, the following check fails, while it seems like it
shouldn't because the string is in capital letters.

"ERR1".is_case(Case::UpperSnake)

This commit introduces a new primitive `convert_case` which receives two
case parameters, one for the source case and one of the target case.
This way the converter is initialized with the correct source case
boundaries. And it allows implementation of a new `is_case` primitive
that builds on `convert_case` to select the source case boundaries
correctly.

[1] rutrum/convert-case#21
r-bk added a commit to r-bk/tighterror that referenced this issue Jul 19, 2024
This commit fixes an issue with `convert_case` crate reported in [1].

The issue is that the default implementation of `is_case` provided by
that crate initializes the converter with almost all available
boundaries.

As a result the following conversion takes place:

"ERR1".to_case(Case::UpperSnake) -> "ERR_1"

Furthermore, the following check fails, while it seems like it
shouldn't because the string is in capital letters.

"ERR1".is_case(Case::UpperSnake)

This commit introduces a new primitive `convert_case` which receives two
case parameters, one for the source case and one of the target case.
This way the converter is initialized with the correct source case
boundaries. And it allows implementation of a new `is_case` primitive
that builds on `convert_case` to select the source case boundaries
correctly.

[1] rutrum/convert-case#21
@rutrum
Copy link
Owner

rutrum commented Dec 17, 2024

Hi @r-bk . Thanks for reaching out. I'm glad you like the crate.

I agree with you, I think this is a non-intuitive result. I thought your suggestion made sense, but it failed some tests. Take the following:

assert!(!"kebab-case-string".is_case(Case::Snake))

Only splitting based on underscore in this case would be counter to the intent. We try generously to split on all boundaries, so we return false when we join back with the case-specific delimiter.

I think the unexpected part of this code is that UpperDigit boundary is being used by default. Digits are still an ongoing problem I'm trying to formalize (see http://stringcase.org, site WIP). Depending on context, they are sometimes boundaries, sometimes not.

An example brought to my attention from another use was "2d_transformation". This is case of digit first, then letter. In this case, I wouldn't want to split here. But for "Transformation2D" I might split on "n2". But then, the unlikely scenario of "TRANSFORMATION2D"...this doesn't feel common, and I don't even think it's obvious that you'd want to split on UpperDigit all the time.

I think I'll remove UpperDigit from the list of defaults. Do you think that makes sense? Should another digit boundary be removed from the defaults?

@r-bk
Copy link
Author

r-bk commented Dec 18, 2024

Hi @rutrum !
Thanks for the response and the explanation why my snippet doesn't work in all cases.

There are no easy choices :) So I am thinking loud here.

Intuitively removing LowerDigit too (for consistency) seems like a good idea.
But the example that you gave with Transformation2D kills it.
Maybe the definition of Acronym should be expanded in a way that would allow removal of LowerDigit too? But then there is an example of json2text. Now it becomes json_2_text, and it would become json2_text (which is wrong). But should it be split at all?

Specifically to fix the is_case function, maybe instead of splitting and comparing to self
it should (a) check if self conforms to case without splitting it and (b) check that self doesn't contain "other" delimiters ? Are there other conditions that must be checked?

@rutrum
Copy link
Owner

rutrum commented Dec 20, 2024

I think you are correct, this isn't an issue of default boundaries (that might be a separate issue, however.) It's really that the mechanism of performing the transformation isn't good enough. This function would actually require manual checks.

I agree with your outline. The is_case function should first check what delimiters are present. If they are the correct delimiters for a certain case, then they can be used to split the identifier into words. Then, each word can be checked to see if they follow a particular "word case", which might be all lower, upper then lower, or all upper.

As I get back into this project, I will put this item on the todo list of improvements to make. I don't know if it will make it to 0.7, but this is clearly something that should be fixed. Thanks for bringing it to my attention.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants