Selection of boundaries for the default implementation of `is_case` #21

r-bk · 2024-07-18T08:22:33Z

@rutrum Hi!
Firstly, thanks for such a useful crate!

I have found the following example which IMO has a not intuitive result.

#[test]
fn test_upper_snake() {
    use convert_case::{Case, Casing};
    assert!("UPPER_CASE_WITH_DIGIT1".is_case(Case::UpperSnake));
}

This test fails, even though UpperSnake case has only a single boundary _ and the string is in capital letters.

This happens because the implementation of is_case initializes the StateConverter with Boundary::defaults() which includes UpperDigit boundary.

Is this by design, or a bug?

If a bug, would the correct implementation of is_case be something like this?

fn is_case(&self, case: Case) -> bool {
    &self.from_case(case).to_case(case) == self
}

The text was updated successfully, but these errors were encountered:

This commit fixes an issue with `convert_case` crate reported in [1]. The issue is that the default implementation of `is_case` provided by that crate initializes the converter with almost all available boundaries. As a result the following conversion takes place: "ERR1".to_case(Case::UpperSnake) -> "ERR_1" Furthermore, the following check fails, while it seems like it shouldn't because the string is in capital letters. "ERR1".is_case(Case::UpperSnake) This commit introduces a new primitive `convert_case` which receives two case parameters, one for the source case and one of the target case. This way the converter is initialized with the correct source case boundaries. And it allows implementation of a new `is_case` primitive that builds on `convert_case` to select the source case boundaries correctly. [1] rutrum/convert-case#21

rutrum · 2024-12-17T21:14:44Z

Hi @r-bk . Thanks for reaching out. I'm glad you like the crate.

I agree with you, I think this is a non-intuitive result. I thought your suggestion made sense, but it failed some tests. Take the following:

assert!(!"kebab-case-string".is_case(Case::Snake))

Only splitting based on underscore in this case would be counter to the intent. We try generously to split on all boundaries, so we return false when we join back with the case-specific delimiter.

I think the unexpected part of this code is that UpperDigit boundary is being used by default. Digits are still an ongoing problem I'm trying to formalize (see http://stringcase.org, site WIP). Depending on context, they are sometimes boundaries, sometimes not.

An example brought to my attention from another use was "2d_transformation". This is case of digit first, then letter. In this case, I wouldn't want to split here. But for "Transformation2D" I might split on "n2". But then, the unlikely scenario of "TRANSFORMATION2D"...this doesn't feel common, and I don't even think it's obvious that you'd want to split on UpperDigit all the time.

I think I'll remove UpperDigit from the list of defaults. Do you think that makes sense? Should another digit boundary be removed from the defaults?

r-bk · 2024-12-18T09:06:50Z

Hi @rutrum !
Thanks for the response and the explanation why my snippet doesn't work in all cases.

There are no easy choices :) So I am thinking loud here.

Intuitively removing LowerDigit too (for consistency) seems like a good idea.
But the example that you gave with Transformation2D kills it.
Maybe the definition of Acronym should be expanded in a way that would allow removal of LowerDigit too? But then there is an example of json2text. Now it becomes json_2_text, and it would become json2_text (which is wrong). But should it be split at all?

Specifically to fix the is_case function, maybe instead of splitting and comparing to self
it should (a) check if self conforms to case without splitting it and (b) check that self doesn't contain "other" delimiters ? Are there other conditions that must be checked?

rutrum · 2024-12-20T19:23:23Z

I think you are correct, this isn't an issue of default boundaries (that might be a separate issue, however.) It's really that the mechanism of performing the transformation isn't good enough. This function would actually require manual checks.

I agree with your outline. The is_case function should first check what delimiters are present. If they are the correct delimiters for a certain case, then they can be used to split the identifier into words. Then, each word can be checked to see if they follow a particular "word case", which might be all lower, upper then lower, or all upper.

As I get back into this project, I will put this item on the todo list of improvements to make. I don't know if it will make it to 0.7, but this is clearly something that should be fixed. Thanks for bringing it to my attention.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Selection of boundaries for the default implementation of `is_case` #21

Selection of boundaries for the default implementation of `is_case` #21

r-bk commented Jul 18, 2024

rutrum commented Dec 17, 2024

r-bk commented Dec 18, 2024

rutrum commented Dec 20, 2024

Selection of boundaries for the default implementation of is_case #21

Selection of boundaries for the default implementation of is_case #21

Comments

r-bk commented Jul 18, 2024

rutrum commented Dec 17, 2024

r-bk commented Dec 18, 2024

rutrum commented Dec 20, 2024

Selection of boundaries for the default implementation of `is_case` #21

Selection of boundaries for the default implementation of `is_case` #21