Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace atoi() with strtoi_with_tail() #1646

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 8 additions & 4 deletions builtin/patch-id.c
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
#include "git-compat-util.h"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Mohit Marathe via GitGitGadget" <[email protected]> writes:

>  	static const char digits[] = "0123456789";
>  	const char *q, *r;
> +	char *endp;
>  	int n;
>  
>  	q = p + 4;
>  	n = strspn(q, digits);
>  	if (q[n] == ',') {
>  		q += n + 1;
> -		*p_before = atoi(q);
> +		if (strtol_i2(q, 10, p_before, &endp) != 0)
> +			return 0;
>  		n = strspn(q, digits);
>  	} else {
>  		*p_before = 1;
>  	}

Looking at this code again, because we upfront run strspn() to make
sure q[] begins with a run of digits *and* followed by a comma
(which is not a digit), I think it is safe to use atoi() and assume
it would slurp all the digits.  So the lack of another check the use
of new helper allows us to do, namely

		if (endp != q + n)
			return 0;

is probably OK, but that is one of the two reasons why you would
favor the use of new helper over atoi(), so the upside of this
change is not all that great as I originally hoped for X-<.

Not your fault, of course.  We would still catch when the digit
string that starts q[] is too large to fit in an int, which is an
upside.

> -	if (n == 0 || q[n] != ' ' || q[n+1] != '+')
> +	if (q[n] != ' ' || q[n+1] != '+')
>  		return 0;

When we saw q[] that begins with ',' upon entry to this function, we
used to say *p_before = 1 and then saw n==0 and realized it is not a
good input and returned 0 from the function.

Now we instead peek q[0] and the check says q[0] is not SP so we
will return 0 the same way so there is no behaviour change from the
upper hunk?  The conversion may be correct, but it wasn't explained
in the proposed commit log message.

How are the change to stop caring about n==0 here ...

>  	r = q + n + 2;
>  	n = strspn(r, digits);
>  	if (r[n] == ',') {
>  		r += n + 1;
> -		*p_after = atoi(r);
> -		n = strspn(r, digits);
> +		if (strtol_i2(r, 10, p_after, &endp) != 0)
> +			return 0;
>  	} else {
>  		*p_after = 1;
>  	}
> -	if (n == 0)
> -		return 0;

... and this change here, linked to the switch from atoi() to
strtul_i2()[*]?

It looks like an unrelated behaviour change that is left
unexplained.

>  	return 1;
>  }

Thanks for working on this one.


[Footnote]

 * by the way, what a horrible name for a public function.  Yuck.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Mohit Marathe wrote (reply to this):

On Tuesday, January 23rd, 2024 at 1:02 AM, Junio C Hamano <[email protected]> wrote:

> "Mohit Marathe via GitGitGadget" [email protected] writes:
> 
> > static const char digits[] = "0123456789";
> > const char *q, *r;
> > + char *endp;
> > int n;
> > 
> > q = p + 4;
> > n = strspn(q, digits);
> > if (q[n] == ',') {
> > q += n + 1;
> > - *p_before = atoi(q);
> > + if (strtol_i2(q, 10, p_before, &endp) != 0)
> > + return 0;
> > n = strspn(q, digits);
> > } else {
> > *p_before = 1;
> > }
> 
> 
> Looking at this code again, because we upfront run strspn() to make
> sure q[] begins with a run of digits and followed by a comma
> (which is not a digit), I think it is safe to use atoi() and assume
> it would slurp all the digits. So the lack of another check the use
> of new helper allows us to do, namely
> 
> if (endp != q + n)
> return 0;
> 
> is probably OK, but that is one of the two reasons why you would
> favor the use of new helper over atoi(), so the upside of this
> change is not all that great as I originally hoped for X-<.
> 
> Not your fault, of course. We would still catch when the digit
> string that starts q[] is too large to fit in an int, which is an
> upside.
> 
> > - if (n == 0 || q[n] != ' ' || q[n+1] != '+')
> > + if (q[n] != ' ' || q[n+1] != '+')
> > return 0;
> 
> 
> When we saw q[] that begins with ',' upon entry to this function, we
> used to say *p_before = 1 and then saw n==0 and realized it is not a
> good input and returned 0 from the function.

Uh oh, I just looked at the `if` block and concluded that it was just 
to check if it has numbers after the ',', which`strtol_i2()` already 
does. But I totally missed this one. 

> Now we instead peek q[0] and the check says q[0] is not SP so we
> will return 0 the same way so there is no behaviour change from the
> upper hunk? The conversion may be correct, but it wasn't explained
> in the proposed commit log message.
> 
> How are the change to stop caring about n==0 here ...
> 
> > r = q + n + 2;
> > n = strspn(r, digits);
> > if (r[n] == ',') {
> > r += n + 1;
> > - *p_after = atoi(r);
> > - n = strspn(r, digits);
> > + if (strtol_i2(r, 10, p_after, &endp) != 0)
> > + return 0;
> > } else {
> > *p_after = 1;
> > }
> > - if (n == 0)
> > - return 0;
> 
> 
> ... and this change here, linked to the switch from atoi() to
> strtul_i2()[*]?
> 
> It looks like an unrelated behaviour change that is left
> unexplained.
> 
> > return 1;
> > }
> 
> 
> Thanks for working on this one.
> 
> 
> [Footnote]
> 
> * by the way, what a horrible name for a public function. Yuck.

Yeah, I thought so too /:D How does `strtol_i_updated` sounds?

Thanks for you feedback! I will send v2 with the corrections soon.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Mohit Marathe via GitGitGadget" <[email protected]> writes:

>  	q = p + 4;
>  	n = strspn(q, digits);
>  	if (q[n] == ',') {
>  		q += n + 1;

So, we saw "@@ -" and skipped over these four bytes, skipped the
digits from there, and found a comma.  

For "@@ -29,14 +30,18 @@", for example, our q is now "14 +30,18 @@"
as we have skipped over that comma after 29.

> -		*p_before = atoi(q);
> +		if (strtol_i_updated(q, 10, p_before, &endp) != 0)
> +			return 0;

We parse out 14 and store it to *p_before.  endp points at " +30..."
now.

>  		n = strspn(q, digits);
> +		if (endp != q + n)
> +			return 0;

Is this necessary?  By asking strtol_i_updated() where the number ended,
we already know endp without skipping the digits in q with strspn().
Shouldn't these three lines become more like

		n = endp - q;

instead?  

After all, we are not trying to find a bug in strtol_i_updated(),
which would be the only reason how this "return 0" would trigger.

>  	} else {
>  		*p_before = 1;
>  	}
> @@ -48,8 +53,11 @@ static int scan_hunk_header(const char *p, int *p_before, int *p_after)
>  	n = strspn(r, digits);
>  	if (r[n] == ',') {
>  		r += n + 1;
> -		*p_after = atoi(r);
> +		if (strtol_i_updated(r, 10, p_after, &endp) != 0)
> +			return 0;
>  		n = strspn(r, digits);
> +		if (endp != r + n)
> +			return 0;

Likewise.

>  	} else {
>  		*p_after = 1;
>  	}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Mohit Marathe wrote (reply to this):

On Thursday, January 25th, 2024 at 2:32 AM, Junio C Hamano <[email protected]> wrote:

> "Mohit Marathe via GitGitGadget" [email protected] writes:
> 
> > q = p + 4;
> > n = strspn(q, digits);
> > if (q[n] == ',') {
> > q += n + 1;
> 
> 
> So, we saw "@@ -" and skipped over these four bytes, skipped the
> digits from there, and found a comma.
> 
> For "@@ -29,14 +30,18 @@", for example, our q is now "14 +30,18 @@"
> as we have skipped over that comma after 29.
> 
> > - *p_before = atoi(q);
> > + if (strtol_i_updated(q, 10, p_before, &endp) != 0)
> > + return 0;
> 
> 
> We parse out 14 and store it to *p_before. endp points at " +30..."
> now.
> 
> > n = strspn(q, digits);
> > + if (endp != q + n)
> > + return 0;
> 
> 
> Is this necessary? By asking strtol_i_updated() where the number ended,
> we already know endp without skipping the digits in q with strspn().
> Shouldn't these three lines become more like
> 
> n = endp - q;
> 
> instead?
> 
> After all, we are not trying to find a bug in strtol_i_updated(),
> which would be the only reason how this "return 0" would trigger.
> 

I was confused about how an invalid hunk header of a corrupted would
look like. This was just an attempt of making a sanity check. But after
taking another look, I agree that its unnecessary.

> > } else {
> > *p_before = 1;
> > }
> > @@ -48,8 +53,11 @@ static int scan_hunk_header(const char *p, int *p_before, int *p_after)
> > n = strspn(r, digits);
> > if (r[n] == ',') {
> > r += n + 1;
> > - *p_after = atoi(r);
> > + if (strtol_i_updated(r, 10, p_after, &endp) != 0)
> > + return 0;
> > n = strspn(r, digits);
> > + if (endp != r + n)
> > + return 0;
> 
> 
> Likewise.
> 
> > } else {
> > *p_after = 1;
> > }

#include "builtin.h"
#include "config.h"
#include "diff.h"
Expand Down Expand Up @@ -29,14 +30,16 @@ static int scan_hunk_header(const char *p, int *p_before, int *p_after)
{
static const char digits[] = "0123456789";
const char *q, *r;
char *endp;
int n;

q = p + 4;
n = strspn(q, digits);
if (q[n] == ',') {
q += n + 1;
*p_before = atoi(q);
n = strspn(q, digits);
if (strtoi_with_tail(q, 10, p_before, &endp) != 0)
return 0;
n = endp - q;
} else {
*p_before = 1;
}
Expand All @@ -48,8 +51,9 @@ static int scan_hunk_header(const char *p, int *p_before, int *p_after)
n = strspn(r, digits);
if (r[n] == ',') {
r += n + 1;
*p_after = atoi(r);
n = strspn(r, digits);
if (strtoi_with_tail(r, 10, p_after, &endp) != 0)
return 0;
n = endp - r;
} else {
*p_after = 1;
}
Expand Down
18 changes: 14 additions & 4 deletions git-compat-util.h
Original file line number Diff line number Diff line change
Expand Up @@ -1296,14 +1296,24 @@ static inline int strtoul_ui(char const *s, int base, unsigned int *result)
return 0;
}

static inline int strtol_i(char const *s, int base, int *result)
#define strtol_i(s,b,r) strtoi_with_tail((s), (b), (r), NULL)
static inline int strtoi_with_tail(char const *s, int base, int *result, char **endp)
{
long ul;
char *p;
char *dummy = NULL;

if (!endp)
endp = &dummy;
errno = 0;
ul = strtol(s, &p, base);
if (errno || *p || p == s || (int) ul != ul)
ul = strtol(s, endp, base);
if (errno ||
/*
* if we are told to parse to the end of the string by
* passing NULL to endp, it is an error to have any
* remaining character after the digits.
*/
(dummy && *dummy) ||
*endp == s || (int) ul != ul)
return -1;
*result = ul;
return 0;
Expand Down
Loading