Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perl_is_utf8_string reads out of bounds #7553

Closed
p5pRT opened this issue Oct 21, 2004 · 4 comments
Closed

Perl_is_utf8_string reads out of bounds #7553

p5pRT opened this issue Oct 21, 2004 · 4 comments

Comments

@p5pRT
Copy link

p5pRT commented Oct 21, 2004

Migrated from rt.perl.org#32080 (status was 'resolved')

Searchable as RT32080$

@p5pRT
Copy link
Author

p5pRT commented Oct 21, 2004

From derhoermi@gmx.net

Hi,

  In Perl 5.9.1 and earlier Perl_is_utf8_string(...) reads out of array
boundaries if an incomplete UTF-8 sequence is passed to it, for example

  Perl_is_utf8_string("\xF0\x9D\x80", 3)

the string lacks e.g. the final \xAD for U+1D02D, Perl_is_utf8_string
passes the string essentially unchecked to Perl_is_utf8_char(...) which
does not have a STRLEN argument and thus attempts to read UTF8SKIP() ==
4 bytes from the string. The documentation of the routine is

  Returns true if first C<len> bytes of the given string form a valid
  UTF-8 string, false otherwise.

which is thus not accurate, it checks the first len plus 0-3 bytes (if
UTF8SKIP(x) returns at most 4) and thus cannot be used for its purpose.
A possible fix would be to check UTF8SKIP() before calling is_utf8_char.

Further, I think the documentation for Perl_is_utf8_char() should note
something to the effect of

  WARNING​: use only if you *know* that s has at least UTF8SKIP(s)
  bytes.

regards.

@p5pRT
Copy link
Author

p5pRT commented Nov 22, 2011

From @khwilliamson

Fixed (finally) in commit e032854
in blead

I modified both this function, and is_utf8_string_loclen() to check
before overstepping the end of the string


Karl Williamson

@p5pRT
Copy link
Author

p5pRT commented Nov 22, 2011

From [Unknown Contact. See original ticket]

Fixed (finally) in commit e032854
in blead

I modified both this function, and is_utf8_string_loclen() to check
before overstepping the end of the string


Karl Williamson

@p5pRT
Copy link
Author

p5pRT commented Nov 22, 2011

@khwilliamson - Status changed from 'new' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant