Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re: [LONG] Possible utf8 implementation #562

Closed
p5pRT opened this issue Sep 20, 1999 · 4 comments
Closed

Re: [LONG] Possible utf8 implementation #562

p5pRT opened this issue Sep 20, 1999 · 4 comments

Comments

@p5pRT
Copy link

p5pRT commented Sep 20, 1999

Migrated from rt.perl.org#1413 (status was 'resolved')

Searchable as RT1413$

@p5pRT
Copy link
Author

p5pRT commented Sep 20, 1999

From The RT System itself

Perhaps. But there are quite a few of them, and they are used a lot.
Such a change is likely to make patches to 5.005_62 even more tricky
to keep in step with 5.005_0X etc. than it is now.

hv_store()/hv_fetch as they are currently implemented should be #define's
to​: (intentionally extra verbose for the purpose of example)

hv_store__key_is_charp_latin1(...)
hv_fetch__key_is_charp_latin1(...)

How does this help?
How is that different from just leaving hv_store
as is and defining hv_store_key_is_charp_utf8()?

How does does what this leads to i.e.​:

  if (is_utf8)
  result = hv_store_key_is_charp_utf8(hv, )
  else
  result = hv_store_key_is_charp_latin1();

improve anything at all?
IMHO it just smears out the abstraction all over the sources.
It would be better in my view to change to
  result = hv_store_ent(hv, sv, ...)

Which may happen where it is easy.

Option 3 really does suck.

It was never really an option.

As much as it is a performance hit, for native
hashes, there should be zero issues with converting all keys to utf8 the
moment a utf8 string is passed in which contains a character with a
value > 256.

Yes, the new idea since I was stalled on Saturday is the per-hash
flag and the "when 1st needed" semantics.

So - that is hash keys resolved (for now).

Now if "you lot" could discuss the implications

of Perl_croak("Oops %s ...",charptr)
 
- can we have (say) %ls to say string is utf8 - is this too weird
  or already "taken" for wchar_t ?
- always use %_
- another nifty idea from the list?

$@​ is an SV so there is no issue with its holding the
value, what is less clear is Perl_warner() & other spots where
Perl_mess() is playing with strings.

There is a minor issue when error message replete with UNICODE hits
the STDERR IO filter that wants bytes. We cannot die printing the
error message printing the error message printing the error message ...

die("Non byte character in %_",ERRSV); // :-(

So do we go for \x{feed} style ?

--
Nick Ing-Simmons

@p5pRT
Copy link
Author

p5pRT commented Sep 20, 1999

From The RT System itself

It all depends how badly you want full support for utf8... and how complete
you want to be about it...

hv_store()/hv_fetch as they are currently implemented should be #define's
to​: (intentionally extra verbose for the purpose of example)
hv_store__key_is_charp_latin1(...)
hv_fetch__key_is_charp_latin1(...)
How does this help?
How is that different from just leaving hv_store
as is and defining hv_store_key_is_charp_utf8()?

How does does what this leads to i.e.​:
if (is_utf8)
result = hv_store_key_is_charp_utf8(hv, )
else
result = hv_store_key_is_charp_latin1();
improve anything at all?

  result = hv_store_key_is_sv(....);
  result = hv_store_key_is_charp_maybe_utf8(..., utf8flag);

hv_store would then be discouraged.

IMHO it just smears out the abstraction all over the sources.
It would be better in my view to change to
result = hv_store_ent(hv, sv, ...)
Which may happen where it is easy.

I wouldn't mind if hv_store/hv_store_ent took sv's, as shown above... the
problem is source code compatibility with older modules. As long as older
modules continue to work, but assume strings are raw strings, then the
majority of older modules should continue to work.

As much as it is a performance hit, for native
hashes, there should be zero issues with converting all keys to utf8 the
moment a utf8 string is passed in which contains a character with a
value > 256.
Yes, the new idea since I was stalled on Saturday is the per-hash
flag and the "when 1st needed" semantics.
So - that is hash keys resolved (for now).
Now if "you lot" could discuss the implications
of Perl_croak("Oops %s ...",charptr)
- can we have (say) %ls to say string is utf8 - is this too weird
or already "taken" for wchar_t ?
- always use %_
- another nifty idea from the list?
$@​ is an SV so there is no issue with its holding the
value, what is less clear is Perl_warner() & other spots where
Perl_mess() is playing with strings.

Umm.. hmm... I don't have an opinion here until somebody brings up some more
of the issues involved here...

Including utf8 strings in your C code... :-)

The "always use %_" should satisfy the majority of the cases, but I don't
see why utf8 charp's should be "not allowed."

There is a minor issue when error message replete with UNICODE hits
the STDERR IO filter that wants bytes. We cannot die printing the
error message printing the error message printing the error message ...
die("Non byte character in %_",ERRSV); // :-(
So do we go for \x{feed} style ?

I think that would would make sense. I certainly don't want
recursive error messages... :-) "Error displaying Error displaying Error ... which contains the string "..."" :-)

mark

--
markm@​nortelnetworks.com/mark@​mielke.cc/markm@​ncf.ca __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | CUE Development (4Y21)
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | Nortel Networks
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
  and in the darkness bind them...

  http​://mark.mielke.cc/

@p5pRT
Copy link
Author

p5pRT commented Apr 22, 2003

@iabyn - Status changed from 'stalled' to 'resolved'

@p5pRT p5pRT closed this as completed Apr 22, 2003
@p5pRT
Copy link
Author

p5pRT commented Apr 22, 2003

@iabyn - Status changed from 'stalled' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant