Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rakudo silently loses information on .encode for non-representable characters #3654

Closed
p6rt opened this issue Jan 25, 2015 · 4 comments
Closed

Comments

@p6rt
Copy link

p6rt commented Jan 25, 2015

Migrated from rt.perl.org#123673 (status was 'resolved')

Searchable as RT123673$

@p6rt
Copy link
Author

p6rt commented Jan 25, 2015

From @moritz

<moritz> m​: say 'ö'.encode('ASCII')
<camelia> rakudo-moar 7e8d8a​: OUTPUT«Blob[uint8]​:0x<3f>␤»
<moritz> that looks like a bug to me

Since ö isn't representable as ASCII, this should throw an exception.
Currently it encodes to 0x37, which is the question mark / replacement
character. Since using a replacmement character silently loses
information, which is a dangerous default.

Here is my wish list​:

1) the default is to throw an exception, let's say

  X​::Str​::NotEncodable.new(
  source => 'ö',
  index => '0',
  destination => 'ASII',
  );

2) if a replacment is desired, indicate that through an adverb in the
.encode call, either

  'ö'.encode('ASCII', :replacement)

to get the default, or

  'ö'.encode('ASCII'. :replacement(Buf.new(42)))

to be able to chose a replacement byte or byte sequence.

If somebody implements it, I'll add it to the design docs :-)

Cheers,
Moritz

@p6rt
Copy link
Author

p6rt commented Nov 6, 2015

From @jnthn

On Sun Jan 25 07​:35​:58 2015, moritz wrote​:

<moritz> m​: say 'ö'.encode('ASCII')
<camelia> rakudo-moar 7e8d8a​: OUTPUT«Blob[uint8]​:0x<3f>␤»
<moritz> that looks like a bug to me

Since ö isn't representable as ASCII, this should throw an exception.
Currently it encodes to 0x37, which is the question mark / replacement
character. Since using a replacmement character silently loses
information, which is a dangerous default.

Here is my wish list​:

1) the default is to throw an exception, let's say

 X&#8203;::Str&#8203;::NotEncodable\.new\(
     source => 'ö',
     index  => '0',
     destination => 'ASII',
 \);

2) if a replacment is desired, indicate that through an adverb in the
.encode call, either

 'ö'\.encode\('ASCII', :replacement\)

to get the default, or

 'ö'\.encode\('ASCII'\. :replacement\(Buf\.new\(42\)\)\)

to be able to chose a replacement byte or byte sequence.

If somebody implements it, I'll add it to the design docs :-)

The default is now to throw an exception (not typed for now), and the replacement functionality is available (though you specify it as a string, to ensure you can't replace with something that is bogus in the target encoding). ilmari++ for working on this; I'm just closing the ticket! Tests in S32-str/encode.t.

/jnthn

@p6rt
Copy link
Author

p6rt commented Nov 6, 2015

The RT System itself - Status changed from 'new' to 'open'

@p6rt
Copy link
Author

p6rt commented Nov 6, 2015

@jnthn - Status changed from 'open' to 'resolved'

@p6rt p6rt closed this as completed Nov 6, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant