Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

strangness with utf8 and warn #5194

Closed
p5pRT opened this issue Mar 4, 2002 · 11 comments
Closed

strangness with utf8 and warn #5194

p5pRT opened this issue Mar 4, 2002 · 11 comments

Comments

@p5pRT
Copy link

p5pRT commented Mar 4, 2002

Migrated from rt.perl.org#8760 (status was 'resolved')

Searchable as RT8760$

@p5pRT
Copy link
Author

p5pRT commented Mar 4, 2002

From jfriedl@yahoo.com

Created by jfriedl@yahoo-icn.com

Resubmitting this with perlbug, so it gets tracked....
----------

I've run into a very weird problem that seems to be related to utf8.

I would expect this program​:

  use charnames '​:full';

  my $text = "x\N{LATIN SMALL LETTER SHARP S}Yz";

  warn (($text =~ m/SSY/i) ? "okay" : "bad");

to produce either
  okay at /tmp/foo line 5.
or
  bad at /tmp/foo line 5.
depending on whether the SHARP S support was there or not.

However, with bleedperl as of an hour or so ago, it produces simply

  bad.

(no source file or line number).

If \N{LATIN SMALL LETTER SHARP S} is replaced by \xDF, the warn() works
properly. It obviously seems to be related to utf8.

But now for some uber-strangeness​: if you leave \N{LATIN SMALL LETTER SHARP S}
as is, but prepend .* to the regex, the regex still fails as before but now
the warn() works. Odd.

  Jeffrey

Perl Info

Flags:
    category=core
    severity=low

Site configuration information for perl v5.7.2:

Configured by jfriedl at Thu Feb 28 22:31:59 PST 2002.

Summary of my perl5 (revision 5.0 version 7 subversion 2 patch 14919) configuration:
  Platform:
    osname=linux, osvers=2.4.17, archname=i686-linux
    uname='linux fummy 2.4.17 #5 smp thu feb 14 15:21:38 pst 2002 i686 unknown '
    config_args='-Dusedevel -d -e -s -O -D optimize=-O2 -g'
    hint=previous, useposix=true, d_sigaction=define
    usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef
    useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
    use64bitint=undef use64bitall=undef uselongdouble=undef
    usemymalloc=n, bincompat5005=define
  Compiler:
    cc='cc', ccflags ='-DDEBUGGING -fno-strict-aliasing -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-O2 -g',
    cppflags='-DDEBUGGING -fno-strict-aliasing -I/usr/local/include -DDEBUGGING -fno-strict-aliasing -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -DDEBUGGING -fno-strict-aliasing -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64'
    ccversion='', gccversion='2.95.4  (Debian prerelease)', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=4, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -L/usr/local/lib'
    libpth=/usr/local/lib /lib /usr/lib
    libs=-lnsl -ldl -lm -lc -lcrypt -lutil
    perllibs=-lnsl -ldl -lm -lc -lcrypt -lutil
    libc=/lib/libc-2.2.5.so, so=so, useshrplib=false, libperl=libperl.a
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic'
    cccdlflags='-fpic', lddlflags='-shared -L/usr/local/lib'

Locally applied patches:
    DEVEL14897


@INC for perl v5.7.2:
    /home/jfriedl/lib/perl
    /home/jfriedl/lib/perl/yahoo
    /usr/local/lib/perl5/5.7.2/i686-linux
    /usr/local/lib/perl5/5.7.2
    /usr/local/lib/perl5/site_perl/5.7.2/i686-linux
    /usr/local/lib/perl5/site_perl/5.7.2
    /usr/local/lib/perl5/site_perl
    .


Environment for perl v5.7.2:
    HOME=/home/jfriedl
    LANG=C
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/home/jfriedl/bin:/home/jfriedl/common/bin:.:/usr/local/pgsql/bin:/usr/local/bin:/usr/X11R6/bin:/bin:/usr/bin:/usr/sbin:/sbin:/home/jfriedl/src/rvplayer5.0:/usr/local/prod/bin:/usr/local/java/bin
    PERLLIB=/home/jfriedl/lib/perl:/home/jfriedl/lib/perl/yahoo
    PERL_BADLANG (unset)
    SHELL=/bin/tcsh


@p5pRT
Copy link
Author

p5pRT commented Feb 3, 2003

From @jhi

Still no resolution for this but I created a cutdown version without using
the charnames pragma​:

$_ = "foo";
utf8​::upgrade($_);
warn (/bar/i ? "bar" : "no bar");

and I also spent some time in the debugger and it seems that
Perl_pp_warn -> Perl_warner -> Perl_vmess -> S_closest_cop
thinks that CopLINE is zero, and that's why the line and the file
are not printed. I have no idea why CopLINE would be zero because
of case-insensitively matching against a UTF-8-ed scalar (Yes, the /i
is required.)

@p5pRT
Copy link
Author

p5pRT commented Feb 3, 2003

From perl5-porters@perl.org

Here's a weird optree problem for someone, from Jeffrey Friedl...
the old style id was ID 20020304.004,
http​://rt.perl.org/rt2/Ticket/Display.html?id=8760
I just added a new message on the issue​:

Still no resolution for this but I created a cutdown version without using
the charnames pragma​:

  $_ = "foo";
  utf8​::upgrade($_);
  warn (/bar/i ? "bar" : "no bar");

and I also spent some time in the debugger and it seems that
Perl_pp_warn -> Perl_warner -> Perl_vmess -> S_closest_cop
thinks that CopLINE is zero, and that's why the line and the file
are not printed. I have no idea why CopLINE would be zero because
of case-insensitively matching against a UTF-8-ed scalar (Yes, the /i
is required.)

--
Jarkko Hietaniemi <jhi@​iki.fi> http​://www.iki.fi/jhi/ "There is this special
biologist word we use for 'stable'. It is 'dead'." -- Jack Cohen

@p5pRT
Copy link
Author

p5pRT commented Feb 4, 2003

From enache@rdslink.ro

On Mon, Feb 03, 2003 at 05​:52​:17PM +0200, Jarkko Hietaniemi wrote​:

$_ = "foo";
utf8​::upgrade($_);
warn (/bar/i ? "bar" : "no bar");

and I also spent some time in the debugger and it seems that
Perl_pp_warn -> Perl_warner -> Perl_vmess -> S_closest_cop
thinks that CopLINE is zero, and that's why the line and the file
are not printed. I have no idea why CopLINE would be zero because
of case-insensitively matching against a UTF-8-ed scalar (Yes, the /i
is required.)

This fixes it for me​:


Inline Patch
--- /arc/perl-current/utf8.c	2003-01-07 12:34:01.000000000 +0200
+++ perl-current/utf8.c	2003-02-04 11:35:41.000000000 +0200
@@ -1517,6 +1517,7 @@ Perl_swash_init(pTHX_ char* pkg, char* n
     SV* errsv_save;
 
     if (!gv_fetchmeth(stash, "SWASHNEW", 8, -1)) {	/* demand load utf8 */
+	COP *ocurcop = PL_curcop;
 	ENTER;
 	errsv_save = newSVsv(ERRSV);
 	Perl_load_module(aTHX_ PERL_LOADMOD_NOIMPORT, newSVpv(pkg,0), Nullsv);
@@ -1524,6 +1525,7 @@ Perl_swash_init(pTHX_ char* pkg, char* n
 	    sv_setsv(ERRSV, errsv_save);
 	SvREFCNT_dec(errsv_save);
 	LEAVE;
+	PL_curcop = ocurcop;
     }
     SPAGAIN;
     PUSHSTACKi(PERLSI_MAGIC);
--------------------------------------------------------------------

The demand loading of utf8 clobbers PL_curcop at Perl_scalarseq (op.c:856).

This is some kind of reversed backtrace​:
Perl_pp_match pp_hot.c​:1255
  Perl_regexec_flags at regexec.c​:1933
  S_find_byclass at regexec.c​:1111
  Perl_to_utf8_fold at utf8.c​:1504
  Perl_to_utf8_case at utf8.c​:1349
  Perl_swash_init at utf8.c​:1522
  Perl_load_module at op.c​:2922
  Perl_vload_module at op.c​:2970
  Perl_utilize at op.c​:2877
  Perl_newATTRSUB at op.c​:4152
  Perl_scalarseq op.c​:856

Maybe the save/retrieve should be made even deeper
(in load_module, utilize, etc ?)

The /i flag is really confusing. When it's left out, Perl aggressively
optimizes away code and the bug doesn't creep out. Check with this​:

$ perl -Dt -e '$_="foo"; utf8​::upgrade($_); warn (/bar/i ? "bar" : "no bar");' 2>/a
$ perl -Dt -e '$_="foo"; utf8​::upgrade($_); warn (/bar/m ? "bar" : "no bar");' 2>
/b
$ diff -y -W80 ~/a ~/b | less -S

Regards
Adi

@p5pRT
Copy link
Author

p5pRT commented Feb 4, 2003

From @rgs

Enache Adrian <enache@​rdslink.ro> wrote​:

On Mon, Feb 03, 2003 at 05​:52​:17PM +0200, Jarkko Hietaniemi wrote​:

$_ = "foo";
utf8​::upgrade($_);
warn (/bar/i ? "bar" : "no bar");

and I also spent some time in the debugger and it seems that
Perl_pp_warn -> Perl_warner -> Perl_vmess -> S_closest_cop
thinks that CopLINE is zero, and that's why the line and the file
are not printed. I have no idea why CopLINE would be zero because
of case-insensitively matching against a UTF-8-ed scalar (Yes, the /i
is required.)

This fixes it for me​:

Thanks, for the patch, but I've already fixed it in a less intrusive
way (I restore only the cop_line for the current cop.) And I patched
in function Perl_vload_module.
See : http​://archive.develooper.com/perl5-changes%40perl.org/msg06629.html
(strangely, the message I sent to P5P hasn't shown up already).

However I have the impression that your patch is more correct than
mine, without any evidence (I don't restore curcop but only cop_line,
and it appears that I get the correct cop_file(gv)? with and without
threads.) If you can work out any proof that your patch avoids problems
that my patch doesn't address, I'll be happy to apply it. I'll look
at it more closely in the meantime.

--------------------------------------------------------------------
--- /arc/perl-current/utf8.c 2003-01-07 12​:34​:01.000000000 +0200
+++ perl-current/utf8.c 2003-02-04 11​:35​:41.000000000 +0200
@​@​ -1517,6 +1517,7 @​@​ Perl_swash_init(pTHX_ char* pkg, char* n
SV* errsv_save;

 if \(\!gv\_fetchmeth\(stash\, "SWASHNEW"\, 8\, \-1\)\) \{    /\* demand load utf8 \*/

+ COP *ocurcop = PL_curcop;
ENTER;
errsv_save = newSVsv(ERRSV);
Perl_load_module(aTHX_ PERL_LOADMOD_NOIMPORT, newSVpv(pkg,0), Nullsv);
@​@​ -1524,6 +1525,7 @​@​ Perl_swash_init(pTHX_ char* pkg, char* n
sv_setsv(ERRSV, errsv_save);
SvREFCNT_dec(errsv_save);
LEAVE;
+ PL_curcop = ocurcop;
}
SPAGAIN;
PUSHSTACKi(PERLSI_MAGIC);
--------------------------------------------------------------------

The demand loading of utf8 clobbers PL_curcop at Perl_scalarseq (op.c​:856).

This is some kind of reversed backtrace​:
Perl_pp_match pp_hot.c​:1255
Perl_regexec_flags at regexec.c​:1933
S_find_byclass at regexec.c​:1111
Perl_to_utf8_fold at utf8.c​:1504
Perl_to_utf8_case at utf8.c​:1349
Perl_swash_init at utf8.c​:1522
Perl_load_module at op.c​:2922
Perl_vload_module at op.c​:2970
Perl_utilize at op.c​:2877
Perl_newATTRSUB at op.c​:4152
Perl_scalarseq op.c​:856

Maybe the save/retrieve should be made even deeper
(in load_module, utilize, etc ?)

The /i flag is really confusing. When it's left out, Perl aggressively
optimizes away code and the bug doesn't creep out. Check with this​:

@p5pRT
Copy link
Author

p5pRT commented Feb 4, 2003

From sky@nanisky.com

On tisdag, feb 4, 2003, at 11​:15 Europe/Stockholm, Enache Adrian wrote​:

On Mon, Feb 03, 2003 at 05​:52​:17PM +0200, Jarkko Hietaniemi wrote​:

$_ = "foo";
utf8​::upgrade($_);
warn (/bar/i ? "bar" : "no bar");

and I also spent some time in the debugger and it seems that
Perl_pp_warn -> Perl_warner -> Perl_vmess -> S_closest_cop
thinks that CopLINE is zero, and that's why the line and the file
are not printed. I have no idea why CopLINE would be zero because
of case-insensitively matching against a UTF-8-ed scalar (Yes, the /i
is required.)

This fixes it for me​:
--------------------------------------------------------------------
--- /arc/perl-current/utf8.c 2003-01-07 12​:34​:01.000000000 +0200
+++ perl-current/utf8.c 2003-02-04 11​:35​:41.000000000 +0200
@​@​ -1517,6 +1517,7 @​@​ Perl_swash_init(pTHX_ char* pkg, char* n
SV* errsv_save;

 if \(\!gv\_fetchmeth\(stash\, "SWASHNEW"\, 8\, \-1\)\) \{    /\* demand load 

utf8 */
+ COP *ocurcop = PL_curcop;
ENTER;
errsv_save = newSVsv(ERRSV);
Perl_load_module(aTHX_ PERL_LOADMOD_NOIMPORT, newSVpv(pkg,0),
Nullsv);
@​@​ -1524,6 +1525,7 @​@​ Perl_swash_init(pTHX_ char* pkg, char* n
sv_setsv(ERRSV, errsv_save);
SvREFCNT_dec(errsv_save);
LEAVE;
+ PL_curcop = ocurcop;
}
SPAGAIN;
PUSHSTACKi(PERLSI_MAGIC);
--------------------------------------------------------------------

The demand loading of utf8 clobbers PL_curcop at Perl_scalarseq
(op.c​:856).

This is some kind of reversed backtrace​:
Perl_pp_match pp_hot.c​:1255
Perl_regexec_flags at regexec.c​:1933
S_find_byclass at regexec.c​:1111
Perl_to_utf8_fold at utf8.c​:1504
Perl_to_utf8_case at utf8.c​:1349
Perl_swash_init at utf8.c​:1522
Perl_load_module at op.c​:2922
Perl_vload_module at op.c​:2970
Perl_utilize at op.c​:2877
Perl_newATTRSUB at op.c​:4152
Perl_scalarseq op.c​:856

Maybe the save/retrieve should be made even deeper
(in load_module, utilize, etc ?)

The /i flag is really confusing. When it's left out, Perl aggressively
optimizes away code and the bug doesn't creep out. Check with this​:

$ perl -Dt -e '$_="foo"; utf8​::upgrade($_); warn (/bar/i ? "bar" : "no
bar");' 2>/a
$ perl -Dt -e '$_="foo"; utf8​::upgrade($_); warn (/bar/m ? "bar" : "no
bar");' 2>
/b
$ diff -y -W80 ~/a ~/b | less -S

Regards
Adi

Nice one, but I think you are correct that the fix should be in
Perl_load_module.

Arthur

@p5pRT
Copy link
Author

p5pRT commented Feb 4, 2003

From enache@rdslink.ro

On Tue, Feb 04, 2003 at 11​:25​:41AM +0100, Rafael Garcia-Suarez wrote​:

Enache Adrian <enache@​rdslink.ro> wrote​:

On Mon, Feb 03, 2003 at 05​:52​:17PM +0200, Jarkko Hietaniemi wrote​:

$_ = "foo";
utf8​::upgrade($_);
warn (/bar/i ? "bar" : "no bar");

and I also spent some time in the debugger and it seems that
Perl_pp_warn -> Perl_warner -> Perl_vmess -> S_closest_cop
thinks that CopLINE is zero, and that's why the line and the file
are not printed. I have no idea why CopLINE would be zero because
of case-insensitively matching against a UTF-8-ed scalar (Yes, the /i
is required.)

This fixes it for me​:

Thanks, for the patch, but I've already fixed it in a less intrusive
way (I restore only the cop_line for the current cop.) And I patched
in function Perl_vload_module.
See : http​://archive.develooper.com/perl5-changes%40perl.org/msg06629.html
(strangely, the message I sent to P5P hasn't shown up already).

I didn't know about your patch. I only read the P5P list.

However I have the impression that your patch is more correct than
mine, without any evidence (I don't restore curcop but only cop_line,
and it appears that I get the correct cop_file(gv)? with and without
threads.) If you can work out any proof that your patch avoids problems
that my patch doesn't address, I'll be happy to apply it. I'll look
at it more closely in the meantime.

cop_file(gv) is correct because Perl_scalarseq() sets PL_curcop to
PL_compiling, which of course has the correct ->cop_file field.
This isn't true of the other fields.

You may consider saving/restoring the PL_curcop pointer instead of
the cop_file field , but I don't think that's mission-critical :)

Regards
Adi

@p5pRT
Copy link
Author

p5pRT commented Feb 4, 2003

From sky@nanisky.com

On tisdag, feb 4, 2003, at 18​:35 Europe/Stockholm, Enache Adrian wrote​:

cop_file(gv) is correct because Perl_scalarseq() sets PL_curcop to
PL_compiling, which of course has the correct ->cop_file field.
This isn't true of the other fields.

You may consider saving/restoring the PL_curcop pointer instead of
the cop_file field , but I don't think that's mission-critical :)

Regards
Adi

I think that we should restore the interpreter state to the correct
state after we have been in the compiler.

Arthur

@p5pRT
Copy link
Author

p5pRT commented Feb 4, 2003

From @rgs

A. Bergman wrote​:

On tisdag, feb 4, 2003, at 18​:35 Europe/Stockholm, Enache Adrian wrote​:

cop_file(gv) is correct because Perl_scalarseq() sets PL_curcop to
PL_compiling, which of course has the correct ->cop_file field.
This isn't true of the other fields.

You may consider saving/restoring the PL_curcop pointer instead of
the cop_file field , but I don't think that's mission-critical :)

I think that we should restore the interpreter state to the correct
state after we have been in the compiler.

Thus I applied the following : (change is in Perl_vload_module())

Change 18656 by rgs@​rgs-home on 2003/02/04 20​:02​:56

  Better version of change #18648, by Enache Adrian
  Message-ID​: <20030204101533.GA11817@​ratsnest.hole>

Affected files ...

... //depot/perl/op.c#537 edit

Differences ...

==== //depot/perl/op.c#537 (text) ====

@​@​ -2964,14 +2964,14 @​@​
  }
  {
  line_t ocopline = PL_copline;
- line_t ocopline2 = PL_curcop->cop_line;
+ COP *ocurcop = PL_curcop;
  int oexpect = PL_expect;

  utilize(!(flags & PERL_LOADMOD_DENY), start_subparse(FALSE, 0),
  veop, modname, imop);
  PL_expect = oexpect;
  PL_copline = ocopline;
- PL_curcop->cop_line = ocopline2;
+ PL_curcop = ocurcop;
  }
}

@p5pRT
Copy link
Author

p5pRT commented Feb 5, 2003

From @jhi

Since the issue got resolved, I added now the original test case and will mark
this problem ticket as resolved.

@p5pRT
Copy link
Author

p5pRT commented Feb 5, 2003

@jhi - Status changed from 'open' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant