Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unusual tell results with :crlf layer and multibyte input #8338

Closed
p5pRT opened this issue Feb 17, 2006 · 12 comments
Closed

Unusual tell results with :crlf layer and multibyte input #8338

p5pRT opened this issue Feb 17, 2006 · 12 comments

Comments

@p5pRT
Copy link

p5pRT commented Feb 17, 2006

Migrated from rt.perl.org#38587 (status was 'new')

Searchable as RT38587$

@p5pRT
Copy link
Author

p5pRT commented Feb 17, 2006

From adavies@ptc.com

Created by adavies@ptc.com

The presence or not of a :crlf layer on a multi byte encoded input
stream causes tell() to give different (incorrect?) results.

Below is an a testcase using utf16le.

# %<
my $file = './foobar';
print "#"x30, "\n";
print ">> using encoding utf16le <<\n";
# Create a file in the given encoding
open FOUT, ">:raw:encoding(utf16le):crlf", $file
or die "can't write file: $!\n";
print FOUT "aaa\nbbb\nccc\nddd\neee\n";
print "At end of write tell = ", tell(FOUT), "\n";
close FOUT;
print "File size = ", (-s $file), "\n";
open FIN, "<:raw:encoding(utf16le)", $file
or die "can't read file(2): $!\n";
while (<FIN>) {
s/\015?\012\z/\n/;
print "No crlf => $.:", tell(FIN), ":$_";
}
close FIN;
print "#"x30, "\n";
open FIN, "<:raw:encoding(utf16le):crlf", $file
or die "can't read file(1): $!\n";
while (<FIN>) {
print "With crlf => $.:", tell(FIN), ":$_";
}
close FIN;

END { unlink $file }
__END__
>> using encoding utf16le <<
At end of write tell = 25
File size = 50
No crlf => 1:10:aaa
No crlf => 2:20:bbb
No crlf => 3:30:ccc
No crlf => 4:40:ddd
No crlf => 5:50:eee
##############################
With crlf => 1:5:aaa
With crlf => 2:10:bbb
With crlf => 3:15:ccc
With crlf => 4:20:ddd
With crlf => 5:25:eee
# >%

Is this a bug?

In the documentation for tell:
"Returns the current position in bytes for FILEHANDLE..."
i suppose could refer to the position in the internal representation
of the FILEHANDLE's contents, in which case both outputs could
be construed to be correct, but it certainly caught me by surprise.

Cheers, alex.

Perl Info

Flags:
    category=core
    severity=low

Site configuration information for perl v5.8.7:

Configured by Andy_Cuss at Thu Aug 11 14:02:10 2005.

Summary of my perl5 (revision 5 version 8 subversion 7) configuration:
  Platform:
    osname=MSWin32, osvers=5.1, archname=MSWin32-x86-multi-thread
    uname=''
    config_args='undef'
    hint=recommended, useposix=true, d_sigaction=undef
    usethreads=define use5005threads=undef useithreads=define
usemultiplicity=define
    useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
    use64bitint=undef use64bitall=undef uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cl', ccflags ='-nologo -Gf -W3 -MD -DNDEBUG -O1 -DWIN32
-D_CONSOLE -DNO_STRICT -DHAVE_DES_FCRYPT  -DPERL_IMPLICIT_CONTEXT
-DPERL_IMPLICIT_SYS 

-DUSE_PERLIO -DPERL_MSVCRT_READFIX',
    optimize='-MD -DNDEBUG -O1',
    cppflags='-DWIN32'
    ccversion='12.00.8804', gccversion='', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=undef, longlongsize=8, d_longdbl=define, longdblsize=10
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='__int64',
lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='link', ldflags ='-nologo -nodefaultlib -release
-libpath:"c:\perl3\lib\CORE"  -machine:x86'
    libpth=\lib
    libs=  oldnames.lib kernel32.lib user32.lib gdi32.lib winspool.lib
comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib
netapi32.lib uuid.lib 

ws2_32.lib mpr.lib winmm.lib  version.lib odbc32.lib odbccp32.lib
msvcrt.lib
    perllibs=  oldnames.lib kernel32.lib user32.lib gdi32.lib
winspool.lib  comdlg32.lib advapi32.lib shell32.lib ole32.lib
oleaut32.lib  netapi32.lib 

uuid.lib ws2_32.lib mpr.lib winmm.lib  version.lib odbc32.lib
odbccp32.lib msvcrt.lib
    libc=msvcrt.lib, so=dll, useshrplib=yes, libperl=perl58.lib
    gnulibc_version='undef'
  Dynamic Linking:
    dlsrc=dl_win32.xs, dlext=dll, d_dlsymun=undef, ccdlflags=' '
    cccdlflags=' ', lddlflags='-dll -nologo -nodefaultlib -release
-libpath:"c:\perl3\lib\CORE"  -machine:x86'

Locally applied patches:
    


@INC for perl v5.8.7:
    C:/perl3/lib
    C:/perl3/site/lib
    .


Environment for perl v5.8.7:
    HOME=C:\alex
    LANG (unset)
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=C:\perl3\bin;C:\Program
Files\Tcl-8.4.8.0\bin;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32
\Wbem;C:\alex\bin;C:\Program Files\ATI 

Technologies\ATI Control Panel;C:\Program
Files\Perforce;C:\cygwin\bin;C:\Program Files\Crimson Editor;C:\Program
Files\Microsoft Visual 

Studio\VC98\Bin;C:\Program Files\Microsoft Visual
Studio\Common\Tools;C:\Program Files\Vantive32;C:\Program
Files\QuickTime\QTSystem\;C:\Program 

Files\cvsnt;C:\Program Files\Sun\AppServer\bin;C:\ghc\ghc-6.2.2\bin
    PERL_BADLANG (unset)
    SHELL (unset)

@toddr
Copy link
Member

toddr commented Feb 13, 2020

@Leont can you comment on this pls?

@Leont
Copy link
Contributor

Leont commented Feb 15, 2020

It's not even a :crlf issue, stacking :perlio on top reveals exactly the same issue.

I'm not quite sure how to fix it, the whole concept of tell is rather dubious when there's several layer of transforming and buffering code between the users and the file.

@toddr
Copy link
Member

toddr commented Feb 15, 2020

If I were designing this, I might say that it is the responsibility of each layer to track their position in relation to their parent layer. I assume that mechanism isn't there?

@Leont
Copy link
Contributor

Leont commented Feb 15, 2020

Buffering complicates everything. They don't read byte-for-byte from the preceding layers but in chunks; therefor the different layers are not necessarily at the same location. The top layer reports knows how many bytes it has consumed from the preceding stream, but it doesn't necessarily know how to translate that to a physical position. Reading from :unix:perlio:encoding(...):crlf is really more like a pipe system.

@toddr
Copy link
Member

toddr commented Apr 15, 2020

Perhaps the other option would be to die and indicate that tell is unsupported except on binary? This would potentially also break for a simple UTF8 layer.

@Leont
Copy link
Contributor

Leont commented Apr 17, 2020

Perhaps the other option would be to die and indicate that tell is unsupported except on binary?

It isn't, that's the confusing thing. We have a hack that makes it work fine as long as a single transforming layer is involved. :crlf or :encoding is fine, it's the combination that isn't.

@toddr
Copy link
Member

toddr commented Apr 17, 2020

Right but the combination of :crlf and :utf8 doesn't actually seem uncommon at this point.

@Leont
Copy link
Contributor

Leont commented Jul 6, 2020

Right but the combination of :crlf and :utf8 doesn't actually seem uncommon at this point.

:utf8 isn't affected, because it isn't a true layer

@toddr
Copy link
Member

toddr commented Jul 6, 2020

If that's the case, then I wonder what the actual use case is and if it's common enough that we need to worry about it? or should we just close this as: "Yes. Yes it is broken. Don't do that!"

@Leont
Copy link
Contributor

Leont commented Jul 7, 2020

If that's the case, then I wonder what the actual use case is and if it's common enough that we need to worry about it?

:utf8 isn't affected, :encoding() is.

or should we just close this as: "Yes. Yes it is broken. Don't do that!"

Perhaps we should. I mean, of you're using two layers that each transform the data stream, it becomes pretty much impossible to reliably determine what point in the original datastream from the transformed data that you observe.

@toddr
Copy link
Member

toddr commented Jul 7, 2020

Barring further comments from the original reporter, I'm closing this case.

@toddr toddr closed this as completed Jul 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants