Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

binmode() affects seek() and tell() ? #8149

Open
p5pRT opened this issue Oct 11, 2005 · 8 comments
Open

binmode() affects seek() and tell() ? #8149

p5pRT opened this issue Oct 11, 2005 · 8 comments

Comments

@p5pRT
Copy link

p5pRT commented Oct 11, 2005

Migrated from rt.perl.org#37419 (status was 'open')

Searchable as RT37419$

@p5pRT
Copy link
Author

p5pRT commented Oct 11, 2005

From pfusik@op.pl

This is a bug report for perl from pfusik@​op.pl,
generated with the help of perlbug 1.35 running under perl v5.8.7.


The last paragraph of "perldoc -f binmode" says that binmode()
affects seek() and tell(). But the docs for seek() and tell()
state that these functions always operate on bytes.
Which one is correct?



Flags​:
  category=docs
  severity=medium


Site configuration information for perl v5.8.7​:

Configured by builder at Mon Jun 6 13​:36​:05 2005.

Summary of my perl5 (revision 5 version 8 subversion 7) configuration​:
  Platform​:
  osname=MSWin32, osvers=5.0, archname=MSWin32-x86-multi-thread
  uname=''
  config_args='undef'
  hint=recommended, useposix=true, d_sigaction=undef
  usethreads=define use5005threads=undef useithreads=define usemultiplicity=define
  useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
  use64bitint=undef use64bitall=undef uselongdouble=undef
  usemymalloc=n, bincompat5005=undef
  Compiler​:
  cc='cl', ccflags
='-nologo -Gf -W3 -MD -Zi -DNDEBUG -O1 -DWIN32 -D_CONSOLE -DNO_STRICT -DHAVE_DES_FCRYPT -DBUILT_BY_ACTIVESTATE -DNO_HASH_SEED -DUSE_
SITECUSTOMIZE -DPERL_IMPLICIT_CONTEXT -DPERL_IMPLICIT_SYS -DUSE_PERLIO -DPERL_MSVCRT_READFIX',
  optimize='-MD -Zi -DNDEBUG -O1',
  cppflags='-DWIN32'
  ccversion='12.00.8804', gccversion='', gccosandvers=''
  intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
  d_longlong=undef, longlongsize=8, d_longdbl=define, longdblsize=10
  ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='__int64', lseeksize=8
  alignbytes=8, prototype=define
  Linker and Libraries​:
  ld='link', ldflags ='-nologo -nodefaultlib -debug -opt​:ref,icf -libpath​:"C​:\j\Perl\lib\CORE" -machine​:x86'
  libpth=\lib
  libs= oldnames.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib
netapi32.lib uuid.lib ws2_32.lib mpr.lib winmm.lib version.lib odbc32.lib odbccp32.lib msvcrt.lib
  perllibs= oldnames.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib
oleaut32.lib netapi32.lib uuid.lib ws2_32.lib mpr.lib winmm.lib version.lib odbc32.lib odbccp32.lib msvcrt.lib
  libc=msvcrt.lib, so=dll, useshrplib=yes, libperl=perl58.lib
  gnulibc_version='undef'
  Dynamic Linking​:
  dlsrc=dl_win32.xs, dlext=dll, d_dlsymun=undef, ccdlflags=' '
  cccdlflags=' ', lddlflags='-dll -nologo -nodefaultlib -debug -opt​:ref,icf -libpath​:"C​:\j\Perl\lib\CORE" -machine​:x86'

Locally applied patches​:
  ACTIVEPERL_LOCAL_PATCHES_ENTRY
  # if !defined(PERL_DARWIN)
  Iin_load_module moved for compatibility with build 806
  # endif
  # if defined(__hpux)
  Avoid signal flag SA_RESTART for older versions of HP-UX
  # endif
  PerlEx hacks for CGI​::Carp
  Less verbose ExtUtils​::Install and Pod​::Find
  instmodsh upgraded from ExtUtils-MakeMaker-6.25
  24699 ICMP_UNREACHABLE handling in Net​::Ping
  21540 Fix backward-compatibility issues in if.pm


@​INC for perl v5.8.7​:
  C​:/j/Perl/lib
  C​:/j/Perl/site/lib
  .


Environment for perl v5.8.7​:
  HOME (unset)
  LANG (unset)
  LANGUAGE (unset)
  LD_LIBRARY_PATH (unset)
  LOGDIR (unset)

PATH=C​:\WINDOWS;C​:\WINDOWS\COMMAND;C​:\U;C​:\JAVA\JDK\BIN;C​:\C\DJGPP\BIN;C​:\J\PERL\BIN;C​:\J\MYSQL\BIN;C​:\U\TEXMF\MAIN\MIKTEX\BIN;C​:\U\
GS\GS8.11\BIN;C​:\C\CYGWIN\BIN;C​:\C\CYGWIN\USR\BIN;C​:\WINDOWS;C​:\WINDOWS\COMMAND;C​:\U\ANT\BIN;C​:\U\SVN\BIN
  PERL_BADLANG (unset)
  SHELL (unset)

@p5pRT
Copy link
Author

p5pRT commented Oct 15, 2005

From pfusik@op.pl

I wrote​:

The last paragraph of "perldoc -f binmode" says that binmode()
affects seek() and tell(). But the docs for seek() and tell()
state that these functions always operate on bytes.
Which one is correct?

No-one here knows that?

Piotr

@p5pRT
Copy link
Author

p5pRT commented Oct 16, 2005

From @ysth

On Sat, Oct 15, 2005 at 03​:22​:20PM +0200, Piotr Fusik wrote​:

I wrote​:

The last paragraph of "perldoc -f binmode" says that binmode()
affects seek() and tell(). But the docs for seek() and tell()
state that these functions always operate on bytes.
Which one is correct?

I don't see any specific mention of binmode actually affecting seek and
tell; it just says it is "important", and points you at perlport.
What is it that you are worried about?

No-one here knows that?

The seek and tell doc are referring to bytes vs. characters, not
anything to do with crlf.

@p5pRT
Copy link
Author

p5pRT commented Oct 16, 2005

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Oct 17, 2005

From pfusik@op.pl

The last paragraph of "perldoc -f binmode" says that binmode()
affects seek() and tell(). But the docs for seek() and tell()
state that these functions always operate on bytes.
Which one is correct?

I don't see any specific mention of binmode actually affecting seek and
tell; it just says it is "important", and points you at perlport.

If binmode() cannot affect seek() and tell() in any way, it is a nonsense
to say that it is "important" for these.
Why sysseek() is not mentioned?

What is it that you are worried about?

Whether it is fine to do seek() before calling binmode() and reading data.

No-one here knows that?

The seek and tell doc are referring to bytes vs. characters, not
anything to do with crlf.

Do you mean bytes/utf8 mode doesn't matter but crlf does?

Piotr

@p5pRT
Copy link
Author

p5pRT commented Oct 20, 2005

From @ysth

On Sun, Oct 16, 2005 at 11​:48​:37PM +0200, Piotr Fusik wrote​:

The last paragraph of "perldoc -f binmode" says that binmode()
affects seek() and tell(). But the docs for seek() and tell()
state that these functions always operate on bytes.
Which one is correct?

I don't see any specific mention of binmode actually affecting seek and
tell; it just says it is "important", and points you at perlport.

If binmode() cannot affect seek() and tell() in any way, it is a nonsense
to say that it is "important" for these.

It's important in the sense that with it, you should be able to seek
to sequential locations and read a byte at each and get the same bytes
as if you just read them without seeking. Without binmode, that's not
true.

Why sysseek() is not mentioned?

What is it that you are worried about?

Whether it is fine to do seek() before calling binmode() and reading data.

AFAIK that should work, unless there's some OS that makes tell()
return something other than a simple byte offset into the file
in the presense of CRLFs. os2?

No-one here knows that?

The seek and tell doc are referring to bytes vs. characters, not
anything to do with crlf.

Do you mean bytes/utf8 mode doesn't matter but crlf does?

crlf is an in-between case; it's one byte in memory but two on the disk.
I would expect any sane implementation of seek/tell to count it as two
positions, but I don't believe that's 100% guaranteed to be the case.

@p5pRT
Copy link
Author

p5pRT commented Oct 20, 2005

From pfusik@op.pl

If binmode() cannot affect seek() and tell() in any way, it is a nonsense
to say that it is "important" for these.

It\'s important in the sense that with it, you should be able to seek
to sequential locations and read a byte at each and get the same bytes
as if you just read them without seeking. Without binmode, that\'s not
true.

I wouldn't say this means that binmode() is important
for seek() and tell(), it's just for read(), getc() etc.
(there isn't any difference if you only use seek() and tell())

Whether it is fine to do seek() before calling binmode() and reading data.

AFAIK that should work, unless there\'s some OS that makes tell()
return something other than a simple byte offset into the file
in the presense of CRLFs. os2?

I doubt. If it would be so, descriptions of tell() and seek()
are erroneous.

Do you mean bytes/utf8 mode doesn\'t matter but crlf does?

crlf is an in-between case; it\'s one byte in memory but two on the disk.
I would expect any sane implementation of seek/tell to count it as two
positions, but I don\'t believe that\'s 100% guaranteed to be the case.

Now I wonder how tell() and relative-seek() can reliably
work on byte offsets when combined with buffering.
Will they work fine if I do a seek(), read some characters
(I mean CRLF or UTF8) and then call tell()?

@p5pRT
Copy link
Author

p5pRT commented Oct 20, 2005

From @wb8tyw

I am jumping in here from my faulty memory because I do not have time to
  look up the precise information​:

Piotr Fusik wrote​:

If binmode() cannot affect seek() and tell() in any way, it is a nonsense
to say that it is "important" for these.

It\'s important in the sense that with it, you should be able to seek
to sequential locations and read a byte at each and get the same bytes
as if you just read them without seeking. Without binmode, that\'s not
true.

I wouldn't say this means that binmode() is important
for seek() and tell(), it's just for read(), getc() etc.
(there isn't any difference if you only use seek() and tell())

Unless something is being done special for Perl on OpenVMS, only in the
binary mode, or if the file was created in a STREAM mode can tell() have
the correct information.

OpenVMS for the native text file format, OpenVMS does not store the "LF"
or "CRLF" or "CR" in the file. Instead, each record on disk starts out
with a byte count value of that record. A value in the directory entry
for the file tells the file system how the file is to be rendered when
it is displayed.

When the OpenVMS C library reads that file in, unless the file is opened
in a "raw" binary form, it appends a "LF" if needed to make it look like
a UNIX file.

Note though that because it is a counted field, any character value can
be present in the data, including NULL, CR, and LF.

Because of that, tell() only knows for sure the record offset into the
file to save for a subsequent seek(). The seek() command can not be
used to position to a specific offset because of the hidden record stuff
that is on disk and that the "LF" is not.

There is a fgetpos() and fsetpos() that know how to find and get to the
exact place in a file that they saved, because there is no assumption in
them that a file is merely a stream of bytes.

I have tried to find a way for seek() and tell() to be accurate on a
native format text file that was open for read and write, and I could
not find a way to do it.

The only way I could find was to reread the entire file if a seek() was
ever done after a write(), and there is also the problem that a write()
on a record oriented file writes more binary bytes to the file than a
UNIX program expects, and if you are simulating a UNIX text file, and
writing to the middle of it or a partial line, you have to do a lot of
fancy tricks with buffering.

When you start having to do that on a file that is several megabytes in
size, the cost starts to show up in program run time.

Whether it is fine to do seek() before calling binmode() and reading data.

AFAIK that should work, unless there\'s some OS that makes tell()
return something other than a simple byte offset into the file
in the presense of CRLFs. os2?

With any operating system that has record oriented files and is
simulating the UNIX lf terminators, the results of seek() to anything
that is not the start of a record returned by tell() may not give the
expected result.

I doubt. If it would be so, descriptions of tell() and seek()
are erroneous.

I would have to read their precise wording to see that.

Do you mean bytes/utf8 mode doesn\'t matter but crlf does?

crlf is an in-between case; it\'s one byte in memory but two on the disk.
I would expect any sane implementation of seek/tell to count it as two
positions, but I don\'t believe that\'s 100% guaranteed to be the case.

Actually it could be 0 bytes on disk, and what should be presented just
a flag in the file header.

Now I wonder how tell() and relative-seek() can reliably
work on byte offsets when combined with buffering.
Will they work fine if I do a seek(), read some characters
(I mean CRLF or UTF8) and then call tell()?

It is platform dependent unless the on disk format is just a stream of
bytes and is never "massaged" through the C library.

This is the case on UNIX, but it is not the case on other platforms.

OpenVMS has several record formats where reading them in binary mode
will give a totally different result than reading them in a normal mode.

If I get a file from a DOS system with CR-LF terminators and the file
header is properly tagged, all native OpenVMS programs will be able to
read it in normal mode with out getting extra blank lines or garbage at
the end of lines.

The same is true with a file from a UNIX system on VMS. As long as the
file header information is correct, it can be read by all native OpenVMS
programs, with out them having special code to deal with the record
terminators.

-John
wb8tyw@​qsl.net
Personal Opinion Only

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants