Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IO::Socket async connect() logic is broken on Solaris #11862

Open
p5pRT opened this issue Jan 11, 2012 · 6 comments
Open

IO::Socket async connect() logic is broken on Solaris #11862

p5pRT opened this issue Jan 11, 2012 · 6 comments

Comments

@p5pRT
Copy link

p5pRT commented Jan 11, 2012

Migrated from rt.perl.org#107962 (status was 'open')

Searchable as RT107962$

@p5pRT
Copy link
Author

p5pRT commented Jan 11, 2012

From carson@taltos.org

Created by carson@taltos.org

The async connect() logic in IO​::Socket is broken for Solaris (and almost
certainly other OS variants). It assumes you're allowed to call connect() a
second time, which is not at all guaranteed. Solaris correctly returns EINVAL
when IO​::Socket does so (as does Windows, but they spell it WSAEINVAL, and
there is code that handles that already).

There was a recent commit to change the select() logic, but it's still wrong.

Below is some very lightly tested code that tries to do the right thing on all platforms.

sub connect {
  @​_ == 2 or croak 'usage​: $sock->connect(NAME)';
  my $sock = shift;
  my $addr = shift;
  my $timeout = ${*$sock}{'io_socket_timeout'};
  my $err;
  my $blocking;

  $blocking = $sock->blocking(0) if $timeout;
  if (!connect($sock, $addr)) {
  if (defined $timeout && ($!{EINPROGRESS} || $!{EWOULDBLOCK})) {
  require IO​::Select;

  my $sel = new IO​::Select $sock;

  undef $!;
  my($r,$w,$e) = IO​::Select​::select($sel,$sel,$sel,$timeout);
  if (!defined($r) && !defined($w) && !defined($e)) {
  # select returns an empty list on timeout or other error
  $err = $!;
  if ($err == 0) {
  # timeout
  $err = (exists &Errno​::ETIMEDOUT ? &Errno​::ETIMEDOUT : 1);
  $@​ = "connect​: timeout";
  }
  }
  elsif(@​$w[0] || @​$r[0]) {
  # select() returns writable when connect() completes
  # select() returns readable if there's an error
  # if both, we either received data very quickly, or it's
  # an odd OS specific error behaviour.
  if (@​$r[0]) {
  if (exists(&Socket​::SO_ERROR)) {
  # The best way to get the error state, if supported
  $err = $sock->getsockopt(SOL_SOCKET,SO_ERROR);
  if ($err == 0) {
  undef $err;
  }
  }
  else {
  # If SO_ERROR isn't supported, the options to get
  # the error code are all bad. sysread($fd, $junk, 0)
  # works on some systems. Calling connect() again
  # works on others. But neither is reliable in
  # portable code. The only portable option is
  # getpeername(), which returns a generic error.
  if (! getpeername($sock)) {
  $err = $!;
  }
  }
  }
  }
  elsif(@​$e[0]) {
  # We really shouldn't ever get here. Exception means
  # OOB data is waiting on most UNIX platforms, and
  # should never be set without read or write.
  # Windows return from select after the timeout in case of
  # WSAECONNREFUSED(10061) if exception set is not used.
  # This behavior is different from Linux.
  # Using the exception
  # set we now emulate the behavior in Linux
  # - Karthik Rajagopalan
  $err = $sock->getsockopt(SOL_SOCKET,SO_ERROR);
  $@​ = "connect​: $err";
  }
  }
  elsif ($blocking || !($!{EINPROGRESS} || $!{EWOULDBLOCK})) {
  $err = $!;
  $@​ = "connect​: $!";
  }
  }

  $sock->blocking(1) if $blocking;

  $! = $err if $err;

  $err ? undef : $sock;
}

Perl Info

Flags:
    category=library
    severity=high
    module=IO::Socket

Site configuration information for perl 5.14.2:

Configured by carson at Tue Jan 10 19:32:20 PST 2012.

Summary of my perl5 (revision 5 version 14 subversion 2) configuration:
   
  Platform:
    osname=solaris, osvers=2.11, archname=i86pc-solaris-64
    uname='sunos gandalf.local.taltos.org 5.11 11.0 i86pc i386 i86pc '
    config_args='-Dprefix=/Tools/SunOS_5.11_i86pc_amd64/perl-5.14.2 -Doptimize=-xO5 -xchip=sandybridge -xarch=sse4_2 -xarch=avx -xarch=aes -Dcf_email=carson@taltos.org -Dinstallsitelib=/var/tmp/site_perl/5.14.2 -Dotherlibdirs=/usr/local/lib/site_perl/5.14.2:/usr/local/lib/site_perl/5.14 -Dccflags=-m64 -d'
    hint=recommended, useposix=true, d_sigaction=define
    useithreads=undef, usemultiplicity=undef
    useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
    use64bitint=define, use64bitall=define, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-m64 -I/usr/local/include -m64 -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -DPERL_USE_SAFE_PUTENV',
    optimize='-xO5 -xchip=sandybridge -xarch=sse4_2 -xarch=avx -xarch=aes',
    cppflags='-m64 -I/usr/local/include'
    ccversion='Sun C 5.12 SunOS_i386 Spica 2011/07/21', gccversion='', gccosandvers=''
    intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
    ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -L/usr/lib -L/usr/ccs/lib -L/opt/solstudiodev/prod/lib -L/lib -L/usr/local/lib -L/usr/gnu/lib -m64 '
    libpth=/usr/lib /usr/ccs/lib /opt/solstudiodev/prod/lib /lib /usr/local/lib /usr/gnu/lib
    libs=-lsocket -lnsl -lgdbm -ldb -ldl -lm -lc
    perllibs=-lsocket -lnsl -ldl -lm -lc
    libc=/lib/libc.so, so=so, useshrplib=false, libperl=libperl.a
    gnulibc_version=''
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags=' '
    cccdlflags='-KPIC', lddlflags=' -G -m64 -L/usr/lib -L/usr/ccs/lib -L/opt/solstudiodev/prod/lib -L/lib -L/usr/local/lib -L/usr/gnu/lib'

Locally applied patches:
    


@INC for perl 5.14.2:
    /Tools/SunOS_5.11_i86pc_amd64/perl-5.14.2/lib/site_perl/5.14.2/i86pc-solaris-64
    /Tools/SunOS_5.11_i86pc_amd64/perl-5.14.2/lib/site_perl/5.14.2
    /Tools/SunOS_5.11_i86pc_amd64/perl-5.14.2/lib/5.14.2/i86pc-solaris-64
    /Tools/SunOS_5.11_i86pc_amd64/perl-5.14.2/lib/5.14.2
    /usr/local/lib/site_perl/5.14.2/i86pc-solaris-64
    /usr/local/lib/site_perl/5.14.2
    /usr/local/lib/site_perl/5.14
    .


Environment for perl 5.14.2:
    HOME=/home/carson
    LANG=en_US.UTF-8
    LANGUAGE (unset)
    LC_ALL=
    LC_COLLATE=
    LC_CTYPE=
    LC_MESSAGES=
    LC_MONETARY=
    LC_NUMERIC=
    LC_TIME=
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/tools/perl/bin:/tools/python/bin:/opt/solstudiodev/bin:/usr/ccs/bin:/bin:/usr/bin:/sbin:/usr/sbin:/usr/openwin/bin:/usr/dt/bin:/usr/local/bin
    PERL_BADLANG (unset)
    SHELL=/bin/bash

@p5pRT
Copy link
Author

p5pRT commented Jan 12, 2012

From @jkeenan

On Wed Jan 11 03​:39​:04 2012, carson@​taltos.org wrote​:

This is a bug report for perl from carson@​taltos.org,
generated with the help of perlbug 1.39 running under perl 5.14.2.

-----------------------------------------------------------------
[Please describe your issue here]
The async connect() logic in IO​::Socket is broken for Solaris (and
almost
certainly other OS variants). It assumes you're allowed to call
connect() a
second time, which is not at all guaranteed. Solaris correctly returns
EINVAL
when IO​::Socket does so (as does Windows, but they spell it WSAEINVAL,
and
there is code that handles that already).

There was a recent commit to change the select() logic, but it's still
wrong.

Below is some very lightly tested code that tries to do the right
thing on all platforms.

Thank you for the report. Two requests​: Can you provide an example of
how your subroutine should be called? And what do you mean by "tries to
do the right thing", i.e., how will I know whether it's "working" on the
platforms I have available?

Thank you very much.
Jim Keenan

@p5pRT
Copy link
Author

p5pRT commented Jan 12, 2012

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Jan 12, 2012

From carson@taltos.org

On 1/11/12 4​:33 PM, James E Keenan via RT wrote​:

Thank you for the report. Two requests​: Can you provide an example of
how your subroutine should be called? And what do you mean by "tries to
do the right thing", i.e., how will I know whether it's "working" on the
platforms I have available?

This is a replacement for the existing connect method in IO/Socket.pm. I
do _not_ suggest it be checked in as-is. It's a 3am "why is this !@​#$%
code not working!" rough hack that Works for Me(tm). It needs test
cases, and especially testing on Windows (I'm confident I understand the
UNIX semantics).

And "tries to do the right thing" means​:
- It will always succeed or fail correctly (no more false negatives)
- It will return the correct error from an async connect() IFF the
platform supports SO_ERROR
- If the platform does not support SO_ERROR, it will return a generic
error (whatever getpeername() returns on that platform). It does not
attempt the 2 non-portable ways to try and get the correct error code
(zero byte read() and calling connect() again), as their behaviour is
unpredictable.

This whole mess is caused by async connect() not having well defined
semantics. When I implemented support for async TCP socket operations in
the NEC SOCKS5 code in the 90s (to get squid to work), I asked W.
Richard Stevens about the corner cases, and he responded "that's a good
question, let me know what you find out". And I knew I was in trouble ;-)

--
Carson

@p5pRT
Copy link
Author

p5pRT commented Jan 12, 2012

From @nwc10

On Wed, Jan 11, 2012 at 04​:33​:51PM -0800, James E Keenan via RT wrote​:

On Wed Jan 11 03​:39​:04 2012, carson@​taltos.org wrote​:

This is a bug report for perl from carson@​taltos.org,
generated with the help of perlbug 1.39 running under perl 5.14.2.

-----------------------------------------------------------------
[Please describe your issue here]
The async connect() logic in IO​::Socket is broken for Solaris (and
almost
certainly other OS variants). It assumes you're allowed to call
connect() a
second time, which is not at all guaranteed. Solaris correctly returns
EINVAL
when IO​::Socket does so (as does Windows, but they spell it WSAEINVAL,
and
there is code that handles that already).

There was a recent commit to change the select() logic, but it's still
wrong.

Below is some very lightly tested code that tries to do the right
thing on all platforms.

Thank you for the report. Two requests​: Can you provide an example of
how your subroutine should be called? And what do you mean by "tries to
do the right thing", i.e., how will I know whether it's "working" on the
platforms I have available?

By "tries to do the right thing" he means "not call connect a second time".

I'm not sure if we have any regression tests to check non-blocking connect.
It's also rather hard to check - it's all going to be timing related. For
example, (and I was working with software that was bitten by this)​:

Solaris 2.7 refactored the IP stack to be multithreaded. 2.6 was not, hence
non-blocking connect to a socket on localhost would always return
EWOULDBLOCK. This code was written to assume this, and so failed on 2.7,
where the connect would return 0 (for success), because the new multithreaded
IP stack *was* able to complete the connect fast enough.

I suspect that to really test the logic here, one would have to mock the
socket functions such as connect(). Which is do-able at a language level,
but is going to get messy because it won't test platform specific
differences, such as the EINVAL vs WSAEINVAL codes which the bug reporter
mentions (and I wasn't fully aware of).

I think in summary, if it's not currently "broken" on the platforms you
have access to, all you can verify is that it remains working.

Nicholas Clark

@p5pRT
Copy link
Author

p5pRT commented Jan 12, 2012

From @leonerd

On Thu, Jan 12, 2012 at 08​:35​:54AM +0000, Nicholas Clark wrote​:

I suspect that to really test the logic here, one would have to mock the
socket functions such as connect(). Which is do-able at a language level,
but is going to get messy because it won't test platform specific
differences, such as the EINVAL vs WSAEINVAL codes which the bug reporter
mentions (and I wasn't fully aware of).

I think in summary, if it's not currently "broken" on the platforms you
have access to, all you can verify is that it remains working.

If it's any help, I observe that IO​::Socket​::IP manages to do blocking
and non-blocking connects OK on both Solaris and MSWin32 in particular​:

  http​://matrix.cpantesters.org/?dist=IO-Socket-IP%200.08;reports=1;os=solaris
  (no MSWin32 smoke results on 0.08, but 0.07_003 worked fine)

Admittedly some of the logic in IO​::Socket​::IP has to end-run around
IO​::Socket's methods, to get the right behavioural semantics out of it.
Most notably​:

  # It seems that IO​::Socket hides EINPROGRESS errors, making them look
  # like a success. This is annoying here.
  # Instead of putting up with its frankly-irritating intentional
  # breakage of useful APIs I'm just going to end-run around it and
  # call CORE​::connect() directly
  if( CORE​::connect( $self, $addr ) ) {
  $! = 0;
  return 1;
  }

--
Paul "LeoNerd" Evans

leonerd@​leonerd.org.uk
ICQ# 4135350 | Registered Linux# 179460
http​://www.leonerd.org.uk/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants