Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Win32 command line parsing explained #326

Closed
p5pRT opened this issue Aug 3, 1999 · 7 comments
Closed

Win32 command line parsing explained #326

p5pRT opened this issue Aug 3, 1999 · 7 comments

Comments

@p5pRT
Copy link

p5pRT commented Aug 3, 1999

Migrated from rt.perl.org#1151 (status was 'resolved')

Searchable as RT1151$

@p5pRT
Copy link
Author

p5pRT commented Aug 3, 1999

From JB@Danware.dk

Created by jb@danware.dk

The README.win32 file in the perl distribution seems to indicate
that the porter does not fully understand the rules of command
line parsing on this platform, so here is my explanation for your
benefit.

The Win32 command shells *do not* parse the command line into
arguments. With two exceptions noted below, the command line
is passed unedited and unparsed as a single NULL terminated string
to perl.exe (or any other program). Quotes, spaces, wildcards
etc. are simply forwarded to the application without any checking
or enforcement of rules such as quote balancing.

This string is returned from the GetCommandLine(void) system call.

Thus the notion of UNIX-like argc/argv passed to main() is an
effect of the C runtime library and may be overridden by simply
providing your own implementation. If you can squeeze a call to
your own implementation into perl's main(), you don't even have to
figure out how to get rid of the C compilers parser, just ignore
the results and provide your own.

Also note, that most C compilers do come with source code for their
command line parser.

The two exceptions are​:
  1. Redirection​: The command shell parses for redirections and
pipes and deletes the associated characters from the command line.
  2. Choice of executable​: The command shell interprets the first
(possibly quoted) argument as the program to be located in the
path, but does not delete it from the command line.

Neither of these two exceptions apply to the CreateProcess system
call.

Hope this helps you make an even better Win32 port of perl.

Perl Info


Site configuration information for perl 5.00503:

Summary of my perl5 (5.0 patchlevel 5 subversion 03) configuration:
  Platform:
    osname=MSWin32, osvers=4.0, archname=MSWin32-x86
    uname=''
    hint=recommended, useposix=true, d_sigaction=undef
    usethreads=undef useperlio=undef d_sfio=undef
  Compiler:
    cc='cl.exe', optimize='-Od -MD -DNDEBUG', gccversion=
    cppflags='-DWIN32'
    ccflags ='-Od -MD -DNDEBUG -DWIN32 -D_CONSOLE -DNO_STRICT   '
    stdchar='char', d_stdstdio=define, usevfork=false
    intsize=4, longsize=4, ptrsize=4, doublesize=8
    d_longlong=undef, longlongsize=8, d_longdbl=define, longdblsize=10
    alignbytes=8, usemymalloc=n, prototype=define
  Linker and Libraries:
    ld='link', ldflags ='-nologo -nodefaultlib -release -machine:x86'
    libpth=\lib
    libs= oldnames.lib kernel32.lib user32.lib gdi32.lib  winspool.lib
comdlg32.lib advapi32.lib shell32.lib ole32.lib  oleaut32.lib
netapi32.lib uuid.lib wsock32.lib mpr.lib winmm.lib  version.lib
odbc32.lib odbccp32.lib msvcrt.lib
    libc=msvcrt.lib, so=dll, useshrplib=yes, libperl=perl.lib
  Dynamic Linking:
    dlsrc=dl_win32.xs, dlext=dll, d_dlsymun=undef, ccdlflags=' '
    cccdlflags=' ', lddlflags='-dll -nologo -nodefaultlib -release
-machine:x86'

Locally applied patches:
    


@INC for perl 5.00503:
    D:\BAT\perl\lib/MSWin32-x86
    D:\BAT\perl\lib
    D:\BAT\perl\site\lib
    .


Environment for perl 5.00503:
    HOME (unset)
    LANG (unset)
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
 
PATH=D:\BAT\perl\bin\MSWin32-x86;D:\BAT\PERL\bin;F:\NETOP.SRC\SHARE;D:\B
AT;D:\XDT\OFFICE\OFFICE;C:\WINNT40;C:\WINNT40\SYSTEM32;e:\DT\MSDEV\BIN;e
:\DT\MSDEV\BIN\WINNT
    PERL_BADLANG (unset)
    SHELL (unset)


DANWARE HAS MOVED!, NEW ADDRESS/PHONE BELOW

Jakob Bøhm Jensen     e-mail:jb@danware.dk
M.Sc.Eng.             http://www.danware.com
Danware Data A/S      phone: +45 45 90 25 25
Kongevejen 62         fax:   +45 45 90 25 26
DK-3460 Birkerod

Information in this e-mail does not constitute a binding
commitment on behalf of me or Danware Data A/S.



@p5pRT
Copy link
Author

p5pRT commented Jul 13, 2000

From @vanstyn

While trying to clear out some ancient cruft from the bug database,
I found this message from Jacob Bo/hm​:
  http​://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/1999-08/msg00098.html

I have no idea whether his assertions about win32 command-line parsing
are correct, but the information does not appear to have changed in
5.6.0 (looking at the paragraph that starts 'The crucial thing to
understand about the "cmd" shell'). Can someone that knows clarify?

Hugo

@p5pRT
Copy link
Author

p5pRT commented Jul 13, 2000

From [Unknown Contact. See original ticket]

I have no idea whether his assertions about win32 command-line parsing
are correct

They look mostly correct to me, but I've never tried to figure out
what happens to %environment% references, and there are still rules
that cmd.exe uses for things like quoting and escaping because it
has to parse the line to find the redirection characters (thus leaving
exciting areas for the application to parse things differently
than the shell :-).

@p5pRT
Copy link
Author

p5pRT commented Jul 13, 2000

From @gsar

On Thu, 13 Jul 2000 17​:13​:39 BST, Hugo wrote​:

While trying to clear out some ancient cruft from the bug database,
I found this message from Jacob Bo/hm​:
http​://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/1999-08/msg00098.html

I have no idea whether his assertions about win32 command-line parsing
are correct,

They're correct, but he's talking about it from the POV of the low-level
Win32 API. Perl uses the CRT's argv/argc abstraction that is built
(precariously) on top that low level support, so describing the lower
level is not much good to the end user.

but the information does not appear to have changed in
5.6.0 (looking at the paragraph that starts 'The crucial thing to
understand about the "cmd" shell'). Can someone that knows clarify?

I don't see anything that needs changing there.

However, we do need to fix various amounts of brokenness that prevent
system(@​args) from working right on windows. Search archives for
"spawnvp" if you're interested in pursuing this.

Sarathy
gsar@​ActiveState.com

@p5pRT
Copy link
Author

p5pRT commented Jul 13, 2000

From @vanstyn

In <200007131720.KAA12643@​molotok.activestate.com>, Gurusamy Sarathy writes​:
:On Thu, 13 Jul 2000 17​:13​:39 BST, Hugo wrote​:
:>While trying to clear out some ancient cruft from the bug database,
:>I found this message from Jacob Bo/hm​:
:> http​://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/1999-08/msg00098.htm
:l
:>
:>I have no idea whether his assertions about win32 command-line parsing
:>are correct,
:
:They're correct, but he's talking about it from the POV of the low-level
:Win32 API. Perl uses the CRT's argv/argc abstraction that is built
:(precariously) on top that low level support, so describing the lower
:level is not much good to the end user.
:
:>but the information does not appear to have changed in
:>5.6.0 (looking at the paragraph that starts 'The crucial thing to
:>understand about the "cmd" shell'). Can someone that knows clarify?
:
:I don't see anything that needs changing there.

Ok, I'll mark the bugid closed.

Hugo

@p5pRT
Copy link
Author

p5pRT commented Jul 14, 2000

From [Unknown Contact. See original ticket]

From​: Gurusamy Sarathy [mailto​:gsar@​ActiveState.com]

They're correct, but he's talking about it from the POV of
the low-level Win32 API. Perl uses the CRT's argv/argc
abstraction that is built (precariously) on top that low
level support, so describing the lower level is not much
good to the end user.

but the information does not appear to have changed in
5.6.0 (looking at the paragraph that starts 'The crucial thing to
understand about the "cmd" shell'). Can someone that knows clarify?

I don't see anything that needs changing there.

I could argue that the user needs to know that there are two forces at work
- the command shell (which may be CMD, but may also be something like 4NT,
or even something obscure like zsh) and the CRT argc/argv implementation
that Perl uses.

A particular gotcha with the 4NT shell is that if you fail to double %
characters, even when quoted, they get eaten by the environment variable
expansion process. This is not a problem with CMD.

I don't think it's README.win32's place to document the idiosyncracies of
all the various Windows shells which could be used (any more than we do for
Unix), but it probably *is* worth explaining that the shell and the CRT are
both involved.

Does the attached patch explain things any better?

[BTW, note that I changed the comment about quoting redirection characters -
based on my experiments, it *does* work, at least in CMD.EXE]

Paul.

---- Patch for perlwin32.pod ----

Inline Patch
--- perlwin32.pod.orig	Thu Jun 29 10:48:02 2000
+++ perlwin32.pod	Fri Jul 14 10:43:30 2000
@@ -283,29 +283,38 @@
 shells found in UNIX environments, you will be less than pleased
 with what Windows offers by way of a command shell.
 
-The crucial thing to understand about the "cmd" shell (which is
-the default on Windows NT) is that it does not do any wildcard
-expansions of command-line arguments (so wildcards need not be
-quoted).  It also provides only rudimentary quoting.  The only
-(useful) quote character is the double quote (").  It can be used to
-protect spaces in arguments and other special characters.  The
-Windows NT documentation has almost no description of how the
-quoting rules are implemented, but here are some general observations
-based on experiments:  The shell breaks arguments at spaces and
-passes them to programs in argc/argv.  Doublequotes can be used
-to prevent arguments with spaces in them from being split up.
-You can put a double quote in an argument by escaping it with
-a backslash and enclosing the whole argument within double quotes.
-The backslash and the pair of double quotes surrounding the
-argument will be stripped by the shell.
-
-The file redirection characters "<", ">", and "|" cannot be quoted
-by double quotes (there are probably more such).  Single quotes
-will protect those three file redirection characters, but the
-single quotes don't get stripped by the shell (just to make this
-type of quoting completely useless).  The caret "^" has also
-been observed to behave as a quoting character (and doesn't get
-stripped by the shell also).
+The crucial thing to understand about the Windows environment is that
+the command line you type in is processed twice before Perl sees it.
+First, your command shell (usually CMD.EXE on Windows NT, and
+COMMAND.COM on Windows 9x) preprocesses the command line, to handle
+redirection, environment variable expansion, and location of the
+executable to run. Then, the perl executable splits the remaining
+command line into individual arguments, using the C runtime library
+upon which Perl was built.
+
+It is particularly important to note that neither the shell nor the C
+runtime do any wildcard expansions of command-line arguments (so
+wildcards need not be quoted).  Also, the quoting behaviours of the
+shell and the C runtime are rudimentary at best (and may, if you are
+using a non-standard shell, be inconsistent).  The only (useful) quote
+character is the double quote (").  It can be used to protect spaces in
+arguments and other special characters.  The Windows NT documentation
+has almost no description of how the quoting rules are implemented, but
+here are some general observations based on experiments:  The C runtime
+breaks arguments at spaces and passes them to programs in argc/argv.
+Doublequotes can be used to prevent arguments with spaces in them from
+being split up.  You can put a double quote in an argument by escaping
+it with a backslash and enclosing the whole argument within double
+quotes.  The backslash and the pair of double quotes surrounding the
+argument will be stripped by the C runtime.
+
+The file redirection characters "<", ">", and "|" can be quoted by
+double quotes (although there are suggestions that this may not always
+be true).  Single quotes are not treated as quotes by the shell or the C
+runtime.  The caret "^" has also been observed to behave as a quoting
+character, but this appears to be a shell feature, and the caret is not
+stripped from the command line, so Perl still sees it (and the C runtime
+phase does not treat the caret as a quote character).
 
 Here are some examples of usage of the "cmd" shell:
 
@@ -344,6 +353,13 @@
 
 Discovering the usefulness of the "command.com" shell on Windows 9x
 is left as an exercise to the reader :)
+
+One particularly pernicious problem with the 4NT command shell for
+Windows NT is that it (nearly) always treats a % character as indicating
+that environment variable expansion is needed.  Under this shell, it is
+therefore important to always double any % characters which you want
+Perl to see (for example, for hash variables), even when they are
+quoted.
 
 =item Building Extensions
 

@p5pRT
Copy link
Author

p5pRT commented Oct 10, 2000

From @jhi

Does the attached patch explain things any better?

[BTW, note that I changed the comment about quoting redirection characters -
based on my experiments, it *does* work, at least in CMD.EXE]

Applied, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant