Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved autosplit mode #14647

Open
p5pRT opened this issue Apr 13, 2015 · 6 comments
Open

Improved autosplit mode #14647

p5pRT opened this issue Apr 13, 2015 · 6 comments

Comments

@p5pRT
Copy link

p5pRT commented Apr 13, 2015

Migrated from rt.perl.org#124290 (status was 'open')

Searchable as RT124290$

@p5pRT
Copy link
Author

p5pRT commented Apr 13, 2015

From @epa

Created by @epa

The -a and -F options to perl are useful to work on column-formatted data.
Often you will want to change one column, for example to uppercase the
first column of a CSV file​:

  perl -F, -aE '$F[0] = uc $F[0]; print join ",", @​F'

Note that 'say' at the end. Perl takes care of splitting the input
string at commas, but you have to manually join it back. Another awkwardness
is that the newline character gets included in the last element of @​F,
so you have to pick either 'say' or 'print', depending on whether you
expect the newline to have been eliminated by your work. (So if you
pop the last element from @​F you have to 'say', but if you shift off
the first element you 'print'.)

Clearly the -a flag could be more useful but it is too much of a backwards
compatibility break to change it. Perhaps an improved flag -A would chomp,
autosplit, run the given code, and then join the string back together and 'say'
it? Then you can say

  perl -F, -AE '$F[0] = uc $F[0]'

  perl -AE '@​F = reverse @​F' # reverse words on each line

and so on. This -A flag would be more useful than -a IMHO,
and would make operating on CSV, TSV, passwd files etc a bit easier.

If you feel that perl's command line is busy enough already I
understand.

Perl Info

Flags:
    category=core
    severity=wishlist

Site configuration information for perl 5.18.4:

Configured by Red Hat, Inc. at Fri Feb 13 16:10:58 UTC 2015.

Summary of my perl5 (revision 5 version 18 subversion 4) configuration:
   
  Platform:
    osname=linux, osvers=3.18.5-201.fc21.x86_64, archname=x86_64-linux-thread-multi
    uname='linux buildvm-09.phx2.fedoraproject.org 3.18.5-201.fc21.x86_64 #1 smp mon feb 2 21:00:58 utc 2015 x86_64 x86_64 x86_64 gnulinux '
    config_args='-des -Doptimize=-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches  -m64 -mtune=generic -Dccdlflags=-Wl,--enable-new-dtags -Dlddlflags=-shared -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches  -m64 -mtune=generic -Wl,-z,relro  -Dshrpdir=/usr/lib64 -DDEBUGGING=-g -Dversion=5.18.4 -Dmyhostname=localhost -Dperladmin=root@localhost -Dcc=gcc -Dcf_by=Red Hat, Inc. -Dprefix=/usr -Dvendorprefix=/usr -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl5 -Dsitearch=/usr/local/lib64/perl5 -Dprivlib=/usr/share/perl5 -Dvendorlib=/usr/share/perl5/vendor_perl -Darchlib=/usr/lib64/perl5 -Dvendorarch=/usr/lib64/perl5/vendor_perl -Darchname=x86_64-linux-thread-multi -Dlibpth=/usr/local/lib64 /lib64 /usr/lib64 -Duseshrplib -Dusethreads -Duseithreads -Dusedtrace=/usr/bin/dtrace -Duselargefiles -Dd_semctl_semun -Di_db -Ui_ndbm -Di_gdbm -Di_shadow -Di_syslog -Dman3ext=3pm -Duseperlio -Dinstallusrbinperl=n -Ubincompat5005 -Uversiononly -Dpager=/usr/bin/less -isr -Dd_gethostent_r_proto -Ud_endhostent_r_proto -Ud_sethostent_r_proto -Ud_endprotoent_r_proto -Ud_setprotoent_r_proto -Ud_endservent_r_proto -Ud_setservent_r_proto -Dscriptdir=/usr/bin -Dusesitecustomize'
    hint=recommended, useposix=true, d_sigaction=define
    useithreads=define, usemultiplicity=define
    useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
    use64bitint=define, use64bitall=define, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='gcc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic',
    cppflags='-D_REENTRANT -D_GNU_SOURCE -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include'
    ccversion='', gccversion='4.8.3 20140911 (Red Hat 4.8.3-7)', gccosandvers=''
    intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
    ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='gcc', ldflags =' -fstack-protector'
    libpth=/usr/local/lib64 /lib64 /usr/lib64
    libs=-lresolv -lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lpthread -lc -lgdbm_compat
    perllibs=-lresolv -lnsl -ldl -lm -lcrypt -lutil -lpthread -lc
    libc=, so=so, useshrplib=true, libperl=libperl.so
    gnulibc_version='2.18'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,--enable-new-dtags'
    cccdlflags='-fPIC', lddlflags='-shared -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -Wl,-z,relro '

Locally applied patches:
    Fedora Patch1: Removes date check, Fedora/RHEL specific
    Fedora Patch3: support for libdir64
    Fedora Patch4: use libresolv instead of libbind
    Fedora Patch5: USE_MM_LD_RUN_PATH
    Fedora Patch6: Skip hostname tests, due to builders not being network capable
    Fedora Patch7: Dont run one io test due to random builder failures
    Fedora Patch9: Fix find2perl to translate ? glob properly (RT#113054)
    Fedora Patch10: Update h2ph(1) documentation (RT#117647)
    Fedora Patch11: Update pod2html(1) documentation (RT#117623)
    Fedora Patch12: Disable ornaments on perl5db AutoTrace tests (RT#118817)
    Fedora Patch14: Do not use system Term::ReadLine::Gnu in tests (RT#118821)
    Fedora Patch15: Define SONAME for libperl.so
    Fedora Patch16: Install libperl.so to -Dshrpdir value
    Fedora Patch18: Fix crash with \\&$glob_copy (RT#119051)
    Fedora Patch19: Fix coreamp.t rand test (RT#118237)
    Fedora Patch20: Reap child in case where exception has been thrown (RT#114722)
    Fedora Patch21: Fix using regular expressions containing multiple code blocks (RT#117917)
    Fedora Patch22: Create site paths by cpan for the first time (CPAN RT#99905)
    Fedora Patch200: Link XS modules to libperl.so with EU::CBuilder on Linux
    Fedora Patch201: Link XS modules to libperl.so with EU::MM on Linux


@INC for perl 5.18.4:
    /home/eda/lib/perl5/
    /home/eda/lib64/perl5/
    /usr/local/lib64/perl5
    /usr/local/share/perl5
    /usr/lib64/perl5/vendor_perl
    /usr/share/perl5/vendor_perl
    /usr/lib64/perl5
    /usr/share/perl5
    .


Environment for perl 5.18.4:
    HOME=/home/eda
    LANG=en_GB.UTF-8
    LANGUAGE (unset)
    LC_COLLATE=C
    LC_CTYPE=en_GB.UTF-8
    LC_MESSAGES=en_GB.UTF-8
    LC_MONETARY=en_GB.UTF-8
    LC_NUMERIC=en_GB.UTF-8
    LC_TIME=en_GB.UTF-8
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/home/eda/bin:/home/eda/bin:/usr/local/bin:/usr/bin:/sbin:/usr/sbin:/sbin:/usr/sbin
    PERL5LIB=/home/eda/lib/perl5/:/home/eda/lib64/perl5/
    PERL_BADLANG (unset)
    SHELL=/bin/bash


______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com
______________________________________________________________________

@p5pRT
Copy link
Author

p5pRT commented Apr 13, 2015

From @Tux

On Mon, 13 Apr 2015 02​:09​:37 -0700, "Ed Avis" (via RT)
<perlbug-followup@​perl.org> wrote​:

The -a and -F options to perl are useful to work on column-formatted data.
Often you will want to change one column, for example to uppercase the
first column of a CSV file​:

perl \-F\, \-aE '$F\[0\] = uc $F\[0\]; print join "\,"\, @&#8203;F'

Note that 'say' at the end. Perl takes care of splitting the input

I note the *absence* of say at the end :)

string at commas, but you have to manually join it back. Another awkwardness
is that the newline character gets included in the last element of @​F,
so you have to pick either 'say' or 'print', depending on whether you
expect the newline to have been eliminated by your work. (So if you
pop the last element from @​F you have to 'say', but if you shift off
the first element you 'print'.)

indeed

Clearly the -a flag could be more useful but it is too much of a backwards
compatibility break to change it. Perhaps an improved flag -A would chomp,
autosplit, run the given code, and then join the string back together and 'say'
it? Then you can say

I like your proposal, but I hate your example

perl \-F\, \-AE '$F\[0\] = uc $F\[0\]'

perl \-AE '@&#8203;F = reverse @&#8203;F'    \# reverse words on each line

and so on. This -A flag would be more useful than -a IMHO,
and would make operating on CSV, TSV, passwd files etc a bit easier.

Giving CSV as example is very dangerous. Your example already fails
drastically on fields that correctly have embedded comma's (or with
embedded newlines). Though many CSV files would work as proposed, a
whole lot will fail leaving the poor user scratch their head.

$ cat test.csv
a,b,c
1,2,3
"ah, well",,D'uh!
$ perl -F, -aE '$F[0] = uc $F[0]; print join ",", @​F' test.csv
A,b,c
1,2,3
"AH, well",,D'uh!
$ perl -MText​::CSV_XS=csv -wE'csv(in=>"test.csv",on_in=>sub{$_[1][0]=uc$_[1][0]})'
A,b,c
1,2,3
"AH, WELL",,D'uh!
$ perl -MText​::CSV_XS=csv -wE'csv(in=>"test.csv",filter=>{1=>sub{$_[1][0]=uc}})'
A,b,c
1,2,3
"AH, WELL",,D'uh!

The next version will be even shorter​:

$ perl -MText​::CSV_XS=csv -wE'csv(in=>"test.csv",filter=>{1=>sub{$_=uc}})'
A,b,c
1,2,3
"AH, WELL",,D'uh!

$ perl -Mblib -MText​::CSV_XS=csv -wE'csv(in=>"test.csv",filter=>{1=>sub{$_=uc},3=>sub{s/D.uh/Yeah/;1}})'
A,b,c
1,2,3
"AH, WELL",,Yeah!

If you feel that perl's command line is busy enough already I
understand.

--
H.Merijn Brand http​://tux.nl Perl Monger http​://amsterdam.pm.org/
using perl5.00307 .. 5.21 porting perl5 on HP-UX, AIX, and openSUSE
http​://mirrors.develooper.com/hpux/ http​://www.test-smoke.org/
http​://qa.perl.org http​://www.goldmark.org/jeff/stupid-disclaimers/

@p5pRT
Copy link
Author

p5pRT commented Apr 13, 2015

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Apr 13, 2015

From @epa

Indeed, I confused myself with say vs print, which kind of proves my point.
I accept that in general you cannot manipulate CSV files with autosplit, since there are complex quoting rules for fields containing spaces and "".
In my line of work, I am mostly dealing with simple numeric CSV files.

I have a slightly revised proposal. The new -A flag would

  - first chomp the string
  - then split into @​F (as -a does)
  - finally, modify the behaviour of -p to 'say' instead of 'print', if used with -p or with flags that imply -p

That lets you use -Ap or -i -A to filter column data and just -A to simply read it (but without having to care about the final newline).

--
Ed Avis <eda@​waniasset.com>

______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http​://www.symanteccloud.com
______________________________________________________________________

@p5pRT
Copy link
Author

p5pRT commented Apr 13, 2015

From @rjbs

* Ed Avis <eda@​waniasset.com> [2015-04-13T05​:59​:15]

- first chomp the string
- then split into @​F (as -a does)
- finally, modify the behaviour of -p to 'say' instead of 'print', if used with -p or with flags that imply -p

Why not use -l for the chomp-and-append-newline-later behavior?

--
rjbs

@p5pRT
Copy link
Author

p5pRT commented Apr 13, 2015

From @epa

Thanks Ricardo S., you are right that -l would deal with chomp and adding back newline.
I was not aware of that flag.
Then the missing feature is just to autojoin on output by the same field separator.

--
Ed Avis <eda@​waniasset.com>

______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http​://www.symanteccloud.com
______________________________________________________________________

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants