Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different results using while/each and for/key when processing a dbm while running Perl 5.16.3 under Windows 7 #13124

Open
p5pRT opened this issue Jul 25, 2013 · 11 comments

Comments

@p5pRT
Copy link

p5pRT commented Jul 25, 2013

Migrated from rt.perl.org#119007 (status was 'open')

Searchable as RT119007$

@p5pRT
Copy link
Author

p5pRT commented Jul 25, 2013

From josephwall99@comcast.net

Hi,
I have attached a perlbug document detailing the fact that I get different results when processing a dbm depending on whether I use while/each or for/keys under Windows 7.
Unfortunately, no one on the Perl forum I use was able to be of much help. This, together with the fact that both constructs yield identical results under Linux, led me to contact you.

Sincerely,
Joe Wall
josephwall99@​comcast.net

@p5pRT
Copy link
Author

p5pRT commented Jul 25, 2013

From josephwall99@comcast.net

Created by josephwall99@comcast.net

I have a simple script that reads through the ITunes music library looking for duplicate songs.
I have created a version of the script that processes a subset of the library. The input to the script is --

01 - Time Spent In Los Angeles.mp3
02 - A Whiter Shade Of Pale.mp3
jerry jeff walker - my old man (1).mp3
Jerry Jeff Walker - My Old Man (2).mp3
Waitresses - Christmas Wrapping (1).mp3
Waitresses - Christmas Wrapping (2).mp3
Waitresses - Christmas Wrapping.mp3.

The code that processes this input is --

#! /usr/bin/perl
  use strict;
  use Win32;
# use IO​::Tee;
 
  my %COUNT;

  my $total_count = 0;
  my $dupe_count = 0;
  my $total_dbms = 0;
  my $dupes = 0;

  chdir('i​:/') || die "Failed to change drive to I $!";
  open(SAMPLE, 'My_Music_Sample.txt') || die "Failed to open Music_Sample $!";
  dbmopen(%COUNT, "Count_Songs", 644) || die "Failed to open DBM $!";

  while (my $song_name = (<SAMPLE>)) {
  chomp($song_name);
  if (-d $song_name) {
  print "Skipping directory $song_name\n";
  next;
  }
  $song_name =~ tr/A-Z/a-z/;
  next if ($song_name =~ /albumart/);
  $song_name =~ s/[\\\/​:\*\?"<>|]/ /g;
  $song_name =~ s/\.mp3$//i;
  $song_name =~ s/ *\(\d+\)//g;
  $song_name =~ s/-/_/g;
  $song_name =~ s// /g;
  my $song_count = $COUNT{$song_name};
  $song_count = 0 if ($song_count eq '');
  ++$song_count;
  $COUNT{$song_name} = $song_count;
  ++$total_count;
  }

  dbmclose(%COUNT);
  dbmopen(%COUNT, "Count_Songs", 644) || die "Failed to open DBM $!";
  open DUPERPT, ">", "Dupe_Report.txt" || die "Failed to open Dupe_Report $!";

# while (my ($song_name, $song_count) = each(%COUNT)) {
  foreach my $song_name (keys %COUNT) {
  my $song_count = $COUNT{$song_name};
  ++$total_dbms;
  if ($song_count > 1) {
  print "Song $song_name has a count of $song_count\n";
  print DUPERPT "$song_name has a count of $song_count\n";
  ++$dupe_count;
  $dupes += $song_count;
  }
  $COUNT{$song_name} = 0;
  }

  print "Total number of songs = $total_count\n";
  print "Total number of dbm records read = $total_dbms\n";
  print "Total number of duplicate songs = $dupe_count\n";
  $dupes -= $dupe_count;
  print "Total number of duplicates = $dupes\n";
  print DUPERPT "Total number of songs read = $total_count\n";
  print DUPERPT "Total number of dbm records read = $total_dbms\n";
  print DUPERPT "Total number of duplicate songs = $dupe_count\n";
  print DUPERPT "Total number of duplicates = $dupes\n";

  closedir(MUSIC);
  close(DUPERPT);
  dbmclose(%COUNT);

If I run the script with the for/keys code operative, I get the following correct output --

jerry jeff walker _ my old man has a count of 2
waitresses _ christmas wrapping has a count of 3
Total number of songs read = 7
Total number of dbm records read = 4
Total number of duplicate songs = 2
Total number of duplicates = 3

If I run the script with the while/each code operative, I consistenly get the following incorrect output --

jerry jeff walker _ my old man has a count of 2
Total number of songs read = 7
Total number of dbm records read = 4
Total number of duplicate songs = 1
Total number of duplicates = 1

Note that the number of records read back from the dbm is the same in both instances.

When run under Linux, the two methods get identical results eqaul to what for/key returns under Windows.

A further complication is that the following code --

# usr/bin/perl -w
  use strict;
  use Fcntl;
  use AnyDBM_File;

  my $preferred_dbm = $AnyDBM_File​::ISA[0];
  my %COUNT;
  tie(%COUNT, $preferred_dbm, "Count_Songs", O_RDONLY, 0644) || die "Failed to open dbm $!";

  if (%COUNT) {
  my $total_missed = 0;
  for my $song_name (keys %COUNT) {
  my $song_count = $COUNT{$song_name};
  if ($song_count > 0) {
  print "Song $song_name has a count of $song_count\n";
  ++$total_missed;
  }
  }
  print "Total number of dupes missed = $total_missed\n";
  } else {
  print "COUNT IS EMPTY\n";
  }

  untie(%COUNT);

when run against the closed dbm, always yields "COUNT IS EMPTY". This code also works as expected under Linux.

Sincerely,
Joe Wall

Perl Info
---
Flags:
    category=core
    severity=medium
---
Site configuration information for perl 5.16.3:

Configured by cyg_server at Wed Mar 13 13:27:24 2013.

Summary of my perl5 (revision 5 version 16 subversion 3) configuration:
   
  Platform:
    osname=MSWin32, osvers=5.2, archname=MSWin32-x64-multi-thread
    uname=''
    config_args='undef'
    hint=recommended, useposix=true, d_sigaction=undef
    useithreads=define, usemultiplicity=define
    useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
    use64bitint=define, use64bitall=undef, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cl', ccflags ='-nologo -GF -W3 -MD -Zi -DNDEBUG -Ox -GL -fp:precise -DWIN32 -D_CONSOLE -DNO_STRICT -DWIN64 -DCONSERVATIVE -DPERL_TEXTMODE_SCRIPTS -DUSE_SITECUSTOMIZE -DPERL_IMPLICIT_CONTEXT -DPERL_IMPLICIT_SYS -DUSE_PERLIO',
    optimize='-MD -Zi -DNDEBUG -Ox -GL -fp:precise',
    cppflags='-DWIN32'
    ccversion='14.00.40310.41', gccversion='', gccosandvers=''
    intsize=4, longsize=4, ptrsize=8, doublesize=8, byteorder=12345678
    d_longlong=undef, longlongsize=8, d_longdbl=define, longdblsize=8
    ivtype='__int64', ivsize=8, nvtype='double', nvsize=8, Off_t='__int64', lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='link', ldflags ='-nologo -nodefaultlib -debug -opt:ref,icf -ltcg  -libpath:"C:\Perl64\lib\CORE"  -machine:AMD64'
    libpth=\lib
    libs=oldnames.lib kernel32.lib user32.lib gdi32.lib winspool.lib  comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib  netapi32.lib uuid.lib ws2_32.lib mpr.lib winmm.lib  version.lib odbc32.lib odbccp32.lib comctl32.lib bufferoverflowU.lib msvcrt.lib
    perllibs=oldnames.lib kernel32.lib user32.lib gdi32.lib winspool.lib  comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib  netapi32.lib uuid.lib ws2_32.lib mpr.lib winmm.lib  version.lib odbc32.lib odbccp32.lib comctl32.lib bufferoverflowU.lib msvcrt.lib
    libc=msvcrt.lib, so=dll, useshrplib=true, libperl=perl516.lib
    gnulibc_version=''
  Dynamic Linking:
    dlsrc=dl_win32.xs, dlext=dll, d_dlsymun=undef, ccdlflags=' '
    cccdlflags=' ', lddlflags='-dll -nologo -nodefaultlib -debug -opt:ref,icf -ltcg  -libpath:"C:\Perl64\lib\CORE"  -machine:AMD64'

Locally applied patches:
    ACTIVEPERL_LOCAL_PATCHES_ENTRY

---
@INC for perl 5.16.3:
    C:/Perl64/site/lib
    C:/Perl64/lib
    .

---
Environment for perl 5.16.3:
    HOME (unset)
    LANG (unset)
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=C:\Perl64\site\bin;C:\Perl64\bin;C:\Program Files\Common Files\Microsoft Shared\Windows Live;C:\Program Files (x86)\Common Files\Microsoft Shared\Windows Live;c:\Program Files (x86)\Intel\iCLS Client\;c:\Program Files\Intel\iCLS Client\;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;C:\Program Files (x86)\Windows Live\Shared;C:\Program Files (x86)\QuickTime\QTSystem\;C:\Program Files (x86)\Common Files\Roxio Shared\DLLShared\;C:\Program Files (x86)\Common Files\Roxio Shared\10.0\DLLShared\
    PERL_BADLANG (unset)
    SHELL (unset)


@p5pRT
Copy link
Author

p5pRT commented Jul 30, 2013

From @ap

And what happens when you comment out the line that says

  $COUNT{$song_name} = 0;

?

And what is the output of this on all your machines?

  perl -MAnyDBM_File -E "say for @​AnyDBM_File​::ISA"

--
Aristotle Pagaltzis // <http​://plasmasturm.org/>

@p5pRT
Copy link
Author

p5pRT commented Jul 30, 2013

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Jul 31, 2013

From josephwall99@comcast.net

Hi, Aristotle, thanks for your reply.

When I remove the line that resets each count to 0, the "Total number of duplicates" increases by 7 with each run.
Also, the code snippet that reads the dbm looking for "missed" entries works as expected.

When I enter

perl -MAnyDBM_File -E "say for @​AnyDBM_File​::ISA",

I get SDBM_File.

Thanks again,
Joe Wall
josephwall99@​comcast.net

----- Original Message -----
From​: "Aristotle Pagaltzis via RT" <perlbug-followup@​perl.org>
To​: josephwall99@​comcast.net
Sent​: Tuesday, July 30, 2013 10​:55​:19 AM
Subject​: [perl #119007] Different results using while/each and for/key when processing a dbm while running Perl 5.16.3 under Windows 7

And what happens when you comment out the line that says

$COUNT{$song_name} = 0;

?

And what is the output of this on all your machines?

perl -MAnyDBM_File -E "say for @​AnyDBM_File​::ISA"

--
Aristotle Pagaltzis // <http​://plasmasturm.org/>

@p5pRT
Copy link
Author

p5pRT commented Jul 31, 2013

From @ap

On Tue Jul 30 20​:11​:28 2013, josephwall99@​comcast.net wrote​:

When I remove the line that resets each count to 0, the "Total number
of duplicates" increases by 7 with each run.

Yes, obviously…

Also, the code snippet that reads the dbm looking for "missed" entries
works as expected.

Apparently writing to that tied hash at all, even just changing stored values, messes up its key
iterator at least under some circumstances.

Do you get the same bug on your Linux system if you put the following line near the top of the
script?

BEGIN { @​AnyDBM_File​::ISA = 'SDBM_File' }

--
Aristotle Pagaltzis // <http​://plasmasturm.org/>

@p5pRT
Copy link
Author

p5pRT commented Jul 31, 2013

From josephwall99@comcast.net

Aristotle,

Are you able to replicate my results?

Because I found - much to my amazement - that when I exited Linux and rebooted into Windows, the problem disappeared.
I cannot reproduce the error and get the same, correct results whether I run the script with while/each or for/keys.

I'm wondering what this says about the stability of Perl under Windows.

And I use NDBM_File under Linux, a module that was not available under Windows, which led me to use AnyDBM_File​::ISA[0], a construct which I used without understanding it.

Sincerely,
Joe Wall

----- Original Message -----
From​: "Aristotle Pagaltzis via RT" <perlbug-followup@​perl.org>
To​: josephwall99@​comcast.net
Sent​: Wednesday, July 31, 2013 3​:56​:55 AM
Subject​: [perl #119007] Different results using while/each and for/key when processing a dbm while running Perl 5.16.3 under Windows 7

On Tue Jul 30 20​:11​:28 2013, josephwall99@​comcast.net wrote​:

When I remove the line that resets each count to 0, the "Total number
of duplicates" increases by 7 with each run.

Yes, obviously…

Also, the code snippet that reads the dbm looking for "missed" entries
works as expected.

Apparently writing to that tied hash at all, even just changing stored values, messes up its key
iterator at least under some circumstances.

Do you get the same bug on your Linux system if you put the following line near the top of the
script?

BEGIN { @​AnyDBM_File​::ISA = 'SDBM_File' }

--
Aristotle Pagaltzis // <http​://plasmasturm.org/>

@p5pRT
Copy link
Author

p5pRT commented Jul 31, 2013

From josephwall99@comcast.net

Ok....
I have to retract a portion of what I wrote this morning.
I cannot reproduce the problem using the stripped down 7 record file, but still see the discrepancy when I run the script against the full music database.
Sorry for the confusion,
Joe

----- Original Message -----
From​: "JOSEPHWALL99" <josephwall99@​comcast.net>
To​: perlbug-followup@​perl.org
Sent​: Wednesday, July 31, 2013 9​:47​:14 AM
Subject​: Re​: [perl #119007] Different results using while/each and for/key when processing a dbm while running Perl 5.16.3 under Windows 7

Aristotle,

Are you able to replicate my results?

Because I found - much to my amazement - that when I exited Linux and rebooted into Windows, the problem disappeared.
I cannot reproduce the error and get the same, correct results whether I run the script with while/each or for/keys.

I'm wondering what this says about the stability of Perl under Windows.

And I use NDBM_File under Linux, a module that was not available under Windows, which led me to use AnyDBM_File​::ISA[0], a construct which I used without understanding it.

Sincerely,
Joe Wall

----- Original Message -----
From​: "Aristotle Pagaltzis via RT" <perlbug-followup@​perl.org>
To​: josephwall99@​comcast.net
Sent​: Wednesday, July 31, 2013 3​:56​:55 AM
Subject​: [perl #119007] Different results using while/each and for/key when processing a dbm while running Perl 5.16.3 under Windows 7

On Tue Jul 30 20​:11​:28 2013, josephwall99@​comcast.net wrote​:

When I remove the line that resets each count to 0, the "Total number
of duplicates" increases by 7 with each run.

Yes, obviously…

Also, the code snippet that reads the dbm looking for "missed" entries
works as expected.

Apparently writing to that tied hash at all, even just changing stored values, messes up its key
iterator at least under some circumstances.

Do you get the same bug on your Linux system if you put the following line near the top of the
script?

BEGIN { @​AnyDBM_File​::ISA = 'SDBM_File' }

--
Aristotle Pagaltzis // <http​://plasmasturm.org/>

@p5pRT
Copy link
Author

p5pRT commented Jul 31, 2013

From josephwall99@comcast.net

Aristotle,
At the risk of loosing all credibility, I have to report that after deleting the DBM files, rather than clearing the DBM with delete, the problem is back.
Running the script with while/each now yields --

jerry jeff walker _ my old man has a count of 2
Total number of songs read = 7
Total number of dbm records read = 4
Total number of duplicate songs = 1
Total number of duplicates = 1

Once again, christmas wrapping is not processed as a duplicate.
By the way, the same is true when I run the script against the entire database.

Again, sorry for the confusion,
Joe

----- Original Message -----
From​: "JOSEPHWALL99" <josephwall99@​comcast.net>
To​: perlbug-followup@​perl.org
Sent​: Wednesday, July 31, 2013 11​:24​:43 AM
Subject​: Re​: [perl #119007] Different results using while/each and for/key when processing a dbm while running Perl 5.16.3 under Windows 7

Ok....
I have to retract a portion of what I wrote this morning.
I cannot reproduce the problem using the stripped down 7 record file, but still see the discrepancy when I run the script against the full music database.
Sorry for the confusion,
Joe

----- Original Message -----
From​: "JOSEPHWALL99" <josephwall99@​comcast.net>
To​: perlbug-followup@​perl.org
Sent​: Wednesday, July 31, 2013 9​:47​:14 AM
Subject​: Re​: [perl #119007] Different results using while/each and for/key when processing a dbm while running Perl 5.16.3 under Windows 7

Aristotle,

Are you able to replicate my results?

Because I found - much to my amazement - that when I exited Linux and rebooted into Windows, the problem disappeared.
I cannot reproduce the error and get the same, correct results whether I run the script with while/each or for/keys.

I'm wondering what this says about the stability of Perl under Windows.

And I use NDBM_File under Linux, a module that was not available under Windows, which led me to use AnyDBM_File​::ISA[0], a construct which I used without understanding it.

Sincerely,
Joe Wall

----- Original Message -----
From​: "Aristotle Pagaltzis via RT" <perlbug-followup@​perl.org>
To​: josephwall99@​comcast.net
Sent​: Wednesday, July 31, 2013 3​:56​:55 AM
Subject​: [perl #119007] Different results using while/each and for/key when processing a dbm while running Perl 5.16.3 under Windows 7

On Tue Jul 30 20​:11​:28 2013, josephwall99@​comcast.net wrote​:

When I remove the line that resets each count to 0, the "Total number
of duplicates" increases by 7 with each run.

Yes, obviously…

Also, the code snippet that reads the dbm looking for "missed" entries
works as expected.

Apparently writing to that tied hash at all, even just changing stored values, messes up its key
iterator at least under some circumstances.

Do you get the same bug on your Linux system if you put the following line near the top of the
script?

BEGIN { @​AnyDBM_File​::ISA = 'SDBM_File' }

--
Aristotle Pagaltzis // <http​://plasmasturm.org/>

@p5pRT
Copy link
Author

p5pRT commented Aug 1, 2013

From @ap

I have run the script now.

I do not have a Windows machine, certainly not one with an I​: partition,
so I had to comment out the `use Win32` and chdir lines.

Then I had to fix the dbmopen permissions, which you gave as 644 when
you meant 0644.

Then it ran, and yes it works with the `foreach` loop.

Then I tried the `while` loop version. That one *never terminates*, on
my machine. It gets stuck in an infinite loop.

Then I tried it without the `$COUNT{$song_name} = 0` line and it worked
correctly (modulo reporting the wrong duplicate counts on subsequent
runs).

I repeated this with every DBM implementation I have on this machine
(NDBM_File, DB_File, SDBM_File) and got identical results.

Then I commented out all of the dbmopen and dbmclose lines, and as
I conjectured it would, the script then worked 100% correctly – both the
`foreach` version and the `while` version, with or without the counter
reset line.

In other words, this has to do with your use of DBM-tied hashes and has
nothing to do with `while` and `foreach` (nor even `each`).

So all the signs point to a “don’t do that then” situation – that is, do
not modify a DBM-tied hash while you are iterating over it with `each`.

This might be a bug in every single DBM implementation shipped by
Perl… which is not impossible, as I can not so far find this particular
limitation mentioned in any documentation.

As a workaround you could reset your database by doing a

  $COUNT{$_} = 0 for keys %COUNT;

*after* the dupe-counting loop.

But then I am baffled why you are using a DBM hash in the first place if
you reset all the counts anyway. You could just use a normal hash with
no persistence between program runs and get correct results, always.

(For that matter, your careful gyrations with $song_count are
unnecessary. You can increment an undefined variable just fine in Perl
and it will do the right thing for you, so the whole $song_count dance
in the first loop should be replaceable with just this​:

  ++$COUNT{$_};

Unless, that is, this hits another DBM implementation peculiarity.

But of course that too seems easily solved, in your case, by simply not
using a DBM-tied hash in the first place…)

--
Aristotle Pagaltzis // <http​://plasmasturm.org/>

@p5pRT
Copy link
Author

p5pRT commented Aug 1, 2013

From josephwall99@comcast.net

Thanks for your time and reply, Aristotle.

So..... to summarize --

The critical line of code here is $COUNT{$song_name} = 0, which updates the dbm values within the read loop.

Thus --

Under Windows, both while/each and for/keys work with a simple hash, even when the values are updated within the loop;
Under Windows, for/keys works with a dbm, even when the values are updated within the loop;
Under Linux, both white/each and for/keys work with a dbm for me, even when the values are updated within the loop.
Under Windows, while/each works with a dbm, unless you try to update the values within the loop, in which case results are in error;

So the undocumented error is attempting to update dbm values while reading a dbm with while/each under Windows.

The answer to why I'm using a dbm when a hash would suffice is just because, I guess.

Thanks, once again,
Joe

----- Original Message -----
From​: "Aristotle Pagaltzis via RT" <perlbug-followup@​perl.org>
To​: josephwall99@​comcast.net
Sent​: Thursday, August 1, 2013 3​:46​:53 AM
Subject​: [perl #119007] Different results using while/each and for/key when processing a dbm while running Perl 5.16.3 under Windows 7

I have run the script now.

I do not have a Windows machine, certainly not one with an I​: partition,
so I had to comment out the `use Win32` and chdir lines.

Then I had to fix the dbmopen permissions, which you gave as 644 when
you meant 0644.

Then it ran, and yes it works with the `foreach` loop.

Then I tried the `while` loop version. That one *never terminates*, on
my machine. It gets stuck in an infinite loop.

Then I tried it without the `$COUNT{$song_name} = 0` line and it worked
correctly (modulo reporting the wrong duplicate counts on subsequent
runs).

I repeated this with every DBM implementation I have on this machine
(NDBM_File, DB_File, SDBM_File) and got identical results.

Then I commented out all of the dbmopen and dbmclose lines, and as
I conjectured it would, the script then worked 100% correctly – both the
`foreach` version and the `while` version, with or without the counter
reset line.

In other words, this has to do with your use of DBM-tied hashes and has
nothing to do with `while` and `foreach` (nor even `each`).

So all the signs point to a “don’t do that then” situation – that is, do
not modify a DBM-tied hash while you are iterating over it with `each`.

This might be a bug in every single DBM implementation shipped by
Perl… which is not impossible, as I can not so far find this particular
limitation mentioned in any documentation.

As a workaround you could reset your database by doing a

$COUNT{$_} = 0 for keys %COUNT;

*after* the dupe-counting loop.

But then I am baffled why you are using a DBM hash in the first place if
you reset all the counts anyway. You could just use a normal hash with
no persistence between program runs and get correct results, always.

(For that matter, your careful gyrations with $song_count are
unnecessary. You can increment an undefined variable just fine in Perl
and it will do the right thing for you, so the whole $song_count dance
in the first loop should be replaceable with just this​:

++$COUNT{$_};

Unless, that is, this hits another DBM implementation peculiarity.

But of course that too seems easily solved, in your case, by simply not
using a DBM-tied hash in the first place…)

--
Aristotle Pagaltzis // <http​://plasmasturm.org/>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants