New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issues with thousands separators in perllocale #16686
Comments
From rdiezmail-perl@yahoo.deAdding thousands separators to a number is a rather common operation. Most languages use a printf format string like "%'d", but Perl does not support it. I suggest that you add this feature and be done with it. Otherwise, searching the Internet will throw up a few alternatives with different degrees of trouble. The following official Perl documentation page has example code for that task: https://perldoc.perl.org/perllocale.html Search for "Format command line params for current locale" in that page to find the example code I am talking about. My first impression when looking at that example was: A "1 while" loop with a cryptic regex inside... what an alarming Perl nerdiness sign... A traditional loop without regular expressions is easier to understand. Search for "This is for manually testing of AddThousandsSeparators()" in the following Perl script I recently wrote for an alternative implementation. https://github.com/rdiez/Tools/blob/master/AnnotateWithTimestamps/AnnotateWithTimestamps.pl I have done a quick benchmark, and my loop version is much faster, at least with Perl v5.22.1. One bad practice that stands out in the perllocale example is the usage of 2 variables with the same name but with different types, namely $grouping and @grouping. Furthermore, I would not encourage implicit usage of special variable $_ anymore. Using named variables makes the code easier to grasp. But more seriously, the perllocale example code does not work correctly with the Spanish locale, where the thousands separator is the period ('.') character, probably because that character has a special meaning in regular expressions. I guess some quoting is necessary. |
From @jkeenanOn Sat, 08 Sep 2018 11:00:00 GMT, rdiezmail-perl@yahoo.de wrote:
There are many locales which in some sense are "Spanish". For example, on Ubuntu: ##### Can you be more specific? Thank you very much. -- |
The RT System itself - Status changed from 'new' to 'open' |
From @jkeenanOn Sat, 08 Sep 2018 11:00:00 GMT, rdiezmail-perl@yahoo.de wrote:
OP and list: Please review patch attached. Thank you very much. |
From @jkeenan0001-Improve-example-using-thousands-separators.patchFrom f5412a7716568219caf56c8019bedb9498598ea1 Mon Sep 17 00:00:00 2001
From: James E Keenan <jkeenan@cpan.org>
Date: Sat, 8 Sep 2018 11:11:36 -0400
Subject: [PATCH] Improve example using thousands separators.
For: RT #133504
---
pod/perllocale.pod | 23 ++++++++++++++++-------
1 file changed, 16 insertions(+), 7 deletions(-)
diff --git a/pod/perllocale.pod b/pod/perllocale.pod
index 207aea0916..5ee6f0c30e 100644
--- a/pod/perllocale.pod
+++ b/pod/perllocale.pod
@@ -748,6 +748,11 @@ Here's a simple-minded example program that rewrites its command-line
parameters as integers correctly formatted in the current locale:
use POSIX qw(locale_h);
+ use locale qw(:numeric);
+
+ my $old_locale;
+ $old_locale = setlocale(LC_NUMERIC);
+ setlocale(LC_NUMERIC, "es_ES.utf8");
# Get some of locale's numeric formatting parameters
my ($thousands_sep, $grouping) =
@@ -767,20 +772,24 @@ parameters as integers correctly formatted in the current locale:
# right to left (low to high digits). In the
# below we cheat slightly by never using anything
# else than the first grouping (whatever that is).
+
+ my @grouping_list;
if ($grouping) {
- @grouping = unpack("C*", $grouping);
+ @grouping_list = unpack("C*", $grouping);
} else {
- @grouping = (3);
+ @grouping_list = (3);
}
# Format command line params for current locale
- for (@ARGV) {
- $_ = int; # Chop non-integer part
+ for my $input (@ARGV) {
+ my $i = int($input);
1 while
- s/(\d)(\d{$grouping[0]}($|$thousands_sep))/$1$thousands_sep$2/;
- print "$_";
+ $i =~ s<(\d)(\d{$grouping_list[0]}($|\Q$thousands_sep\E))>
+ <$1$thousands_sep$2>x;
+ print "$i\n";
}
- print "\n";
+
+ setlocale(LC_NUMERIC, $old_locale);
Note that if the platform doesn't have C<LC_NUMERIC> and/or
C<LC_MONETARY> available or enabled, the corresponding elements of the
--
2.17.1
|
From @SmylersJames E Keenan via RT writes:
I don't accept that $_ shouldn't be used “any more”, nor that using This example happens to be written using $_ as the loop variable, to (If it'd already been using $i, I don't think changing it to $_ would've
That's a good point. $grouping is a list of integers in a string, and If you're going to improve this, I think better options would be either • Instead of saving a hash slice of the return value of localeconv into my $number_prop = localconv(); then use $number_prop->{grouping} in place of $grouping. • Since only the first element of @grouping is ever used, don't bother my $group_size = Then use $group_size in place of $grouping[0]. Using $grouping and @grouping_list meets the ‘don't use the same name (Obviously I'm happy with making this work for Spanish.) Smylers |
From @GrinnzOn Mon, Sep 10, 2018 at 5:01 AM Smylers <smylers@stripey.com> wrote:
I disagree. In addition to named variables providing better opportunities -Dan |
From rdiezmail-perl@yahoo.deIn my opinion, this is the wrong way to fix this issue in the official Perl documentation. I would just implement the quasi-standard way with printf. The new code still encourages using special variable $_ , which I would not do anymore. Say you use your example code to write a subroutine. Is \Q...\E safe? What if the user manages to inject an \E from the outside? If you are going to quote for a regular expression, do it right with a separate variable and quotemeta(). The loop with a regular expression is hard to understand, for no gain. In fact, the implementation with a regular expression is very inefficient compared with a standard loop that indexes on a string. In fact, for an official FAQ, I would implement a full loop that uses the whole @grouping_list array, not just the first element. Perl always wanted to have a full locale implementation, didn't it? Actually, half of my comments above are repetitions of my first post. If you have a different opinion, please state it. I took my time writing all that, and I find it rude just being ignored. Best regards, |
From rdiezmail-perl@yahoo.de
declare -x LANG="es_ES.UTF-8" But you do not need any locale to test the bug, just set $thousands_sep to '.' before using the regular expression in the perllocale example code. The code should be robust, no matter what the locale specifics are. You do not want your example code to break if a future locale uses some character which has a special meaning in a regular expression. Best regards, |
From @GrinnzOn Tue, Sep 11, 2018 at 12:50 AM R. Diez via perl5-porters <
This is not possible. \Q and \E are interpolated directly in the string -Dan |
From @tonycozOn Sat, 08 Sep 2018 08:22:24 -0700, jkeenan wrote:
@@ -748,6 +748,11 @@ Here's a simple-minded example program that rewrites its command-line use POSIX qw(locale_h); You changed the locale, so it's no longer using what was the current locale. + As others have said, @grouping_list isn't great, maybe rename $grouping instead, eg. $packed_grouping. # Format command line params for current locale In this case I think $_ is just fine, since it's acting as a current "thing" to work on. If it were to be renamed, I'd rather the "named" version ($input) be used inside the loop rather than the "not really named" $i. Unrelated to the patch, the grouping code doesn't interpret the grouping list correctly, though I don't know if there's any locales anywhere that use the full power of the grouping list. From POSIX: The elements of grouping and mon_grouping are interpreted according to the following: {CHAR_MAX} Tony |
Migrated from rt.perl.org#133504 (status was 'open')
Searchable as RT133504$
The text was updated successfully, but these errors were encountered: