Discussion:
Using the GNU GMP Library for Bignums in Emacs
(too old to reply)
Siraphob (Ben) Phipathananunth
2018-04-21 14:15:08 UTC
Permalink
Emacs Calc was written many years ago, and as a result its current
implementation implements bignums purely in Emacs Lisp. Its bignum
operations also use a lot of hacks (such as performing carry
operations). Arbitrary precision arithmetic could be faster if Emacs
had GNU GMP linked to it, with the relevant Emacs Lisp functions added
in C.

What is the consensus on linking the GNU GMP library to Emacs so that
packages such as Emacs Calc (and others) could benefit from using
native types (i.e. "mpz_t") rather than reinventing the wheel?

The benefits of linking GNU GMP are clear. It would make it easier to
extend the math capabilities of Emacs, while making it faster and less
error-prone. However, the downsides, if any exist, should be discussed
as well, to gauge whether pursuing such a task would be fruitful.
Eli Zaretskii
2018-04-21 14:34:10 UTC
Permalink
Date: Sat, 21 Apr 2018 21:15:08 +0700
Emacs Calc was written many years ago, and as a result its current
implementation implements bignums purely in Emacs Lisp. Its bignum
operations also use a lot of hacks (such as performing carry
operations). Arbitrary precision arithmetic could be faster if Emacs
had GNU GMP linked to it, with the relevant Emacs Lisp functions added
in C.
What is the consensus on linking the GNU GMP library to Emacs so that
packages such as Emacs Calc (and others) could benefit from using
native types (i.e. "mpz_t") rather than reinventing the wheel?
I think the consensus is we want that, it's just a matter of someone
doing the job of making it happen.

The design should IMO be discussed up front, because we want not only
to be able to use bignums and arbitrary-precision floating-point
numbers in C, but also in Lisp. How exactly to expose them to Lisp is
something we should talk about, I think.

Thanks.
Siraphob (Ben) Phipathananunth
2018-04-21 15:01:07 UTC
Permalink
One approach would be to implement a minimal subset of functions
exposed in Emacs Lisp that would allow one to recreate the following
functions that are defined in calc.el :

Function Name Arguments
============================================
math-bignum (a)
math-bignum-big (a)
math-div10-bignum (a)
math-scale-left-bignum (a n)
math-scale-right-bignum (a n)
math-add-bignum (a b)
math-sub-bignum (a b)
math-mul-bignum (a b)
math-mul-bignum-digit (a d c)
math-div-bignum (a b)
math-div-bignum-digit (a b)
math-div-bignum-big (a b alen blen)
math-div-bignum-part (a b blen)
math-div-bignum-try (a b c guess)
math-format-bignum (a)
math-format-bignum-decimal (a)
math-read-bignum (s)

There are other functions scattered throughout the calc sources, but
this is the rough idea of the type of functions we would need. This
list would have to be shortened, as when we migrate bignums we would
no longer need the following functions (at least, could be more):

math-mul-bignum-digit (a d c)
math-div-bignum-digit (a b)

These functions exist because bignums are implemented as a list of
digits.

I think that keeping the function names the same would be good (for
some backwards compatibility). The /actual/ value of the bignum
(internally, in C) would be a tagged pointer to the mpz_t data type
stored somewhere so that it could be marked for GC (very important to
mark for GC).

With respect to floating points, though, things get a little hairy.

As quoted from the GMP Library
The exponent of each float has fixed precision, one machine word on
most systems. In the current implementation the exponent is a count of
limbs, so for example on a 32-bit system this means a range of roughly
2^-68719476768 to 2^68719476736, or on a 64-bit system this will be
much greater. Note however that mpf_get_str can only return an
exponent which fits an mp_exp_t and currently mpf_set_str doesn’t
accept exponents bigger than a long.
Fortunately, it links to another project called MPFR
http://www.mpfr.org/sample.html which allows us to computing floating
points at arbitrary precision.

However, I'm concerned that adding two different libraries could lead
to issues regarding interoperability (for example, a floating point
number that needs to be converted to a bignum, and vice versa). If
anyone has used MPFR + GMP in the past, please chime in.

GC shouldn't be ignored as well, because as the name implies, these
data objects would be very large.

Thanks,

Siraphob (Ben) Phipathananunth
Date: Sat, 21 Apr 2018 21:15:08 +0700
Emacs Calc was written many years ago, and as a result its current
implementation implements bignums purely in Emacs Lisp. Its bignum
operations also use a lot of hacks (such as performing carry
operations). Arbitrary precision arithmetic could be faster if Emacs
had GNU GMP linked to it, with the relevant Emacs Lisp functions added
in C.
What is the consensus on linking the GNU GMP library to Emacs so that
packages such as Emacs Calc (and others) could benefit from using
native types (i.e. "mpz_t") rather than reinventing the wheel?
I think the consensus is we want that, it's just a matter of someone
doing the job of making it happen.
The design should IMO be discussed up front, because we want not only
to be able to use bignums and arbitrary-precision floating-point
numbers in C, but also in Lisp. How exactly to expose them to Lisp is
something we should talk about, I think.
Thanks.
Paul Eggert
2018-04-21 15:23:33 UTC
Permalink
Post by Siraphob (Ben) Phipathananunth
One approach would be to implement a minimal subset of functions
exposed in Emacs Lisp that would allow one to recreate the following
Surely we should just add bignum support to existing functions +, -, etc.
Wouldn't that suffice for 'calc'? If not, why not? (Of course 'calc' would need
to be changed to exploit the bignums properly, no matter how we add bignums.)
Post by Siraphob (Ben) Phipathananunth
With respect to floating points, though, things get a little hairy.
This should be a separate task. Bignums alone are quite a large-enough project.
I'm not even sure we should do rationals.
Post by Siraphob (Ben) Phipathananunth
The /actual/ value of the bignum
(internally, in C) would be a tagged pointer to the mpz_t data type
stored somewhere so that it could be marked for GC
For bignums I would think that Emacs shouldn't use the mpz_t data type, as this
would complicate garbage collection. Emacs can use the mp_limb_t data type and
stick with the mpn_* functions.
Eli Zaretskii
2018-04-21 15:36:59 UTC
Permalink
Date: Sat, 21 Apr 2018 08:23:33 -0700
Post by Siraphob (Ben) Phipathananunth
One approach would be to implement a minimal subset of functions
exposed in Emacs Lisp that would allow one to recreate the following
Surely we should just add bignum support to existing functions +, -, etc.
Exactly my thoughts.
Post by Siraphob (Ben) Phipathananunth
With respect to floating points, though, things get a little hairy.
This should be a separate task. Bignums alone are quite a large-enough project.
I'm not even sure we should do rationals.
Right.
Siraphob (Ben) Phipathananunth
2018-04-21 15:40:05 UTC
Permalink
Post by Paul Eggert
Surely we should just add bignum support to existing functions +, -,
etc. Wouldn't that suffice for 'calc'? If not, why not? (Of course
'calc' would need to be changed to exploit the bignums properly, no
matter how we add bignums.)
Of course we could to do that. Hopefully there isn't existing
Emacs Lisp code that relies on unsafe arithmetic /anywhere/. If the
functions + - * / operate on bignums (instead of dedicated bignum
functions), would that mean we drop 32/64 bit integers entirely?
Post by Paul Eggert
The design should IMO be discussed up front, because we want not only
to be able to use bignums and arbitrary-precision floating-point
numbers in C, but also in Lisp.
I thought that Eli was talking about how we should interface bignums
to Emacs Lisp; the + - * / operators are defined in C source
code. Bignums would decrease performance in areas where the usual
32/64 bit integers are sufficient, and lead to higher memory usage. It
would make much more sense to have separate math functions for 32/64
bit numbers and for bignums. In doing so, it should be obvious to the
Emacs Lisp programmer when to use what.
Post by Paul Eggert
This should be a separate task. Bignums alone are quite a large-enough
project. I'm not even sure we should do rationals.
Rationals would be a part of Emacs Calc, once we have bignums it
should be trivial to reimplement rationals.
Eli Zaretskii
2018-04-21 15:54:00 UTC
Permalink
Date: Sat, 21 Apr 2018 22:40:05 +0700
I thought that Eli was talking about how we should interface bignums
to Emacs Lisp; the + - * / operators are defined in C source
code.
I never thought we could consider not using + - * etc. with bignums,
so I didn't even raise that issue.
Bignums would decrease performance in areas where the usual
32/64 bit integers are sufficient, and lead to higher memory usage. It
would make much more sense to have separate math functions for 32/64
bit numbers and for bignums. In doing so, it should be obvious to the
Emacs Lisp programmer when to use what.
When the arguments are bignums, these operators should automatically
invoke the relevant GMP functions. That is how we behave elsewhere in
Emacs; Emacs Lisp does support polymorphism on the level of primitive
functions. E.g., that's how these operators support both Lisp
integers and Lisp floats (and markers, for that matter).
Paul Eggert
2018-04-21 16:08:03 UTC
Permalink
Post by Siraphob (Ben) Phipathananunth
Hopefully there isn't existing
Emacs Lisp code that relies on unsafe arithmetic /anywhere/.
I'm afraid your hope will be in vain....

And it depends on what we mean by "unsafe". Is it safe to assume that (eq (1+ 0)
1) returns t, for example? The Scheme standard says "no" but we might decide
that 'eq' should "work" for fixnums in Emacs Lisp. That sort of thing.
Post by Siraphob (Ben) Phipathananunth
If the functions + - * / operate on bignums (instead of dedicated bignum
functions), would that mean we drop 32/64 bit integers entirely?
No, it'd mean we'd still have fixnums vs bignums internally, and most programs
wouldn't care whether an integer is represented via fixnum or bignum, as it'd be
an issue merely of efficiency. Some programs would probably still care though
(for efficiency reasons), and GNU Calc quite possibly would be one such program.
Post by Siraphob (Ben) Phipathananunth
It would make much more sense to have separate math functions for 32/64
bit numbers and for bignums. In doing so, it should be obvious to the
Emacs Lisp programmer when to use what.
That's not how other Lisps work, by and large. They have just one integer type
and one set of functions. The goal is to simplify the job of writing a program.
Tom Tromey
2018-04-26 03:17:39 UTC
Permalink
Paul> And it depends on what we mean by "unsafe". Is it safe to assume that
Paul> (eq (1+ 0) 1) returns t, for example? The Scheme standard says "no"
Paul> but we might decide that 'eq' should "work" for fixnums in Emacs
Paul> Lisp. That sort of thing.

I would like the "egal" idea to be at least considered: that is, for
immutable objects like bignums (and floats), have "eq" do a value
comparison, not an identity comparison. Ideally we could dispense with
eql entirely.

The upside is that this makes programs simpler to reason about -- to my
mind the eq/eql distinction is letting an implementation leak through
the abstraction.

The downside is that this is less efficient. (Currently my JIT emits
"eq" as a simple comparison, which is nice; but I think on the whole I'd
personally rather trade a bit of performance for the cleaner semantics.)

Tom
Stefan Monnier
2018-04-26 03:33:50 UTC
Permalink
Post by Tom Tromey
Paul> And it depends on what we mean by "unsafe". Is it safe to assume that
Paul> (eq (1+ 0) 1) returns t, for example? The Scheme standard says "no"
Paul> but we might decide that 'eq' should "work" for fixnums in Emacs
Paul> Lisp. That sort of thing.
I would like the "egal" idea to be at least considered: that is, for
immutable objects like bignums (and floats), have "eq" do a value
comparison, not an identity comparison. Ideally we could dispense with
eql entirely.
I think I agree, tho I'd describe it differently: the way `eq` is
defined in Common-Lisp, I believe it is valid to make it an alias for
`eql`.

I think Emacs should move in this direction: define `eq` to be the same
as `eql`, then check every use of EQ in the C code to see if it will
necessarily behave identically to "EQL" and where it doesn't, replace it
with EQL.

Obviously, this will cost us some code size and performance, so we'd
also want to try and measure the cost to make sure it's not too serious.


Stefan
Richard Stallman
2018-04-27 15:56:03 UTC
Permalink
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
Post by Stefan Monnier
I think Emacs should move in this direction: define `eq` to be the same
as `eql`, then check every use of EQ in the C code to see if it will
necessarily behave identically to "EQL" and where it doesn't, replace it
with EQL.
This is a lot of work and will slow down some loops.
I would rather try the obvious approach: eq treats bignums like floats.
--
Dr Richard Stallman
President, Free Software Foundation (https://gnu.org, https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)
Skype: No way! See https://stallman.org/skype.html.
Stefan Monnier
2018-04-27 16:08:15 UTC
Permalink
Post by Richard Stallman
Post by Stefan Monnier
I think Emacs should move in this direction: define `eq` to be the same
as `eql`, then check every use of EQ in the C code to see if it will
necessarily behave identically to "EQL" and where it doesn't, replace it
with EQL.
This is a lot of work and will slow down some loops.
It's hard to tell without measuring it first. And I'd expect that
many/most of those loops can be made fast anyway (e.g. member/memql can
delegate to `memq` when the argument is such that eq/eql/equal behave
identically).
Post by Richard Stallman
I would rather try the obvious approach: eq treats bignums like floats.
While I'm writing this within the GMP thread, it's orthogonal.


Stefan
Richard Stallman
2018-04-21 22:42:39 UTC
Permalink
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
Post by Siraphob (Ben) Phipathananunth
Of course we could to do that. Hopefully there isn't existing
Emacs Lisp code that relies on unsafe arithmetic /anywhere/. If the
functions + - * / operate on bignums (instead of dedicated bignum
functions), would that mean we drop 32/64 bit integers entirely?
To eliminate the current types for small integers would
require rewriting of much of the C code in Emacs.
It would be better to represent small integers as now,
and have a different structure for larger integers.
--
Dr Richard Stallman
President, Free Software Foundation (https://gnu.org, https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)
Skype: No way! See https://stallman.org/skype.html.
d***@dancol.org
2018-04-22 02:48:55 UTC
Permalink
Post by Richard Stallman
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
Post by Siraphob (Ben) Phipathananunth
Of course we could to do that. Hopefully there isn't existing
Emacs Lisp code that relies on unsafe arithmetic /anywhere/. If the
functions + - * / operate on bignums (instead of dedicated bignum
functions), would that mean we drop 32/64 bit integers entirely?
To eliminate the current types for small integers would
require rewriting of much of the C code in Emacs.
It would be better to represent small integers as now,
and have a different structure for larger integers.
I'd love to see Emacs get transparent bigint support. Python semantics are
fine, as is using a normal int representation at the C level. Adding
transparent bigints as Lisp types doesn't require us to increase various
Emacs core limits right away.
Philipp Stephani
2018-04-22 13:00:27 UTC
Permalink
Post by d***@dancol.org
Post by Richard Stallman
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
Post by Siraphob (Ben) Phipathananunth
Of course we could to do that. Hopefully there isn't existing
Emacs Lisp code that relies on unsafe arithmetic /anywhere/. If the
functions + - * / operate on bignums (instead of dedicated bignum
functions), would that mean we drop 32/64 bit integers entirely?
To eliminate the current types for small integers would
require rewriting of much of the C code in Emacs.
It would be better to represent small integers as now,
and have a different structure for larger integers.
I'd love to see Emacs get transparent bigint support. Python semantics are
fine, as is using a normal int representation at the C level. Adding
transparent bigints as Lisp types doesn't require us to increase various
Emacs core limits right away.
We need to be very careful with transparent bigint support though, as a
naive design will break many invariants. For example, integers are
currently documented to use modular arithmetic (
https://www.gnu.org/software/emacs/manual/html_node/elisp/Integer-Type.html
,
https://www.gnu.org/software/emacs/manual/html_node/elisp/Integer-Basics.html),
and they are documented to be pure value types, i.e. same-valued integers
are the same object (
https://www.gnu.org/software/emacs/manual/html_node/elisp/Equality-Predicates.html).
Especially the second property is widely used.
Paul Eggert
2018-04-22 17:43:15 UTC
Permalink
integers are currently documented to use modular arithmetic (
We'll need to change the documentation for modular arithmetic, since integers
will no longer overflow.
and they are documented to be pure value types, i.e. same-valued integers
are the same objects. Especially the second property is widely used.
This property is used only for fixnums now (since bignums do not exist yet), and
we can keep this property for fixnums. We could also keep it for bignums, by
hashing all bignums, though I doubt whether it's worth the expense.
Daniel Colascione
2018-04-22 18:04:38 UTC
Permalink
Post by Paul Eggert
integers are currently documented to use modular arithmetic (
We'll need to change the documentation for modular arithmetic, since
integers will no longer overflow.
Bugs will inevitably arise. One possibility is just dealing with them;
another might be to predicate bignum support on a lexical-binding-like flag.
Post by Paul Eggert
and they are documented to be pure value types, i.e. same-valued integers
are the same objects. Especially the second property is widely used.
This property is used only for fixnums now (since bignums do not exist
yet), and we can keep this property for fixnums. We could also keep it
for bignums, by hashing all bignums, though I doubt whether it's worth
the expense.
That's a good idea, IMHO. It preserves identity without complicating eq
itself.
Clément Pit-Claudel
2018-04-22 18:34:34 UTC
Permalink
integers are currently documented to use modular arithmetic (
We'll need to change the documentation for modular arithmetic, since integers will no longer overflow.
Bugs will inevitably arise. One possibility is just dealing with them; another might be to predicate bignum support on a lexical-binding-like flag.
Or we could make them different types: the usual operators on fixnums would return fixnums, and the usual operators on bignums would return bignums. Mixing both would always return a bignum.

The difficulty is whether we should change existing Emacs function to return bignums.

Clément.
Richard Stallman
2018-04-23 03:39:26 UTC
Permalink
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
Post by Philipp Stephani
https://www.gnu.org/software/emacs/manual/html_node/elisp/Integer-Basics.html),
and they are documented to be pure value types, i.e. same-valued integers
are the same object (
https://www.gnu.org/software/emacs/manual/html_node/elisp/Equality-Predicates.html).
Especially the second property is widely used.
This will continue to be true for all the integers that are currently
representable. If we document that, maybe we won't have a lot of
places to fix.
Post by Philipp Stephani
Or we could make them different types: the usual operators on
fixnums would return fixnums, and the usual operators on bignums
would return bignums. Mixing both would always return a bignum.
Having two data types for integers would be a nuisance for many Lisp
programs. It won't simplify anything, and it won't reduce the changes
needed in Lisp programs because of bignums.
--
Dr Richard Stallman
President, Free Software Foundation (https://gnu.org, https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)
Skype: No way! See https://stallman.org/skype.html.
Siraphob (Ben) Phipathananunth
2018-04-22 08:00:49 UTC
Permalink
Post by Richard Stallman
To eliminate the current types for small integers would
require rewriting of much of the C code in Emacs.
It would be better to represent small integers as now,
and have a different structure for larger integers.
From what I understand, we would want to use fixnums by default in the
C code, and convert to bignums automatically (in lisp) when the number
exceeds the range of a fixnum, while retaining behavior as before,
using the regular math operations + - * / (and more) to interface this
to Lisp.
Post by Richard Stallman
programs wouldn't care whether an integer is represented via fixnum or
bignum, as it'd be an issue merely of efficiency.
Would it not slow down computation to have to constantly convert
between the two types? (especially if the computation is switching
above/below the fixnum/bignum boundary). In such a case, a fix could
be to convert lisp numbers exceeding fixnum limits to bignums for the
rest of the number's life (until GC). This ensures memory usage is kept
low for fixnum computations.
Paul Eggert
2018-04-22 09:06:11 UTC
Permalink
Post by Siraphob (Ben) Phipathananunth
Would it not slow down computation to have to constantly convert
between the two types?
A bit perhaps, but it shouldn't be that big a deal. Most integer computation
will likely be fixnum only, and the only slowdown there will be integer overflow
checking that we currently aren't doing. With decent hardware and compiler, I'd
guess this would cost us three machine instructions per Lisp arithmetic
operation, including the conditional branch that is typically not taken. Hardly
anybody will notice.
Post by Siraphob (Ben) Phipathananunth
In such a case, a fix could
be to convert lisp numbers exceeding fixnum limits to bignums for the
rest of the number's life (until GC). This ensures memory usage is kept
low for fixnum computations.
Yes, the idea is to use bignums to represent numbers outside of fixnum range,
and to use fixnums to represent numbers inside fixnum range. Bignums require
garbage collection; fixnums do not.
Helmut Eller
2018-04-23 05:19:22 UTC
Permalink
Post by Paul Eggert
Most integer
computation will likely be fixnum only, and the only slowdown there
will be integer overflow checking that we currently aren't doing. With
decent hardware and compiler, I'd guess this would cost us three
machine instructions per Lisp arithmetic operation, including the
conditional branch that is typically not taken. Hardly anybody will
notice.
Out of curiousity: what's the quickest way to detect overflow on
multiplication in ANSI C? E.g.

int64_t x, y, result;
result = x * y;
if (<how to detect overflow?>)

Helmut
Andreas Schwab
2018-04-23 08:39:32 UTC
Permalink
Post by Helmut Eller
Out of curiousity: what's the quickest way to detect overflow on
multiplication in ANSI C? E.g.
int64_t x, y, result;
result = x * y;
if (<how to detect overflow?>)
There is no way to detect overflow after the fact because overflow
invokes undefined behaviour. You need to either check the range
beforehand, or use special builtins offered by the compiler
(eg. __builtin_smul_overflow in GCC).

Andreas.
--
Andreas Schwab, ***@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1
"And now for something completely different."
Paul Eggert
2018-04-23 14:36:39 UTC
Permalink
Post by Andreas Schwab
There is no way to detect overflow after the fact because overflow
invokes undefined behaviour. You need to either check the range
beforehand, or use special builtins offered by the compiler
(eg. __builtin_smul_overflow in GCC).
All true, and Emacs lib/intprops.h has an INT_MULTIPLY_OVERFLOW macro that
arranges for all that. On the x86-64 with GCC, it costs one additional
instruction (typically a conditional branch that is not taken) to check for
overflow in machine-word integer multiplication. Checking for fixnum overflow
(as opposed to machine-word overflow) requires one more conditional branch after
some quick bit-twiddling.
Helmut Eller
2018-04-23 19:22:26 UTC
Permalink
Post by Paul Eggert
Post by Andreas Schwab
There is no way to detect overflow after the fact because overflow
invokes undefined behaviour. You need to either check the range
beforehand, or use special builtins offered by the compiler
(eg. __builtin_smul_overflow in GCC).
All true, and Emacs lib/intprops.h has an INT_MULTIPLY_OVERFLOW macro
that arranges for all that. On the x86-64 with GCC, it costs one
additional instruction (typically a conditional branch that is not
taken) to check for overflow in machine-word integer
multiplication.
If I read the code in in data.c correctly, than Emacs uses the
INT_MULTIPLY_WRAPV macro which boils down to the __builtin_mul_overflow.
Which indeed produces nice machine code. But the original question was
about ANSI C, which seems to require a division and 3 conditional jumps
for the range check (with gcc 6).
Post by Paul Eggert
Checking for fixnum overflow (as opposed to machine-word overflow)
requires one more conditional branch after some quick bit-twiddling.
Does INT_MULTIPLY_WRAPV macro even perform fixnum overflow tests?

Anyway, I find it curios that the following two expression yield
different values:

(* (* most-positive-fixnum 2) 1.0) => -2.0

(* most-positive-fixnum 2 1.0) => 4.611686018427388e+18

Helmut
Paul Eggert
2018-04-23 20:26:22 UTC
Permalink
Post by Helmut Eller
If I read the code in in data.c correctly, than Emacs uses the
INT_MULTIPLY_WRAPV macro which boils down to the __builtin_mul_overflow.
Which indeed produces nice machine code. But the original question was
about ANSI C, which seems to require a division and 3 conditional jumps
for the range check (with gcc 6).
Yes, in general INT_MULTIPLY_WRAPV does require a division on platforms
that do not support overflow-checking builtins. In practice, though,
this tends to not be an issue, as the major compilers are moving in the
direction of supporting such builtins.
Post by Helmut Eller
Does INT_MULTIPLY_WRAPV macro even perform fixnum overflow tests?
No, it checks only for machine-level overflow. We could come up with a
variant that does both machine-level and fixnum-level checking
simultaneously; I'm not sure it's worth the trouble, though, as most
arithmetic operations are addition and subtraction and for these,
fixnum-level checking suffixes.
Post by Helmut Eller
Anyway, I find it curios that the following two expression yield
(* (* most-positive-fixnum 2) 1.0) => -2.0
(* most-positive-fixnum 2 1.0) => 4.611686018427388e+18
Cool, huh? That's done on purpose; see this Emacs commit:

https://git.savannah.gnu.org/cgit/emacs.git/commit/?id=0ae6bdee0085b05028108325b0a4ce979eadb24e

The idea is to produce a floating-point answer that is more
mathematically-correct, while sticking to Common Lisp-style rules for
contagion and argument processing.
Richard Stallman
2018-04-23 03:36:31 UTC
Permalink
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
Post by Siraphob (Ben) Phipathananunth
From what I understand, we would want to use fixnums by default in the
C code, and convert to bignums automatically (in lisp) when the number
exceeds the range of a fixnum, while retaining behavior as before,
using the regular math operations + - * / (and more) to interface this
to Lisp.
Yes, exactly.
Post by Siraphob (Ben) Phipathananunth
Would it not slow down computation to have to constantly convert
between the two types? (especially if the computation is switching
above/below the fixnum/bignum boundary). In such a case, a fix could
be to convert lisp numbers exceeding fixnum limits to bignums for the
rest of the number's life (until GC). This ensures memory usage is kept
low for fixnum computations.
I am not sure that makes sense. Emacs works with numbers, and
computes (or otherwise generates) new numbers, but it never changes an
existing number.

Numbers that are in the range of non-bignum integers MUST be
represented as non-bignum integers so that the existing C code can
work on them without change. Some places ought to be able to handle
larger numbers, and we will need to change them, However, many places
could continue to handle only small numbers.
--
Dr Richard Stallman
President, Free Software Foundation (https://gnu.org, https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)
Skype: No way! See https://stallman.org/skype.html.
Helmut Eller
2018-04-22 12:43:56 UTC
Permalink
Post by Paul Eggert
Surely we should just add bignum support to existing functions +, -, etc.
I'm not so sure of that. E.g. (format "%x" -1) => "3fffffffffffffff".
That doesn't make any sense on if -1 is a bignum.

Helmut
Paul Eggert
2018-04-22 17:47:10 UTC
Permalink
Post by Helmut Eller
(format "%x" -1) => "3fffffffffffffff".
Any program that assumes that behavior is already unportable, since the
expression returns "3fffffff" on 32-bit platforms.

The usual approach for this is to format negative numbers with a leading minus
sign, so that (format "%x" -1) returns "-1". This is what Emac Lisp should have
been doing anyway as it is more intuitive and more portable. Obviously there
will be compatibility concerns here, and we'll need to address them.
Richard Stallman
2018-04-23 03:39:28 UTC
Permalink
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
Post by Paul Eggert
The usual approach for this is to format negative numbers with a leading minus
sign, so that (format "%x" -1) returns "-1". This is what Emac Lisp should have
I think that change is likely to break things. If you WANT a hex
string, you probably want it to be all hex, regardless of value.

I would expect that each use of %x probably has a certain number of bits
in mind. Perhaps %x should take a number of bits and output modularly.
So %24x would output 24 bits' worth. Always unsigned.
--
Dr Richard Stallman
President, Free Software Foundation (https://gnu.org, https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)
Skype: No way! See https://stallman.org/skype.html.
Paul Eggert
2018-04-23 04:41:02 UTC
Permalink
Post by Richard Stallman
Post by Paul Eggert
The usual approach for this is to format negative numbers with a leading minus
sign, so that (format "%x" -1) returns "-1". This is what Emac Lisp should have
I think that change is likely to break things. If you WANT a hex
string, you probably want it to be all hex, regardless of value.
No doubt people occasionally want C-like formatting (i.e., print a negative
number modulo 2**W where W is the word width), but once we have bignums the
notion of the "word width" becomes dubious, and in practice it's cleaner and
more useful to print hexadecimal integers in the usual mathematical way. That's
the tradition in Common Lisp, in Python, and in every other language I know that
has bignums and has the %x format or something similar.
Post by Richard Stallman
I would expect that each use of %x probably has a certain number of bits
in mind. Perhaps %x should take a number of bits and output modularly.
So %24x would output 24 bits' worth.
We could invent a special syntax along those lines, though it would need to
differ from "%24x" which already means pad with spaces to 24 characters.
However, there's little point to a special syntax since one can easily achieve
the same effect without it. For example, to format the low-order 24 bits of N,
one can already use (format "%x" (logand N #xffffff)) and this will continue to
work even if bignums are introduced.
Richard Stallman
2018-04-24 02:54:23 UTC
Permalink
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
Post by Paul Eggert
We could invent a special syntax along those lines, though it would need to
differ from "%24x" which already means pad with spaces to 24 characters.
However, there's little point to a special syntax since one can easily achieve
the same effect without it. For example, to format the low-order 24 bits of N,
one can already use (format "%x" (logand N #xffffff)) and this will continue to
The reason to invent a special syntax is to get the commonly desired
result in a simpler way. If I am right that this is what people
want, why make it hard?
--
Dr Richard Stallman
President, Free Software Foundation (https://gnu.org, https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)
Skype: No way! See https://stallman.org/skype.html.
Richard Stallman
2018-04-24 02:54:23 UTC
Permalink
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
Post by Paul Eggert
No doubt people occasionally want C-like formatting (i.e., print a
negative number modulo 2**W where W is the word width), but once
we have bignums the notion of the "word width" becomes dubious,
In a theoretical sense, yes -- but I doubt that that matters in
practice. In practice, I think, the cases where one wants hexadecimal
are cases where there is a word width.
Post by Paul Eggert
and in practice it's cleaner and more useful to print hexadecimal
integers in the usual mathematical way.
Would you tell me about some of the cases where you wanted hexadecimal
output and you wanted negative numbers to use a minus sign?

I ask because I am skeptical that such cases exist.
--
Dr Richard Stallman
President, Free Software Foundation (https://gnu.org, https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)
Skype: No way! See https://stallman.org/skype.html.
Paul Eggert
2018-04-24 04:35:31 UTC
Permalink
Post by Richard Stallman
In practice, I think, the cases where one wants hexadecimal
are cases where there is a word width.
Although that is true in traditional usage where word width is assumed, it's not
true for most applications using bignums, the original context of this thread.
In such applications, it's routine to have hexadecimal numbers that are wider
than 32- or 64-bit words. And these numbers (when printed without a sign) are
routinely treated as nonnegative, not as negative. For an example of this sort
of application, see the OpenSSL C code that deals with bignums: e.g., BN_bn2hex
generates a leading "-" when converting a negative number to a hexadecimal string.

Even in the standard C library (which lacks bignums), the %x printf format is
supposed to be used only with unsigned integers. Nowadays GCC even optionally
warns about using %x on signed integers.
Post by Richard Stallman
The reason to invent a special syntax is to get the commonly desired
result in a simpler way.
If it turns out to be useful to have a shorthand printf format for formatting
the least N bits of an argument, we can add it as needed. I have my doubts,
though, as other bignum formatters seem to get along fine without such a feature.
Helmut Eller
2018-04-24 05:45:30 UTC
Permalink
Post by Paul Eggert
Even in the standard C library (which lacks bignums), the %x printf
format is supposed to be used only with unsigned integers. Nowadays
GCC even optionally warns about using %x on signed integers.
Maybe %x should then print the (shortest) two's complement
representation for bignums. In Common Lisp:

(defun 2comp (x)
(format nil "~x" (logand x (1- (ash 1 (+ 1 (integer-length x)))))))

(2comp -1) => "1"
(2comp -2) => "2"
(2comp -3) => "5"
(2comp -15) => "11"
(2comp -16) => "10"

In practice, only integers smaller than most-negative-fixnum would use
this rule.

Helmut
Richard Stallman
2018-04-25 01:05:55 UTC
Permalink
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
Post by Paul Eggert
In such applications, it's routine to have hexadecimal numbers that are wider
than 32- or 64-bit words.
What is the motive for using hexadecimal in a case like that?
Post by Paul Eggert
Even in the standard C library (which lacks bignums), the %x printf format is
supposed to be used only with unsigned integers.
I think that's a misleading statement of what it does in C. The rule
is to use it with an unsigned _type_. The type, not the value, is
supposed to be unsigned.

In practice, what this means is that the hex output is done _treating
the number as unsigned_. So -1, which is 0xffffffff as a 32-bit int,
will print as ffffffff, not as -1.

I contend that we want the same behavior in Emacs Lisp, too.
Post by Paul Eggert
If it turns out to be useful to have a shorthand printf format for formatting
the least N bits of an argument, we can add it as needed. I have my doubts,
though, as other bignum formatters seem to get along fine without such a feature.
What conclusions we can draw from them, about Emacs Lisp, depends on
the number of bits in a non-bignum integer for each of them. In
Emacs, 0xffffffff will be a bignum. Is it a bignum in those other
systems? If they can treat 32-bit ints as short, many programs that
want to use hex will never operate on a bignum.
--
Dr Richard Stallman
President, Free Software Foundation (https://gnu.org, https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)
Skype: No way! See https://stallman.org/skype.html.
Paul Eggert
2018-04-25 01:19:29 UTC
Permalink
Post by Richard Stallman
Post by Paul Eggert
In such applications, it's routine to have hexadecimal numbers that are wider
than 32- or 64-bit words.
What is the motive for using hexadecimal in a case like that?
It varies. One motivation is a desire to communicate a number via text
more-efficiently than base 10 would provide.
Post by Richard Stallman
Post by Paul Eggert
Even in the standard C library (which lacks bignums), the %x printf format is
supposed to be used only with unsigned integers.
I think that's a misleading statement of what it does in C. The rule
is to use it with an unsigned _type_. The type, not the value, is
supposed to be unsigned.
In C, if the type is unsigned then the corresponding value is
nonnegative. That is, in C there is no such thing as a negative value
with an unsigned type. The %x format is supposed to be used only with
unsigned types, i.e., only with nonnegative values.
Post by Richard Stallman
I contend that we want the same behavior in Emacs Lisp, too.
I certainly wouldn't want the behavior you suggest. Among other things,
Emacs Lisp does not have unsigned types, and I'd rather not introduce
such a concept into the language as it'd be needless complexity and an
unnecessary divergence from other Lisps.
Post by Richard Stallman
In Emacs, 0xffffffff will be a bignum.
Whether 0xffffffff is a bignum will depend on the platform. On my 64-bit
Emacs, 0xffffffff already is supported as a fixnum, and that wouldn't
change if bignums were introduced to Emacs.
Post by Richard Stallman
Is it a bignum in those other systems?
It depends on the system, I expect (just as it would in Emacs).
Richard Stallman
2018-04-25 22:40:38 UTC
Permalink
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
Post by Paul Eggert
In C, if the type is unsigned then the corresponding value is
nonnegative. That is, in C there is no such thing as a negative value
with an unsigned type.
I know that (of course). It looks like we are miscommunicating.
Post by Paul Eggert
I certainly wouldn't want the behavior you suggest. Among other things,
Emacs Lisp does not have unsigned types, and I'd rather not introduce
such a concept into the language
I think you have misunderstood what I suggest. I am talking about changing
the behavior of %x, nothing else.

I responded to a statement about what happens in C. Did you misread that
as a proposal to change Emacs?
Post by Paul Eggert
Post by Richard Stallman
In Emacs, 0xffffffff will be a bignum.
Whether 0xffffffff is a bignum will depend on the platform. On my 64-bit
Emacs, 0xffffffff already is supported as a fixnum, and that wouldn't
change if bignums were introduced to Emacs.
You're right about 64-bit systems. I often forget they exist, since
I never use them.

My argument is valid nonetheless, because that quantity is a bignum on
_some_ systems.

If we were to support _only_ 64-but systems, this argument would cease
to be valid.
--
Dr Richard Stallman
President, Free Software Foundation (https://gnu.org, https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)
Skype: No way! See https://stallman.org/skype.html.
Paul Eggert
2018-04-25 23:29:56 UTC
Permalink
If we were to support_only_ 64-bit systems, this argument would cease
to be valid.
That is easy to arrange; just configure --with-wide-int. This causes
Emacs to use 64-bit words on all platforms (or wider, at least in
theory). If we made --with-wide-int the default (which I'm more and more
thinking would be a good thing, for this and other reasons), then
#xffffffff would work and would format the same on all platforms by
default, and this would happen regardless of whether we also add bignums
to Emacs.

However, I don't see why assuming 64 bits would mean that the argument
would cease to be valid. Even with that assumption, we would continue
have the same problem with numbers like #xffffffffffffffff that exceed
Emacs fixnum range when words are 64 bits.
I responded to a statement about what happens in C. Did you misread that
as a proposal to change Emacs?
Yes, because after your response (which talked about what happens in C),
you wrote "I contend that we want the same behavior in Emacs Lisp, too."
Sorry if I misunderstood you.

I think part of the problem here, is that once we have bignums then
there will no problem representing unsigned values that are too large to
fit into (say) 30-bit signed integers, so there will be no problem in
having %x treat integers like Common Lisp and Python do: programs that
want to do I/O of numbers as hexadecimal strings will be able to read
and print them with %x and they'll all work just fine even if the
numbers contain 8 or 16 or more hexadecimal digits. The problem arises
in Emacs Lisp only because it is limited to fixnums now.
Richard Stallman
2018-04-30 03:07:04 UTC
Permalink
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
Post by Paul Eggert
If we were to support_only_ 64-bit systems, this argument would cease
to be valid.
That is easy to arrange; just configure --with-wide-int. This causes
Emacs to use 64-bit words on all platforms
That may be correct, but it doesn't respond to the point at hand.
The point at hand is that Emacs does support 32-bit Lisp objecs in
some cases. and that a 32-bit number _in 32-bit builds_ will not
be a fixnum.

If you're suggesting that I personally change to a 64-bit build, that
doesn't respond to the issue at hand. It's not about what happens for
me in particular.

If you're suggesting to eliminate support for 32-bit builds,
I agree that that would eliminate the issue. But I think that change
would have substantial disadvantages.
Post by Paul Eggert
However, I don't see why assuming 64 bits would mean that the argument
would cease to be valid. Even with that assumption, we would continue
have the same problem with numbers like #xffffffffffffffff that exceed
Emacs fixnum range when words are 64 bits.
I expect that %x is used only for values meant to interact
with the operating system in certain ways, and that _most of_ those
values will never be more than 32 bits.

I could be mistaken in this, but it is a factual question.
Where else do people use %x?
--
Dr Richard Stallman
President, Free Software Foundation (https://gnu.org, https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)
Skype: No way! See https://stallman.org/skype.html.
Michael Welsh Duggan
2018-04-30 05:00:59 UTC
Permalink
Post by Richard Stallman
Post by Paul Eggert
However, I don't see why assuming 64 bits would mean that the argument
would cease to be valid. Even with that assumption, we would continue
have the same problem with numbers like #xffffffffffffffff that exceed
Emacs fixnum range when words are 64 bits.
I expect that %x is used only for values meant to interact
with the operating system in certain ways, and that _most of_ those
values will never be more than 32 bits.
I could be mistaken in this, but it is a factual question.
Where else do people use %x?
Well, although I admit I most often use hexadecimal numbers to represent
machine addresses (including on 64-bit addresses, of course), I also use
them to represent memory offsets. I only bring this up because this is
a case when a negative hexadecimal number (-#x1000 or -0x1000) is a more
useful representation than two's complement.

I admit that this may be considered a niche use, however, and I am
seldom working with offsets of sizes greater than 32 bits in size. I
will further admit that I have never needed this in Emacs, but have run
into this problem with writing C and C++ code, since I have been doing a
lot of work lately on static binary analysis.
--
Michael Welsh Duggan
(***@md5i.com)
Stefan Monnier
2018-04-30 12:34:42 UTC
Permalink
Post by Michael Welsh Duggan
a case when a negative hexadecimal number (-#x1000 or -0x1000) is
^^^^^^^
#x-1000

-- Stefan
Richard Stallman
2018-05-01 03:01:19 UTC
Permalink
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
Post by Michael Welsh Duggan
Well, although I admit I most often use hexadecimal numbers to represent
machine addresses (including on 64-bit addresses, of course), I also use
them to represent memory offsets.
I think that implies we need a way to specify for %x to use the proper
width for an address on the machine in use.
Post by Michael Welsh Duggan
I only bring this up because this is
a case when a negative hexadecimal number (-#x1000 or -0x1000) is a more
useful representation than two's complement.
If it's useful in Emacs, we should support that. But we may as well take
stock of the need before deciding whether to implement this.
--
Dr Richard Stallman
President, Free Software Foundation (https://gnu.org, https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)
Skype: No way! See https://stallman.org/skype.html.
Paul Eggert
2018-04-30 07:04:12 UTC
Permalink
Post by Richard Stallman
I expect that %x is used only for values meant to interact
with the operating system in certain ways
Hexadecimal is used for many other purposes. For example, the shell command 'git
log' at the top level of Emacs currently outputs a first line containing the
hexadecimal string bca6c4348077c8c0b368503b16378867b6d49659 which represents an
integer containing 160 bits, the integer width of the SHA-1 checksums used by
Git. Although Emacs cannot now process such a number directly, with bignums it
will be able to and %x is the natural way to format such numbers.
Richard Stallman
2018-05-01 03:01:26 UTC
Permalink
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
Post by Paul Eggert
log' at the top level of Emacs currently outputs a first line containing the
hexadecimal string bca6c4348077c8c0b368503b16378867b6d49659 which represents an
integer containing 160 bits, the integer width of the SHA-1 checksums used by
Git. Although Emacs cannot now process such a number directly, with bignums it
will be able to and %x is the natural way to format such numbers.
If the first 20 bits were zero, would you want the output to start
with 00000, or would you want it to be shorter?
--
Dr Richard Stallman
President, Free Software Foundation (https://gnu.org, https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)
Skype: No way! See https://stallman.org/skype.html.
Paul Eggert
2018-05-01 21:45:38 UTC
Permalink
Post by Richard Stallman
If the first 20 bits were zero, would you want the output to start
with 00000, or would you want it to be shorter?
For Git checksums I'd want leading zeros without other decorations like
#x, since that's how Git itself prints those numbers and I'd prefer to
be compatible.
Richard Stallman
2018-05-03 03:34:10 UTC
Permalink
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
Post by Paul Eggert
For Git checksums I'd want leading zeros without other decorations like
#x, since that's how Git itself prints those numbers and I'd prefer to
be compatible.
That means the feature would need to specify the number of hex digits
desired.

Is that the case for all use of hex output?
Is there any counterexample?
--
Dr Richard Stallman
President, Free Software Foundation (https://gnu.org, https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)
Skype: No way! See https://stallman.org/skype.html.
Paul Eggert
2018-05-03 05:53:11 UTC
Permalink
Post by Richard Stallman
That means the feature would need to specify the number of hex digits
desired.
We already have such a feature. For example, (format "%09x" 257) returns
"000000101", which is 257 hexadecimal, expressed using leading zeros as needed
so that at least 9 digits are used. Once Emacs supports bignums, the same
feature can print 40-hex-digit SHA-1 checksums by using (format "%040x" n).
Helmut Eller
2018-05-03 06:26:05 UTC
Permalink
Post by Paul Eggert
Post by Richard Stallman
That means the feature would need to specify the number of hex digits
desired.
We already have such a feature. For example, (format "%09x" 257) returns
"000000101", which is 257 hexadecimal, expressed using leading zeros
as needed so that at least 9 digits are used. Once Emacs supports
bignums, the same feature can print 40-hex-digit SHA-1 checksums by
using (format "%040x" n).
I think the bitwise operations and the resulting negative numbers are
the problematic part:

E.g.

(format "%016x" (lognot 257)) => "3ffffffffffffefe"

but the 16-hex-digit string should be "fffffffffffffefe".

Helmut
Eli Zaretskii
2018-05-03 17:49:38 UTC
Permalink
Date: Thu, 03 May 2018 08:26:05 +0200
I think the bitwise operations and the resulting negative numbers are
E.g.
(format "%016x" (lognot 257)) => "3ffffffffffffefe"
but the 16-hex-digit string should be "fffffffffffffefe".
But a 16-hex-digit fffffffffffffefe is not a fixnum, right?
Paul Eggert
2018-05-03 18:26:31 UTC
Permalink
Post by Eli Zaretskii
Post by Helmut Eller
(format "%016x" (lognot 257)) => "3ffffffffffffefe"
but the 16-hex-digit string should be "fffffffffffffefe".
But a 16-hex-digit fffffffffffffefe is not a fixnum, right?
Exactly right. If we have bignums, this will be straightforward; it'll
work like Common Lisp, Python, etc. and it'll be obvious what to do. The
problems mentioned in this thread mainly come from the fact that Emacs
does not have bignums now and that some code may depend on fixnum
overflow wrapping around.

Here's another way to think about it. (format "%016x" (lognot 257)) is
already nonportable, as it yields "000000003ffffefe" on 32-bit platforms
and "3ffffffffffffefe" on 64-bit platforms. Once we add bignums, the
same expression should yield "-000000000000102" on all platforms. Code
intended to be portable between old and new regimes already needs to
worry about nonportability in this area, and it shouldn't be hard to
adjust (if it needs adjusting at all). Code intended to run only under
the new regime will be able to safely assume the expression yields
"-000000000000102" on all platforms, regardless of whether the
underlying platform is 32- or 64- or some other number of bits, and in
the long run this will reduce the number of portability issues in Elisp
code.
Richard Stallman
2018-05-04 04:26:15 UTC
Permalink
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

I don't think people want (format "%x" (lognot 257)) to output "-102".
Not on any platform. It isn't what code expects, and it isn't useful.

Perhaps, with bignums, lognot should take a width operand, measured in bits.
So (lognot 257 32) would produce 0xfffffefe (a positive number)
and (lognot 257 64) would produce 0xfffffffffffffefe (a positive number).

Another idea: %Nx with a negative number should output its 2's
complement representation in 4N bits.
--
Dr Richard Stallman
President, Free Software Foundation (https://gnu.org, https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)
Skype: No way! See https://stallman.org/skype.html.
Ken Raeburn
2018-05-05 05:03:30 UTC
Permalink
Post by Richard Stallman
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
I don't think people want (format "%x" (lognot 257)) to output "-102".
Not on any platform. It isn't what code expects, and it isn't useful.
Perhaps, with bignums, lognot should take a width operand, measured in bits.
So (lognot 257 32) would produce 0xfffffefe (a positive number)
and (lognot 257 64) would produce 0xfffffffffffffefe (a positive number).
Another idea: %Nx with a negative number should output its 2's
complement representation in 4N bits.
That's already taken: A leading “-“ means to left-justify the output if the specified width is wider than required.

ELISP> (format "%-16x" 300)
"12c "
ELISP> (format "%16x" 300)
" 12c"
ELISP>
Richard Stallman
2018-05-06 03:12:27 UTC
Permalink
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
Post by Ken Raeburn
Post by Richard Stallman
Another idea: %Nx with a negative number should output its 2's
complement representation in 4N bits.
That's already taken: A leading “-“ means to left-justify the output if the specified width is wider than required.
We are miscommunicating. I'm talking about what %Nx should do when
outputting an argument that is negative, as in (format "%16x" -63).

Sorry that wasn't clear.

You could try p/x -63 in GDB to see what I mean.
--
Dr Richard Stallman
President, Free Software Foundation (https://gnu.org, https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)
Skype: No way! See https://stallman.org/skype.html.
Ken Raeburn
2018-05-07 17:24:58 UTC
Permalink
Post by Richard Stallman
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
Post by Ken Raeburn
Post by Richard Stallman
Another idea: %Nx with a negative number should output its 2's
complement representation in 4N bits.
That's already taken: A leading “-“ means to left-justify the output if the specified width is wider than required.
We are miscommunicating. I'm talking about what %Nx should do when
outputting an argument that is negative, as in (format "%16x" -63).
Sorry that wasn't clear.
You could try p/x -63 in GDB to see what I mean.
Oh, I see. Yes, I misunderstood. Apologies for my confusion.

So, in this case, would it truncate the output or use a wider field, if the value couldn’t “properly” be shown in the specified size?
E.g., -5000 would display in hex as …ffec78; with a format “%2x” should it return “78” or should it get enough digits to show the minimum number of bits needed to express the real value, much like formatting with “%2d” would still give you “-5000” instead of chopping off relevant digits?

Maybe we can supply both bit count and minimum field width in a format string… “%32!8x” to display the lowest 32 bits in 8 columns but with leading spaces rather than zeros? Or swap the positions of the numbers. And maybe one could be optional; “%32!x” could mean either “%32!8x” (eight columns, leading spaces) or “%32!1x” (minimum one column, i.e., no leading spaces or zeros added).

It would be logical to consider whether the same format extensions would be useful with %o or %d (or others?) being used to format bignums.

Ken
Andreas Schwab
2018-05-07 18:40:58 UTC
Permalink
Post by Richard Stallman
Another idea: %Nx with a negative number should output its 2's
complement representation in 4N bits.
That's what the precision is for.

Andreas.
--
Andreas Schwab, ***@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1
"And now for something completely different."
Helmut Eller
2018-05-03 18:51:43 UTC
Permalink
Post by Eli Zaretskii
Post by Helmut Eller
(format "%016x" (lognot 257)) => "3ffffffffffffefe"
but the 16-hex-digit string should be "fffffffffffffefe".
But a 16-hex-digit fffffffffffffefe is not a fixnum, right?
Well, (lognot 257) is fixnum. And if we explicity specify the length,
then the 16-hex-digit string "fffffffffffffefe" represents the same
fixnum as the 4-hex-digit string "fefe".

Helmut
Helmut Eller
2018-05-03 20:30:43 UTC
Permalink
On Thu, May 03 2018, Eli Zaretskii wrote:

[...]
Maybe we are miscommunicating. fffffffffffffefe cannot be a fixnum
because it invades the bits reserved for the type tag. Emacs fixnums
are limited to smaller values, as I'm sure you know.
If we interpret "fffffffffffffefe" as the 64 bit two's complement
representation of an integer then it corresponds to the integer -258.
If we interpret "fefe" as the 16 bit two's complement representation of
an integer then we also get the integer -258. And -258 is a fixnum.
Still convinced that "fffffffffffffefe cannot be a fixnum"?.

Helmut
Paul Eggert
2018-05-03 21:48:56 UTC
Permalink
Post by Helmut Eller
Still convinced that "fffffffffffffefe cannot be a fixnum"?.
No, Eli's right. Emacs has never interpreted fffffffffffffefe as a
fixnum on any practical platform, as far as I know. In Elisp code on
practical platforms, #xfffffffffffffefe is a fixnum neither in master
(where it is an overflow error) nor in Emacs 25 (where it is typically
converted silently to 1.8446744073709552e+19, with some loss of
information). If Emacs gets bignums, #xfffffffffffffefe will be lossless
and will have the same meaning as 18446744073709551358, which is the
standard interpretation of this hexadecimal number in C and in most
other languages.
Richard Stallman
2018-05-04 04:22:51 UTC
Permalink
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
Post by Paul Eggert
We already have such a feature. For example, (format "%09x" 257) returns
"000000101", which is 257 hexadecimal, expressed using leading zeros as needed
so that at least 9 digits are used. Once Emacs supports bignums, the same
feature can print 40-hex-digit SHA-1 checksums by using (format "%040x" n).
Perhaps this way everything would work with all-positive hex numbers.

However, I think we also want a feature to signal an error if the
number doesn't fall in the range that the %x construct can represent.
If the Git hash code is 40 hex digits, any attempt to use as a Git hash code
a number that won't fit in 40 hex digits implies that there has already been
an error.

I suggest %!040x for this. The ! would mean, "Signal an error if the number
doesn't fit in the specified number of digits." It could be implemented
for all %-specs, but if that is a pain in the neck, it could be implemented
only for %x.
--
Dr Richard Stallman
President, Free Software Foundation (https://gnu.org, https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)
Skype: No way! See https://stallman.org/skype.html.
Stefan Monnier
2018-04-23 03:03:18 UTC
Permalink
Post by Helmut Eller
Post by Paul Eggert
Surely we should just add bignum support to existing functions +, -, etc.
I'm not so sure of that. E.g. (format "%x" -1) => "3fffffffffffffff".
That doesn't make any sense on if -1 is a bignum.
Note that (format "%x" -1) does not necessarily use functions
in "+, -, etc..."

There are several steps to adding GMP support. Some of those steps
might involve non-trivial decisions. But the first few steps should be
straightforward enough:
A- add a new "bignum" type
B- add new operations on them (this might be enough to start using them in Calc)
C- add support for bignums to some of the pre-existing functions (e.g. +)
D- add support for bignums to more of the pre-existing functions
E- add support for bignums "everywhere" where numbers are usually allowed.

Anything before D/E should be fairly straightforward.
I consider "support for read&print" to fall somewhere in D or maybe E.


Stefan
Markus Triska
2018-04-21 16:46:07 UTC
Permalink
Post by Siraphob (Ben) Phipathananunth
The benefits of linking GNU GMP are clear. It would make it easier to
extend the math capabilities of Emacs, while making it faster and less
error-prone. However, the downsides, if any exist, should be discussed
as well, to gauge whether pursuing such a task would be fruitful.
Using GMP has a significant downside if you run out of memory: There’s
currently no defined way for the allocation functions to recover from an
error such as out of memory, they must *terminate* program execution.

Please see the following page for details:

https://gmplib.org/manual/Custom-Allocation.html

Thus, situtations where you can currently throw an Emacs error that can
be handled in user code would instead terminate the Emacs process. This
can for example arise from malicious input that involves very large
integers as results (7^7^7^7 etc.), and of course also elsewhere.

Also, it may become hard to stop long GMP calculations, whereas you can
easily stop computations that are written in Elisp. Thus, long GMP
calculations may lead to denial of editing attacks.

All the best,
Markus
Eli Zaretskii
2018-04-21 17:09:55 UTC
Permalink
Date: Sat, 21 Apr 2018 18:46:07 +0200
Using GMP has a significant downside if you run out of memory: There’s
currently no defined way for the allocation functions to recover from an
error such as out of memory, they must *terminate* program execution.
https://gmplib.org/manual/Custom-Allocation.html
Thus, situtations where you can currently throw an Emacs error that can
be handled in user code would instead terminate the Emacs process. This
can for example arise from malicious input that involves very large
integers as results (7^7^7^7 etc.), and of course also elsewhere.
Also, it may become hard to stop long GMP calculations, whereas you can
easily stop computations that are written in Elisp. Thus, long GMP
calculations may lead to denial of editing attacks.
How are those dangers different from using any other external
library. Like the JSON library, for excample, or libxml2?
Markus Triska
2018-04-21 17:27:13 UTC
Permalink
Post by Eli Zaretskii
How are those dangers different from using any other external
library. Like the JSON library, for excample, or libxml2?
Please note that in particular the first issue I mentioned is quite
specific to GMP: It could be solved by providing a different API, or by
generalizing the existing API to let applications safely handle such
situations. Some of the libraries you mention may support this.

The second issue I mentioned is also quite specific to GMP, since short
arithmetic expressions may cause long running times for calculations. Of
course, this issue may in principle also arise in other libraries.

All the best,
Markus
Paul Eggert
2018-04-21 18:37:43 UTC
Permalink
Post by Markus Triska
Using GMP has a significant downside if you run out of memory: There’s
currently no defined way for the allocation functions to recover from an
error such as out of memory, they must *terminate* program execution.
This should not be a problem if we use GMP's mpn_* functions. They never
allocate memory; it is the user's responsibility to allocate it. This should fit
better with how Emacs does things internally.
Post by Markus Triska
Also, it may become hard to stop long GMP calculations, whereas you can
easily stop computations that are written in Elisp. Thus, long GMP
calculations may lead to denial of editing attacks.
Yes, this could be a problem. Perhaps we'll need to extend GMP to fix it. On the
other hand, no doubt there are other, similar problems in GNU Emacs (just look
for calls to memcpy :-), and perhaps this problem could just be added to the list.
Richard Stallman
2018-04-21 22:42:52 UTC
Permalink
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
Post by Paul Eggert
Yes, this could be a problem. Perhaps we'll need to extend GMP to fix it.
The only things that might interrupt Lisp code are signals.
If C-g is detected by a signal, it can interrupt almost anything in C,
including computation in a GMP function,
--
Dr Richard Stallman
President, Free Software Foundation (https://gnu.org, https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)
Skype: No way! See https://stallman.org/skype.html.
Eli Zaretskii
2018-04-22 02:43:37 UTC
Permalink
Date: Sat, 21 Apr 2018 18:42:52 -0400
The only things that might interrupt Lisp code are signals.
If C-g is detected by a signal, it can interrupt almost anything in C,
including computation in a GMP function,
We've changed how signals are processed in Emacs several years ago.
Nowadays, a signal handler just sets a flag, and the Lisp interpreter
tests that flag "when appropriate" (normally, as part of maybe_quit).

This change was done to avoid non-trivial processing inside signal
handlers.
Richard Stallman
2018-04-23 03:34:58 UTC
Permalink
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
Post by Eli Zaretskii
We've changed how signals are processed in Emacs several years ago.
Nowadays, a signal handler just sets a flag, and the Lisp interpreter
tests that flag "when appropriate" (normally, as part of maybe_quit).
This change was done to avoid non-trivial processing inside signal
handlers.
Quiting always worked by setting a flag, tested by the QUIT macro.
However, as I implemented it, some constructs used to specify to quit
immediately straight from the signal handler.

I see that this feature has been deleted. That is going to cause a
bad results. There are quite a few places in the C code of Emacs
where execution can get stuck; that's why I added immediate quitting.
The problems I fixed that way have all come back.

What was the reason for this change?

I think we need to revert it, and fix whatever problem
in another way.
--
Dr Richard Stallman
President, Free Software Foundation (https://gnu.org, https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)
Skype: No way! See https://stallman.org/skype.html.
Paul Eggert
2018-04-23 04:21:02 UTC
Permalink
Post by Richard Stallman
There are quite a few places in the C code of Emacs
where execution can get stuck; that's why I added immediate quitting.
The problems I fixed that way have all come back.
I hope they haven't come back. We tried to be methodical about preventing Emacs
from getting stuck.
Post by Richard Stallman
What was the reason for this change?
Allowing immediate quitting from arbitrary points in execution causes problems
when Emacs internals assume that actions are done in a certain order and are not
just partly done when interrupted by a quit. In the old days we could assume
that execution order at the machine level was the same order as in the C program
source code, which meant that as long as the C source code did things in the
right order Emacs could survive immediate quitting. That assumption is no longer
true. To do things safely now, we need to either (1) add critical sections to
Emacs where appropriate, or (2) test the quit flag at appropriate intervals. In
practice (2) is easier to implement and to maintain.
Stefan Monnier
2018-04-23 13:13:58 UTC
Permalink
Post by Richard Stallman
There are quite a few places in the C code of Emacs
where execution can get stuck; that's why I added immediate quitting.
The problems I fixed that way have all come back.
[...]
Post by Richard Stallman
What was the reason for this change?
The main reason was that you can't do it reliably in X11 because to
detect a C-g requires non-trivial processing which can't be performed
reliably from a signal handler.


Stefan
Richard Stallman
2018-04-24 02:54:20 UTC
Permalink
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
Post by Paul Eggert
Allowing immediate quitting from arbitrary points in execution causes problems
Immediate quitting was never allowed from arbitrary points in execution.
Only in places where immediate_quit was nonzero.
Those were generally code whose side effects would not matter
unless they were allowed to finish.
Post by Paul Eggert
In the old days we could assume that execution order at the
machine level was the same order as in the C program source code,
which meant that as long as the C source code did things in the
right order Emacs could survive immediate quitting. That
assumption is no longer true.
That problem exists in theory. Did the problem actually occur?
Post by Paul Eggert
To do things safely now, we need to either (1) add critical
sections to Emacs where appropriate, or (2) test the quit flag at
appropriate intervals
... in each and every place where immediate_quit was formerly used.

In practice (2) is easier to implement and
Post by Paul Eggert
to maintain.
If (2) means to check for quitting as needed in each and every place
code where immediate_quit was formerly used, that would thoroughly do
the job, but did we handle all of those places?
--
Dr Richard Stallman
President, Free Software Foundation (https://gnu.org, https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)
Skype: No way! See https://stallman.org/skype.html.
Paul Eggert
2018-04-24 04:34:58 UTC
Permalink
Post by Richard Stallman
Post by Paul Eggert
In the old days we could assume that execution order at the
machine level was the same order as in the C program source code,
which meant that as long as the C source code did things in the
right order Emacs could survive immediate quitting. That
assumption is no longer true.
That problem exists in theory. Did the problem actually occur?
As Eli wrote, we did make the change in response to a bug report. I do not
recall all the details offhand. The theory is pretty clear, though.
Post by Richard Stallman
If (2) means to check for quitting as needed in each and every place
code where immediate_quit was formerly used, that would thoroughly do
the job, but did we handle all of those places?
We attempted to check for quitting at every place where a check is needed to
avoid unbounded or too-long computation, and I recall using the immediate_quits
in the old code as a cue for where to check for quitting in the new code. We
tried to be conservative about this, and as I recall we put in some checks even
when we weren't sure they were needed. Perhaps we missed some places, but if so
we can fix them as problems arise (and in this sense the new method is similar
to the old one).
Richard Stallman
2018-04-25 01:05:54 UTC
Permalink
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

Thanks for explaining.
--
Dr Richard Stallman
President, Free Software Foundation (https://gnu.org, https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)
Skype: No way! See https://stallman.org/skype.html.
Eli Zaretskii
2018-04-23 15:18:59 UTC
Permalink
Date: Sun, 22 Apr 2018 23:34:58 -0400
Quiting always worked by setting a flag, tested by the QUIT macro.
However, as I implemented it, some constructs used to specify to quit
immediately straight from the signal handler.
This still happens, but much less frequently: only on text-mode
frames, and only when Emacs waits for input.
I see that this feature has been deleted. That is going to cause a
bad results. There are quite a few places in the C code of Emacs
where execution can get stuck; that's why I added immediate quitting.
The problems I fixed that way have all come back.
What was the reason for this change?
The main reasons were described and discussed in bug#12471, in the
context of making signal handling in Emacs more robust. I reproduce
that description below.

In addition, the asynchronous input code caused problems and
limitations, the most (in)famous being that we needed to refrain from
calling malloc in places that could be potentially run from the signal
handler.

Also note this discussion about actually making synchronous input the
default one:

http://lists.gnu.org/archive/html/emacs-devel/2008-03/msg00410.html

And here's your agreement to making that the default:

http://lists.gnu.org/archive/html/emacs-devel/2008-03/msg01268.html

----------------------------------------------------------------------
* Signal handlers can interrupt each other, leading to races.

* Signals can be mishandled if they arrive right in the middle of
code that is keeping track of whether signals have arrived.

* Some variables are modified by signal handlers but are not
declared 'volatile', which means accesses to them could be
incorrectly optimized.

* When debugging, the debugging code can get into an infinite
signal-handling loop if there's a bug in the fatal error handler.

* There are some bugs involving running out of memory in the middle
of a vfork, in which signal handlers aren't restored correctly and
Emacs is likely to misbehave.

* Signals are always redirected to the main thread, resulting in
incorrect backtraces when, for example, a subsidiary thread has
a segmentation violation. Thread-specific signals like SIGSEGV
should have thread-specific backtraces.

* When in batch mode, Emacs doesn't ignore SIGINT if the invoker has
purposely ignored SIGINT. Similarly for SIGTERM. Emacs gets
SIGHUP right, but the other two signals should be consistent
with the usual behavior for batch programs.

* Emacs isn't consistent about what tests it uses to decide whether
it is in batch mode, leading to glitches when the tests disagree.

* Emacs catches SIGPIPE, but it's better to ignore it, as this avoids
races.

* Emacs catches SIGFPE, but on IEEE hosts catching SIGFPE isn't
needed and can mask bugs; it's better to catch SIGFPE only on (the
now quite rare) non-IEEE hosts.
Emanuele Santoro
2018-04-24 18:56:04 UTC
Permalink
Post by Siraphob (Ben) Phipathananunth
What is the consensus on linking the GNU GMP library to Emacs so that
packages such as Emacs Calc (and others) could benefit from using
native types (i.e. "mpz_t") rather than reinventing the wheel?
My two cents: this should be added as optional feature and its presence,
if wanted/needed, should be enabled at compilation time.

Maybe in a few years we could see if it has been widely adopted and if
so make GNU GMP usage the default setting (and/or discard the old
implementation). This should allow a smoother transition.


Hope this helps,
--
Emanuele Santoro
Glenn Morris
2018-04-26 15:52:00 UTC
Permalink
I don't know how useful it is today, but there is existing work on
this in other-branches/gerd_big.
Loading...