I can't find any discussion of this question in this NG.
I'd like to implement some variable precision integer arithmetic in C,
and do it efficiently. A problem arises with the divide/remainder
operations. I'm using gcc on x86, but a lot of what I say applies to
other architectures.
First, I assume the div and ldiv functions are more efficient than
using / and % (obviously it depends on the compiler and level of
optimization, but it's one operation instead of two).
However, div and ldiv require the length of the quotient, divisor,
dividend and remainder to all be the same. Now all the machines I've
ever worked with at the machine language level (quite a few of them)
implement integer division with a dividend that is twice as long as the
divisor, quotient and remainder. Moreover, this is what naturally
arises in multiple precision arithmetic, you get dividends twice as
long as everything else. So suppose I'm using 16bit divisors,
quotients and remainders, but 32bit dividends. If I were writing in
assembly language, I would use the instructions that divide 16 bits
into 32.
But when I write in C, I'm forced to use the ldiv function, since one
of the operands is 32bits. (Let's say int=16bit and long=32bit, as
on my machine.) Then the compiler has to implement this ldiv function
by using the instruction that divides 32bits into 64bits, since the
divisor is now 32bits. In other words, the compiled code is forced to
use a divide instruction with operands twice as large as needed, due to
the design of the div and ldiv functions.
Anybody know a way around this (apart from inserting assembly code in
my C program)?
Thanks in advance... 29 5405 ne*****@wigner.berkeley.edu wrote On 01/13/06 17:47,: [...] First, I assume the div and ldiv functions are more efficient than using / and % (obviously it depends on the compiler and level of optimization, but it's one operation instead of two). [...] But when I write in C, I'm forced to use the ldiv function, since one of the operands is 32bits. (Let's say int=16bit and long=32bit, as on my machine.) Then the compiler has to implement this ldiv function by using the instruction that divides 32bits into 64bits, since the divisor is now 32bits. In other words, the compiled code is forced to use a divide instruction with operands twice as large as needed, due to the design of the div and ldiv functions.
The leap from "I assume" to "I'm forced" seems a
risky one. What have your measurements shown about
the efficiencies of /, %, both / and %, and ldiv?
You *have* made measurements, haven't you?
HAVEN'T YOU????
 Er*********@sun.com
Use inline assembly for the system you are on. Use macros to determine
which routine depending on your system. Burst into flames. ne*****@wigner.berkeley.edu wrote: I can't find any discussion of this question in this NG.
I'd like to implement some variable precision integer arithmetic in C, and do it efficiently. A problem arises with the divide/remainder operations. I'm using gcc on x86, but a lot of what I say applies to other architectures.
First, I assume the div and ldiv functions are more efficient than using / and % (obviously it depends on the compiler and level of optimization, but it's one operation instead of two).
However, div and ldiv require the length of the quotient, divisor, dividend and remainder to all be the same. Now all the machines I've ever worked with at the machine language level (quite a few of them) implement integer division with a dividend that is twice as long as the divisor, quotient and remainder. Moreover, this is what naturally arises in multiple precision arithmetic, you get dividends twice as long as everything else. So suppose I'm using 16bit divisors, quotients and remainders, but 32bit dividends. If I were writing in assembly language, I would use the instructions that divide 16 bits into 32.
But when I write in C, I'm forced to use the ldiv function, since one of the operands is 32bits. (Let's say int=16bit and long=32bit, as on my machine.) Then the compiler has to implement this ldiv function by using the instruction that divides 32bits into 64bits, since the divisor is now 32bits. In other words, the compiled code is forced to use a divide instruction with operands twice as large as needed, due to the design of the div and ldiv functions.
Anybody know a way around this (apart from inserting assembly code in my C program)?
Thanks in advance...
On 20060113, ne*****@wigner.berkeley.edu <ne*****@wigner.berkeley.edu> wrote: I can't find any discussion of this question in this NG.
I'd like to implement some variable precision integer arithmetic in C, and do it efficiently. A problem arises with the divide/remainder operations. I'm using gcc on x86, but a lot of what I say applies to other architectures.
First, I assume the div and ldiv functions are more efficient than using / and % (obviously it depends on the compiler and level of optimization, but it's one operation instead of two).
Not really  it requires a function call and use of a structure, and any
compiler should optimize use of / and % with the same operands anyway.
But when I write in C, I'm forced to use the ldiv function, since one of the operands is 32bits. (Let's say int=16bit and long=32bit, as on my machine.) Then the compiler has to implement this ldiv function by using the instruction that divides 32bits into 64bits, since the divisor is now 32bits. In other words, the compiled code is forced to use a divide instruction with operands twice as large as needed, due to the design of the div and ldiv functions.
i'm pretty sure that there are instructions on x86 take 32 bits for both
operands.
Anybody know a way around this (apart from inserting assembly code in my C program)?
just use / and %.
You are right. Suppose I have declarations
unsigned int quot,rem,dvsr;
unsigned long dvnd;
and then I say
quot=dvnd/dvsr;
rem =dvnd%dvsr;
If the compiler is smart enough to generate a single divide instruction
(that's all that's required on the x86 hardware), then I have what I
want.
Maybe I will try to figure out how to list assembly output from gcc to
see what instructions it actually generates.
Thanks for the reply!
On 20060114, ne*****@wigner.berkeley.edu <ne*****@wigner.berkeley.edu> wrote: Maybe I will try to figure out how to list assembly output from gcc to see what instructions it actually generates.
<OT>gcc S</OT> ne*****@wigner.berkeley.edu writes: You are right.
[...]
About what? Most of us can't easily see the article to which you're
replying. Please read <http://cfaj.freeshell.org/google/>.
[...]
Maybe I will try to figure out how to list assembly output from gcc to see what instructions it actually generates.
<OT>
gcc S
</OT>

Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Jordan Abel wrote: On 20060113, ne*****@wigner.berkeley.edu <ne*****@wigner.berkeley.edu> wrote: First, I assume the div and ldiv functions are more efficient than using / and % (obviously it depends on the compiler and level of optimization, but it's one operation instead of two).
Not really  it requires a function call and use of a structure, and any compiler should optimize use of / and % with the same operands anyway.
just use / and %.
Some of the responses make me wonder if your question was understood.
You want both "/" and "%" but don't want to needlessly repeat the
division. Do compilers find this optimization?
Someone from Sun implies that speed comparison tests are appropriate.
He should preach this inhouse. It was trivial to beat memcpy() on
Sun3 (which was optimized for Sun2). The early Sparc's integer
multiply
could not only be bettered by rewriting the assembly, it *could be
beaten
without assembly in a C routine!* (In either way the speed improvement
was huge.)
James
Keith Thompson wrote: ne*****@wigner.berkeley.edu writes:
[...]
Maybe I will try to figure out how to list assembly output from gcc to see what instructions it actually generates.
<OT> gcc S </OT>
I use: gcc gstabs+ Wa,ahldn c
and pipe the result to less.

"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers."  Keith Thompson
More details at: <http://cfaj.freeshell.org/google/>
James Dow Allen wrote: Someone from Sun implies that speed comparison tests are appropriate. He should preach this inhouse. It was trivial to beat memcpy() on Sun3 (which was optimized for Sun2).
How many employees from 20 years ago do you think are still there? :)
 Logan
In article <11*********************@f14g2000cwb.googlegroups. com>,
"James Dow Allen" <jd*********@yahoo.com> wrote: Jordan Abel wrote: On 20060113, ne*****@wigner.berkeley.edu <ne*****@wigner.berkeley.edu> wrote: First, I assume the div and ldiv functions are more efficient than using / and % (obviously it depends on the compiler and level of optimization, but it's one operation instead of two).
Not really  it requires a function call and use of a structure, and any compiler should optimize use of / and % with the same operands anyway.
just use / and %.
Some of the responses make me wonder if your question was understood. You want both "/" and "%" but don't want to needlessly repeat the division. Do compilers find this optimization?
Someone from Sun implies that speed comparison tests are appropriate. He should preach this inhouse. It was trivial to beat memcpy() on Sun3 (which was optimized for Sun2). The early Sparc's integer multiply could not only be bettered by rewriting the assembly, it *could be beaten without assembly in a C routine!* (In either way the speed improvement was huge.)
The probably didn't measure it.
That's why he is telling you: Measure it.
Or are you suggesting that measuring the speed of code before trying to
optimise it for speed is a bad idea, because it was suggested by someone
who works for a company who shipped some suboptimal code twenty years
ago?
On 13 Jan 2006 18:03:25 0800, ne*****@wigner.berkeley.edu wrote: You are right. Suppose I have declarations unsigned int quot,rem,dvsr; unsigned long dvnd; and then I say quot=dvnd/dvsr; rem =dvnd%dvsr;
If the compiler is smart enough to generate a single divide instruction (that's all that's required on the x86 hardware), then I have what I want.
Maybe I will try to figure out how to list assembly output from gcc to see what instructions it actually generates.
Thanks for the reply!
You should check as well if the optimiser results are different for
signed and unsigned integer types, since there is a slight
complication with the sign of the modulo and the way it is rounded if
one of the incoming values are negative.
Jim
Christian Bau wrote: In article <11*********************@f14g2000cwb.googlegroups. com>, "James Dow Allen" <jd*********@yahoo.com> wrote:
Someone from Sun implies that speed comparison tests are appropriate. He should preach this inhouse. It was trivial to beat memcpy() on Sun3 (which was optimized for Sun2). The early Sparc's integer multiply could not only be bettered by rewriting the assembly, it *could be beaten without assembly in a C routine!* (In either way the speed improvement was huge.)
I'm surprised no one was curious about the fast Ccode multiply.
It typically takes shorts in, long out  a common case.
If you've not seen the trick before, consider it a puzzle
to reverse engineer it from the fragment:
c = arr[x+y]  arr[xy]; /* c = x * y */
It gives competitive performance even on some machines with
blazingly fast multiplies.
The probably didn't measure it. That's why he is telling you: Measure it.
Mr. Sosman's comment was of course excellent.
I thought this went without saying. What in my post
implied otherwise? (I deliberately removed an attribution
to reduce risk that my comment would somehow be seen
as opposing.)
Or are you suggesting that measuring the speed of code before trying to optimise it for speed is a bad idea, ...
Hunh? I'll admit to a sardonic tone that is sometimes confusing,
but what's with the blatant misconstruction?
The only point I was alluding to is that even big organizations with
which one would *like* to associate technical excellence, often
fail sharply. ... so sharply it almost seems unfair here to chastise
OP.
James
On 20060116, James Dow Allen <jd*********@yahoo.com> wrote: Christian Bau wrote: In article <11*********************@f14g2000cwb.googlegroups. com>, "James Dow Allen" <jd*********@yahoo.com> wrote:
> Someone from Sun implies that speed comparison tests are appropriate. > He should preach this inhouse. It was trivial to beat memcpy() on > Sun3 (which was optimized for Sun2). The early Sparc's integer > multiply > could not only be bettered by rewriting the assembly, it *could be > beaten > without assembly in a C routine!* (In either way the speed improvement > was huge.)
I'm surprised no one was curious about the fast Ccode multiply. It typically takes shorts in, long out  a common case. If you've not seen the trick before, consider it a puzzle to reverse engineer it from the fragment: c = arr[x+y]  arr[xy]; /* c = x * y */ It gives competitive performance even on some machines with blazingly fast multiplies.
OK, i'll bite  what goes in the array?
Jordan Abel wrote: On 20060116, James Dow Allen <jd*********@yahoo.com> wrote:
I'm surprised no one was curious about the fast Ccode multiply. It typically takes shorts in, long out  a common case. If you've not seen the trick before, consider it a puzzle to reverse engineer it from the fragment: c = arr[x+y]  arr[xy]; /* c = x * y */ It gives competitive performance even on some machines with blazingly fast multiplies.
OK, i'll bite  what goes in the array?
Corners.
/* BEGIN new.c */
#include <stdio.h>
int main(void)
{
unsigned x, y, arr[256];
for (x = 0; 255 > x; ++x) {
arr[x] = x * x / 4;
}
for (x = 0; 127 > x; ++x) {
for (y = 0; 127 > y; ++y) {
printf("%u * %u is %u\n", x, y,
x > y
?arr[x+y]  arr[xy]
:arr[x+y]  arr[yx]);
}
}
return 0;
}
/* END new.c */

pete
Jordan Abel wrote: On 20060116, James Dow Allen <jd*********@yahoo.com> wrote:
.... snip ... I'm surprised no one was curious about the fast Ccode multiply. It typically takes shorts in, long out  a common case. If you've not seen the trick before, consider it a puzzle to reverse engineer it from the fragment:
c = arr[x+y]  arr[xy]; /* c = x * y */
It gives competitive performance even on some machines with blazingly fast multiplies.
OK, i'll bite  what goes in the array?
A table of squares. The result needs a division by 2.

"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers."  Keith Thompson
More details at: <http://cfaj.freeshell.org/google/>
Jordan Abel wrote: On 20060116, James Dow Allen <jd*********@yahoo.com> wrote:
Christian Bau wrote:
In article <11*********************@f14g2000cwb.googlegroups. com>, "James Dow Allen" <jd*********@yahoo.com> wrote:
Someone from Sun implies that speed comparison tests are appropriate. He should preach this inhouse. It was trivial to beat memcpy() on Sun3 (which was optimized for Sun2). The early Sparc's integer multiply could not only be bettered by rewriting the assembly, it *could be beaten without assembly in a C routine!* (In either way the speed improvement was huge.)
I'm surprised no one was curious about the fast Ccode multiply. It typically takes shorts in, long out  a common case. If you've not seen the trick before, consider it a puzzle to reverse engineer it from the fragment: c = arr[x+y]  arr[xy]; /* c = x * y */ It gives competitive performance even on some machines with blazingly fast multiplies.
OK, i'll bite  what goes in the array?
In addition to the other answers, see e.g. http://www.mccw.hetlab.tk/92/Multiplication/en.html
for an "explanation".
Cheers
Michael

EMail: Mine is an /at/ gmx /dot/ de address.
pete wrote: Jordan Abel wrote: On 20060116, James Dow Allen <jd*********@yahoo.com> wrote: c = arr[x+y]  arr[xy]; /* c = x * y */ OK, i'll bite  what goes in the array? ... printf("%u * %u is %u\n", x, y, x > y ?arr[x+y]  arr[xy] :arr[x+y]  arr[yx]);
I just want to point out that a solution can simultaneously
(a) accept signeds, not just unsigneds
(b) avoid the x > y ? timewaster
(c) obey the rules of standard C.
Again this is left as an exercise.
James
James Dow Allen wrote: pete wrote: Jordan Abel wrote: On 20060116, James Dow Allen <jd*********@yahoo.com> wrote: > c = arr[x+y]  arr[xy]; /* c = x * y */ OK, i'll bite  what goes in the array? ... printf("%u * %u is %u\n", x, y, x > y ?arr[x+y]  arr[xy] :arr[x+y]  arr[yx]);
I just want to point out that a solution can simultaneously (a) accept signeds, not just unsigneds (b) avoid the x > y ? timewaster (c) obey the rules of standard C.
Again this is left as an exercise.
/* BEGIN new.c */
#include <stdio.h>
int main(void)
{
int x, y, arr[256], *ptr;
ptr = arr + 128;
for (x = 128; 127 > x; ++x) {
ptr[x] = x * x / 4;
}
for (x = 63; 64 > x; ++x) {
for (y = 63; 64 > y; ++y) {
printf("%d * %d is %d\n",
x, y, ptr[x + y]  ptr[x  y]);
}
}
return 0;
}
/* END new.c */

pete
On 20060118, James Dow Allen <jd*********@yahoo.com> wrote: pete wrote: Jordan Abel wrote: > On 20060116, James Dow Allen <jd*********@yahoo.com> wrote: > > c = arr[x+y]  arr[xy]; /* c = x * y */ > OK, i'll bite  what goes in the array? ... printf("%u * %u is %u\n", x, y, x > y ?arr[x+y]  arr[xy] :arr[x+y]  arr[yx]);
I just want to point out that a solution can simultaneously (a) accept signeds, not just unsigneds
How, by passing a possibly negative offset into an array? Or by testing
both vars to be < 0?
(b) avoid the x > y ? timewaster (c) obey the rules of standard C.
Again this is left as an exercise.
James
pete wrote: Jordan Abel wrote: James Dow Allen <jd*********@yahoo.com> wrote: > I'm surprised no one was curious about the fast Ccode multiply. > It typically takes shorts in, long out  a common case. > If you've not seen the trick before, consider it a puzzle > to reverse engineer it from the fragment: > c = arr[x+y]  arr[xy]; /* c = x * y */ > It gives competitive performance even on some machines with > blazingly fast multiplies.
OK, i'll bite  what goes in the array?
.... snip ... int main(void) { unsigned x, y, arr[256];
for (x = 0; 255 > x; ++x) { arr[x] = x * x / 4; }
And how do you conclude that x*x is divisible by 4? (or even x)?

"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers."  Keith Thompson
More details at: <http://cfaj.freeshell.org/google/>
Chuck F. wrote: pete wrote: Jordan Abel wrote: James Dow Allen <jd*********@yahoo.com> wrote:
> I'm surprised no one was curious about the fast Ccode multiply. > It typically takes shorts in, long out  a common case. > If you've not seen the trick before, consider it a puzzle > to reverse engineer it from the fragment: > c = arr[x+y]  arr[xy]; /* c = x * y */ > It gives competitive performance even on some machines with > blazingly fast multiplies.
OK, i'll bite  what goes in the array? ... snip ... int main(void) { unsigned x, y, arr[256];
for (x = 0; 255 > x; ++x) { arr[x] = x * x / 4; }
And how do you conclude that x*x is divisible by 4? (or even x)?
I simply choose not to care,
and the problem goes away by itself,
just like everything else in C programming:
8 * 3 ==
8 + 3 == 11
8  3 == 5
arr[11] == 121 / 4 == 30
arr[ 5] == 25 / 4 == 6
30  6 == 24
8 * 3 == 24

pete
pete wrote: Chuck F. wrote: pete wrote: Jordan Abel wrote: James Dow Allen <jd*********@yahoo.com> wrote:
> I'm surprised no one was curious about the fast Ccode > multiply. It typically takes shorts in, long out  a > common case. If you've not seen the trick before, > consider it a puzzle to reverse engineer it from the > fragment: c = arr[x+y]  arr[xy]; /* c = x * y */ It > gives competitive performance even on some machines with > blazingly fast multiplies.
OK, i'll bite  what goes in the array? ... snip ... int main(void) { unsigned x, y, arr[256];
for (x = 0; 255 > x; ++x) { arr[x] = x * x / 4; }
And how do you conclude that x*x is divisible by 4? (or even x)?
I simply choose not to care, and the problem goes away by itself, just like everything else in C programming:
8 * 3 ==
8 + 3 == 11 8  3 == 5
arr[11] == 121 / 4 == 30 arr[ 5] == 25 / 4 == 6
30  6 == 24
8 * 3 == 24
Are you trying to force me to write modular diophantine equations
and prove this? :) I can conceive it may compensate, and may look
later.

"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers."  Keith Thompson
More details at: <http://cfaj.freeshell.org/google/>
pete wrote: Chuck F. wrote: And how do you conclude that x*x is divisible by 4? (or even x)?
x*x need not be divisible by 4, but it is always of residue class 0 or
1!
The residues thus aren't big enough to cause trouble!!
I simply choose not to care, and the problem goes away by itself, just like everything else in C programming:
I'm also a fan of C, but didn't realize it was *that* good!
James
On Wed, 18 Jan 2006 23:50:09 0800, James Dow Allen wrote: pete wrote: Chuck F. wrote: > And how do you conclude that x*x is divisible by 4? (or even x)? x*x need not be divisible by 4, but it is always of residue class 0 or 1! The residues thus aren't big enough to cause trouble!!
More precisely: if the only possible remainders are 0 and 0.25, this
implies that the remainder must be identical for the two expressions (it's
no doubt possible to prove this more directly), so that removing the
remainders in the expressions through integer conversion has the same
result as cancelling them in the subtraction.
(those expressions being (a+b)**2 / 4, from which (ab)**2 / 4 is
subtracted, where ** is exponentiation)
In any case, it was an interesting technique that you shared. Doing some
quick calculations, assuming a short int of 16 bits and int of 32 bits,
and allowing for signed multiplication, the complete table would consume
approx 524,292 bytes (halved if you add a conditional to reuse a positive
index when the index is negative). So you can exchange half a meg memory
for faster multiplication of short ints on platforms where the technique
is actually fast enough to warrant it.
I simply choose not to care, and the problem goes away by itself, just like everything else in C programming:
I'm also a fan of C, but didn't realize it was *that* good!
Perhaps pete's symbiosis with C is as poetic as his postings.
 http://members.dodo.com.au/~netocrat
"Chuck F. " <cb********@yahoo.com> wrote: pete wrote: Chuck F. wrote: pete wrote: Jordan Abel wrote: > James Dow Allen <jd*********@yahoo.com> wrote:
>> I'm surprised no one was curious about the fast Ccode >> multiply. It typically takes shorts in, long out  a >> common case. If you've not seen the trick before, >> consider it a puzzle to reverse engineer it from the >> fragment: c = arr[x+y]  arr[xy]; /* c = x * y */ It >> gives competitive performance even on some machines with >> blazingly fast multiplies. > > OK, i'll bite  what goes in the array?
... snip ...
int main(void) { unsigned x, y, arr[256];
for (x = 0; 255 > x; ++x) { arr[x] = x * x / 4; }
And how do you conclude that x*x is divisible by 4? (or even x)?
I simply choose not to care, and the problem goes away by itself, just like everything else in C programming:
8 * 3 ==
8 + 3 == 11 8  3 == 5
arr[11] == 121 / 4 == 30 arr[ 5] == 25 / 4 == 6
30  6 == 24
8 * 3 == 24
Are you trying to force me to write modular diophantine equations and prove this? :) I can conceive it may compensate, and may look later.
If x==2a, y==2b, then
x+y==2a+2b, ((x+y)**2)/4 == (4aa+8ab+4bb)/4 == aa+2ab+bb;
xy==2a2b, ((xy)**2)/4 == (4aa8ab+4bb)/4 == aa2ab+bb;
((x+y)**2)/4  ((xy)**2)/4 == aa+2ab+bb  (aa2abbb) == 4ab == xy.
If x==2a+1, y==2b+1, then
x+y==2a+2b+2, ((x+y)**2)/4 == (4aa+8ab+4bb+8a+8b+4)/4 ==
aa+2ab+bb+2a+2b+2;
xy==2a2b, ((xy)**2)/4 == (4aa8ab+4bb)/4 == aa2ab+bb;
((x+y)**2)/4  ((xy)**2)/4 == aa+2ab+bb+2a+2b+1  (aa2ab+bb) ==
4ab+2a+2b+1 == (2a+1)*(2b+1) == xy.
If x==2a+1, y==2b, then
x+y==2a+2b+1, ((x+y)**2)/4 == (4aa+8ab+4bb+4a+4b+1)/4 ==
aa+2ab+bb+a+b;
xy==2a2b+1, ((xy)**2)/4 == (4aa8ab+4bb+4a4b+1)/4 ==
aa2abbb+ab;
((x+y)**2)/4  ((xy)**2)/4 == aa+2ab+bb+a+b  (aa2ab+bb+ab) ==
4ab+2b == (2a+1)*2b == xy.
If x==2a, y==2b1, then
x+y==2a+2b+1, ((x+y)**2)/4 == (4aa+8ab+4bb+4a+4b+1)/4 ==
aa+2ab+bb+a+b;
xy==2a2b1, ((xy)**2)/4 == (4aa8ab+4bb4a+4b+1)/4 ==
aa2abbba+b;
((x+y)**2)/4  ((xy)**2)/4 == aa+2ab+bb+a+b  (aa2ab+bba+b) ==
4ab+2a == 2a*(2b+1) == xy.
Note: integer divisions throughout, rounding down, as in C.
Note on the note: this only matters if _either_ x or y is odd, not if
both are.
Richard
Chuck F. wrote: diophantine
you me friend good

pete
Netocrat wrote: On Wed, 18 Jan 2006 23:50:09 0800, James Dow Allen wrote: pete wrote: Chuck F. wrote: > And how do you conclude that x*x is divisible by 4? (or even x)?
I simply choose not to care, and the problem goes away by itself, just like everything else in C programming:
I'm also a fan of C, but didn't realize it was *that* good!
Perhaps pete's symbiosis with C is as poetic as his postings.
I was just kidding around.

pete
Netocrat wrote: On Wed, 18 Jan 2006 23:50:09 0800, James Dow Allen wrote: x*x need not be divisible by 4, but it is always of residue class 0 or 1! The residues thus aren't big enough to cause trouble!! More precisely: if the only possible remainders are 0 and 0.25, ...
A residue of 3 would also not cause trouble, as long as you remember
to make the remainder 0.25 instead of 0.75. My blood pressure goes
up a little (when I forget my meds) whenever I divide a negative
number because of the sadness that >>2 and /4 no longer
work the same on 2'scomplement machines. Fortunately there is
no hypertension risk here, since x*x is never negative.
In any case, it was an interesting technique that you shared. Doing some quick calculations, assuming a short int of 16 bits and int of 32 bits, and allowing for signed multiplication, the complete table would consume approx 524,292 bytes...
That's assuming your multipliers use the full 16bit range; often
you'll get away
with much less. The technique may be obsolescent now since hardwired
multipliers have become so frisky.
I *did* use the technique on The World's Fastest Jpeg Compressor (tm),
but only on Sun's early Sparc with its bizarrely slow multiplication.
James
In article <11*********************@o13g2000cwo.googlegroups. com>, "James Dow Allen" <jd*********@yahoo.com> writes: That's assuming your multipliers use the full 16bit range; often you'll get away with much less. The technique may be obsolescent now since hardwired multipliers have become so frisky.
I suspect that's true for generalpurpose CPUs, with their fast integer
multiplication and cache sensitivities (which weigh against tablebased
approaches in general). But it might still be useful for some embedded
applications with simple processors, particularly if as you say the
domain is restricted.
Thanks for posing this puzzle, by the way. I spent 10 minutes or so
working out values for the array on paper (using simultaneous
equations; I didn't bother digging out my number theory textbooks,
so I took the simple route). Interestingly, I came up with a
different way to compute the series: {(i/2)**2, (i/2)**2 + i/2, ...}
for i from 1 to N. That is, the first, third, and so on elements
are the squares, and the second, fourth, etc are the squares plus
their roots.
Anyway, it was an amusing exercise.

Michael Wojcik mi************@microfocus.com
Some seem to live on credit as naturally as they breathe, and I remember
the surprise of one of these: "What! You don't owe anybody anything! Good
Lord! man, lend me half a sovereign."  Arthur Ransome This discussion thread is closed Replies have been disabled for this discussion. Similar topics
2 posts
views
Thread by Michael Cornelius 
last post: by

1 post
views
Thread by Dave 
last post: by

24 posts
views
Thread by Teis Draiby 
last post: by

5 posts
views
Thread by silentlights 
last post: by

17 posts
views
Thread by seb.haase 
last post: by

22 posts
views
Thread by jamestuck21 
last post: by

13 posts
views
Thread by jamesonang 
last post: by
 
16 posts
views
Thread by spl 
last post: by
          