This file is indexed.

/usr/share/doc/libntl-dev/NTL/quad_float.txt is in libntl-dev 9.9.1-3.

This file is owned by root:root, with mode 0o644.

The actual contents of the file can be viewed below.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
/**************************************************************************\

MODULE: quad_float

SUMMARY:

The class quad_float is used to represent quadruple precision numbers.
Thus, with standard IEEE floating point, you should get the equivalent
of about 106 bits of precision (but actually just a bit less).

The interface allows you to treat quad_floats more or less as if they were
"ordinary" floating point types.

See below for more implementation details.


\**************************************************************************/

#include <NTL/ZZ.h>


class quad_float {
public:

quad_float(); // = 0

quad_float(const quad_float& a);  // copy constructor

explicit quad_float(double a);  // promotion constructor


quad_float& operator=(const quad_float& a);  // assignment operator
quad_float& operator=(double a);

~quad_float();


static void SetOutputPrecision(long p);
// This sets the number of decimal digits to be output.  Default is
// 10.


static long OutputPrecision();
// returns current output precision.


};


/**************************************************************************\

                             Arithmetic Operations

\**************************************************************************/




quad_float operator +(const quad_float& x, const quad_float& y);
quad_float operator -(const quad_float& x, const quad_float& y);
quad_float operator *(const quad_float& x, const quad_float& y);
quad_float operator /(const quad_float& x, const quad_float& y);


// PROMOTIONS: operators +, -, *, / promote double to quad_float
// on (x, y).

quad_float operator -(const quad_float& x);

quad_float& operator += (quad_float& x, const quad_float& y);
quad_float& operator += (quad_float& x, double y);

quad_float& operator -= (quad_float& x, const quad_float& y);
quad_float& operator -= (quad_float& x, double y);

quad_float& operator *= (quad_float& x, const quad_float& y);
quad_float& operator *= (quad_float& x, double y);

quad_float& operator /= (quad_float& x, const quad_float& y);
quad_float& operator /= (quad_float& x, double y);

quad_float& operator++(quad_float& a); // prefix
void operator++(quad_float& a, int); // postfix

quad_float& operator--(quad_float& a); // prefix
void operator--(quad_float& a, int); // postfix



/**************************************************************************\

                                  Comparison

\**************************************************************************/


long operator> (const quad_float& x, const quad_float& y);
long operator>=(const quad_float& x, const quad_float& y);
long operator< (const quad_float& x, const quad_float& y);
long operator<=(const quad_float& x, const quad_float& y);
long operator==(const quad_float& x, const quad_float& y);
long operator!=(const quad_float& x, const quad_float& y);

long sign(const quad_float& x);  // sign of x, -1, 0, +1
long compare(const quad_float& x, const quad_float& y); // sign of x - y

// PROMOTIONS: operators >, ..., != and function compare
// promote double to quad_float on (x, y).


/**************************************************************************\

                               Input/Output
Input Syntax:

<number>: [ "-" ] <unsigned-number>
<unsigned-number>: <dotted-number> [ <e-part> ] | <e-part>
<dotted-number>: <digits> | <digits> "." <digits> | "." <digits> | <digits> "."
<digits>: <digit> <digits> | <digit>
<digit>: "0" | ... | "9"
<e-part>: ( "E" | "e" ) [ "+" | "-" ] <digits>

Examples of valid input:

17 1.5 0.5 .5  5.  -.5 e10 e-10 e+10 1.5e10 .5e10 .5E10

Note that the number of decimal digits of precision that are used
for output can be set to any number p >= 1 by calling
the routine quad_float::SetOutputPrecision(p).  
The default value of p is 10.
The current value of p is returned by a call to quad_float::OutputPrecision().



\**************************************************************************/


istream& operator >> (istream& s, quad_float& x);
ostream& operator << (ostream& s, const quad_float& x);


/**************************************************************************\

                                  Miscellaneous

\**************************************************************************/



quad_float sqrt(const quad_float& x);
quad_float floor(const quad_float& x);
quad_float ceil(const quad_float& x);
quad_float trunc(const quad_float& x);
quad_float fabs(const quad_float& x);
quad_float exp(const quad_float& x);
quad_float log(const quad_float& x);


void power(quad_float& x, const quad_float& a, long e); // x = a^e
quad_float power(const quad_float& a, long e); 

void power2(quad_float& x, long e); // x = 2^e
quad_float power2_quad_float(long e); 

quad_float ldexp(const quad_float& x, long e);  // return x*2^e

long IsFinite(quad_float *x); // checks if x is "finite"   
                              // pointer is used for compatability with
                              // IsFinite(double*)


void random(quad_float& x);
quad_float random_quad_float();
// generate a random quad_float x with 0 <= x <= 1





/***********************************************************************\

IMPLEMENTATION DETAILS

A quad_float x is represented as a pair of doubles, x.hi and x.lo,
such that the number represented by x is x.hi + x.lo, where

   |x.lo| <= 0.5*ulp(x.hi),  (*)

and ulp(y) means "unit in the last place of y".  

For the software to work correctly, IEEE Standard Arithmetic is sufficient.  
That includes just about every modern computer; the only exception I'm
aware of is Intel x86 platforms running Linux (but you can still
use this platform--see below).

Also sufficient is any platform that implements arithmetic with correct 
rounding, i.e., given double floating point numbers a and b, a op b 
is computed exactly and then rounded to the nearest double.  
The tie-breaking rule is not important.

This is a rather wierd representation;  although it gives one
essentially twice the precision of an ordinary double, it is
not really the equivalent of quadratic precision (despite the name).
For example, the number 1 + 2^{-200} can be represented exactly as
a quad_float.  Also, there is no real notion of "machine precision".

Note that overflow/underflow for quad_floats does not follow any particularly
useful rules, even if the underlying floating point arithmetic is IEEE
compliant.  Generally, when an overflow/underflow occurs, the resulting value
is unpredicatble, although typically when overflow occurs in computing a value
x, the result is non-finite (i.e., IsFinite(&x) == 0).  Note, however, that
some care is taken to ensure that the ZZ to quad_float conversion routine
produces a non-finite value upon overflow.

THE INTEL x86 PROBLEM

Although just about every modern processor implements the IEEE
floating point standard, there still can be problems
on processors that support IEEE extended double precision.
The only processor I know of that supports this is the x86/Pentium.

While extended double precision may sound like a nice thing,
it is not.  Normal double precision has 53 bits of precision.
Extended has 64.  On x86s, the FP registers have 53 or 64 bits
of precision---this can be set at run-time by modifying
the cpu "control word" (something that can be done
only in assembly code).
However, doubles stored in memory always have only 53 bits.
Compilers may move values between memory and registers
whenever they want, which can effectively change the value
of a floating point number even though at the C/C++ level,
nothing has happened that should have changed the value.
Is that sick, or what?
Actually, the new C99 standard seems to outlaw such "spontaneous"
value changes; however, this behavior is not necessarily
universally implemented.

This is a real headache, and if one is not just a bit careful,
the quad_float code will break.  This breaking is not at all subtle,
and the program QuadTest will catch the problem if it exists.

You should not need to worry about any of this, because NTL automatically
detects and works around these problems as best it can, as described below.
It shouldn't make a mistake, but if it does, you will
catch it in the QuadTest program.
If things don't work quite right, you might try
setting NTL_FIX_X86 or NTL_NO_FIX_X86 flags in ntl_config.h,
but this should not be necessary.

Here are the details about how NTL fixes the problem.

The first and best way is to have the default setting of the control word
be 53 bits.  However, you are at the mercy of your platform
(compiler, OS, run-time libraries).  Windows does this,
and so the problem simply does not arise here, and NTL neither
detects nor fixes the problem.  Linux, however, does not do this,
which really sucks.  Can we talk these Linux people into changing this?

The second way to fix the problem is by having NTL 
fiddle with control word itself.  If you compile NTL using a GNU compiler
on an x86, this should happen automatically.
On the one hand, this is not a general, portable solution,
since it will only work if you use a GNU compiler, or at least one that
supports GNU 'asm' syntax.  
On the other hand, almost everybody who compiles C++ on x86/Linux
platforms uses GNU compilers (although there are some commercial
compilers out there that I don't know too much about).

The third way to fix the problem is to 'force' all intermediate
floating point results into memory.  This is not an 'ideal' fix,
since it is not fully equivalent to 53-bit precision (because of 
double rounding), but it works (although to be honest, I've never seen
a full proof of correctness in this case).
NTL's quad_float code does this by storing intermediate results
in local variables declared to be 'volatile'.
This is the solution to the problem that NTL uses if it detects
the problem and can't fix it using the GNU 'asm' hack mentioned above.
This solution should work on any platform that faithfully
implements 'volatile' according to the ANSI C standard.



BACKGROUND INFO

The code NTL uses algorithms designed by Knuth, Kahan, Dekker, and
Linnainmaa.  The original transcription to C++ was done by Douglas
Priest.  Enhancements and bug fixes were done by Keith Briggs
(http://epidem13.plantsci.cam.ac.uk/~kbriggs).  The NTL version is a
stripped down version of Briggs' code, with a couple of bug fixes and
portability improvements.  Briggs has continued to develop his
library;  see his web page above for the latest version and more information.

Here is a brief annotated bibliography (compiled by Priest) of papers 
dealing with DP and similar techniques, arranged chronologically.


Kahan, W., Further Remarks on Reducing Truncation Errors,
  {\it Comm.\ ACM\/} {\bf 8} (1965), 40.

M{\o}ller, O., Quasi Double Precision in Floating-Point Addition,
  {\it BIT\/} {\bf 5} (1965), 37--50.

  The two papers that first presented the idea of recovering the
  roundoff of a sum.

Dekker, T., A Floating-Point Technique for Extending the Available
  Precision, {\it Numer.\ Math.} {\bf 18} (1971), 224--242.

  The classic reference for DP algorithms for sum, product, quotient,
  and square root.

Pichat, M., Correction d'une Somme en Arithmetique \`a Virgule
  Flottante, {\it Numer.\ Math.} {\bf 19} (1972), 400--406.

  An iterative algorithm for computing a protracted sum to working
  precision by repeatedly applying the sum-and-roundoff method.

Linnainmaa, S., Analysis of Some Known Methods of Improving the Accuracy
  of Floating-Point Sums, {\it BIT\/} {\bf 14} (1974), 167--202.

  Comparison of Kahan and M{\o}ller algorithms with variations given
  by Knuth.

Bohlender, G., Floating-Point Computation of Functions with Maximum
  Accuracy, {\it IEEE Trans.\ Comput.} {\bf C-26} (1977), 621--632.

  Extended the analysis of Pichat's algorithm to compute a multi-word
  representation of the exact sum of n working precision numbers.
  This is the algorithm Kahan has called "distillation".

Linnainmaa, S., Software for Doubled-Precision Floating-Point Computations,
  {\it ACM Trans.\ Math.\ Soft.} {\bf 7} (1981), 272--283.

  Generalized the hypotheses of Dekker and showed how to take advantage
  of extended precision where available.

Leuprecht, H., and W.~Oberaigner, Parallel Algorithms for the Rounding-Exact
  Summation of Floating-Point Numbers, {\it Computing} {\bf 28} (1982), 89--104.

  Variations of distillation appropriate for parallel and vector
  architectures.

Kahan, W., Paradoxes in Concepts of Accuracy, lecture notes from Joint
  Seminar on Issues and Directions in Scientific Computation, Berkeley, 1989.

  Gives the more accurate DP sum I've shown above, discusses some
  examples.

Priest, D., Algorithms for Arbitrary Precision Floating Point Arithmetic,
  in P.~Kornerup and D.~Matula, Eds., {\it Proc.\ 10th Symposium on Com-
  puter Arithmetic}, IEEE Computer Society Press, Los Alamitos, Calif., 1991.

  Extends from DP to arbitrary precision; gives portable algorithms and
  general proofs.

Sorensen, D., and P.~Tang, On the Orthogonality of Eigenvectors Computed
  by Divide-and-Conquer Techniques, {\it SIAM J.\ Num.\ Anal.} {\bf 28}
  (1991), 1752--1775.

  Uses some DP arithmetic to retain orthogonality of eigenvectors
  computed by a parallel divide-and-conquer scheme.

Priest, D., On Properties of Floating Point Arithmetics: Numerical Stability
  and the Cost of Accurate Computations, Ph.D. dissertation, University
  of California at Berkeley, 1992.

  More examples, organizes proofs in terms of common properties of fp
  addition/subtraction, gives other summation algorithms.

Another relevant paper: 

X. S. Li, et al.
Design, implementation, and testing of extended and mixed 
precision BLAS.  ACM Trans. Math. Soft., 28:152-205, 2002.



\***********************************************************************/