/usr/share/doc/libntl-dev/NTL/tour-impl.html is in libntl-dev 9.9.1-3.
This file is owned by root:root, with mode 0o644.
The actual contents of the file can be viewed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 | <html>
<head>
<title>
A Tour of NTL: NTL Implementation and Portability </title>
</head>
<center>
<a href="tour-tips.html"><img src="arrow1.gif" alt="[Previous]" align=bottom></a>
<a href="tour.html"><img src="arrow2.gif" alt="[Up]" align=bottom></a>
<a href="tour-gmp.html"> <img src="arrow3.gif" alt="[Next]" align=bottom></a>
</center>
<h1>
<p align=center>
A Tour of NTL: NTL Implementation and Portability
</p>
</h1>
<p> <hr> <p>
NTL is designed to be portable, fast,
and relatively easy to use and extend.
<p>
To make NTL portable, no assembly code is used (well, almost none, see below).
This is highly desirable, as architectures are constantly
changing and evolving, and maintaining assembly
code is quite costly.
By avoiding assembly code, NTL should remain usable,
with virtually no maintenance, for many years.
<p>
<h3>Minimal platform requirements</h3>
When the configuration flags <tt>NTL_CLEAN_INT</tt>
and <tt>NTL_CLEAN_PTR</tt> are both <i>on</i> (this is not the default,
see below),
NTL makes two requirements
of its platform,
neither of which are guaranteed by the <tt>C++</tt> language
definition, but are essentially universal:
<ol>
<li>
<tt>int</tt> and <tt>long</tt> quantities, respectively,
are represented using a 2's complement
representation whose width is equal to the width of <tt>unsigned int</tt>
and <tt>unsigned long</tt>, respectively.
<li>
Double precision floating point
conforms to the IEEE floating point standard.
</ol>
<p>
NTL makes very conservative requirements of the <tt>C++</tt> compiler:
<ul>
<li>
it is assumed that the <tt>C++</tt> compiler conforms to the 1998 standard,
including the basic of templates;
<li>
it does not assume any features not in the 1998 standard,
unless compiled with the <tt>NTL_THREADS</tt> or <tt>NTL_EXCEPTIONS</tt> flags.
</ul>
<p>
<h3>The <tt>NTL_CLEAN_INT</tt> flag</h3>
<p>
The configuration flag <tt>NTL_CLEAN_INT</tt>
is currently <i>off</i> by default.
<p>
When this flag is off, NTL makes another requirement of its platform;
namely, that conversions from <tt>unsigned long</tt> to <tt>long</tt>
convert the bit pattern without change to the corresponding 2's complement
signed integer.
Note that the <tt>C++</tt> standard defines the behavior of
converting unsigned
to signed values as <i>implementation defined</i> when the value
cannot be represented in the range of nonnegative signed values.
Nevertheless, this behavior is essentially universal, and more importantly,
is is not <i>undefined behavior</i>: implementation-defined behavior must be
documented and respected by the compiler, while undefined behavior can
be exploited by the compiler in some surprising ways.
<p>
Actually, with <tt>NTL_CLEAN_INT</tt> off, it is also assumed
that right shifts of signed integers are consistent,
in the sense that if it is sometimes an arithmetic shift,
then it is always an arithmetic shift (the installation
scripts check if right shift appears to be arithmetic, and if so,
this assumption is made elsewhere).
Arithmetic right shift is also <i>implementation defined</i> behavior
that is essentially universal.
<p>
It seems fairly unlikely that one would ever have to turn the
<tt>NTL_CLEAN_INT</tt> flag <i>on</i>, but it seems a good idea
to make this possible, and at the very least
to identify and isolate the code that
relies on this assumption.
The only code affected by this flag
is the traditional LIP long integer package (which, if you use
GMP as the long integer package, is not involved),
and the single-precision modular multiplication routines.
<p>
Note that prior to NTL 9.0, the default compilation mode required
that in a few critical places, <i>signed</i> integer arithmetic quietly wraps around
on overflow; however, signed integer overflow is <i>undefined behavior</i>,
and it seems that in recent years compilers have been getting
more aggressive in exploiting such undefined behavior in their optimizations.
Moreover, recent versions of GCC now come with a "sanitizer" that checks
for undefined behavior.
So, both to avoid potentially dangerous optimizations and to allow
NTL to pass such sanitzer checks, it seemed safer to move to this
more conservative approach.
There should, in fact, be zero performance penalty in doing so.
Also note that I was never aware of any compiler that generated incorrect
code in the pre-9.0 code: this new approach is just to be on the safe side
in the future.
<h3>The <tt>NTL_CLEAN_PTR</tt> flag</h3>
<p>
The configuration flag <tt>NTL_CLEAN_PTR</tt>
is currently <i>off</i> by default.
<p>
When this flag is off, NTL makes another requirement of its platform;
namely, that the address space is "flat", and in particular,
that one can test if an object pointed to by a pointer <tt>p</tt>
is located in a array of objects <tt>v[0..n-1]</tt> by testing
if <tt>p >= v</tt> and <tt>p < v + n</tt>.
The <tt>C++</tt> standard does not guarantee that such a test will
work; the only way to perform this test in a standard-conforming way
is to iteratively test if <tt>p == v</tt>, <tt>p == v+1</tt>, etc.
<p>
This assumption of a "flat" address space is essentially universally
valid, and making this assumption leads to more efficicient code.
For this reason, the <tt>NTL_CLEAN_PTR</tt> is <i>off</i> by default,
but one can always turn it on, and in fact, the overall performance
penalty should be negligible for most applications.
<h3>Some floating point issues</h3>
<p>
NTL uses floating point arithmetic in a number of places,
including a number of exact computations, where one might
not expect to see floating point.
Relying on floating point may seem prone to errors,
but with the guarantees provided by the IEEE standard,
one can prove the correctness of the NTL code that uses floating point.
<p>
Briefly, the IEEE floating point standard says that basic arithmetic operations
on doubles should work <i>as if</i> the operation were performed with infinite
precision, and then rounded to <tt>p</tt> bits,
where <tt>p</tt> is the precision (typically, <tt>p = 53</tt>).
<p>
Throughout most of NTL, correctness follows from weaker assumptions,
namely
<p>
<ul>
<li>
basic arithmetic operations and conversion from integral types
produce results with a <i>relative error</i> of
<tt>2^{-p + 1}</tt> (assuming no overflow),
<li>
multiplication by powers of 2 produce <i>exact</i> results (assuming no overflow),
<li>
basic arithmetic operations on integers represented as doubles and conversions from integral types
to doubles produce <i>exact</i> results, provided the inputs and outputs
are less than <tt>2^p</tt> in absolute value,
<li>
assuming no overflow, <tt>x - long(x)</tt> produces an exact result for nonnegative <tt>x</tt>.
</ul>
<p>
It is also generally assumed that the compiler does not
do too much "regrouping" of arithmetic expressions involving
floating point.
Most compilers respect the implied grouping of floating point
computations, and NTL goes out of its way to make its
intentions clear: instead of <tt>x = (a + b) + c</tt>,
if the grouping is truly important, this is written
as <tt>t = a + b; x = t + c</tt>.
Current standards do not allow, and most implementations will not
perform, any regrouping of this, e.g., <tt>x = a + (b + c)</tt>,
since in floating point, addition and subtraction are not
associative.
<p>
Unfortunately, some compilers do not do this correctly,
unless you tell them.
With Intel's C compiler <tt>icc</tt>, for example,
you should compile NTL with the flag <tt>-fp-model strict</tt>
to enforce strict adherence to floating point standards.
That said, some effort has been made to ensure that NTL
works correctly even if the compiler does perform such
regrouping, including replacement of <tt>x/y</tt>
by <tt>x*(1/y)</tt>.
<p>
Also, you should be wary of compiling using an optimization
level higher than the default <tt>-O2</tt> --
this may break some floating point assumptions (and maybe
some other assumptions as well).
<p>
In any case, programs that compile against NTL header files
should compile correctly, even under very aggressive optimizations.
<p>
One big problem with the IEEE standard is that it allows intermediate
quantities to be computed in a higher precision than the standard
double precision.
Most platforms today implement the "strict" IEEE standard, with no
excess precision.
Up until recently, the Intel x86 machine with the GCC compiler
was a notable exception to this: on older x86 machines, floating point
was performed using the x87 FPU instructions, which operate on 80-bit,
extended precision numbers; nowadays, most compilers use the SSE instructions,
which operate on the standard, 64-bit numbers.
<p>
Historically,
NTL went out of its way to ensure that its code is correct with
both "strict" and "loose" IEEE floating point.
This is achieved in a portable fashion throughout NTL, except
for the <tt>quad_float</tt> module, where some desperate hacks,
including assembly code, may be used
to try to work around problems created by "loose" IEEE floating point
<a href="quad_float.cpp.html">[more details]</a>.
But note that even if the <tt>quad_float</tt> package does not work correctly
because of these problems, the only other routines that are affected
are the <tt>LLL_QP</tt> routines in the <tt>LLL</tt> module -- the
rest of NTL should work fine.
Hopefully, because of the newer SSE instructions, this whole strict/loose
issue is a thing of the past.
<p>
Another problem is that some hardware (especially newer Intel chips)
support fused multiply-add (FMA) instructions.
Again, this is only a problem for <tt>quad_float</tt>, and some
care is taken to detect the problem and to work around it.
The rest of NTL will work fine regardles.
<p>
Mostly, NTL does not
require that the IEEE floating point
special quantities "infinity"
and "not a number" are implemented correctly.
This is certainly the case for core code where
floating point arithmetic is used for exact (but fast)
computations, as the numbers involved never get too big (or small).
However, the behavior of
certain explicit floating point computations
(e.g., the <tt>xdouble</tt> and <tt>quad_float</tt> classes,
and the floating point versions of LLL) will be
much more predictable and reliable if "infinity"
and "not a number" are implemented correctly.
<p>
<h3>Algorithms</h3>
<p>
NTL makes fairly consistent use of asymptotically fast algorithms.
<p>
Long integer multiplication is implemented using the classical
algorithm, crossing over to Karatsuba for very big numbers.
Long integer division is currently only implemented using
the classical algorithm -- unless you use NTL with GMP (version 3 or later),
which
employs an algorithm that is about twice as slow as multiplication
for very large numbers.
<p>
Polynomial multiplication and division is carried out
using a combination of the classical algorithm, Karatsuba,
the FFT using small primes, and the FFT using the Schoenhagge-Strassen
approach.
The choice of algorithm depends on the coefficient domain.
<p>
Many algorithms employed throughout NTL are inventions
of the author (<a href="http://www.shoup.net">Victor Shoup</a>)
and his colleagues
<a href="http://math-www.uni-paderborn.de/~aggathen/joachim.html">Joachim von zur Gathen</a>
and
<a href="http://www4.ncsu.edu/~kaltofen">Erich Kaltofen</a>,
as well as <a href="mailto:abbott@dima.unige.it">John Abbott</a>
and
<a href="http://www.loria.fr/~zimmerma">Paul Zimmermann</a>.
<p>
<h3>
Thread safety
</h3>
<p>
As of v7.0, NTL is thread safe.
That said, there are several things to be aware of:
<ul>
<li>
To use this feature, you have to enable <tt>NTL_THREADS</tt>
in the configuration script.
Also, you will need a compiler and runtime library that
implements several key <tt>C++11</tt> features,
including <tt>thread_local</tt> storage.
<ul>
<li>
NOTE: as of v9.8, the requirements have been relaxed, so that
for gcc and gcc-compatible compilers
(such as clang and icc) only support of the gcc <tt>__thread</tt>
storage specifier is required.
<li>
With these relaxed requirements, it is possible to build
a thread safe version of NTL on Linux using gcc 4.8 and above,
or on Mac OSX 10.10 and above.
</ul>
<p> <li>
You must build NTL using GMP (i.e., configure with <tt>NTL_GMP_LIP=on</tt>).
The classic LIP integer arithmetic is not thread safe: it could
be made so, but it is not a priority at this time.
<p>
<li>
The current version (v1.1) of the external gf2x
library is not thread safe.
Therefore, <b>you should NOT build NTL using gf2x if you need a thread-safe
build</b>.
</ul>
To obtain thread safety, I used the following strategies:
<ul>
<p>
<li>
In places where NTL's interface demands global variables,
such as the "current modulus" for the <tt>ZZ_p</tt>
class, these global variables have been made thread local.
So, you can pass around various <tt>ZZ_pContext</tt> objects
among threads, and individual threads can install these locally.
Thus, different threads can concurrently use the same or different
moduli, and it all just works, with no changes to NTL's interface.
<p>
<li>
In places where NTL used static variables to hold on to space
for scratch variables, I make these variables <i>thread local</i>,
and I also make sure the storage used by these variables
get released when the thread terminates.
In all NTL builds (thread safe or not),
I try to make sure that fairly large chunks of memory get released immediately.
<p>
<li>
In places where NTL uses a lazy strategy to build various tables
(such as FFT primes), I uses a "double checked locking" strategy
to grow these tables in a way that (a) the tables can be
shared among different threads, and (b) taking a lock
on a <tt>mutex</tt> is very rare.
The new <tt>C++11</tt> concurrent memory model is essential here.
<p>
<li>
Smart pointers (for things like <tt>ZZ_pContext</tt>'s) are
designed to do the necesary reference counting in a thread-safe
manner.
<p>
<li>
For psuedo-random number generation,
the internal state of the PRG
is thread local,
and the default initial seed is guaranteed to be unique
among all threads in a given process (and an attempt is made to
make the seed globally unique among all processes and threads,
but this is hard to do in a completely portable way).
</ul>
The overall structure of the code has been modified so that
the code base is nearly identical for regular and thread-safe builds:
there are just a few <tt>ifdef</tt>'s on the <tt>NTL_THREADS</tt>
flag.
<p>
<h3>
Thread Boosting
</h3>
<p>
As of v9.5.0, NTL provides a <i>thread boosting</i> feature.
With this feature, certain code within NTL will use available
threads to speed up computations on a multicore
machine.
This feature is enabled by setting <tt>NTL_THREAD_BOOST=on</tt>
during configuration.
See <a href="BasicThreadPool.cpp.html">BasicThreadPool.txt</a>
for more information.
<p>
This feature is a work in progress.
Currently, basic <tt>ZZ_pX</tt> arithmetic has been thread boosted.
More code will be boosted later.
<p>
<h3>
Error Handling and Exceptions
</h3>
<p>
As of v8.0, NTL provides error handling through exceptions.
To enable exptions, you have to configure
NTL with <tt>NTL_EXCEPTIONS</tt> flag turned on.
By default, exceptions are not enabled, and NTL
reverts to its old error handling method:
abort with an error message.
<p>
If exceptions are enabled, then instead of aborting your
program, and appropriate exception is thrown.
More details ion the programming interface
of this feature are available <a href="tour-struct.html#except">here</a>.
<p>
If you enable exceptions, you must use a <tt>C++11</tt> compiler.
Specifically, your compiler will need support for lambdas
(which are used to conveniently implement the "scope guard" idiom),
and your compiler should implement the new default exception
specification semantics (namely, that destructors are "noexcept" by
default).
<p>
Implementation of this required a top-to-bottom scrub of NTL's code,
replacing a lot of old-fashioned code with more modern, RAII-oriented
code (RAII = "resource acquisition is initialization").
<p>
<center>
<a href="tour-tips.html"><img src="arrow1.gif" alt="[Previous]" align=bottom></a>
<a href="tour.html"><img src="arrow2.gif" alt="[Up]" align=bottom></a>
<a href="tour-gmp.html"> <img src="arrow3.gif" alt="[Next]" align=bottom></a>
</center>
</body>
</html>
|