==================================================================
=== ===
=== GENESIS / PARKBENCH Parallel Benchmarks ===
=== ===
=== RINF1 ===
=== ===
=== R-infinity and N-half ===
=== ===
=== Versions: Std F77 ===
=== ===
=== Author : Roger Hockney ===
=== Department of Electronics and Computer Science ===
=== University of Southampton ===
=== Southampton SO9 5NH, U.K. ===
=== fax.:+44-703-593045 e-mail:rwh@uk.ac.soton.pac ===
=== vsg@uk.ac.soton.ecs ===
=== ===
=== Last update: November 1993 ===
=== ===
==================================================================
1. Description
--------------
The performance of vector operations on a processor can be characterised
by two parameters: the asymptotic performance, R-infinity (RINF), and
the half-performance length, N-half (N1/2). R-infinity is the asymptotic
performance obtained as the vector length tends to infinity. For finite
vector lengths this maximum performance will not be realised due to the
start-up time associated with vector operations. One useful method of
parameterizing this start-up time is by the use of N-half which corresponds
to the vector length which gives exactly half of the asymptotic performance.
The use of vectors whose length is less than N-half will result in
significant loss in performance.
The performance, R, for a vector of length N is given by:
R = R-infinity / [ 1 + (N-half/N) ] (1)
The execution time, T, for a vector of length N is:
T = (N + N-half) / R-infinity (2)
In this benchmark N-half and R-infinity are derived from a least-squares
fit of time against vector length. The value of N-half will vary with
different vector operations. Seventeen different tests are incorporated
for different expressions which could potentialy be vectorized by a
compiler. The examples are selected to be useful in the assessment of
both architectures and compilers.
The values of R-infinity & N-half will depend on the operations being
performed and also on the size of the cache memory. The summary of best
values, which appears in the benchmark output give values for the parameter
pair (RINF,N1/2) for vector lengths that fit into the cache memory and for
those that exceed the cache memory.
2. Operating Instructions
-------------------------
This benchmark assumes by default that the maximum vector length is
100,000. Change the parameters NNMAX if this is not suitable. It is also
advisable to check the number of iterations and to adjust this if necessary
in accordance with the clock tick.
NITER = 1000 if tick is 1.0E-5 sec
NITER = 100000 if tick is 1.0E-3 sec
All parameters are to be found in the include file `rinf1.inc'.
To compile and link the benchmark type: `make' . If you set 'XDIR=.' in
the Makefile to put the executable in the current directory, you will get
an Fatal error: failed to target 'rinf1'. Ignore this, the executable
rinf1 is created and can be used.
On some systems it may be necessary to allocate the appropriate resources
before running the benchmark, eg. on the iPSC/860 to reserve a single
processor, type: getcube -t1.
To run the benchmark type: rinf1
Output from the benchmark is written to the file "rinf1.res". Copy this
to another file to save it.
If NITER=10000 RINF1 will take about 2 minutes to run on a typical
workstation. For accurate results with NITER=100,000 allow 15 to 20
minutes.
3. Interpretation of Results
----------------------------
Low-level benchmarks like RINF1 are trying to represent, for each kernel,
some 50 data sets (the vector lengths) by two performance parameters
(R-infinity and N-half). The times to be measured are also very short, and
if the repeat number NITER is not large enough for the timer being used,
nonesense values for the time of execution will give nonesense values for
the parameters. It really is a case of garbage-in gives garbage-out.
Interference from other users can also give a large scatter to the input
times and give unsatisfactory results. Compared with an application
benchmark that only requires the measurement of a long time interval
comparable to a second or minute, for perhaps only three input data sets,
without any effort to fit the time to a model, the interpretation of data
from low-level benchmarks is incomparably more difficult. Good results are
not to expected from such benchmarks unless they are carried out with care
and interpreting with good sense.
The summary table of results at the end of the output is an attempt
to pick automatically from the mass of measurements the best value of the
parameter pair (RINF,N1/2) for in-cache values (reported first) and out-of-
cache values (reported second). The summary line states the vector lengths
that have been used to obtain these values. If the summary look silly, and
perhaps in any case, one should also examine the detailed output, because
the automatic selection cannot be expected always to work satisfactorily.
These are our recommendations for interpreting the detailed results. For
each of the 17 kernels (DO loops):
(1) Examine the TOTAL TIME column of the output and ensure that this is at
least 100 times the measured tick of your timer. If not increase
NITER by a factor 10. The run will now take longer, but the timing
results should show much less scatter.
(2) Examine the values in the time column TI, this is the time per vector
operation as a function of the vector length in the column headed NI.
If TI is not a monotonically increasing function of NI that is roughly
linear, then the (rinf,nhalf) parameters are not appropriate, and
this benchmark will not make sense. Therefore plot TI against NI and
see what it looks like. If there is a lot of scatter, then increase
NITER and rerun. If it is reasonably smooth but not at all linear do
something else. If it is approximately linear then the columns headed
RINF and N1/2 should have stable values that do not change much as
the vector length increases. This is what one is looking for, and such
stable values are the ones to be reported.
The column headed PCT ERROR gives the root mean square deviation of the
line from the measured points with the parameters derived from the data
points, expressed as a percentage of the last value of TI. Values up to
a few percent indicate that the straight-line fit is good and that the
(RINF,N1/2) values are reliable. Values greater than, say, 20 percent
indicate that the approximation is poor and the parameters should be
used with caution, if at all. Bear in mind also that values of N1/2 are
added to N in equation (2), and divided by N in equation (1), thus
large values and variations in N1/2 may in fact be insignificant and
unimportant when the value of N itself is large. They do not
necessarily indicate an unsatisfactory result.
(3) It is important to understand the meaning of the values in the columns
RINF and N1/2. The vector lengths are run through in the order printed
and as a new time of execution is obtained for the next vector length,
updated values of RINF and N1/2 are computed. That means that the values
printed on one line are the best least squares fit of a straight line
to all data computed up to this time (i.e. all NI, TI pairs appearing
on this and all previous lines, but not of course from any later lines).
The first line (SI=1) provides only one point and does not define a
straight line, so RINF=N1/2=0 is printed, meaning not enough information
to compute values. By SI=2 there are two points and a straight line is
defined together with the values of RINF and N1/2. The fit is exact and
the ERROR column records correctly zero. As each new point is computed,
RINF and N1/2 is updated, with the best least-squares straight line.
For small vector lengths, and perhaps inaccurate timer, values of
RINF and N1/2 may wave around and even become negative. This does not
matter provided the values stabilise for longer vector lengths. It
probably means that NITER was taken too small. Apart from the effects
of cache, discussed next, the best values of RINF and N1/2 should be
the last ones recorded for the longest vector, because this straight
line uses all the previous data values.
(4) The presence of a data cache complicates the picture considerably by
increasing the execution times significantly once the vector length
exceeds the cache or paging size, when references to off-chip memory
are required. This shows up by driving the value of N1/2 negative,
which is correct and only means that the best straight line intercepts
the positive x-axis. In the sample results shown in StdRes directory,
RINF and N1/2 have stabilised before this point, and these values
are the in-cache measurement.
The automatic selection procedure tries to pick these values and prints
them in the summary table. This trip point where N1/2 goes negative
is marked in the detailed output by 'PCT ERROR' being set to 222.2.
The selected value is taken three measurements before this point.
The least-squares fit is then reset, and a separate best straight line
is obtained for longer vectors exceeding the cache size. This provides
a second pair of (RINF,N1/2) values for vectors longer than some
stated value in the summary table. This value is 4 measurement points
past the trip point, in order to avoid using points in the transition
region.
It must be obvious from the above that sensible results will only be
obtained from RINF1 if the benchmark is run sensibly (using e.g. a correct
value for NITER), and the results are interpretted with care and
understanding. It is easy to misuse this benchmark and produce rubbish
results. It is therefore easy to "rubbish" the benchmark if one wishes to
do so, however it delivers good understanding of the behaviour of the basic
hardware (and the software through which it is used) when it is used
properly.
4. Negative values of RINF and N1/2
-----------------------------------
It is often supposed that negative values for RINF and N1/2 are meaningless
and therefore bring the benchmark into disrepute. This shows a
misunderstanding of the parameters: RINF and N1/2 should be thought of as
two parameters that determine, respectively, the inverse slope and the
negative intercept on the x-axis, of a straight line. They are used in
equations (1) and (2) to determine the performance, R, or time, T, as a
function of vector length, N. Whereas neither R, T nor N can by their
very nature be negative, there is no reason why in certain circumstances
RINF and N1/2 cannot be negative. Such negative values can appear for small
values of N with an inaccurate timer, and should generally be ignored,
provided later values stabilise. Negative values of N1/2 are quite usual
and correct for out-of-cache measurements. Negative RINF would imply that
larger problems execute in less time, and this would not be expected, but
there may be such cases. In fact the benchmark traps negative RINF with
negative N1/2 as indicating poor input data, rejects such data and restarts
the least squares fit. This action is signalled in the output by a value
of PCT ERROR being 111.1. The only statement that we can say with certainty
is that R, T and N computed from equations (1) and (2) cannot be negative.
$Id: ReadMe,v 1.4 1994/05/27 15:32:57 igl Exp igl $
Submitted by Mark Papiani,
last updated on 10 Jan 1995.