GBIS Benchmark Header File: rinf1


   ==================================================================
   ===                                                            ===
   ===          GENESIS / PARKBENCH Parallel Benchmarks           ===
   ===                                                            ===
   ===                          RINF1                             ===
   ===                                                            ===
   ===                   R-infinity and N-half                    ===
   ===                                                            ===
   ===               Versions:  Std F77                           ===
   ===                                                            ===
   ===               Author     : Roger Hockney                   ===
   ===     Department of Electronics and Computer Science         ===
   ===               University of Southampton                    ===
   ===               Southampton SO9 5NH, U.K.                    ===
   ===     fax.:+44-703-593045   e-mail:rwh@uk.ac.soton.pac       ===
   ===                                  vsg@uk.ac.soton.ecs       ===
   ===                                                            ===
   ===                Last update: November 1993                  ===
   ===                                                            ===
   ==================================================================


1. Description
--------------

The performance of vector operations on a processor can be characterised  
by two parameters: the asymptotic performance, R-infinity (RINF), and 
the half-performance length, N-half (N1/2). R-infinity is the asymptotic 
performance obtained as the vector length tends to infinity. For finite 
vector lengths this maximum performance will not be realised due to the 
start-up time associated with vector operations. One useful method of 
parameterizing this start-up time is by the use of N-half which corresponds 
to the vector length which gives exactly half of the asymptotic performance. 
The use of vectors whose length is less than N-half will result in 
significant loss in performance.

The performance, R, for a vector of length N is given by:

       R  =  R-infinity / [ 1 + (N-half/N) ]                           (1)

The execution time, T, for a vector of length N is:

       T  =  (N + N-half) / R-infinity                                 (2)

In this benchmark N-half and R-infinity are derived from a least-squares
fit of time against vector length. The value of N-half will vary with
different vector operations. Seventeen different tests are incorporated
for different expressions which could potentialy be vectorized by a
compiler.  The examples are selected to be useful in the assessment of 
both architectures and compilers.

The values of R-infinity & N-half will depend on the operations being 
performed and also on the size of the cache memory. The summary of best 
values, which appears in the benchmark output give values for the parameter 
pair (RINF,N1/2) for vector lengths that fit into the cache memory and for 
those that exceed the cache memory.

2. Operating Instructions
-------------------------

This benchmark assumes by default that the maximum vector length is
100,000. Change the parameters NNMAX if this is not suitable. It is also 
advisable to check the number of iterations and to adjust this if necessary 
in accordance with the clock tick.

	NITER = 1000   if tick is 1.0E-5 sec
	NITER = 100000 if tick is 1.0E-3 sec

All parameters are to be found in the include file `rinf1.inc'.

To compile and link the benchmark type: `make' . If you set 'XDIR=.' in
the Makefile to put the executable in the current directory, you will get
an Fatal error: failed to target 'rinf1'. Ignore this, the executable
rinf1 is created and can be used. 

On some systems it may be necessary to allocate the appropriate resources 
before running the benchmark, eg. on the iPSC/860 to reserve a single 
processor, type:    getcube -t1. 

To run the benchmark type:     rinf1
Output from the benchmark is written to the file "rinf1.res". Copy this
to another file to save it.

If NITER=10000 RINF1 will take about 2 minutes to run on a typical
workstation. For accurate results with NITER=100,000 allow 15 to 20
minutes.

3. Interpretation of Results
----------------------------

Low-level benchmarks like RINF1 are trying to represent, for each kernel, 
some 50 data sets (the vector lengths) by two performance parameters
(R-infinity and N-half). The times to be measured are also very short, and 
if the repeat number NITER is not large enough for the timer being used, 
nonesense values for the time of execution will give nonesense values for 
the parameters. It really is a case of garbage-in gives garbage-out. 
Interference from other users can also give a large scatter to the input 
times and give unsatisfactory results. Compared with an application 
benchmark that only requires the measurement of a long time interval 
comparable to a second or minute, for perhaps only three input data sets, 
without any effort to fit the time to a model, the interpretation of data 
from low-level benchmarks is incomparably more difficult. Good results are 
not to expected from such benchmarks unless they are carried out with care 
and interpreting with good sense.

The summary table of results at the end of the output is an attempt
to pick automatically from the mass of measurements the best value of the
parameter pair (RINF,N1/2) for in-cache values (reported first) and out-of-
cache values (reported second). The summary line states the vector lengths
that have been used to obtain these values. If the summary look silly, and
perhaps in any case, one should also examine the detailed output, because
the automatic selection cannot be expected always to work satisfactorily. 
These are our recommendations for interpreting the detailed results. For
each of the 17 kernels (DO loops):

(1) Examine the TOTAL TIME column of the output and ensure that this is at
    least 100 times the measured tick of your timer. If not increase
    NITER by a factor 10. The run will now take longer, but the timing 
    results should show much less scatter.

(2) Examine the values in the time column TI, this is the time per vector 
    operation as a function of the vector length in the column headed NI.
    If TI is not a monotonically increasing function of NI that is roughly
    linear, then the (rinf,nhalf) parameters are not appropriate, and
    this benchmark will not make sense. Therefore plot TI against NI and
    see what it looks like. If there is a lot of scatter, then increase
    NITER and rerun. If it is reasonably smooth but not at all linear do
    something else. If it is approximately linear then the columns headed
    RINF and N1/2 should have stable values that do not change much as
    the vector length increases. This is what one is looking for, and such
    stable values are the ones to be reported. 

    The column headed PCT ERROR gives the root mean square deviation of the 
    line from the measured points with the parameters derived from the data 
    points, expressed as a percentage of the last value of TI. Values up to 
    a few percent indicate that the straight-line fit is good and that the 
    (RINF,N1/2) values are reliable. Values greater than, say, 20 percent 
    indicate that the approximation is poor and the parameters should be 
    used with caution, if at all. Bear in mind also that values of N1/2 are 
    added to N in equation (2), and divided by N in equation (1), thus 
    large values and variations in N1/2 may in fact be insignificant and 
    unimportant when the value of N itself is large. They do not 
    necessarily indicate an unsatisfactory result.

(3) It is important to understand the meaning of the values in the columns
    RINF and N1/2. The vector lengths are run through in the order printed
    and as a new time of execution is obtained for the next vector length,
    updated values of RINF and N1/2 are computed. That means that the values
    printed on one line are the best least squares fit of a straight line 
    to all data computed up to this time (i.e. all NI, TI pairs appearing
    on this and all previous lines, but not of course from any later lines). 
    
    The first line (SI=1) provides only one point and does not define a
    straight line, so RINF=N1/2=0 is printed, meaning not enough information
    to compute values. By SI=2 there are two points and a straight line is
    defined together with the values of RINF and N1/2. The fit is exact and
    the ERROR column records correctly zero. As each new point is computed,
    RINF and N1/2 is updated, with the best least-squares straight line.
    For small vector lengths, and perhaps inaccurate timer, values of
    RINF and N1/2 may wave around and even become negative. This does not
    matter provided the values stabilise for longer vector lengths. It
    probably means that NITER was taken too small. Apart from the effects
    of cache, discussed next, the best values of RINF and N1/2 should be
    the last ones recorded for the longest vector, because this straight
    line uses all the previous data values.

(4) The presence of a data cache complicates the picture considerably by
    increasing the execution times significantly once the vector length
    exceeds the cache or paging size, when references to off-chip memory
    are required. This shows up by driving the value of N1/2 negative,
    which is correct and only means that the best straight line intercepts
    the positive x-axis. In the sample results shown in StdRes directory,
    RINF and N1/2 have stabilised before this point, and these values
    are the in-cache measurement. 

    The automatic selection procedure tries to pick these values and prints 
    them in the summary table. This trip point where N1/2 goes negative
    is marked in the detailed output by 'PCT ERROR' being set to 222.2.
    The selected value is taken three measurements before this point.
    The least-squares fit is then reset, and a separate best straight line
    is obtained for longer vectors exceeding the cache size. This provides 
    a second pair of (RINF,N1/2) values for vectors longer than some 
    stated value in the summary table.  This value is 4 measurement points
    past the trip point, in order to avoid using points in the transition
    region. 

It must be obvious from the above that sensible results will only be 
obtained from RINF1 if the benchmark is run sensibly (using e.g. a correct 
value for NITER), and the results are interpretted with care and 
understanding. It is easy to misuse this benchmark and produce rubbish 
results. It is therefore easy to "rubbish" the benchmark if one wishes to
do so, however it delivers good understanding of the behaviour of the basic 
hardware (and the software through which it is used) when it is used 
properly.

4. Negative values of RINF and N1/2
-----------------------------------

It is often supposed that negative values for RINF and N1/2 are meaningless
and therefore bring the benchmark into disrepute. This shows a 
misunderstanding of the parameters: RINF and N1/2 should be thought of as 
two parameters that determine, respectively, the inverse slope and the 
negative intercept on the x-axis, of a straight line. They are used in 
equations (1) and (2) to determine the performance, R, or time, T, as a 
function of vector length, N. Whereas neither R, T nor N can by their 
very nature be negative, there is no reason why in certain circumstances 
RINF and N1/2 cannot be negative. Such negative values can appear for small 
values of N with an inaccurate timer, and should generally be ignored, 
provided later values stabilise. Negative values of N1/2 are quite usual 
and correct for out-of-cache measurements. Negative RINF would imply that 
larger problems execute in less time, and this would not be expected, but 
there may be such cases. In fact the benchmark traps negative RINF with
negative N1/2 as indicating poor input data, rejects such data and restarts 
the least squares fit. This action is signalled in the output by a value 
of PCT ERROR being 111.1. The only statement that we can say with certainty
is that R, T and N computed from equations (1) and (2) cannot be negative.

$Id: ReadMe,v 1.4 1994/05/27 15:32:57 igl Exp igl $
High Performance Computing Centre
Submitted by Mark Papiani,
last updated on 10 Jan 1995.