==================================================================
=== ===
=== GENESIS / PARKBENCH Parallel Benchmarks ===
=== ===
=== POLY1 ===
=== ===
=== R-infhat and F-half ===
=== In-Cache Memory Bottleneck ===
=== ===
=== Versions: Std F77 ===
=== ===
=== Author : Roger Hockney ===
=== Department of Electronics and Computer Science ===
=== University of Southampton ===
=== Southampton SO9 5NH, U.K. ===
=== fax.:+44-703-593045 e-mail:rwh@uk.ac.soton.pac ===
=== vsg@uk.ac.soton.ecs ===
=== ===
=== Last update: November 1993; Release: 1.0 ===
=== ===
==================================================================
1. Description
--------------
This benchmark tests severity of memory bottlenecks by varying the
amount of arithmetic per memory reference which is called the
computational intensity of the loop. The performance for long loop
(vector) lengths, RINF, is represented as :
RINF = RHAT/(1 + FHALF/F) (1)
where RHAT = peak Mflop/s rate of arithmetic pipeline
approached as F goes to infinity
and F = computational intensity
= ratio floating operations/memory references
FHALF = F required to obtain RINF=RHAT/2
The loop executed is polynomial evaluation by Horners rule, where the
computational intensity is equal to the order of the polynomial.
The order and F is increases from 1 to 10, and the results for RINF
for each value of F are fitted by least squares to equation (1), giving
the best value of the parameters RHAT (R-infinity-hat) and FHALF
(half-performance intensity) for this fit.
POLY1 chooses vector lengths that fit into the cache, and FHALF is a
measure of the ratio arithmetic performance (Mflop/s) to cache-memory
access rate (Mword/s).
For further details of the FHALF characterisation, Hockney and Jesshope,
Parallel Computers-2, IOP Publishing, Bristol and New York, Chapter-1.
2. Operating Instructions
-------------------------
To compile and link the benchmark type: `make' . On some systems it
may be necessary to allocate the appropriate resources before running the
benchmark, eg. on the iPSC/860 to reserve a single processor,
type: getcube -t1.
To run the benchmark type: poly1
Output from the benchmark is written to the file "poly1.res"
If the timing results are too inaccurate the parameter NITER in file
poly1.inc may be increased. This is the number of repetitions of the
kernel loop used to extend the length of time measured. NITER=1000
is a sensible starting value. NITER=10 may be used for testing execution
but is probably too small for accurate timing.
The order of executing of the kernel loop should be as specified in the
Fortran code (in SUBROUTINE DOALL). Nonesense results (e.g. negative FHALF)
may be produced if the compiler tampers with the loop ordering or does
software pipelining. The polynomial must be completely evaluated for one
value of the loop index-I (e.g. DO 310 loop) before the next value of I is
taken.
evaluated.
$Id: ReadMe,v 1.2 1994/05/25 16:54:25 igl Exp igl $
Submitted by Mark Papiani,
last updated on 10 Jan 1995.