[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
ATLASv3.1.3D: Like Mozilla, but less robust and longer overdue
Guys,
Alright, it's been trivially tested on two platforms, which found errors, so
we must be golden ;->
I have finally come through with developer release 3.1.3.  I figure it
has all the stability of Windows98 running a screensaver, and the 
new sections describing how to handle cleanup contribution make your
average unix manpage look like complete newbies-oriented documentation,
but I figure these deficits are still preferable to developers not having
access to an ATLAS that will actually use their gemm kernels.  It can be found,
as before, at:
   www.cs.utk.edu/~rwhaley/ATLAS/OS
Here's some of what is new:
(1) By user request, GEMM kernel routines now have access to the macros
    MB2, MB3, ... MB8; NB2, NB3, ... NB8; KB2, KB3, ... KB8
    corresponding to multiples of blocking factors, usually for indexing
    purposes.  In other words, KB4 is KB*4.
(2) User-contributed kernels should actually be used when they beat the
    generated code
(3) User contribution of cleanup now supported
(4) Ability to pass user-selected compiler and flags for GEMM kernels
(5) Inclusion of:
    * Camm's SSE-enabled [S,C]GEMM kernel
    * Viet Nguyen and Peter Strazdin's [D,Z]GEMM UltraSparc kernels
Note that a whole lot remains undone, and if you've sent something in
that is not there, do not worry, I have a list a mile long of all
the things still to do before the release.  Contributer-related things I
can rattle off the top of my head include:
 (1) Camm's improved SSE-enabled Level 2, particularly multiple precision
 (2) Doug's SSE-enabled full SGEMM
 (3) Full gemm UltraSparc 
 (4) Peter's 3DNow! kernels
I include below a few gotchas I'm pretty sure are in this release.  There
are even some known errors, but I thought it was worthwhile to get this
thing out regardless.
Cheers,
Clint
=========================== Level 1 & 2 Errors ================================
For some reason I haven't bothered to track down, I seem to have introduced
an error in these C interface routines:
   cblas_[s,d]rot  (Level 1)
   cblas_[c,z]tpmv (Level 2)
======================== Error in user search =================================
The user search routine has hung on me a few times, in a fashion I have
not found easy to repeat.  It appears to happen in the cleanup search,
when you don't take arch defaults, and some user cases won't compile.  It
is probably an error in my use of data structures in building the table of
cleanup candidates: that worked first compile, something that never happens
in real life.
======================== Untested cleanup stuff ===============================
In particular, when ATLAS generates a cleanup wrapper around user-contributed
cleanup, there are two possibilities: 
(1) if the number of necessary ifs are small, it uses an if-based decision
    tree, for instance:
      if (M == 8) call user contributed code for M == 8 case
      else if (M % 4 == 0) call user-contributed code mod 4
      else call generated code
(2) If the number of necessary ifs are large, ATLAS instead generates
    a static array of function pointers, and uses that to invoke the
    user-supplied cleanup
Option 2 is in the code, but has never been tested (my present example
cases do not invoke it; I will test ASAP), so if you make it happen, it's
not even certain it will compile.  If this breaks in a spectacular way,
it's easy to disable option 2 . . .
========================== Pthreads support ===================================
If you have an SMP box, config will probably ask if you want to build the
SMP version.  Just say no.  This is in there for Antoine's threading work,
but I haven't got that incorperated into the package yet, much less tested
or working.
========================== Treacherous index files ============================
If you do an install, and then modify the CASES/<pre>cases.dsc file, make
sure to add your new kernel to the *end* of the file, since the previous
install will have associated case 0 with the first entry of the original file,
and if you put your kernel first, you are now case 0, but the install thinks
all those previously obtained case-0 timings apply to you . . .