Blis

Latest version: v1.2.0

Safety actively analyzes 706267 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 3 of 7

0.4.1

Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Aug 30 15:13:59 2018 -0500

Version file update (0.4.1)

commit 08dd67c4b21244851f8416bd59159bea7a9c5b3d
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Aug 30 15:12:13 2018 -0500

ReleaseNotes.md update in advance of next version.

commit 4fa4cb0734e7de6505b5d6f1aeef3a5d5c89dcbb
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Aug 29 18:06:41 2018 -0500

Trivial comment header updates.

Details:
- Removed four trailing spaces after "BLIS" that occurs in most files'
commented-out license headers.
- Added UT copyright lines to some files. (These files previously had
only AMD copyright lines but were contributed to by both UT and AMD.)
- In some files' copyright lines, expanded 'The University of Texas' to
'The University of Texas at Austin'.
- Fixed various typos/misspellings in some license headers.

commit b051ffb815baf6c3ece2b5118b679fd9219d5780
Merge: 6f33d9de aaa549f4
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Aug 29 17:06:48 2018 -0500

Merge branch 'dev'

commit 6f33d9de21fbc2f579846b9104fb9d513753f79c
Author: Mathieu Poumeyrol <kaliusers.noreply.github.com>
Date: Wed Aug 29 23:48:22 2018 +0200

fix compilation of armv7a kernels (242)

commit 8199e339aefdd27019c7f3d8c99818d375d5400b
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Aug 27 07:00:12 2018 -0500

Added testsuite threading to input.general.fast.

Details:
- Added lines associated with the testsuite's new threading option to
input.general.fast. This change was intended for the previous commit
(10d0735).

commit 10d07357afbb2d468837aa97369ef9a6d0610817
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sun Aug 26 20:34:30 2018 -0500

Better thread safety; added threading to testsuite.

Details:
- Replaced critical sections that were conditional upon multithreading
being enabled (via pthreads or OpenMP) with unconditional use of
pthreads mutexes. (Why pthreads? Because BLIS already requires it
for its initialization mechanism: pthread_once().) This was done in
bli_error.c, bli_gks.c, bli_l3_ind.c. Also, replaced usage of BLIS's
mtx_t object and bli_mutex_*() API with pthread mutexes in
bli_thread.c. The previous status quo could result in a race condition
if the application called BLIS from more than one thread. The new
pthread-based code should be completely agnostic to the application's
threading configuration. Thanks to AMD for bringing to our attention
the need for a thread-safety review.
- Added an option to the testsuite to simulate application-level
multithreading. Specifically, each thread maintains a counter that is
incremented after each experiment. The thread only executes the
experiment if: counter % n_threads == thread_id. In other words, the
threads simply take turns executing each problem experiment. Also,
POSIX guarantees that fprintf() will not intermingle output, so
output was switched to fprintf() instead of libblis_test_fprintf().
- Changed membrk_t objects to use pthread_mutex_t intead of mtx_t and
replaced use of bli_mutex_init()/_finalize() in bli_membrk.c with
wrappers to pthread_mutex_init()/_destroy().
- Changed the implementation of bli_l3_ind_oper_enable_only() to fix
a race condition; specifically, two threads calling the function with
the same parameters could lead to a non-deterministic outcome.
- Added include <pthread.h> to bli_cpuid.c and moved the same in
bli_arch.c.
- Added 'const' to declaration of OPT_MARKER in bli_getopt.c.
- Added include <pthread.h> to bli_system.h.
- Added add-copyright.py script to automate adding new copyright lines
to (and updating existing lines of) source files.

commit aaa549f4d1e63929fe2bea023ce849253cfbbb42
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sun Aug 26 20:13:51 2018 -0500

Minor update to configure --help (--sharedir option).

Details:
- Fixed/tweaked description for --sharedir=SHAREDIR option.

commit 573b8ac373f821a65cc8afd51cdbe03b8ec01081
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sun Aug 26 13:51:32 2018 -0500

Fixed copy-paste typo in previous commit.

Details:
- Fixed a typo in travis/do_testsuite.sh introduced in 62ea1d3.

commit 62ea1d33d3bc1e890420a1e828b9d0e87e87533b
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sun Aug 26 13:35:53 2018 -0500

Fixed broken out-of-tree builds.

Details:
- Fixed stale filepaths to check-blastest.sh and check-blistest.sh in
travis/do_testsuite.sh and travis/do_sde.sh.
- Create a symbolic link to the 'config' directory so that the top-level
Makefile can find the configs' make_defs.mk files during out-of-tree
builds.
- Added additional case handling to out-of-tree scenario to handle
situations where files 'Makefile', 'common.mk', or 'config' exist but
are not symbolic links. In such cases, configure warns the user and
exits.
- Homogenized various error messages throughout configure.
- Belated thanks to Victor Eijkhout for requesting the feature added
in 0f491e9 whereby lesser Makefiles can compile and link against
an existing installation of BLIS.

commit 0f491e994a7e14d4dfce26e6a51dba2bccad29a3
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Aug 25 20:12:36 2018 -0500

Allow lesser Makefiles to reference installed BLIS.

Details:
- Updated the build system so that "lesser" Makefiles, such as those in
belonging to example code or the testsuite, may be run even if the
directory is orphaned from the original build tree. This allows a
user to configure, compile, and install BLIS, delete the build tree
(that is, the source distribution, or the build directory for out-
of-tree builds) and then compile example or testsuite code and link
against the installed copy of BLIS (provided the example or testsuite
directory was preserved or obtained from another source). The only
requirement is that make be invoked while setting the
BLIS_INSTALL_PATH variable to the same installation prefix used when
BLIS was configured. The easiest syntax is:

make BLIS_INSTALL_PATH=/install/prefix

though it's also permissible to set BLIS_INSTALL_PATH as an
environment variable prior to running 'make'.
- Updated all lesser Makefiles to implement the new aforementioned build
behavior.
- Relocated check-blastest.sh and check-blistest.sh from build to
blastest and testsuite, respectively, so that if those directories are
copied elsewhere the user can still run 'make check' locally.
- Updated docs/Testsuite.md with language that mentions this new option
of building/linking against an installed copy of BLIS.

commit 36ff92ce0d3b428b15b6cddc6f5944afe22e43ec
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Aug 24 18:26:09 2018 -0500

Missing C++ compiler no longer fatal to configure.

Details:
- Changed configure so that the absence of any C++ compiler from the
pre-defined search list does not result in an exit. Instead, in this
situation, the found_cxx variable is assigned 'c++notfound' and the
error message is changed to remind the user that C++ will not be
available in the sandbox. Thanks to Devangi Parikh for reporting this
issue.
- Also tweaked the message when a C++ compiler *is* found to remind any
would-be confused user that BLIS will only use C++ if it is needed by
code in the sandbox.

commit 658f0a129bdc565b072696b6ebddce501132091c
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Aug 24 17:49:37 2018 -0500

Fixed obscure integer size bug in va_arg() usage.

Details:
- Fixed a bug in the way that the variadic bli_cntx_set_l3_nat_ukrs()
function was defined. This function is meant to take a microkernel id,
microkernel datatype, microkernel address, and microkernel preference
as arguments, and is typically called within the bli_cntx_init_*()
function defined within a sub-configuration for initializing an
appropriate context. The problem is with the final argument: the
microkernel preference. These preferences are actually boolean values,
0 or 1 (encoded as FALSE or TRUE). Since the variadic function does
not give the compiler any type information for any variadic arguments,
they are "promoted" in the course of internal (macroized) processing
according to default argument promotion rules. Thus, integer literals
such as 0 and 1 become int and floating-point literals (such as 0.0 or
1.0) become double. Previous to this commit, we indicated to va_arg()
that the ukernel preference was a 'bool_t', which is a typedef of
int64_t on 64-bit systems. On systems where int is defined as 64 bits,
no problems manifest since int is the same size as the type we passed
in to va_arg(), but on systems where int is 32 bits, the ukernel
preference could be misinterpreted as a garbage value. (This was
observed on a modern armv8 system.) The fix was to interpret the
bool_t value as int and then immediately typecast it to and store it
as a bool_t. Special thanks to Devangi Parikh for helping track down
this issue, including deciphering the use of va_arg() and its
byzantine treatment of types.
- Added explicit typecasts for all invocations of va_arg() in
bli_cntx.c.

commit e71dc389120b032e42091e4d1a928515ed6f7275
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Aug 24 15:56:04 2018 -0500

Fixed a very minor memory leak in gks.

Details:
- Fixed a memory leak in the global kernel structure that resulted in 56
bytes per configured architecture (of which only 18 are presently
supported by BLIS). The leak would only manifest if BLIS was
initialized and then finalized before the application terminated.
Thanks to Devangi Parikh for helping track down this leak.

commit a7e3a5f9753468c8e665e6c5c3b38d22b7c92500
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Aug 24 14:51:11 2018 -0500

Fixed uncallable bli_finalize().

Details:
- Previously, bli_finalize_once()--which, like bli_init_once(), was
implemented in terms of pthread_once()--was using the same
pthread_once_t control object being used by bli_init(), thus
guaranteeing that it would never be called as long as BLIS had already
been initialized. This could manifest as a rather large memory leak to
any application that attempted to finalize BLIS midway through its
execution (since BLIS reserves several megabytes of storage for
packing buffers per thread used). The fix entailed giving each
function its own pthread_once_t object. Thanks to Devangi Parikh for
helping track down this very quiet bug.

commit a79c21c7c17fb4854fd24c73b81ec5543f74082d
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Aug 23 14:40:46 2018 -0500

Fixed cleanmk target post-1b0f8d6.

Details:
- Changed the cleanmk target to delete makefile fragments from their new
home in obj/$(CONFIG_NAME). The old definition worked only because of
a typo (REFERKN_PATH instead of REFKERN_PATH), and only in the
non-verbose (V != 1) case.

commit ffb57242f3eb1175c991fe1b492595fdaa175c27
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Aug 22 18:22:41 2018 -0500

Cosmetic output changes to configure.

Details:
- Disable sandbox-related obj directory creation, directory mirroring,
and makefile fragment generation when a sandbox is not enabled.
- Prevent various duplicate actions by configure (such as those
mentioned above for sandboxes above).

commit ac17454aae9ad430f05aa7c156919c6c695c300c
Merge: a77bec76 7afd095a
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Aug 22 15:34:53 2018 -0500

Merge branch 'master' into dev

commit a77bec766a01e42f13f8cacbec8c4cbde8ecefef
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Aug 22 15:31:29 2018 -0500

Whitespace changes, minor renames in build system.

Details:
- Minor whitespace cleanup, mostly in the form of spaces -> tabs.
- Shortened certain variables' _FRAGMENT_ infixes to _FRAG_ in
common.mk.

commit 1b0f8d60d1132b56485cc202ebf1246898d3a2a4
Author: Devin Matthews <damatthewssmu.edu>
Date: Wed Aug 22 13:19:29 2018 -0700

Generate makefile fragments in build tree (240)

* Make src dir read-only in out-of-tree build test.

* Generate makefile fragments in the build tree.

commit 7afd095af33690e0175903852b354c9fe46993f6
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Aug 22 14:58:24 2018 -0500

Removed skx from code snippet in previous commit.

Details:
- The docs/ConfigurationHowTo.md document was written with examples that
did not yet contain the skx sub-configuration, but the previous commit
included bli_arch.c code copied and pasted from a recent commit that
does support skx. To keep things consistent, I've removed skx from the
recently-added ConfigurationHowTo.md code snippet.

commit 48211a980d78673133076e8eced1007b1980f5e6
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Aug 22 14:55:02 2018 -0500

Update to docs/ConfigurationHowTo.md.

Details:
- Added missing language directing the reader to modify the config_name
string array in bli_arch.c when adding a new sub-configuration. Thanks
to Devangi Parikh for reporting this missing section.

commit 65c9096c6e21f3dc2947fa12be9ea3034f8662dc
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Aug 17 11:44:12 2018 -0500

Fixed broken -p option to configure.

Details:
- Fixed some stale code that was preventing the -p option to configure
from working as expected (though the --prefix option was unaffected).
This bug was was most likely introduced in 7e5648c (May 7 2018).
Thanks to Dave Love for reporting this issue.

commit e358d5e497c77b305af462f44266370a596445e2
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Aug 16 12:18:45 2018 -0500

README.md update (Funding section).

commit a61dd5e7bcf23f7237d407a5e06dd44e1bec9ad0
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Aug 14 17:08:03 2018 -0500

Changed 'test' target to be more like 'check'.

Details:
- Redefined the 'test' make target in the top-level Makefile so that the
final result ("everything passed" or at "least one failure") is echoed
to stdout. Note that 'check' is unchanged, and thus is now effectively
a fast version of 'test'.
- Updated docs/BuildSystem.md to reflect the above change.

commit ce5c3a198a7ae1ca676c27da4541d51ed19d16e1
Merge: 4f6745d6 0bbe69d5
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Aug 14 16:52:19 2018 -0500

Merge branch 'master' of github.com:flame/blis

commit 4f6745d68a2c66511695eff0beb00a82ffc6bbbe
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Aug 14 16:50:47 2018 -0500

Fixed link error when building only shared library.

Details:
- Fixed a linker error that occurred when attempting to compile and link
the testsuite and/or BLAS test drivers after having configured BLIS to
only generate a shared library (no static library). The chosen
solution involved
(1) adding the local library path, $(BASE_LIB_PATH), to the search
paths for the shared library via the link option
-Wl,-rpath,$(BASE_LIB_PATH).
(2) adding a local symlink to $(BASE_LIB_PATH) that uses the .so major
version number so that ld would find the shared library at
execution time.
Thanks to Sajid Ali for reporting this issue, to Devin Matthews for
pointing out the need for the -rpath option, and to Devangi Parikh for
helping Sajid isolate the problem.
- Added include <ctype.h> to bli_system.h to avoid a compiler warning
resulting from using toupper() from bli_string.c without a prototype.
Thanks again to Sajid Ali, whose build log revealed this compiler
warning.
- Added '*.so.*' to .gitignore.
- CREDITS file update.

commit 0bbe69d5ed260849297d8f2d35b7668d167482ed
Author: Devangi N. Parikh <dnpcs.utexas.edu>
Date: Tue Aug 14 14:49:58 2018 -0500

Updated plotting scripts in test/studies.

Details:
- Fixed indexing on plots to correspond to the removal of dtime in
the test drivers.

commit e93e0e149e087e08eca2885f1a748a4e88ffe55d
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Aug 7 15:54:30 2018 -0500

Removed redefinition of axpyv, scal2v func types.

Details:
- Removed a stray/accidental redefinition of axpyv and scal2v function
types in frame/1d/bli_l1d_ft.h (probably a copy/paste leftover during
development).

commit 1deb33bd16349aaa643694d1bd685ff8a9a5f476
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Aug 7 15:02:50 2018 -0500

Updated penryn kernels to use new _ker_ft type names.

Details:
- Updated older _ft kernel type suffixes used within penryn level-1v
and -1f kernels to use the newer _ker_ft suffix that was introduced
in 0175483. (Thank you Travis CI.)

commit 9cb0b023ca91abdc056d726cdc070062e4954611
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Aug 7 14:21:07 2018 -0500

INSTALL file update.

commit 017548314f3f78f66fbe3264509ac5302bd8d62b
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Aug 7 14:13:25 2018 -0500

Replaced function chooser macros w/ func ptr arrays.

Details:
- Previously, most object API functions (_oapi.c) used a function
chooser macro that would expand out to an if-elseif-elseif-else
conditional that used a num_t datatype to call the appropriate
type-specific API (_tapi.c). This always felt a little hackish, and
would get in the way somewhat of addig support for new num_t datatypes
in the future. So, I've replaced that functionality with code that
queries a function pointer that is then typecast appropriately. This
model of function calling was already pervasive for kernels queried
from the cntx_t structure. It was also already in use in various other
functions, such as macrokernels, and this commit simply extends that
pattern.
- The above change required many new files, mostly header files, that
define the function types (mostly _ft.h) for the queriable functions
as well as some source files to define the function pointer arrays and
their corresponding query functions (_fpa.c). Various other function
types, mostly for kernel function types, were renamed to reduce the
potential for confusion with the function types for expert and basic
(non-expert) typed API functions.
- Removed definitions for all of the "bli_call_ft_*()" function chooser
macros from bli_misc_macro_defs.h.

commit addce089664561f9f63efa6f107e58fc48d29871
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Aug 6 13:18:20 2018 -0500

Format spec and other updates in test, test/3m4m.

Details:
- Removed the dtime (delta time, or wallclock time) column from the
matlab output of all test drivers in test, test/3m4m, test/studies.
This value was rarely (if ever) really needed and usually only served
to take up screen space.
- Updated format specifier in test/studies/skx to use %7.2f instead of
%6.3f.
- For the test drivers in 'test' directory, added an initial line of
output that sets last entry of matlab matrix to zero in order to
induce a pre-allocation of the entire array of performance results.

commit 94d5ef42c833a4d43e50a80d46dddbd7a56d2db6
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Aug 4 15:57:17 2018 -0500

Adjusted gflops format spec in testsuite, test/3m4m.

Details:
- Changed the format specifier for the gflops column in the testsuite
output from %7.3f to %7.2f. This was done mainly to keep the output
aligned properly when the expected perfomance exceeded 1000 gflops.
Also, two decimal places still conveys plenty of precision for all
practical applications, including just eyeballing performance deltas
between two executions (let alone two implementations).
- Changed the format specifier for gflops in the test/3m4m drivers
from %6.3f to %7.2f (for the same reasons listed above).

commit c7ff06bae92b9b6c6656f2030d13486b95417821
Merge: 6074082c ebe998d0
Author: Devangi N. Parikh <dnpcs.utexas.edu>
Date: Wed Aug 1 14:20:41 2018 -0500

Merge branch 'master' of https://github.com/flame/blis

commit 6074082cd359dd775ef72478f8f3a281c5a6a6f9
Author: Devangi N. Parikh <dnpcs.utexas.edu>
Date: Wed Aug 1 13:30:51 2018 -0500

Fixed bug in bli_cntx_set_packm_ker_dt() implementation.

Details:
- Fixed bug in static function bli_cntx_set_[packm/unpackm]_ker_dt(), which
were incorrectly calling bli_cntx_get_[packm/unpackm]_ker_dt to get the
corresponding func_t.

commit ebe998d06cc56a9a9d66990b6ebf683d6fd0efdf
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Aug 1 13:24:00 2018 -0500

Fixed typos in BuildSystem.md from previuos commit.

commit e72a344e94c5ae253f69b60f41d92ca89a5d1d1c
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Aug 1 13:00:38 2018 -0500

Added table of 'make' targets to BuildSystem.md.

Details:
- Added a new section to BuildSystem.md that describes the most useful
make targets defined in the top-level Makefile.

commit 4f60d0288e00586dc921ff57db851f1266ff8e70
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Jul 30 19:22:57 2018 -0500

README.md, comment updates.

Details:
- Added links, and sandbox language to README.md.
- Adjusted some comments in high-level level-3 object functions to make
clear what bli_thread_init_rntm() does.

commit 455d3f49e5c8362395be14c79e6adb5123e29623
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sun Jul 29 18:31:29 2018 -0500

Edits to object/typed API, multithreading docs.

commit 922a1c05e06f52c97fb369870dce07233e61c4c9
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Jul 28 20:15:55 2018 -0500

More tweaks to README.md.

commit a7a0cf2b5d9f1dea5061c0f20eeaf371dfd4ea12
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Jul 28 16:59:31 2018 -0500

More edits to docs/Multithreading.md.

commit be21d0cf68c330fd0d2048465a43ddc59d0b9d6c
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Jul 28 16:46:51 2018 -0500

Fixed typos in docs/Multithreading.md.

commit eac07c7b4f7a41c68d63f1e67141b2b58009609e
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Jul 28 16:45:28 2018 -0500

Edits to docs/Multithreading.md.

commit 5438375a032273b46ae626fee909ffc05f48ab72
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Jul 28 16:34:21 2018 -0500

Fixed link in README.md.

commit 1f1a237d3f0b24d71ce2d7ee52d8a84f8e6a29ad
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Jul 28 16:33:28 2018 -0500

Fixed links in BLISTypedAPI.md.

commit 89c8806e3aa49310f36c0314c5f6956c83a627a1
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Jul 28 16:30:56 2018 -0500

Minor doc fixes to previous commit.

commit b8c7574f84873b9c408f70c29c41ce464df57c2d
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Jul 28 16:27:09 2018 -0500

README.md, typed/object API updates.

Details:
- Updated the typed and object APIs to include language on the rntm_t
parameters in the expert interfaces.
- Updated README to include link to object API.

commit 29c34c4adb02d91fb34d1ccc0e821d6cfb7ce5c5
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Jul 27 16:26:19 2018 -0500

CREDITS file update.

commit 55a04edf52ac4f16c51b738bc884684adc1f1777
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Jul 27 16:10:46 2018 -0500

CHANGELOG update (0.4.0)

0.4.0

Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Jul 27 16:10:43 2018 -0500

Version file update (0.4.0)

commit b86cf13793b07f35c027a56c9faec8f4b6279d3e
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Jul 27 16:08:21 2018 -0500

Release Notes update in advance of next version.

commit a8b4084a0e04e47ac02ceae93a2018f5363e1205
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Jul 27 16:07:26 2018 -0500

CREDITS file update.

commit 8e10cac5f388ac961c3d77b0a465214e7c9dc91a
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Jul 27 14:45:35 2018 -0500

Updates to CREDITS, RELEASING, config/README.md.

Details:
- Added individuals' github handles to CREDITS file.
- Updated RELEASING, config/README.md files.

commit 401b69c8f26a86726ac5e1fb4f9fc2d2098ef204
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Jul 25 17:55:13 2018 -0500

More indentation in docs/ConfigurationHowTo.md.

commit 1c6a1b921ef96999bb449d657cca6d9a556f7245
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Jul 25 17:14:58 2018 -0500

Trying new indentation in ConfigurationHowTo.md.

Details:
- Modified a few sections to take advantage of a feature of markdown
that allows a bullet or enumeration to have multiple paragraphs. This
is a trial run to make sure the indentation looks good when rendered
in a web browser.

commit 71f978719527fcf17617cb234e48bf349a76c12d
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Jul 25 15:55:36 2018 -0500

Whitespace changes to macrokernels' func ptr defs.

commit 87d57c31c2bfcf4609dfe31ce915e9345150e613
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Jul 25 14:20:18 2018 -0500

Various minor updates to typed, object API docs.

commit fb6e16268aaafbab2fd78d47cbf821e2152261fd
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Jul 25 14:17:28 2018 -0500

Consolidated prototypes in bli_l1v_tapi.h.

Details:
- Consolidated typed API function prototypes in bli_l1v_tapi.h by
leveraging identical function signatures between operations.
- Removed 'restrict' keyword since it is not actually present in the
function definitions.

commit af60d738f21340ccb0903e6c87dbf6af4fc44fc0
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Jul 24 15:35:52 2018 -0500

Finished object creation part of BLISObjectAPI.md.

Details:
- Filled in remaining section on object creation function reference
of BLISObjectAPI.md. All object management functions demonstrated as
part of the example code in examples/oapi are now documented, as well
as some other functions that are not shown in the example code.
- Updated variuos links (mostly in function index) to correctly point to
the object API reference instead of the typed API reference.
- Added documentation to getijm, setijm.

commit 8217a6a3b68382c62f016c658d337e6086112fef
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Jul 24 13:13:10 2018 -0500

Moved sandbox README.md to docs/Sandboxes.md.

Details:
- Relocated sandbox/ref99/README.md to docs/Sandboxes.md and made minor
edits to the document.

commit b7db29332394324ffd1a73c3847a75e9a5b38c8d
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Jul 19 11:14:30 2018 -0500

Explicitly typecast return vals in static funcs.

Details:
- Added explicit typecasting to various functions (mostly static
functions), primarily those in bli_param_macro_defs.h,
bli_obj_macro_defs.h, bli_cntx.h, bli_cntl.h, and a few other header
files.
- This change was prompted by feedback from Jacob Gorm Hansen, who
reported that including "blis.h" from his application caused a
gcc to output error messages (relating to types being returned
mismatching the declared return types) when used via the C++ compiler
front-end. This is the first pass of fixes, and we may need to
iterate with additional follow-up commits (233).

commit fa08e5ead95f9d757af6ab5b095a8bf131e3874d
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Jul 17 19:02:15 2018 -0500

Fixed minor issues in ecbebe7 with mt disabled.

Details:
- Fixed an unused variable warning in frame/base/bli_rntm.c when
multithreading is disabled.
- Fixed a missing variable declaration in bli_thread_init_rntm_from_env()
when multithreading is disabled.

commit ecbebe7c2e43950dfa369f71c2b83cabe348a046
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Jul 17 18:37:32 2018 -0500

Defined rntm_t to relocate cntx_t.thrloop (235).

Details:
- Defined a new struct datatype, rntm_t (runtime), to house the thrloop
field of the cntx_t (context). The thrloop array holds the number of
ways of parallelism (thread "splits") to extract per level-3
algorithmic loop until those values can be used to create a
corresponding node in the thread control tree (thrinfo_t structure),
which (for any given level-3 invocation) usually happens by the time
the macrokernel is called for the first time.
- Relocating the thrloop from the cntx_t remedies a thread-safety issue
when invoking level-3 operations from two or more application threads.
The race condition existed because the cntx_t, a pointer to which is
usually queried from the global kernel structure (gks), is supposed to
be a read-only. However, the previous code would write to the cntx_t's
thrloop field *after* it had been queried, thus violating its read-only
status. In practice, this would not cause a problem when a sequential
application made a multithreaded call to BLIS, nor when two or more
application threads used the same parallelization scheme when calling
BLIS, because in either case all application theads would be using
the same ways of parallelism for each loop. The true effects of the
race condition were limited to situations where two or more application
theads used *different* parallelization schemes for any given level-3
call.
- In remedying the above race condition, the application or calling
library can now specify the parallelization scheme on a per-call basis.
All that is required is that the thread encode its request for
parallelism into the rntm_t struct prior to passing the address of the
rntm_t to one of the expert interfaces of either the typed or object
APIs. This allows, for example, one application thread to extract 4-way
parallelism from a call to gemm while another application thread
requests 2-way parallelism. Or, two threads could each request 4-way
parallelism, but from different loops.
- A rntm_t* parameter has been added to the function signatures of most
of the level-3 implementation stack (with the most notable exception
being packm) as well as all level-1v, -1d, -1f, -1m, and -2 expert
APIs. (A few internal functions gained the rntm_t* parameter even
though they currently have no use for it, such as bli_l3_packm().)
This required some internal calls to some of those functions to
be updated since BLIS was already using those operations internally
via the expert interfaces. For situations where a rntm_t object is
not available, such as within packm/unpackm implementations, NULL is
passed in to the relevant expert interfaces. This is acceptable for
now since parallelism is not obtained for non-level-3 operations.
- Revamped how global parallelism is encoded. First, the conventional
environment variables such as BLIS_NUM_THREADS and BLIS_*_NT are only
read once, at library initialization. (Thanks to Nathaniel Smith for
suggesting this to avoid repeated calls getenv(), which can be slow.)
Those values are recorded to a global rntm_t object. Public APIs, in
bli_thread.c, are still available to get/set these values from the
global rntm_t, though now the "set" functions have additional logic
to ensure that the values are set in a synchronous manner via a mutex.
If/when NULL is passed into an expert API (meaning the user opted to
not provide a custom rntm_t), the values from the global rntm_t are
copied to a local rntm_t, which is then passed down the function stack.
Calling a basic API is equivalent to calling the expert APIs with NULL
for the cntx and rntm parameters, which means the semantic behavior of
these basic APIs (vis-a-vis multithreading) is unchanged from before.
- Renamed bli_cntx_set_thrloop_from_env() to bli_rntm_set_ways_for_op()
and reimplemented, with the function now being able to treat the
incoming rntm_t in a manner agnostic to its origin--whether it came
from the application or is an internal copy of the global rntm_t.
- Removed various global runtime APIs for setting the number of ways of
parallelism for individual loops (e.g. bli_thread_set_*_nt()) as well
as the corresponding "get" functions. The new model simplifies these
interfaces so that one must either set the total number of threads, OR
set all of the ways of parallelism for each loop simultaneously (in a
single function call).
- Updated sandbox/ref99 according to above changes.
- Rewrote/augmented docs/Multithreading.md to document the three methods
(and two specific ways within each method) of requesting parallelism
in BLIS.
- Removed old, disabled code from bli_l3_thrinfo.c.
- Whitespace changes to code (e.g. bli_obj.c) and docs/BuildSystem.md.

commit 323eaaab99752858b12e81e2eb8e416f009a3028
Author: Devangi N. Parikh <dnpcs.utexas.edu>
Date: Fri Jul 13 11:40:06 2018 -0500

Removed left over code from plotting scripts.

commit 60c197736495b47ce974ffb9b43874d1ebcfe78c
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Jul 12 19:22:14 2018 -0500

Documented accessor functions in BLISObjectAPI.md.

Details:
- Added documentation to docs/BLISObjectAPI.md for a handful of
commonly-used obj_t accessor functions.
- Minor updates to docs/BLISTypedAPI.md.

commit 77327ad796e11ef67df0cc91d45ed663598ba4df
Merge: 73b0b2a3 9fef8575
Author: Devangi N. Parikh <dnpcs.utexas.edu>
Date: Thu Jul 12 17:09:33 2018 -0500

Merge branch 'master' of https://github.com/flame/blis

commit 73b0b2a3ac1be6dfbe85c116886b4e29d98ac945
Author: Devangi N. Parikh <dnpcs.utexas.edu>
Date: Thu Jul 12 16:53:10 2018 -0500

Created hardware-specific test driver directory.

Details:
- Created a 'studies' subdirectory within 'test' to be used to house
test drivers, makefiles, run scripts, matlab plot code, and related
files that have been customized for collecting performance data on
specific host machines or product lines. This new setup will help us
catalog, track, and share test driver materials over time, and in a
way that facilitates reproducibility.
- Created an 'skx' subdirectory within 'test/studies' to house various
level-3 test driver files used to measure performance on SkylakeX
nodes (specifically, those nodes used by TACC's stampede2 system).

commit 9fef85756d15ee0f977fff6e57acd01c20cba184
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Jul 11 18:40:30 2018 -0500

Cleaned up loose ends in BLISObjectAPI.md.

Details:
- Deleted some lines from the API function signatures that did not
belong (and were only left over from the copy-paste of the typed API).
- Fixed some paragraph-in-bullet indentation.

commit 80ddeae4629022b69fdf1f1b053a1fcba643c40c
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Jul 11 18:31:57 2018 -0500

Added BLISObjectAPI.md to docs.

Details:
- Added first draft of BLISObjectAPI.md. (Object management section is
still missing.)
- Small fixes to BLISTypedAPI.md found while writing BLISObjectAPI.md.
- In various .md files, changed verbatim blocks to language
attributes (e.g. c for C code).

commit 038442add39ce629fee0d960b212ce0c95138d46
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Jul 11 12:24:18 2018 -0500

Added -lpthread to makefile example in BuildSystem.md.

Details:
- Added missing pthreads library linking to example makefile in
docs/BuildSystem.md, as well as similar language to build requirements
at the beginning of the document. Thanks to Stefanos Mavros for
bringing this to our attention.
- Updated CREDITS file.

commit bf10d8624e7b5902c9d9189c7c93f318b8e1b9a5
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Jul 9 18:40:13 2018 -0500

Small updates to KernelsHowTo.md, BLISTypedAPI.md.

Details:
- Minor updates to BLISTypedAPI.md, mostly to bring terminology
up-to-date with the new "typed API" classification.
- Added contents section to KernelsHowTo.md.

commit 1fd3bce59e43b422e62f9684bca9d1296a29edc3
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Jul 9 18:20:11 2018 -0500

Further updates to KernelsHowTo.md, BLISTypedAPI.md.

Details:
- Added missing level-1v operations to BLISTypedAPI (e.g. axpbyv,
xpbyv).
- Updated broken linkes in KernelsHowTo.md based on misnamed anchors.
- Other minor changes.

commit c40d30a6c920bd2e5a8353a3cd07a7e2b2265758
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Jul 9 17:55:54 2018 -0500

Updated KernelsHowTo.md, BLISTypedAPI.md.

Details;
- Added missing (basic) information in KernelsHowTo.md for level-1f and
level-1v kernels.
- Updated section regarding contexts.

commit f8913c2bf91c0e0fb4e68aedf64a242a19db92a0
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Jul 7 20:35:13 2018 -0500

Fixed outdated scalv() calls in penryn l1f kernels.

Details:
- Fixed stale calls to dscalv() from the dotxf and dotxaxpyf penryn
kernels that were not updated during the basic/expert API separation
in e88aeda.

commit e78e71d549ac17ecd52c7b33008df1cd78f1b59e
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Jul 7 20:18:09 2018 -0500

Added README.md mention/link to examples/tapi.

Details:
- Added language to README.md to bring the reader's attention to the
example code for the typed API (in addition to those for the object
API).

commit 419ffb158573a26bfec47bac73e4394e7926a7b8
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Jul 7 20:14:23 2018 -0500

Updates to README.md.

Details:
- Updated wiki links according to renamed/relocated files in 'docs'.
- Converted links to relative paths.
- Added link to docs/Multithreading.md.

commit 7d3e8a7e5f1ec299d009fb6c9071f0c1b089b460
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Jul 7 20:01:29 2018 -0500

Reverted docs/*.md links to relative paths.

Details:
- Within the documents in docs/*.md, reverted links to other local
documents to relative paths.
- Fixed some links/documents that did not yet have the '.md' suffix.
- Testing whether we can use relative links ('docs/BLISTypedAPI.md')
from within README.md.

commit d97c862c2b9170d774f414e63ae365488fffb4f5
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Jul 7 19:40:41 2018 -0500

Updated links (URLs) in docs/*.md.

Details:
- Updated most markdown links in the documents/wikis to use absolute
paths instead of the relative paths that were in use previously.
A few links were not updated, except for adding a ".md" to reflect
the documents' new names, in order to test whether relative
linking still works.

commit 3a0c12135875e0fb04de9798664e4fae632d994e
Merge: 2c7960c8 bcacddfa
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Jul 7 16:51:38 2018 -0500

Merge branch 'dev'

commit bcacddfad75b20969660606751eea6ead6c42ca9
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Jul 7 16:45:29 2018 -0500

Added 'docs' directory with wiki markdown files.

Details:
- Exported all github wikis to a new 'docs' directory.
- Renamed 'BLISAPIQuickReference' wiki to 'BLISTypedAPI' and removed
all cntx_t* arguments from the (now non-expert) APIs (with the
exception of the kernel APIs).
- Added section to BuildSystem documenting new ARG_MAX hack.

commit 3ee2bc0f7aa3b08da92331d64271bee99eaf8c1d
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Jul 7 16:02:16 2018 -0500

Renamed files that distinguish basic/expert APIs.

Details:
- Renamed various files that were previously named according to a
"with context" or "without context" convention. For example, the
following files in frame/3 were renamed:

frame/3/bli_l3_oapi_woc.c -> frame/3/bli_l3_oapi_ba.c
frame/3/bli_l3_oapi_wc.c -> frame/3/bli_l3_oapi_ex.c
frame/3/bli_l3_tapi_woc.c -> frame/3/bli_l3_tapi_ba.c
frame/3/bli_l3_tapi_wc.c -> frame/3/bli_l3_tapi_ex.c

Here, the "ba" is for "basic" and "ex" is for "expert". This new
naming scheme will make more sense especially if/when additional
expert parameters are added to the expert APIs (typed and object).

commit e88aedae735dfeb6fa5ac28d4527eb3ca58c6510
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Jul 6 19:14:02 2018 -0500

Separated expert, non-expert typed APIs.

Details:
- Split existing typed APIs into two subsets of interfaces: one for use
with expert parameters, such as the cntx_t*, and one without. This
separation was already in place for the object APIs, and after this
commit the typed and object APIs will have similar expert and non-
expert APIs. The expert functions will be suffixed with "_ex" just as
is the case for expert interfaces in the object APIs.
- Updated internal invocations of typed APIs (functions such as
bli_?setm() and bli_?scalv()) throughout BLIS to reflect use of the
new explictly expert APIs.
- Updated example code in examples/tapi to reflect the existence (and
usage) of non-expert APIs.
- Bumped the major soname version number in 'so_version'. While code
compiled against a previous version/commit will likely still work
(since the old typed function symbol names still exist in the new API,
just with one less function argument) the semantics of the function
have changed if the cntx_t* parameter the application passes in is
non-NULL. For example, calling bli_daxpyv() with a non-NULL context
does not behave the same way now as it did before; before, the
context would be used in the computation, and now the context would
be ignored since the interace for that function no longer expects a
context argument.

commit 331694e52414c0cd50048daf880a9ace9e29b94a
Author: Isuru Fernando <isurufgmail.com>
Date: Fri Jul 6 09:07:38 2018 -0600

Fix windows build and enable x86_64 on appveyor (230)

* Upload artifacts built on appveyor (228)

* Upload artifacts

* Fix install in appveyor

* Remove windows.h in bli_winsys.c (229)

Looks like it is unneeded.

* Implemented ARG_MAX hack in configure, Makefile.

Details:
- Added support for --enable-arg-max-hack to configure, which will
change the behavior of make when building BLIS so that rather than
invoke the archiver/linker with all of the object files as command
line arguments, those object files are echoed to a temporary file
and then the archiver/linker is fed that temporary file via the
notation. An example of this can be found in the GNU make docs at
https://www.gnu.org/software/make/manual/make.html#File-Function
- Thanks to Isuru Fernando for prompting this feature.

* Enable x86_64 and arg-max-hack on appveyor

* Use gas style assembly for clang on windows

commit a64a780d28c99d35f237f59212772e9beff35b3e
Merge: 89e178ce 3cb396d1
Author: Devin Matthews <dmatthewsutexas.edu>
Date: Fri Jul 6 09:38:42 2018 -0500

Merge pull request 231 from flame/travis-pr

Disable SDE for PRs

commit 3cb396d1ae4ee569f862db201c6a976712fd128e
Author: Devin Matthews <dmatthewsutexas.edu>
Date: Fri Jul 6 09:19:44 2018 -0500

Disable SDE for PRs

Pull requests cannot use Travis secret variables, so SDE needs to be disabled. This PR should suffice as a test.

commit 2c7960c8416ee9b67364be5f2b210fd7a0aec4b5
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Jul 5 14:38:33 2018 -0500

Implemented ARG_MAX hack in configure, Makefile.

Details:
- Added support for --enable-arg-max-hack to configure, which will
change the behavior of make when building BLIS so that rather than
invoke the archiver/linker with all of the object files as command
line arguments, those object files are echoed to a temporary file
and then the archiver/linker is fed that temporary file via the
notation. An example of this can be found in the GNU make docs at
https://www.gnu.org/software/make/manual/make.html#File-Function
- Thanks to Isuru Fernando for prompting this feature.

commit c422a5cd191d47e6aeb9cea6de0e348f46e3e318
Merge: b6470262 89e178ce
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Jul 5 12:33:35 2018 -0500

Merge branch 'dev'

commit b6470262ea66c0f48a5b4d85ca4bf85c1fb2b3af
Author: Isuru Fernando <isurufgmail.com>
Date: Wed Jul 4 19:14:29 2018 -0600

Remove windows.h in bli_winsys.c (229)

Looks like it is unneeded.

commit eac4bdf98691c5ec784af0dc11d1ad2269840661
Author: Isuru Fernando <isurufgmail.com>
Date: Wed Jul 4 18:31:01 2018 -0600

Upload artifacts built on appveyor (228)

* Upload artifacts

* Fix install in appveyor

commit 89e178ce380439dea951925e33703dc4b979e914
Merge: d868eb3e e32b2ef9
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Jul 4 17:51:16 2018 -0500

Merge branch 'master' into dev

commit e32b2ef983ea1c3521dd3821116c0078690f125e
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Jul 4 17:49:39 2018 -0500

Update to CREDITS file.

commit 14648e137696484e0ff04f89b16c6b4183ea42b8
Author: Isuru Fernando <isurufgmail.com>
Date: Wed Jul 4 16:48:42 2018 -0600

Native windows support using clang (227)

* Add appveyor file

* Build script

* Remove fPIC for now

* copy as

* set CC and CXX

* Change the order of immintrin.h

* Fix testsuite header

* Move testsuite defs to .c

* Fix appveyor file

* Remove fPIC again and fix strerror_r missing bug

* Remove appveyor script

* cd to blis directory

* Fix sleep implementation

* Add f2c_types_win.h

* Fix f2c compilation

* Remove rdp and rename appveyor.yml

* Remove setenv declaration in test header

* set CPICFLAGS to empty

* Fix another immintrin.h issue

* Escape CFLAGS and LDFLAGS

* Fix more ?mmintrin.h issues

* Build x86_64 in appveyor

* override LIBM LIBPTHREAD AR AS

* override pthreads in configure

* Move windows definitions to bli_winsys.h

* Fix LIBPTHREAD default value

* Build intel64 in appveyor for now

commit b45ea92fc6f77f2313b50dbe95922f838cbead07
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Jul 3 18:27:29 2018 -0500

Added typed (BLAS-like) API code examples.

Details:
- Added new example code to examples/tapi demonstrating how to use the
BLIS typed API. These code examples directly mirror the corresponding
example code files in examples/oapi. This setup provides a convenient
opportunity for newcomers to BLIS to compare and contrast the typed
and object APIs when they are used to perform the same tasks.
- Minor cleanups to examples/oapi.

commit d868eb3e200f657a1284c4cc933e7a4d25260dce
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Jun 29 12:36:04 2018 -0500

Implemented bli_obj_scalar_cast_to().

Details:
- Implemented bli_obj_scalar_cast_to(), which will typecast the value in
the internal scalar of an obj_t to a specified datatype.
- Changed bli_obj_scalar_attach() so that the scalar value being attached
is first typecast to the storage datatype of the destination object
rather than the target datatype.
- Reformatted function type signatures in bli_obj_scalar.c as well as
prototypes in its corresponding header file.

commit 52d80b5f09517d80ac8a7c96983a576c1ec2080b
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Jun 29 12:30:44 2018 -0500

Fixed static funcs related to target and exec dts.

Details:
- Fixed incorrect bit shifts in the following static functions:
bli_obj_set_target_domain()
bli_obj_set_target_prec()
bli_obj_set_exec_domain()
bli_obj_set_exec_prec()
- Fixed incorrect bitmask in bli_dt_proj_to_single_prec().
- Updated bli_obj_real_part() and bli_obj_imag_part() so that it updates
the target and exec datatypes (in addition to the storage datatypes).

commit e006f2d0eeb229c1cd05a424496a774c29bdc5d7
Merge: bd8c55fe dafca7a0
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Jun 27 15:54:38 2018 -0500

Merge branch 'dev' of github.com:flame/blis into dev

commit bd8c55fe268e8e352508341ebd739ef4fc68eb92
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Jun 27 15:52:37 2018 -0500

Added dt_on_output field to auxinfo_t.

Details:
- Added a new field to the auxinfo_t struct that can be used, in theory,
to request type conversion before the microkernel stores/accumulates
its microtile back to memory.
- Added the appropriate get/set static functions to bli_type_defs.h.

commit dafca7a0c2c72aaf15cb588b2bef6f246abb1905
Author: Devin Matthews <dmatthewsutexas.edu>
Date: Mon Jun 25 16:20:10 2018 -0500

Fix botched memory addressing in Penryn kernel (no effect for GAS output).

commit de493b0f349efebab98ab17f063d4d3d932c24c3
Merge: 195480be a7166feb
Author: Devin Matthews <dmatthewsutexas.edu>
Date: Mon Jun 25 14:26:06 2018 -0500

Merge pull request 226 from devinamatthews/dev

Finish macroization of assembly ukernels.

commit 195480beb589db7d582646f556e855c611d4c3a9
Merge: 07c3d0a9 3f387ca3
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Jun 25 13:24:21 2018 -0500

Merge branch 'master' into dev

commit 3f387ca35e42519f0d6a154814e4c8800fa2acb8
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Jun 25 12:32:03 2018 -0500

Fixed bugs in configure's select_cc() function.

Details:
- This commit fixes several bugs in configure relating to selecting a C
compiler. By dumb luck, two of the two bugs sort of cancelled each
other out in most use cases, which manifested as the expected behavior.
Thanks to Mathieu Poumeyrol for bringing this issue to our attention,
and to Devin Matthews for suggesting the more portable way of
capturing both stdout and stderr and suggesting a return code check
instead of testing stdout/stderr.
- The first bug: As the values of the compiler search list are iterated
over, only stderr is captured when querying a compiler with --version
rather than both stdout and stderr.
- The second bug: After each query, a conditional attempted to test
whether the query resulted in anything being output. That conditional
erroneously was using "-z" instead of "-n" for non-emptiness. Thus,
most of the time, stderr was empty (because the --version info was
being output on stdout), and since it was empty, the -z conditional
(intended to execute only when a compiler was found to be responsive)
executed.
- A third bug was also fixed in the way that the merged stdout/stderr
output was tested for non-emptiness (moving the 'cat' invocation to
another line and testing the contents of a variable instead).
- The three bugs above have been fixed as part of a partial rewrite of
the select_cc() function in terms of a return code check, which
obviated the need to save the output of stdout and stderr.
- The fourth bug involved a misnamed variable in the right-hand side
of a statement intended to prepend CC to search_list when CC was
non-empty. This typically did not manifest as a bug since usually CC
(if it was set) was set to a value that was known to work.

commit a7166feb1053814b7dd27f3879ae38acfc9637fc
Author: Devin Matthews <dmatthewsutexas.edu>
Date: Mon Jun 25 12:09:18 2018 -0500

Finish macroization of assembly ukernels.

commit f986396c2af5de06283b9834112782afd0a8907e
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Jun 22 18:12:40 2018 -0500

Added 'configure --help' text for CFLAGS, LDFLAGS.

Details:
- Added mention of the new support for preset CFLAGS, LDFLAGS to the
bottom of the text output by './configure --help'.
- Updated usage example to use 'haswell' instead of 'sandybridge'.

commit 884175d9ffb62e49535e6c1f7d58fb3b83e7e78f
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Jun 22 18:08:43 2018 -0500

Added configure support for preset CFLAGS, LDFLAGS.

Details:
- Any preexisting values set to the CFLAGS environment variable (or the
CFLAGS variable if given on the command line) are saved by configure
for later inclusion (prepending, to be precise) along with the
compiler flags automatically determined by the BLIS build system.
LDFLAGS is treated in a similar manner.) Thanks to Dave Love for
requesting this feature in issue 223 and Mathieu Poumeyrol for his
support on this and a previous related issue.
- Comment updates to build/config.mk.in.
- Strip whitespace from return value of various cflags functions in
common.mk.

commit 07c3d0a95190bd23f0cd2ef220deb3384d8378d1
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Jun 21 12:35:07 2018 -0500

Update to CREDITS file.

commit a1ebbbf158c7b34c9032ef45431bc610b6f14858
Merge: 17928b1c c81c6f23
Author: Devin Matthews <dmatthewsutexas.edu>
Date: Wed Jun 20 15:37:53 2018 -0500

Merge pull request 224 from devinamatthews/asm-macros

Asm macros

commit c81c6f23b9547b5d55ae68fd5a3bbd8a78290b6b
Author: Devin Matthews <dmatthewsutexas.edu>
Date: Wed Jun 20 15:20:44 2018 -0500

Fix problem with inc and dec macros.

commit 5a63971c822fd452f97ba869625c8e87f6cbeebc
Merge: b4d94e54 17928b1c
Author: Devin Matthews <dmatthewsutexas.edu>
Date: Wed Jun 20 14:07:49 2018 -0500

Merge remote-tracking branch 'upstream/dev' into asm-macros

commit b4d94e54d44cf30e4bb452ca5263be3473c0582d
Author: Devin Matthews <dmatthewsutexas.edu>
Date: Wed Jun 20 14:07:24 2018 -0500

Convert x86 microkernels to assembly macros.

commit 17928b1c9941aa58aef1f122c793e2b14e705267
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Jun 19 17:59:03 2018 -0500

Added static funcs bli_dt_domain(), bli_dt_prec().

Details:
- Added definitions of static functions bli_dt_domain()/bli_dt_prec(),
which extract a dom_t domain or prec_t precision value, respectively,
from a num_t datatype.
- Changed the return types of bli_obj_domain() and bli_obj_prec() from
objbits_t to dom_t and prec_t. (Not sure why they were ever set to
return objbits_t.)

commit 5f7fbb7115b1bf532c169dfd9adef84c41a95031
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Jun 19 15:38:55 2018 -0500

Static funcs for projecting dt to single/double.

Details:
- Added static functions for projecting a datatype to single precision
or double precision, both for obj_t's storage datatypes and standalone
datatypes.

commit d4a22702c7a90273dc14f271db465c2e11e5b87e
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Jun 19 14:54:57 2018 -0500

Set up haswell config for optional col-pref ukrs.

Details:
- Added two presently-disabled cpp blocks in bli_cntx_init_haswell.c to
easily allow one to switch to a set of column-preferential gemm
microkernels (in the haswell subconfiguration). The second column-
preferring block sets the the register blocksizes to their appropriate
values. However, cache blocksizes are left unchanged, and therefore are
likely suboptimal. This should be addressed later.

commit f317c2e31bfc329cb6bb4e06005e45b9c8a9d6a7
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Jun 19 12:21:23 2018 -0500

Added get/set static funcs for exec dt/dom/prec.

Details:
- Added functions to bli_obj_macro_defs.h to get and set the target
domain and target precision bits in the obj_t, and also added the
appropriate support in bli_type_defs.h.

commit e88a5b8da8c26caebd2b0fb73b30836fb5417c9c
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Jun 18 15:56:26 2018 -0500

Implemented castm, castv operations.

Details:
- Implemented castm and castv operations, which behave like copym and
copyv except where the obj_t operands can be of different datatypes.
These new operations, however, unlike copym/copyv, do not build upon
existing level-1v kernels.
- Reorganized projm, projv into a 'proj' subdirectory of frame/base (to
match the newly added frame/base/cast directory).
- Added new macros to bli_gentfunc_macro_defs.h, _gentprot_macro_defs.h
that insert GENTFUNC2/GENTPROT2 macros for all non-homogeneous datatype
combinations. Previously, one had to invoke two additional macros--one
which mixed domains only and another that included all remaining
cases--in order to get full type combination coverage.
- Defined a new static function, bli_set_dims_incs_2m(), to aid in the
setting of various variables in the implementations of bli_??castm().
This static function joins others like it in bli_param_macro_defs.h.
- Comment update to bli_copysc.h.

commit 2000cdff59272974438e88e0e82d8e1a32710325
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Jun 18 14:17:28 2018 -0500

Update to CREDITS file.

commit ed2c8aed848ba2dede18df090cf2e0b6e4cc059f
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Jun 18 11:49:34 2018 -0500

Temporarily disabled small matrix handling on zen.

Details:
- Disabled small matrix handling in config/zen/bli_family_zen.h due to
what appears to be a bug that manifests as failures in the single and
double precision real level-3 BLAS test drivers (visible via
out.sblat3 and out.dblat3). Thanks to Robin Christ for reporting this
issue.

commit ed20392c500940bfc0947795c1ff7c8c24f8e26f
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Jun 15 16:31:22 2018 -0500

Added get/set static funcs for exec dt/dom/prec.

Details:
- Added functions to bli_obj_macro_defs.h to get and set the execution
domain and execution precision bits in the obj_t.
- Added/rearranged a few functions in bli_obj_macro_defs.h.
- Renamed some macros in bli_type_defs.h: EXECUTION -> EXEC.

commit 22594e8e9ab55f5bc0e69d96a23e128502849999
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Jun 14 17:35:23 2018 -0500

Updated sandbox/ref99 according to f97a86f.

Details:
- Applied changes to ref99 sandbox analagous to those applied to
framework code in f97a86f. This involves setting the pack schemas of
A and B objects temporarily to communicate those desired schemas to
the control tree creation function in blx_gemm_cntl.c. This allows us
to (henceforth) query the schemas from the control tree rather than
the context.

commit 1b5d0424d2c7e5eac33e02359c12917ef280949f
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Jun 13 18:41:32 2018 -0500

Prototype column-preferential zen gemm ukernels.

Details:
- Added prototypes to bli_kernels_zen.h for each of the four gemm
microkernels that prefer outputting to column storage.

commit f88c2e7a539e383297e846e6d4647058dd3db128
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Jun 13 18:27:46 2018 -0500

Defined static function bli_blksz_scale_def_max().

Details:
- Added a new static function to bli_blksz.h that scales both the default
(regular) blocksize as well as the maximum blocksize in the blksz_t
object. Reminder: maximum blocksizes have different meanings in
different contexts. For register blocksizes, they refer to the packing
register blocksizes (PACKMR or PACKNR) while for cache blocksizes, they
refer to the maximum blocksize to use during the final iteration of a
loop.

commit 87db5c048e0c7f37351fda486abaf7d19fc5821c
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Jun 12 19:38:37 2018 -0500

Changed usage of virtual microkernel slots in cntx.

Details:
- Changed the way virtual microkernels are handled in the context.
Previously, there were query routines such as bli_cntx_get_l3_ukr_dt()
which returned the native ukernel for a datatype if the method was
equal to BLIS_NAT, or the virtual ukernel for that datatype if the
method was some other value. Going forward, the context native and
virtual ukernel slots will both be initialized to native ukernel
function pointers for native execution, and for non-native execution
the virtual ukernel pointer will be something else. This allows us
to always query the virtual ukernel slot (from within, say, the
macrokernel) without needing any logic in the query routine to decide
which function pointer (native or virtual) to return. (Essentially,
the logic has been shifted to init-time instead of compute-time.)
This scheme will also allow generalized virtual ukernels as a way
to insert extra logic in between the macrokernel and the native
microkernel.
- Initialize native contexts (in bli_cntx_ref.c) with native ukernel
function addresses stored to the virtual ukernel slots pursuant to
the above policy change.
- Renamed all static functions that were native/virtual-ambiguous, such
as bli_cntx_get_l3_ukr_dt() or bli_cntx_l3_ukr_prefers_cols_dt()
pursuant to the above polilcy change. Those routines now use the
substring "get_l3_vir_ukr" in their name instead of "get_l3_ukr". All
of these functions were static functions defined in bli_cntx.h, and
most uses were in level-3 front-ends and macrokernels.
- Deprecated anti_pref bool_t in context, along with related functions
such as bli_cntx_l3_ukr_eff_dislikes_storage_of(), now that 1m's
panel-block execution is disabled.

commit dbaf440540837b03643190cd685ed889fa7fd212
Merge: 22aa44eb 2610fff0
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Jun 11 12:37:04 2018 -0500

Merge branch 'master' into dev

commit 2610fff0b07bdb345cb2e334ef6bea0c63c8cead
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Jun 11 12:32:54 2018 -0500

Renamed 1m packm kernels from _1e to _1er.

Details:
- Renamed the reference packm kernels used by 1m. Previously, they used
a _1e suffix, which was confusing since they packed to both 1e and 1r
schemas. This was likely an artifact of the time when there were
separate kernels for each schema before I decided to combine them into
a single function (per datatype and panel dimension), and the 1e
functions were the ones to inherit the 1r functionality. The kernels
have now been renamed to use a _1er suffix.

commit 7af5283dcc3dded114852d6013d33134021b81aa
Author: sraut <Biplab.Rautamd.com>
Date: Mon Jun 11 15:00:22 2018 +0530

added check condition on n-dimension for XA'=B intrinsic code to process till 128 size

Change-Id: I95d020a5ca3ea21d446b8c2e379d56e1eea18530

commit 712de9b371a8727682352a2f52cd4880de905f0b
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Jun 9 14:36:30 2018 -0500

Added missing semicolon in 03obj_view.c

Details:
- Thanks to Tony Skjellum for pointing out this typo due to a
last-minute change to the source prior to committing.

commit 043d0cd37ef4a27b1901eeb89d40083cfb2a57ba
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Jun 9 13:46:49 2018 -0500

Implemented bli_acquire_mpart(), added example code.

Details:
- Implemented bli_acquire_mpart(), a general-purpose submatrix view
function that will alias an obj_t to be a submatrix "view" of an
existing obj_t.
- Renumbered examples in examples/oapi and inserted a new example file,
03obj_view.c, which shows how to use bli_acquire_mpart() to obtain
submatrix views of existing objects, which can then be used to
indirectly modify the parent object.

commit f1908d39767baef56077def69126d96f805ee27e
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Jun 8 14:22:22 2018 -0500

Fixed broken input.operations.fast.

Details:
- Removed three input lines from input.operations.fast (labeled
"test sequential micro-kernel") that I intended to remove in bd02c4e.
These lines prevented 'make check' (and 'make checkblis-fast') from
completing correctly. Note: This bug was fixed in 3df39b3, but that
commit has not yet been merged into master, hence this redundant
commit. Thanks to Robert van de Geijn for reporting this issue.

commit 262a62e3482c5caa947a89cabb562b5887555bd6
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Jun 8 12:10:54 2018 -0500

Fixed undefined ref in steamroller/excavator configs.

Details:
- Fixed erroneous calls to bli_cntx_init_piledriver_ref() in
bli_cntx_init_steamroller() and bli_cntx_init_excavator(), which
should have been to their respectively-named bli_cntx_init_*()
functions instead. Thanks to qnerd for bringing these bugs to our
attention.

commit 22aa44ebec2c7884bdc944775a1aa7534ab53f0d
Merge: 65fae950 b65d0b84
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Jun 7 17:42:59 2018 -0500

Merge branch 'dev' of github.com:flame/blis into dev

commit 65fae95074d239354737355bbe6f202d4f8b2871
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Jun 7 17:41:09 2018 -0500

Implemented bli_setrm, _setim, _setrv, _setiv.

Details:
- Defined new wrappers to setm/setv operations in frame/base/bli_setri.c
that will target only the real or only the imaginary parts of a
matrix/vector object.
- Updated bli_obj_real_part() so that the complex-specific portions of
the function are not executed if the object is real.
- Defined bli_obj_imag_part().
- Caveat: If bli_obj_imag_part() is called on a real object, it does
nothing, leaving the destination object untouched. The caller must
take care to only call the function on complex objects.
- Reordered some of the static functions in bli_obj_macro_defs.h related
to aliasing.

commit b65d0b841b7e4357bc2cf743bbb03384a3ab0bfa
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Jun 7 14:38:41 2018 -0500

Fixed bug in bli_dt_proj_to_complex().

Details:
- Fixed a bug identical to the one fixed in 0a4a27e, except this time in
the bli_obj_param_defs.h header file. It looks like the only consumers
of this static function were in bli_l0_oapi.c, and so this may not have
been manifesting (yet).

commit 55b6abdf7458e31df3ad01796d67c2332c776948
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Jun 7 14:08:12 2018 -0500

Enforce consistent datatypes in most object APIs.

Details:
- Added logic to level-1v, -1d, -1f, -1m, -2, and -3 operations' _check()
functions to ensure that all operands are of the same datatype. There
are some exceptions that were left out, such as the _check() function
for the various norm operations since they have a different idea of
datatype consistency (ie: the norm object must be the real projection
of the primary input vector/matrix object).

commit 513138b1a1ecebd015580423c779810cae5c67f2
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Jun 7 12:24:47 2018 -0500

Defined/implemented bli_projv().

Details:
- Added an implementation for bli_projv() to go along with the
implementation of bli_projm() added in 0a4a27e. The only difference
between the two is that bli_projv() may only be used on vectors,
whereas bli_projm() is general-purpose.
- Added a _check() function corresponding to bli_projv().

commit 5f71c1e719eb482b2a4e40daa280c4f7d05b6963
Merge: b5a641e9 3df39b37
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Jun 6 19:06:14 2018 -0500

Merge branch 'dev' of github.com:flame/blis into dev

commit b5a641e968469805906eb2c971384d12ad1beac5
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Jun 6 19:05:37 2018 -0500

Added char-to-dt and dt-to-char mapping functions.

Details:
- Defined additional functions in bli_param_map.c:
bli_param_map_char_to_blis_dt()
bli_param_map_blis_to_char_dt()
which will map a char to its corresponding num_t, or vice versa.

commit 0a4a27e1a4487480410bc0b1bb034bcf97583214
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Jun 6 19:02:29 2018 -0500

Defined/implemented bli_projm().

Details:
- Defined a new operation in frame/base/bli_proj.c, bli_projm(), which
behaves like bli_copym(), except that operands a and b are allowed to
contain data of differing domains (e.g. a is real while b is complex,
or vice versa). The file is named bli_proj.c, rather than bli_projm.c,
with the intention that a 'v' vector version of the function may be
added to the same file (at some point in the future).
- Added supporting bli_check_*() functions in bli_check.c to confirm
consistent precisions between to datatypes/objects, as well as the
appropriate error message in bli_error.c and a new error code in
bli_type_defs.h.
- Wrote a bli_projm_check() function to go along with bli_projm().
- Defined static function bli_obj_real_part() in bli_obj_macro_defs.h,
which will initialize an obj_t alias to the real part of the source
object.
- Fixed a bug in the static function bli_dt_proj_to_complex(), found
in bli_param_macro_defs.h. Thankfully, there were no calls to the
function to produce buggy behavior.

commit 3df39b37a0134befa34b6b6259db98467c7bc965
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Jun 6 15:35:05 2018 -0500

Fixed recently broken input.operations.fast.

Details:
- Removed "test sequential front-end" lines from microkernel test
entries of input.operations.fast. This change was meant for inclusion
in bd02c4e but was missed due to slightly different wording of the
comment (I used "sed //d" to remove the lines). This fixes the broken
'make checkblis-fast' (and 'make check') targets.

commit 695cd520e2f5eab938f66afe9fe36201ab2700c5
Author: sraut <Biplab.Rautamd.com>
Date: Wed Jun 6 11:48:56 2018 +0530

AMD Copyright information changed to 2018

Change-Id: Idfd11afd5d252f8063d0158680d24bf7e2854469

commit df1dd24fd896821de60917b429f303bab7fd0d4b
Author: sraut <Biplab.Rautamd.com>
Date: Wed Jun 6 11:24:33 2018 +0530

small matrix trsm intrinsics optimization code for AX=B and XA'=B

Change-Id: I90123c4d9adbd314c867995cd19dc975150b448c

commit 3f48c38164b4135515b5c752c506fdccc4480be2
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Jun 5 16:52:35 2018 -0500

Cosmetic fix to configure output in config.mk.

Details:
- Fixed configure so that MK_ENABLE_MEMKIND is assigned "no" when the
option is disabled due to libmemkind not being present. This wasn't
affecting anything since the one use of the variable (in common.mk)
was formulated as "ifeq ($(MK_ENABLE_MEMKIND),yes)". That is, the
variable being empty was effectively equivalent to it being set to
"no".
- Comment updates to build/config.mk.in, common.mk.

commit 5df201260f64aa98a365931f6d2da70144d69932
Merge: 1b9af85e 96d2774b
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Jun 5 16:14:19 2018 -0500

Merge branch 'master' into dev

commit 1b9af85ec98d91bb2b27aadaa3df344d18faff35
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Jun 5 16:07:13 2018 -0500

Updated ref99 call to _cntx_set_thrloop_from_env().

Details:
- Reordered the arguments in the ref99 sandbox's call to
bli_cntx_set_thrloop_from_env() to be consistent with the updated
function signature from f97a86f. Thanks to Devangi Parikh for
reporting this issue.

commit 96d2774b4cb44ff1e8b5798d7cfc83154a607624
Author: Tyler Michael Smith <tmscs.utexas.edu>
Date: Tue Jun 5 14:17:39 2018 +0200

Make bli_auxinfo_next_b() return b_next, not a_next (216)

commit d4c24ea5f644eb635046e7fe249d3e8e58b4c98a
Author: sraut <biplab.rautamd.com>
Date: Tue Jun 5 15:42:59 2018 +0530

copyright message changed to 2018

Change-Id: I33c1ebda41bc7f1973ff19e3b1947bdad62b4d44

commit 3f1ba4e646776699ebfaa042fe24691d9e2f55d0
Author: sraut <biplab.rautamd.com>
Date: Tue Jun 5 14:21:13 2018 +0530

copyright changed to 2018

Change-Id: Ie916c7cd6f95aedc3cab6eec3a703c9ddb333bc3

commit bd02c4e9f7fe07487276e61507335d48c8e05f35
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Jun 4 13:42:17 2018 -0500

Cleanups to testsuite, input.operations format.

Details:
- Removed the line in each operation entry in input.operations titled
"test sequential front-end" and the corresponding support for the lines
in the testsuite input parsing code. This line was included in the some
of the earliest versions of the testsuite, back when I intended to
eventually have separate multithreaded APIs. Specifically, I envisioned
that multithreaded and sequential testing could be enabled or disabled
on an operation level. However, BLIS evolved in a different direction
and still does not have multithreaded-specific APIs (even if it will
eventually someday). But even if it did have such APIs, I doubt I would
allow the user to enable/disable them on an operation level. Thus, this
was a zombie future parameter that was never used and never made sense
to begin with. The one instance of the front_seq variable, used in the
various libblis_test_<operation>() functions to guard the call to the
operation test driver, that remains was commented out instead of
deleted so that someday it could be easily changed via sed, if desired.
- Various minor cleanups to the testsuite code, including consolidating
use of DISABLE and DISABLE_ALL and reexpressing certain conditional
expressions in the libblis_test_<operation>() functions in terms of
boolean functions.

commit 2c6d99b99e50d70f904da298a0c59be16cc5c180
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sun Jun 3 18:13:36 2018 -0500

Fixed names out of alphabetical order in CREDITS.

commit 7a207e8f2c5046f8b295a78e029ff2de765c7409
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sun Jun 3 18:04:27 2018 -0500

Disabled indirect blacklisting (issue 214).

Details:
- Return early from function, pass_config_kernel_registries(), that
implements indirect blacklisting of subconfigurations (during pass 0).
In short, I realized that indirect blacklisting is not needed in the
situations I envisioned, and can actually cause problems under certain
circumstances. Thanks to Tony Skjellum for reporting the issue (214)
that led to this commit, and to Devin Matthews for prompting me to
realize that indirect blacklisting was unnecessary, at least as
originally envisioned.

commit d7fb32682057c7458c8891c0eedafc374fd9beef
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sun Jun 3 13:20:37 2018 -0500

Fixed syntax artifacts from 4b36e85 in examples.

Details:
- Fixed artifacts of malformed recursive sed expressions used when
preparing 4b36e85, in which most function-like macros were converted
to static functions. The syntactically defective code was contained
entirely in examples/oapi. Thanks to Tony Skjellum for reporting this
issue.
- Update to CREDITS file.

commit ed7dedfd4a07eefeb5a038f9899afb8053b45383
Merge: f97a86f3 469727d4
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Jun 2 20:29:53 2018 -0500

Merge branch 'master' into dev

commit f97a86f322a6e3e31f33c89befc66189b0b8c64f
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Jun 2 20:28:20 2018 -0500

Updated setting/querying pack schema (cntx->cntl).

- Query pack schemas in level-3 bli_*_front() functions and store those
values in the schema bitfields of the correponding obj_t's when the
cntx's method is not BLIS_NAT. (When method is BLIS_NAT, the default
native schemas are stored to the obj_t's.)
- In bli_l3_cntl_create_if(), query the schemas stored to the obj_t's in
bli_*_front(), clear the schema bitfields, and pass the queried values
into bli_gemm_cntl_create() and bli_trsm_cntl_create().
- Updated APIs for bli_gemm_cntl_create() and bli_trsm_cntl_create() to
take schemas for A and B, and use these values to initialize the
appropriate control tree nodes. (Also cpp-disabled the panel-block cntl
tree creation variant, bli_gemmpb_cntl_create(), as it has not been
employed by BLIS in quite some time.)
- Simplified querying of schema in bli_packm_init() thanks to above
changes.
- Updated openmp and pthreads definitions of bli_l3_thread_decorator()
so that thread-local aliases of matrix operands are guaranteed, even
if aliasing is disabled within the internal back-end functions (e.g.
bli_gemm_int.c). Also added a comment to bli_thrcomm_single.c
explaining why the extra aliasing is not needed there.
- Change bli_gemm() and level-3 friends so that the operation's ind()
function is called only if all matrix operands have the same datatype,
and only if that datatype is complex. The former condition is needed
in preparation for work related to mixed domain operands, while the
latter helps with readability, especially for those who don't want to
venture into frame/ind.
- Reshuffled arguments in bli_cntx_set_thrloop_from_env() to be
consistent with BLIS calling conventions (modified argument(s) are
last), and updated all invocations in the level-3 _front() functions.
- Comment updates to bli_cntx_set_thrloop_from_env().

commit 965db85d29977d228ea744581edf2b682eb8e8a8
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Jun 1 12:32:15 2018 -0500

Updated macro invocations in bli_gemm_ker_var2.c.

Details:
- Updated "get next a/b micropanel" macro invocations in
bli_gemm_ker_var2.c according to changes in 9588625.
- Comment update in bli_cntx.c.

commit 8749fa0b48a7710f4115023e2c46bc80167bc8f9
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu May 31 12:34:01 2018 -0500

Cleanups to ref99/README.md, test/3m4m/Makefile.

Details:
- Minor edits to sandbox/ref99/README.md.
- Removed cpp guards in sandbox/ref99/thread/blx_gemm_thread.h to be
consistent with other headers in sandbox/ref99.
- Additional targets and related cleanups in test/3m4m/Makefile.

commit 9588625c43c86ef1bde8140f620a30f52420e6a6
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed May 30 15:19:53 2018 -0500

Renamed "next micropanel" macros in _l3_thrinfo.h.

Details:
- Renamed several macros defined in bli_l3_thrinfo.h designed to compute
the values of a_next and b_next to insert into an auxinfo_t struct in
level-3 macrokernels. (Previously, the macros did not use a bli_
prefix.)
- Updated instances of above macro usage within various macrokernels.

commit e4420591225fca2f63ca74ef6a23b962fcd4bec0
Merge: 34f974d1 850a8a46
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue May 29 17:12:22 2018 -0500

Merge branch 'dev' of github.com:flame/blis into dev

commit 34f974d1a83a7d29ba09f67e392d361231fdf99c
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue May 29 17:11:52 2018 -0500

More tweaks/updates to sandbox/ref99/README.md.

commit 850a8a46c0a569a2652d8c200e5c53b61bcf988d
Author: Devin Matthews <dmatthewsutexas.edu>
Date: Tue May 29 13:51:21 2018 -0500

Test all x86_64 configurations*... (212)

* Add custom SDE cpuid files.

* Set up testing of all x86_64 architectures (except bulldozer) using SDE.

* Update .travis.yml

[ci skip]

* Update do_testsuite.sh

[ci skip]

* Updated .travis.yml with my secret token.

Details:
- Replaced Devin's temporary secret token with my own, which is used by
Travis when accessing the Intel SDE via Dropbox.

* Work around CPUID dispatch in glibc/libm by patching ld.so.

* Detect path of loader at runtime.

* Attempt to make SDE run on Travis

* Allow unpatched ld.so if we don't know how to patch it.

I *think* this only happens for older glibc without the multi-arch stuff (e.g. Ubuntu 14.04 on Travis), but who knows?

* Upgrade Travis to gcc-6 and binutils-2.26.

* Try to get Travis to use the right assembler.

* Apparently you need ld-2.26 too.

* Try to also patch ld.so from Ubuntu 14.04.

* Take the nuclear option.

* Account for non-absolute dependencies in ldd output.

* String manipulation fail.

* Update patch-ld-so.py

* Add Zen to SDE testing.

* Removed dead variable from travis/do_testsuite.sh.

Details:
- Removed 'BLIS_ENABLE_TEST_OUTPUT=yes' from make invocations in
travis/do_testsuite.sh. This variable is no longer present in the
BLIS build system (if it ever was?), and therefore has no effect.

commit 42ea02a34e5c144893fe239ae55daef895d92677
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue May 29 12:48:14 2018 -0500

Renamed c99 sandbox to ref99.

Details:
- Renamed sandbox/c99 to sandbox/ref99. I wanted to name the sandbox so
that it would be thought of as a "reference" sandbox. I kept the "99"
to differientiate it from future reference sandboxes that may be
written in another language (such as C++).
- Updates to sandbox/ref99/README.md.

commit 0e7205ccef50dccd4306cf427a63633396472813
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue May 29 12:36:13 2018 -0500

Remove sandbox/.gitkeep now that dir is non-empty.

commit 3a4603858e3819cbd6ed7dd67d0fc0b3f89ed254
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat May 26 15:51:08 2018 -0500

More README.md updates to sandbox/c99.

Details:
- Added a section that walks the reader through how to configure BLIS to
use a gemm sandbox.

commit 2bad97f6bdf4642884d60fc03970549902a54d74
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat May 26 15:31:16 2018 -0500

Updates to CREDITS, sandbox/c99/README.md.

commit 2b4a447526effa3e847a7e5c15c3758573f12318
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri May 25 18:51:23 2018 -0500

Initial implementation of c99 "reference" sandbox.

Details:
- Added a c99 sandbox (in sandbox/c99) to serve as a starting point for
others looking to experiment with alternative implementations of gemm
in BLIS. Note that this sandbox implementation is a first draft and
will be refined over time.
- Minor updates to Makefile and common.mk to restrict what source files
get recompiled when sandbox files are touched.
- Added an initial draft of a README.md in sandbox/c99.

commit 469727d4f8a976d8713afb4d0b6235c322498db0
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri May 25 16:17:13 2018 -0500

Very minor comment updates.

commit 66dbe69a0f9359bf1e39b5672ee365213de2e3ee
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri May 25 15:45:53 2018 -0500

Converted macros to static funcs in _packm_cntl.h.

Details:
- Converted various macros in frame/1m/packm/bli_packm_cntl.h (designed
to access fields of a packm_params_t struct) to static functions.

commit 22deef2f5463a47e3b3c37fc313d17550f10ee06
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu May 24 14:28:55 2018 -0500

Support alternative gemm implementation sandboxes.

Detail:
- configure:
- add support for --enable-sandbox=NAME to configure script, where NAME
is a subdirectory of a new 'sandbox' directory that contains an
alternative implementation of gemm. (For now, only implementations of
gemm may be provided via a sandbox.);
- add support for C++ compiler. C++ compilers are handled in a manner
similar to that of C compilers, in that a default search order is
used, and that CXX is searched for first, if the variable is set. In
practice, the C++ compiler that is selected should correspond to the
selected C compiler. (Example: If gcc is selected for C, g++ should
be selected for C++.) The result of the search is output to config.mk
via build/config.mk.in. NOTE: The use of C++ in BLIS is still
hypothetical, but may eventually move to being experimental. This
support was intended only for use of C++ within a gemm sandbox.
- build/config.mk.in:
- define SANDBOX variable containing sandbox subdirectory name.
- build/bli_config.in:
- define either of the BLIS_ENABLE_SANDBOX or BLIS_DISABLE_SANDBOX
macros in bli_config.h.
- common.mk:
- include makefile fragments that were propagated into the specified
sandbox subdirectory;
- generate different CFLAGS for sandboxes, as well as a separate
CXXFLAGS variable for sandboxes when C++ source files are compiled;
- isolate into a single location lists of file suffixes for various
purposes.
- reorganized/clean up code related to identifying header files and
paths.
- Makefile:
- generate object filepaths for and compile source code files found in
sandbox sub-directory;
- remove makefile fragments placed in sandbox sub-directory (cleanmk);
- various other cleanups.
- Added .cc, .cpp, and .cxx to list of suffixes of files to recognize in
makefile fragments (via build/gen-make-frags/suffix_list).
- Updated blis.h to conditionally include bli_sandbox.h (via a new file,
bli_sbox.h), which each sandbox is assumed to use for any type
definitions and function prototypes it wishes to export out to blis.h.
- Conditionally disable bli_gemmnat() implementation in frame/3 when
BLIS_ENABLE_SANDBOX is defined.

commit 25e3501ed57a0db7f860c88b7199b36049aec12a
Merge: 216a4cb9 5140ee34
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu May 24 13:57:16 2018 -0500

Merge branch 'master' into dev

commit 5140ee3424c744981a3fed3b5a748ebbfc111388
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed May 23 16:56:14 2018 -0500

Updated types of bli_is_[un]aligned_to() functions.

Details:
- Changed the void* arguments of the following static functions:
bli_is_aligned_to()
bli_is_unaligned_to()
bli_offset_past_alignment()
to siz_t, and the return type of bli_offset_past_alignment() from
guint_t to siz_t. This allows for more versatile usage of these
functions (e.g. when aligning both pointers and leading dimension).
- Updated all invocations of these functions, mostly in kernels/penryn
but also in kernels/bgq, to include explicit typecasts to siz_t when
pointer arguments are passed in.
- Thanks to Devin Matthews for pointing out this potential bug (via issue
211).
- Deleted a few trailing spaces in various penryn kernels.
- Removed duplicate instances of the words "derived" and "THEORY" from
various kernel license headers, likely from a malformed recursive sed
performed long ago.

commit 216a4cb9cb87fa4c93f6ceb6ae90602e5018b305
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri May 18 18:47:03 2018 -0500

Minor update to flatten-headers.[py|sh] help text.

Details:
- Fixed a typo and removed some outdated language from the help text of
flatten-headers.py and flatten-headers.sh.

commit 962a706a6f56ea070ac4683f0af69c7e59af8ecb
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri May 18 18:19:40 2018 -0500

Updated LICENSE file to mention HP Enterprise.

Details:
- Added HP Enterprise to the LICENSE file. Previously, only the source
files touched by HPE contained the corresponding copyright notices.
(This oversight was unintentional.)
- Updated file-level copyright notices to include a comma, to match
the formatting used for UT and AMD copyrights.

commit efa43e13effe901ad31e734ac90f027e89473bd9
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri May 18 12:20:40 2018 -0500

More updates to CREDITS and RELEASING files.

commit f94ab97af8e86baf9ee9a9cbaef8bb3712df2e11
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu May 17 17:45:31 2018 -0500

Update to CREDITS file.

commit 4919b10c005e006a6d818eb8f865f9dbd8aa16df
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu May 17 16:38:49 2018 -0500

Minor changes to README.md and CONTRIBUTING.md.

commit b89451187e8321b673a1cf7603c8d48028d9d4c8
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu May 17 16:23:06 2018 -0500

README.md update.

Details:
- Added "Contributing" section with relevant links.

commit af244194e7d76276a1b90fe59f9307dde0429e1d
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu May 17 15:38:02 2018 -0500

Removed explicit critical sec. from bli_memsys.c.

Details:
- Removed critical sections protecting the initialization/finalization of
bli_memsys.c. These synchronization mechanisms are no longer needed now
that BLIS initializes all APIs via pthread_once().

commit 10c9e8f95254d8c6436c4d3cb093fa5544b45c90
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu May 17 15:22:51 2018 -0500

Cache hardware's arch_t id after querying once.

Details:
- Added logic to bli_arch.c that will call what was previously the body
of bli_arch_query_id() only once and then cache the value in a static
variable local to the file. (Previously, the arch_t associated with
the hardware/configuration was queried every time bli_arch_query_id()
was called, which was at least once per level-3 function call. Thanks
to Devin Matthews for suggesting this feature via issue 175.
- Added -lpthread to the compile/link command line of the compiler
invocation that compiles build/detect/config/config_detect.c, which
prints the string identifying the detected configuration, since it
is now needed due to new pthread_once() logic in bli_arch.c.
- Implementation note: I chose to implement this arch_t caching feature
via pthread_once(), using a separate pthread_once_t variable local to
the file, rather than calling bli_init_once(). The reason is that I
did not want to require bli_init() as a prerequisite to this function.
bli_init() already calls several sub-components, some of which make use
of bli_arch_query_id(), and therefore it would be easy to fall into a
circular self-init situation (which usually causes pthreads to hang
indefinitely).

commit f28a15293890ac6fbceac229fd204dbc9fec6e27
Author: Francisco Igual <figualucm.es>
Date: Thu May 17 09:26:14 2018 +0000

Fixed clobber list bug in ARMv8 ukernel

commit 2e31dd7852b4d6a9355899cf9659d4b8130461cb
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed May 16 17:28:33 2018 -0500

Inserted missing integer typecasting into ukernels.

Details:
- Inserted missing safeguards into most microkernels to ensure that the
integers read by the microkernel's assembly instructions are of the
appropriate size. In many cases, this bug was going undetected likely
because the compiler was inserting zero padding before the integers
in the calling function, allowing the assembly code to read 64-bits
in a way that did not corrupt the "lower" 32 integer bits with garbage
in the higher bits. Thanks to Francisco Igual and Devangi Parikh for
finding this issue.

commit 12dfa9516428b4092554f0ce70b07571d35de222
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed May 16 12:46:57 2018 -0500

Fixed a bug in determining default integer size.

Details:
- Fixed a bug that would cause configurations to inadvertantly define
their integers to be 32 bits when those environments actually call for
64-bit integers. While either BLIS_ARCH_64 or BLIS_ARCH_32 is defined
in bli_system.h (based on whether preprocessor macros such as __x86_64
or __aarch64__ are defined by the environment), bli_system.h was being
included *after* bli_config_macro_defs.h, in which the BLIS_ARCH_64
macro was used to choose an integer type size in the event that
BLIS_INT_TYPE_SIZE was not already defined by configure via
bli_config.h. And due to the structure of the cpp code in that file,
the 32-bit integer case was being chosen. Thanks to Francisco Igual
and Devangi Parikh for their help in isolating this bug.
- Moved the include of hbwmalloc.h and related preprocessor code to
bli_kernel_macro_defs.h to facilitate the reshuffling of the include
for bli_system.h in blis.h.

commit f930cec0f35824c0f9ebbd218614209217d491cb
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue May 15 17:47:08 2018 -0500

More tweaks to CONTRIBUTING.md.

commit 173e30ff7d293ba31f3fab8ab0c0a695eda3d4fd
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue May 15 14:48:34 2018 -0500

Added initial draft of CONTRIBUTING.md file.

Details:
- Thanks to the Ruby on Rails project for providing a good template off
of which to build.

commit 6e25e758b444bf725046674e1e64c6a52421749d
Author: Nico Schlömer <nico.schloemergmail.com>
Date: Tue May 15 14:03:20 2018 +0200

Debian config (206)

* add debian config

* correct wording in the README

commit fcf6c6a3c87da08a7cdb92b102489b991ef7a644
Author: Alex Arslan <ararslancomcast.net>
Date: Mon May 14 18:41:03 2018 -0700

Fix shared library builds on platforms other than Linux and macOS (209)

* Fix detection of systems other than Linux and macOS

The way the logic is currently laid out, any platform that isn't Linux
gets assigned the .dylib shared library extension and the macOS-specific
compiler flags. This reverses the logic to check for macOS first, and
have the fallback use the Linux definitions, which apply to most other
systems as well.

* Use SHLIB_EXT instead of SO_SUF

The former is more standard, as jakirkham pointed out in a comment.

commit 6f7f51048c48f31d691c06451d0fd2cbc453ad03
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon May 14 18:41:56 2018 -0500

Echo cc_vendor when printing compiler version.

Details:
- Echo the ${cc_vendor} when informing the user of the compiler's version.
Previously, the actual ${cc} (which could be a path to the executable)
was being printed, which has already been printed by that point in the
configure script.

commit ad67dc4e348b0a381efc057573a6b03cc7e26db0
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon May 14 18:35:28 2018 -0500

Communicate cc, cc_vendor to make via config.mk.

Details:
- Historically, the compiler selection has happened statically in the
various make_defs.mk and would only be overriden by setting CC (either
prior to running configure or as a configure argument). However, in
the last couple months, configure has evolved to contain rather
sophisticated compiler detection logic for the purposes of blacklisting
sub-configurations. It only makes sense that configure now fully take
over the responsibility of selecting a compiler from the GNU make side
of the build system. Thanks to Alex Arslan for his help exposing this
issue.
- Substitute found_cc into CC in config.mk via configure.
- Set a new variable, CC_VENDOR, in config.mk via substitution from
configure, and disable the corresponding CC_VENDOR code in common.mk.
- Disabled default compiler selection (usually gcc) in the sub-configs'
various make_def.mk files.

commit 20af119fc97ec6120017a7a5ba5f9aaa920c7640
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon May 14 17:44:58 2018 -0500

Added README.md to 'config' directory.

Details:
- Added a brief README.md file to the config directory to redirect those
who may be exploring the source tree to the ConfigurationHowTo wiki.
(Included is a very brief explanation of configurations for those who
don't have time to read the wiki.) Thanks to Nico Schlömer for this
suggestion.

commit 9dbce16269c3e1f27c7a0d64372cc76aed30dfc1
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon May 14 17:04:54 2018 -0500

Search for 'cc clang gcc' on OpenBSD, FreeBSD.

Details:
- Swapped gcc and clang in the compiler search list for OpenBSD.
- Use the same search list for FreeBSD as above.

commit 55ebf24d63128b5fd15b10160485667415a02a55
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon May 14 16:19:08 2018 -0500

Change compiler search order on OpenBSD.

Details:
- Set a compiler search list (and order) as a function of the OS detected
via 'uname -s'. By default, this list and order is 'gcc clang cc' for
Linux and Darwin (OS X), and any other OS except OpenBSD). On OpenBSD,
we use 'cc gcc clang' because OpenBSD's default installation of gcc
(4.2.1) is too old for BLIS. Thanks to Alex Arslan for reporting this
issue and suggesting a fix.

commit 4fb353bd90e6642c8aeffd1b1e6329f54eee4bb4
Merge: 4b36e85b 8a2857b5
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sun May 13 17:50:51 2018 -0500

Merge branch 'master' into dev

commit 8a2857b5e3c633b18c24f2275110437a702a71d0
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri May 11 18:42:05 2018 -0500

Fixed README.md typo; mention 'make check'.

commit 543935c02f9335142d2e485a15f37dbaebe012ed
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri May 11 18:35:32 2018 -0500

Updated README.md with Ubuntu packages link.

Details:
- Created a separate section of README.md for external packages, with
one bullet each for Dave Love's rpms and Nico Schlömer's Ubuntu apt
packages. Thanks to Dave and Nico for their contributions.

commit af1d8470b56d3b2a1c8513d366d788dddcb84baa
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri May 11 17:49:58 2018 -0500

Better handling of shared libraries on OS X.

Details:
- Use the .dylib shared library suffix on OS X (instead of .so in Linux).
- Link with the -dynamiclib and -install_name options on OS X (instead of
-shared and -soname in Linux).
- Determine operating system (e.g. Linux, Darwin) during configure and
substitute into config.mk.in rather than run 'uname -s' during make.
- Echo operating system during configure.

commit 4b72a462d7467cf815422aafac7b05037d2e3b13
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu May 10 18:35:38 2018 -0500

Enable building shared library by default.

Details:
- Tweaked configure so that the shared library is generated by default.
- Updated --help text and configure's feedback messages reporting the
status of the static/shared builds.
- Changed the order of build product installation so that headers are
installed last, after libraries and symlinks.

commit b699bb1ff03c6e9baaa054805b4939983ae7145b
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu May 10 15:54:17 2018 -0500

Adopt Linux-like .so versioning at install-time.

Details:
- Changed the naming conventions used for installed libraries and
symlinks to more closely mirror patterns used by typical GNU/Linux
libraries. Whereas previously static and shared libraries were
installed and symlinked as follows:

(library) libblis-0.3.2-15-haswell.a
(library) libblis-0.3.2-15-haswell.so
(symlink) libblis.a -> libblis-0.3.2-15-haswell.a
(symlink) libblis.so -> libblis-0.3.2-15-haswell.so

we now use the following naming conventions:

(library) libblis.a
(symlink) libblis.so -> libblis.so.0.1.2
(symlink) libblis.so.0 -> libblis.so.0.1.2
(library) libblis.so.0.1.2

where 0.1.2 indicates shared library major, minor, and build versions
of 0, 1, and 2, respectively. The conventional version string can
still be queried by linking to the library in question and then calling
bli_info_get_version_str(). (The testsuite binary does this
automatically at startup.)
- Added logic to common.mk to set the soname field in the shared library
via the -soname linker flag.
- Added a 'so_version' file to the top-level directory containing two
lines. The first line specifies the .so major version number, and the
second line specifies the minor and build version numbers joined with
a '.'. This file is read by configure and those values substituted
into build/config.mk.in to define SO_MAJOR, SO_MINORB, and SO_MMB
variables.

commit fc2d9ec6bf46f6e5b19d196208415ce433e95b10
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed May 9 15:19:28 2018 -0500

Tweaks to top-level clean and distclean targets.

Details:
- Moved the removal of bli_config.h from cleanh to distclean.
- Removed cleantest as a dependency of clean.

commit bf0350305971e3991861b5117a13fda31ff97b6d
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue May 8 16:49:22 2018 -0500

Renamed (shortened) a few build system variables.

Details:
- Renamed the following variables in config.mk (via build/config.mk.in):
BLIS_ENABLE_VERBOSE_MAKE_OUTPUT -> ENABLE_VERBOSE
BLIS_ENABLE_STATIC_BUILD -> MK_ENABLE_STATIC
BLIS_ENABLE_SHARED_BUILD -> MK_ENABLE_SHARED
BLIS_ENABLE_BLAS2BLIS -> MK_ENABLE_BLAS
BLIS_ENABLE_CBLAS -> MK_ENABLE_CBLAS
BLIS_ENABLE_MEMKIND -> MK_ENABLE_MEMKIND
and also renamed all uses of these variables in makefiles and makefile
fragments. Notice that we use the "MK_" prefix so that those variables
can be easily differentiated (such as via grep) from their "BLIS_" C
preprocessor macro counterparts.
- Other whitespace changes to build/config.mk.in.
- Renamed the following C preprocessor macros in bli_config.h (via
build/bli_config.h.in):
BLIS_ENABLE_BLAS2BLIS -> BLIS_ENABLE_BLAS
BLIS_DISABLE_BLAS2BLIS -> BLIS_DISABLE_BLAS
BLIS_BLAS2BLIS_INT_TYPE_SIZE -> BLIS_BLAS_INT_TYPE_SIZE
and also renamed all relevant uses of these macros in BLIS source
files.
- Renamed "blas2blis" variable occurrences in configure to "blas", as
was done in build/config.mk.in and build/bli_config.h.in.
- Renamed the following functions in frame/base/bli_info.c:
bli_info_get_enable_blas2blis() -> bli_info_get_enable_blas()
bli_info_get_blas2blis_int_type_size()
-> bli_info_get_blas_int_type_size()
- Remove bli_config.h during 'make cleanh' target of top-level Makefile.

commit 4b36e85be9b516b4089b24768f881dd976668997
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue May 8 14:26:30 2018 -0500

Converted function-like macros to static functions.

Details:
- Converted most C preprocessor macros in bli_param_macro_defs.h and
bli_obj_macro_defs.h to static functions.
- Reshuffled some functions/macros to bli_misc_macro_defs.h and also
between bli_param_macro_defs.h and bli_obj_macro_defs.h.
- Changed obj_t-initializing macros in bli_type_defs.h to static
functions.
- Removed some old references to BLIS_TWO and BLIS_MINUS_TWO from
bli_constants.h.
- Whitespace changes in select files (four spaces to single tab).

commit 7e5648ca150757b874f6823da832f3798c40b9f9
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon May 7 18:59:19 2018 -0500

Add configure support for --libdir, --includedir.

Details:
- Added support for two new configure options: --libdir and --includedir.
They specify the precise install directories for libraries and header
files, respectively, and override any location implied by the --prefix
option (including the default install prefix, if --prefix was not
given). Thanks to Nico Schlömer for suggesting this via issue 195.
- Removed the INSTALL_PREFIX definition/anchor from build/config.mk.in
and replaced it with corresponding definitions/anchors for libdir and
includedir.
- Updated top-level Makefile to use the new variables, INSTALL_LIBDIR
and INSTALL_INCDIR, instead of INSTALL_PREFIX (which is now no longer
needed by make).
- Set default sane values for INSTALL_LIBDIR and INSTALL_INCDIR in
common.mk when configure has not been run, as is already done for
DIST_PATH. This is to safeguard against statements in the top-level
Makefile that use 'find' to locate old libraries and headers for the
uninstall targets, which run regardless of make target. Without setting
INSTALL_LIBDIR and INSTALL_INCDIR, those variables are empty and the
'find' ends up looking at '/', which is obviously not what we want.
(Also enclosed those definitions in an IS_CONFIGURED guard so that they
won't get evaluated unless configure has been run.)
- Rearranged "ifeq ($(IS_CONFIGURED),yes)" conditionals in Makefile to
reduce occurrences and separated "local" and top-level components of
cleanblastest and cleanblistest targets to improve readability.
- Adjusted out-of-tree builds so that they are no longer oblivious to
the .git directories, if present, and thus now properly augment version
strings with the appropriate patch number.
- Include missing version string in 'configure --help' output.

commit b09e4e8852a6c42895910e3bcb9041124dc8bf9f
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon May 7 14:37:50 2018 -0500

Allow 'make clean' and friends without configuring.

Details:
- Modified top-level Makefile so that a user can run 'make distclean',
'make clean', or any of the other clean-related targets prior to
running configure (or after a previous 'make distclean'). Thanks to
Nico Schlömer for suggesting this via issue 197.
- Made the cleanblastest and cleanblistest more comprehensive in that
they now clean out build products that would have resulted from local
compilation (ie: builds performed within the 'blastest' or 'testsuite'
directories).
- Added "cc" to list of expected compiler "vendors" since the CC variable
seems to automatically be set to "cc" on Ubuntu 16.04 (which is just an
alias to gcc).
- Comment update to build/config.mk.in.

commit 35c5a1449c3efe0b2ec43cdefcfdf00e71828149
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon May 7 12:04:57 2018 -0500

No longer update version file during configure.

Details:
- Recycled the core functionality of build/update-version-file.sh into a
function in configure, disabling the updating of the 'version' file in
the process. Instead of writing the patched version string back to the
version file and then reading it again from within configure, the
patched version string is now saved directly to a variable in the main()
function in configure. This will prevent developers from accidentally
committing configure-induced changes to the version file in between
releases.

commit 8adb2f919b62da4a2885ae04a10925e0e6a2e304
Author: Mathieu Poumeyrol <kaliusers.noreply.github.com>
Date: Sun May 6 19:58:16 2018 +0200

Some cross compilations fixes (198)

* cross-compilation fixes
* add doc ranlib variable
* icc support -dumpversion, posix compatible test, plus one stupid mistake
* retab
* revert version as requested

commit 89acd9ebe516eeb97006dba344354bfc98826645
Merge: 4cff432d 0557eba7
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed May 2 12:53:35 2018 -0500

Merge branch 'amd'

commit 4cff432d707891ada705b039a7e043558bbf3c51
Author: Nisanth M P <31736542+nisanthmpamdusers.noreply.github.com>
Date: Wed May 2 23:20:42 2018 +0530

AMD specific optimizations for target 'zen' (194)

Re-enabled AMD-specific optimizations for zen.

Details:
- Re-enabled Zen-specific cache blocksizes for 'zen' sub-configuration.
- Re-enabled small matrix gemm optimization for 'zen'.
- These were both temporarily disabled during a previous merge simply due to lack of Zen hardware for testing.

commit 8eda5fe7f678b413cb274bd84716995a7d0b87a9
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed May 2 12:20:37 2018 -0500

Typo fix in README.md.

commit 0557eba78f5fcf28f0f039f28da79498ffde848c
Author: Nisanth M P <nisanth.padinharepattamd.com>
Date: Mon Mar 19 12:49:26 2018 +0530

Re-enabling the small matrix gemm optimization for target zen

Change-Id: I13872784586984634d728cd99a00f71c3f904395

commit df78ceb3d6f33a27fe69017854405edaea7c40e5
Author: Nisanth M P <nisanth.padinharepattamd.com>
Date: Mon Mar 19 11:34:32 2018 +0530

Re-enabling Zen optimized cache block sizes for config target zen

Change-Id: I8191421b876755b31590323c66156d4a814575f1

commit 5e515f9a76f4aaf43dc21315a34d797726ca8069
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue May 1 13:44:10 2018 -0500

Tweaked new language in README.md.

commit 1ddd9e316ad5024af8b606dfcebd1e7d587a130f
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue May 1 13:36:28 2018 -0500

Added link to Dave Love's Fedora Copr page.

Details:
- Added a blurb to README.md advertising Dave Love's Copr homepage,
which contains rpm packages for RHEL/Fedora-like distributions.

commit 078a852f738c66c6468bd5e64b06467edc9057fd
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Apr 30 16:15:26 2018 -0500

Minor tweaks to top-level 'make clean' target.

Details:
- Execute 'cleanh' target as part of 'clean'
- Remove cblas.h file from 'include/<configname>/' as part of 'cleanh'
target.
- Updated the echoed (non-verbose) text for uniformity.

commit 75d0d1057dda69c655bd1cd8f791cb39b54d99b8
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Apr 30 14:57:33 2018 -0500

Renamed various datatype-related macros/functions.

Details:
- Renamed the following macros in bli_obj_macro_defs.h and
bli_param_macro_defs.h:
- bli_obj_datatype() -> bli_obj_dt()
- bli_obj_target_datatype() -> bli_obj_target_dt()
- bli_obj_execution_datatype() -> bli_obj_exec_dt()
- bli_obj_set_datatype() -> bli_obj_set_dt()
- bli_obj_set_target_datatype() -> bli_obj_set_target_dt()
- bli_obj_set_execution_datatype() -> bli_obj_set_exec_dt()
- bli_obj_datatype_proj_to_real() -> bli_obj_dt_proj_to_real()
- bli_obj_datatype_proj_to_complex() -> bli_obj_dt_proj_to_complex()
- bli_datatype_proj_to_real() -> bli_dt_proj_to_real()
- bli_datatype_proj_to_complex() -> bli_dt_proj_to_complex()
- Renamed the following functions in bli_obj.c:
- bli_datatype_size() -> bli_dt_size()
- bli_datatype_string() -> bli_dt_string()
- bli_datatype_union() -> bli_dt_union()
- Removed a pair of old level-1f penryn intrinsics kernels that were no
longer in use.

commit 01c4173238baf08e7f6700a3f91a2ea58cca50c1
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Apr 28 14:07:34 2018 -0500

CHANGELOG update (0.3.2)

0.3.2

Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Apr 28 14:07:31 2018 -0500

Version file update (0.3.2)

commit cdf041ddadd8725e578e2f59f37ae341f26655af
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Apr 28 14:05:00 2018 -0500

Use config.mk instead of common.mk in bump-version.sh.

Details:
- Fixed inadvertent targeting of common.mk when testing whether configure
had already been run, rather than config.mk.

commit 6ded8f9f0364b3c07255e2532ada3eeb2ed2a715
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Apr 28 14:01:29 2018 -0500

Account for recent 'make distclean' in bump-version.sh.

Details:
- Added logic to build/bump-version.sh that will run './configure auto'
if 'common.mk' is not present (usually because 'make distclean' was run
recently).

commit 7c16fdce433f5dea0e83d5047553c955d8e46fd2
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Apr 28 13:50:55 2018 -0500

Fixed typo in RELEASING file.

commit 5e5ca4984fcf6d72d3036c338bb9cdc64520a325
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Apr 28 13:48:01 2018 -0500

README updates.

Details:
- Updates to the top-level README files in the top-level directory as
well as the 'examples/oapi' directory.

commit 627b045e301defea6770dc5b64e1110cbec25153
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Apr 27 18:11:19 2018 -0500

Added an example of using transposition with gemm.

Details:
- Added an example to examples/oapi/8level3.c to show how to indicate
transposition when performing a gemm operation.

commit 13a0eadc69d72933e322901f5b44944834e3c787
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Apr 27 18:00:07 2018 -0500

Added more transposition/conjugation examples.

Details:
- Added code to examples/oapi/5level1m.c that demonstrates transposing
(and conjugate-transposing) unstructured matrices.
- Comment updates to 6level1m_diag.c to maintain consistency with new
examples in 5level1m.c.

commit 5606cd8881e75264a96af45dc8ea1905bab054f5
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Apr 27 17:13:10 2018 -0500

Added utility module to examples/oapi.

Details:
- Added a new code example file to examples/oapi demonstrating how to use
various utility operations.
- Comment updates to other example files.
- README updates.

commit ff26c94c6486374c709f93c6965ea18903bd6a18
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Apr 27 12:31:34 2018 -0500

Added missing gcc version constraint for knl.

Details:
- Previously forgot to add explicit enforcement of a minimum gcc version
in configure script when 'knl' sub-configuration is requested.
- Comment updates to configure.

commit 4d97574e477b3e55ddbb6044b0542a92cd9bab30
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Apr 24 18:48:09 2018 -0500

Added object API example code.

Details:
- Added an 'examples' directory at the top level.
- Added an 'oapi' subdirectory in 'examples' that contains a tutorial-like
sequence of example code demostrating the core functionality of BLIS's
object-based API, along with a Makefile and README. Thanks to Victor
Eijkhout for being the first to suggest including such code in BLIS.

commit d6ab25a3232aa52b9b855088fb4b0b46ff2c00c8
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Apr 24 18:43:03 2018 -0500

Add setijm, getijm operations.

Details:
- Added bli_setgetijm.c, which defines bli_setijm(), bli_getijm(), and
related functions that can be used to read and write individual
elements of an obj_t.
- Defined a new function, bli_obj_create_conf_to(), in bli_obj.c that will
create a new object with dimensions conformal to an existing object.
Transposition and conjugation states on the existing object are ignored,
as are structure and uplo fields.
- Defined a new function, bli_datatype_string(), in bli_obj.c that returns
a char* to a string representation of the name of each num_t datatype.
For example, BLIS_DOUBLE is "double" and BLIS_DCOMPLEX is "dcomplex".
BLIS_INT is included (as "int"), but BLIS_CONSTANT is not, and thus is
not a valid input argument to bli_datatype_string().
- Added calls to bli_init_once() to various functions in bli_obj.c, the
most important of which was bli_obj_create_without_buffer().
- Removed unintended/extra newline from the end of printv output.
- Whitespace changes to
- frame/base/bli_machval.c
- frame/base/bli_machval.h
- frame/0/copysc/bli_copysc.c
- Trivial changes to README.md and common.mk.

commit a731a428f7fc02fd6ab4f953ead828c1d06fb5a1
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Apr 17 16:44:55 2018 -0500

Another README.md update.

commit c734ee928a824b27d280a9a67b1b4bc8423d5795
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Apr 17 16:40:05 2018 -0500

README.md update.

commit 03ecad372d8eb603ee905a7b944d0544a813460a
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Apr 17 14:16:59 2018 -0500

Added RELEASING file.

Details:
- Added a file named 'RELEASING' that contains basic notes on how to
create a new version/release of BLIS. This is mostly just a reminder
to myself, but also may become useful if/when others take over
development and administration of the project.

commit 24b3c3149ce66546b9a1afc2cc794a637a86aa60
Merge: 60366a3f 817b67c0
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Apr 16 18:49:38 2018 -0500

Merge branch 'dev' of github.com:flame/blis into dev

commit 60366a3faba4e60cee85c3b87a3f69625f4b9026
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Apr 16 18:46:21 2018 -0500

Updates to knl kernels and related code.

Details:
- Imported the 24x16 knl sgemm microkernel (and its corresonding spackm
kernel) from TBLIS and enabled its use in the knl sub-config. Also
Added sgemm microkernel prototype to bli_kernels_knl.h.
- Updated dgemm and dpackm microkernels from TBLIS, which included an
important change regarding the offsets array (changed from extern
declaration to static declaration/definition).
- Activated use of level-1v and -1f zen kernels in skx and knl
sub-configs.
- Removed some old macros no longer needed in bli_family_skx.h now that
libmemkind support exists in configure.
- Moved bli_avx512_macros.h to frame/include and adjusted includes in
skx and knl kernels accordingly.
- Moved unused kernels in kernels/knl/3 to kernels/knl/3/other
directory.
- Fixed a minor bug in the 'make' output per compile when verboseness
is not turned on. The rule-generating function 'make-kernel-rule' was
previously passing in the name of the config, rather than the name of
the kernel set returned by get-config-for-kset, which could give
misleading information to the user when the kconfig_map mapped a
kernel set to a sub-configuration that did not share the same name.
(This didn't affect the CFLAGS that were actually used.)
- Updated test/3m4m/Makefile, removing acml targets and renaming the
remaining targets.

commit 817b67c01752e0ca8fe230bb8ad23afc7bd0f64e
Merge: 67c9c2f8 2b7108a8
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Apr 16 14:06:26 2018 -0500

Merge branch 'dev' of github.com:flame/blis into dev

commit 67c9c2f86d5ef2accc439b21581d73d82754a2e3
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Apr 16 14:03:12 2018 -0500

Retired haswell gemm microkernels.

Details:
- Moved microkernels in kernels/haswell/3 to kernels/haswell/3/old. These
microkernels were no longer being used and only sowed confusion to
anyone inspecting the repository without being fully cognizant of the
build system and how it works (and sometimes even to those who wrote
the build system). Note that the haswell configuration currently
employs the zen microkernels.

commit 2b7108a8ef8ce958b3acad028ff07c85ff97fd63
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Apr 16 12:35:53 2018 -0500

Minor updates to test driver makefiles.

Details:
- Cleaned up and homogenized the various test driver Makefiles in
testsuite and test directories.
- Very minor updates to test driver code.

commit 9f56df95570a24587b910b169f342bd356ccbfb6
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Apr 11 14:51:36 2018 -0500

Trivial tweaks to configure blacklisting output.

Details:
- Updated output of information vis-a-vis configuration blacklisting.

commit f56481efebd9a7785c0618f3a12c0bec36f46333
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Apr 10 19:02:21 2018 -0500

Cleaned up assembler version query on OS X.

Details:
- Swiched from querying version of 'objdump' to 'as' (e.g. the
assembler).
- Fixed the outputting of the version of 'as' on OS X, which required
this beauty:
...=$(as -v /dev/null -o /dev/null 2>&1)
- Only add sub-configs to blacklist if the sub-config hasn't already
been added.

commit 088c474e629535affbe111f141f895af50d109be
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Apr 10 18:09:56 2018 -0500

Added support for blacklisting via the assembler.

Details:
- Added logic to configure that attempts to assemble various small files
containing select instructions designed to reveal whether binutils
(specifically, the assembler) supports emitting those instruction sets.
This information provides additional opportunities to blacklist sub-
configurations that are unsupported by the environment. Thanks to Devin
Matthews for pointing me towards a similar solution in TBLIS as an
example.
- Various other cleanups in configure.
- Reorganized the detection code in the 'build' directory, bringing the
"auto-detect" configuration detection, libmemkind detection, and new
instruction set detection codes into a single new subdirectory named
'detect'.

commit 78a24e7dada52a3582f8488795bd1a44993989d9
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Apr 9 17:02:13 2018 -0500

Updated bli_avx512_macros.h in knl and skx configs.

Details:
- Downloaded updated version of bli_avx512_macros.h from TBLIS [1] in
attempt to address issue 192.
[1] https://github.com/devinamatthews/tblis/

commit 388f64d6ade14caa4a6c286845ad2d565378b2bb
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Apr 9 15:33:10 2018 -0500

Fixed failure to honor CC= argument to configure.

Details:
- Fixed a failure to observe the value of CC when selecting the compiler
in configure. Thanks to Devangi Parikh for reporting this bug.
- The semantics now also work for the CC environment variable. That is,
if CC is set prior to running configure, that value is used, but will
be overridden by specifying the CC= argument to configure. If the CC
environment variable is not set, the CC= value is used. If neither the
environment variable nor CC= are specified, then the choice is made
internally to configure: first attempting to find gcc, then clang, and
then cc.

commit 45fbe66b3e2ab92f0b4fdf437d57c5d06603803d
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Apr 9 14:01:08 2018 -0500

Fixed libmemkind dependency for x86_64.

Details:
- Removed some old conditional code in config/knl/make_defs.mk that
added -lmemkind to LDFLAGS if DEBUG_TYPE was not 'sde' and inserted
code into common.mk that affirmatively filters out -lmemkind from
LDFLAGS if DEBUG_TYPE is 'sde'. (Thanks to Dave Love for reporting
this issue.) Other minor cleanups to neighboring code in common.mk.
- Updated CRVECFLAGS in knl/make_defs.mk to be based on -march=knl,
and then AVX-512 functionality is manually removed via various
-mno-avx512* flags. Also, make the setting of CRVECFLAGS conditional
on CC_VENDOR. Similar change to skx/make_defs.mk.
- Comment/whitespace updates.

commit ca982148b3b419db063cad2fa74376ec383a5c80
Author: dnp <devangiparikhgmail.com>
Date: Sun Apr 8 21:27:10 2018 -0500

Fixed bug in SKX sgemm microkernel. Modified SKX dgemm mircokernel to be consistent with the sgemm microkernel

commit bd0276752ccdd56ff897b1a5ae022f2ffe6e0b38
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Apr 6 18:51:43 2018 -0500

Track separate ref kernel flags for each sub-config.

Details:
- Renamed CVECFLAGS variables in sub-configurations' make_defs.mk files
to CKVECFLAGS.
- Added default defintions of two new make variables to most sub-
configurations' make_defs.mk files--CROPTFLAGS and CRVECFLAGS--
which correspond to reference kernel analogues of the CKOPTFLAGS
and CKVECFLAGS, which track optimization and vectorization flags for
optimized kernels. Currently, two sub-configurations (knl and skx)
explicitly set CRVECFLAGS to non-default values (using AVX2 instead of
AVX-512 for reference kernels. Thanks to Jeff Hammond, whose feedback
prompted me to make this change (issue 187).
- Changed common.mk so that the get-refkern-cflags-for function returns
the flags associated with the given sub-configuration's CROPTFLAGS
and CRVECFLAGS (instead of CKOPTFLAGS and CKVECFLAGS).

commit b9aebce19480448817373e2df2b36bd090eae41a
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Apr 6 18:37:33 2018 -0500

De-verbosify makefile fragment generation.

Details:
- Changed from -v1 to -v0 when calling gen-make-frag.sh from configure.
The directory-by-directory recursive output didn't add much value to
the user, so now we just echo a line for each top-level directory into
which we will recurse (e.g. 'config', 'ref_kernels', 'frame', etc.).
This also helps keep more interesting information (from earlier in the
execution of configure) from scrolling out of the terminal window.

commit b549b91f26948991e13364f1f26a878da0f43aa0
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Apr 6 16:31:33 2018 -0500

Added 64-bit integer support to BLAS test drivers.

Details:
- Updated the build system and BLAS test drivers to use 64-bit integers
when BLIS is configured for 64-bit integers in the BLAS layer. Also
updated blastest/Makefile accordingly. Thanks to Dave Love for
reporting the need for this feature.
- Added a 'check' target to blastest/Makefile so that the user can see
a summary of the tests.
- Commented out the initial definition of INCLUDE_PATHS in common.mk,
which was used pre-monolithic header, back when BLIS needed paths to
*all* headers, rather than just a select few. This line is no longer
needed since the value of INCLUDE_PATHS is overwritten by a later
definition limited to only the header paths that are needed now.

commit d39fa1c04265869bdf8b6f453076359eec2f3c59
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Apr 5 19:38:35 2018 -0500

Adjusted CFLAGS used to compile bli_cntx_ref.c.

Details:
- Removed CKOPTFLAGS and CVECFLAGS from the set of CFLAGS used to
compile bli_cntx_ref.c for each configuration. This is necessary
because the file defines functions like bli_cntx_init_skx_ref(),
which are called during BLIS's initialization of the global kernel
structure, potentially being executed by an architecture that lacks
the instruction set used to compile the kernels for, in this example,
skx, which would lead to an illegal instruction error. Thanks to
Dave Love for reporting this issue.
- Further adjusted CFLAGS used when compiling code in the 'config'
directory (e.g. bli_cntx_init_skx.c) as well as code in 'frame' so
as to avoid the aforementioned issue.

commit 08b123084d35680beab379012f8f5a5a8b44a443
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Apr 5 14:25:39 2018 -0500

Added color-coding to 'make check' output.

Details:
- Added color coding to output of check-blistest.sh, check-blastest.sh
scripts. Success messages are coded green and failure are coded red.
This helps draw the eye toward those messages as the 'make checkblis',
'make checkblis-fast', and 'make checkblas' targets are executed.
- Changed top-level Makefile so that execution will not halt if
'checkblis', 'checkblis-fast', or 'checkblas' targets fail, which
means that the second of the two tests (BLIS and BLAS) run by
'make check' will run even if the first test fails.

commit c9e4d7db7410b03c1ffe8c9727e9f1b2ba7fecfe
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Apr 4 17:13:15 2018 -0500

CHANGELOG update (0.3.1)

0.3.1

Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Apr 4 17:13:15 2018 -0500

Version file update (0.3.1)

commit e6cc9ee26bcf0450f1120d5d12985b04d9fb8516
Merge: 786d15c5 3c91c7ae
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Apr 4 16:08:18 2018 -0500

Merge branch 'dev' of github.com:flame/blis into dev

commit 786d15c5ef09f1f647b126b63d57e76d5810c58e
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Apr 4 16:06:47 2018 -0500

Added skx, knl to x86_64 configuration family.

Details:
- Added 'skx' and 'knl' sub-configurations to the 'x86_64' configuration
family in the config_registry file.
- Added logic to configure that avoids committing certain sub-configs to
the configuration/kernel registries if those sub-configs cannot be
handled properly by the chosen compiler. (This was modeled after
similar logic in TBLIS's configure; thanks to Devin Matthews for
pointing this out.) First, the compiler and its version are inspected
and, based on the results, certain configurations are added to a
"blacklist". Then, as the configuration registries are being created,
configurations and/or kernels that match items in the blacklist are
skipped over and not commited to the registries. Under certain
circumstances, omitting a blacklisted configuration will indirectly
invalidate other configurations due to the loss of availability of
the original blacklisted configuration's kernel set. This additional
indirect blacklist is also accounted for.
- Added output to the beginning of configure that echos information
about the chosen compiler as well as the configurations that are
blacklisted and must be stripped from the registries.
- Various other cleanups in configure, especially with respect to
explicitly declaring local variables in functions.
- Comment updates to config/zen/make_defs.mk regarding choice of -march
flags based on compiler version.

commit 3c91c7aebafb446a2582267beb3b22c8bb475b3b
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Apr 2 12:40:25 2018 -0500

Fixed 64b type mismatch warning in cblas_xerbla.c.

Details:
- Fixed a compiler warning concerning a type mismatch between the
format specifier of the printf() call in cblas_xerbla.c and its
corresponding (info) argument. The warning manifested when the CBLAS
layer was enabled and the BLAS/CBLAS integer type siwas is set to 64
(the default is 32). The warning was fixed by changing the specifier
from %d to %jd and typecasting the argument to intmax_t. Thanks to
Dave Love for reporting this issue and submitting the patch.

commit 71eaf449a812fe2bd640d21513ec83974b2edb45
Merge: 6a628184 ae9a5be5
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Mar 27 17:21:43 2018 -0500

Merge branch 'dev'

commit ae9a5be56d6f9b87278d6032154d2dcf3fb7d54f
Author: dnp <devangiparikhgmail.com>
Date: Tue Mar 27 17:01:23 2018 -0500

Fixed bug in skx sgemm microkernel

commit 3f02af0905b1e2e2e065862f8afe5e9a52f282b2
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Mar 26 17:40:04 2018 -0500

Row storage optimizations to zen dotxf kernels.

Details:
- Split the main loop bodies of zen's [sd]dotxf kernels into two cases:
one to handle a column-stored matrix A and one to handle a row-stored
matrix A. This allows vector instructions to be employed even if A is
stored by rows (and A^T appears stored as columns). Both storage cases
use a common edge case loop. Thanks to Devin Matthews for this idea
and for prototyping the change needed for sdotxf kernel.

commit 679dcc331dd870ec680e135a3fb65ffa6e3a91c2
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Mar 26 15:35:17 2018 -0500

Make k_iter/k_left uint64_t in bulldozer fma ukrs.

Details:
- Changed the declaration of k_iter and k_left for d, c, z microkernels
from dim_t to uint64_t. This is needed to ensure compatibility with
the movq instruction used to load the value into registers. This
change should have been made a long time ago, but for some reason
only recently began showing up via Travis CI.

commit 6a628184f6938673440e4cdd4fed0208c51fd1f9
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Mar 26 14:48:16 2018 -0500

Fixed a memkind-related compile-time bug on knl.

Details:
- Fixed a compile-time error that occurred due to the fact that
BLIS_ENABLE_MEMKIND, defined in bli_config.h, was not being defined
soon enough to be used in bli_system.h where it is needed to determine
whether hbwmalloc.h should be included. bli_system.h is now included
after bli_config.h (and bli_config_macro_defs.h). Thanks to Dave Love
for reporting this issue.
- Tweaked the language used by configure to echo the status of the
--with[out]-memkind option.

commit e2192a8fd58ec3657434ddd407033e097edad8f4
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Mar 23 12:53:48 2018 -0500

Removed vzeroupper intrinsics from zen kenels.

Details:
- Fixed a bug in the zen (also used by haswell) dotxf kernels whereby a
vzeroupper instruction destoryed part of the intermediate result
stored by the vdpps instructions that came right before. (The
vzeroupper instrinsic was removed.)
- Removed remaining vzeroupper instrinsics from other zen kernels.
Previously, the vzeroupper instructions were included because BLIS is
typically compiled with -mfpmath=sse. But it was brought to my
attention that inserting these vzeroupper instructions is unnecessary
for our purposes, since (a) -mfpmath=sse results in VEX-encoded scalar
code rather than literal SSE instructions, and (b) compilers already
(likely) insert vzeroupper instructions where necessary. Thanks to
Devin Matthews for zeroing in on the dotxf bug.
- Removed -malign-double from bulldozer make_defs.mk. This alignment
was already happening by default since bulldozer is an x86_64 system.

commit 22289ad23cd10b81451ce82f60d84b5f97e7fd85
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Mar 22 18:21:30 2018 -0500

Added build system support for libmemkind.

Details:
- Added support for libmemkind to configure. configure attempts to
detect the presence of libmemkind by compiling a small program
containing include <hbwmalloc.h> and a call to hbw_malloc(). If
successful, it is assumed that libmemkind is present and available.
If present, use of libmemkind is enabled by default, and otherwise
use is disabled by default. If libmemkind is present, the user may
explicitly disable use of the library by running configure with the
--without-memkind option. Furthermore, a configuration may disable
libmemkind, perhaps conditional on some aspect of the build system,
by including -DBLIS_DISABLE_MEMKIND in the configuration's CPPROCFLAGS
make variable and setting the BLIS_ENABLE_MEMKIND makefile variable,
set in config.mk, to 'no'. (The knl configuration makes use of this
latter feature; see below.)
- If enabled at configure-time, bli_system.h will include <hbwmalloc.h>
and bli_kernel_macro_defs.h will define BLIS_MALLOC_POOL and
BLIS_FREE_POOL to use hbw_malloc() and hbw_free(), respectively.
- Deprecated explicit use of BLIS_NO_HBWMALLOC in
config/knl/bli_family.knl.h and replaced use of -DBLIS_NO_HBWMALLOC in
config/knl/make_defs.mk with -DBLIS_DISABLE_MEMKIND, which overrides
(undefs) the definition of BLIS_ENABLE_MEMKIND in bli_system.h, if it
would otherwise be defined. Also, set the BLIS_ENABLE_MEMKIND makefile
variable to 'no'.
- common.mk now adds libmemkind to LDFLAGS if libmemkind is enabled.

commit 7dc40eafdd9af3e8c4519a8d1b04d25830b4ca7a
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Mar 21 18:39:16 2018 -0500

Updates to top-level and test driver Makefiles.

Details:
- Added logic to common.mk that will choose a BLIS library against which
to link (LIBBLIS_LINK). The default choice is the static (.a) library;
the shared (.so) library is chosen only if the shared library build was
enabled and the static one was disabled.
- Updated the various test driver Makefiles to reference this common,
pre-chosen library against which to link. (Previously, these drivers
unconditionally linked against the static library and would have
failed if the static library build was disabled at configure-time.)
- Renamed many of the variables in common.mk and the top-level Makefile
so that variables relating to the libblis.[a|so] files, including
paths to those files, begin with "LIBBLIS".
- Shuffled around some of the library definitions from the top-level
Makefile to common.mk.
- Renamed BLIS_ENABLE_DYNAMIC_BUILD to BLIS_ENABLE_SHARED_BUILD, and
the enable_dynamic anchor to enable_shared in build/config.mk.in
and in configure.
- A few other cleanups in the top-level Makefile.

commit 97e1eeade3c51df1bae574a9bc1da34b05bf2bd3
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Mar 21 15:47:11 2018 -0500

Added input.operations.fast file for 'make check'.

Details:
- Added an 'input.operations.fast' file to testsuite directory to go
along with the 'input.general.fast' file used by the 'make check'
target in the top-level Makefile. This will allow the "fast" check
to prune operations and/or parameter combinations from the test
space in order to save time.
- Currently, input.operations.fast prunes trmm3 and all transposition
and conjugation parameters from the level-3 test space.
- Reduced problem size tested in input.general.fast to 100 and disabled
testing of 1m method.

commit c441caa95aabe69f54e2160eb67bf4ca76a66c34
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Mar 20 17:56:02 2018 -0500

README update.

Details:
- Minor updates to README.md.
- Minor change to blastest/Makefile.

commit 6fe018eb4ac8c16f2edc916c24f5994848017b7f
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Mar 20 15:35:45 2018 -0500

Added .gitkeep file to blastest/obj.

Details:
- Added an empty file named '.gitkeep' to blastest/obj/ so that git will
track the otherwise empty directory. (This is already done for the BLIS
testsuite in testsuite/obj.)

commit 0e6d000db9291342913dc5f8590a28c67bbcbc95
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Mar 20 15:08:43 2018 -0500

Updated .gitignore to ignore BLAS test out.* files.

commit 40c040a31d96fbadff11f761d0cad1ef03ef2cc5
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Mar 20 14:33:50 2018 -0500

Fixes to .travis.yml.

Details:
- Invoke the full BLIS testsuite via 'make testblis' instead of the fast
version via 'blistest-fast' (which was wrong anyway, since the correct
fast traget is 'testblis-fast').
- Invoke the BLAS tests via 'make testblas' instead of 'blastest'.

commit 664ec4813d8b53121cce7a68bef47da656ece9cb
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Mar 20 13:54:58 2018 -0500

Integrated f2c'ed netlib BLAS test suite.

Details:
- Created a new test suite that exercises only the BLAS compatibility
found in BLIS. The test suite is a straightforward port of code
obtained from netlib LAPACK, run through f2c and linked to a stripped-
down version of libf2c that is compiled along with the test drivers
(to prevent any obvious ABI issues). The new BLAS test suite can be
run from within its new local directory, 'blastest' (through its local
'make ; make run' targets) or from the top-level Makefile (via the
'make testblas' target). Output files are created in whatever directory
the test drivers are run, whether it be the 'blastest' directory, the
top-level source distribution directory, or the out-of-tree directory
in which 'configure' was run. Also, the results of the BLAS test suite
can be checked via 'make checkblas', which summarizes the presence or
absence of test failures in a single line printed to stdout.
- Updated the 'test' target to run both 'testblis' and 'testblas'.
- Added a new 'testblis-fast' target that runs the BLIS testsuite with
smaller problem sizes, allowing it to finish more quickly.
- Added a 'make check' target, which runs 'checkblis-fast' and
'checkblas'.
- Changed .travis.yml so that Travis CI runs 'testblis-fast' instead of
'testblis' before (calling the check-blistest.sh script to check the
result manually).
- Renamed some targets in the top-level Makefile to be consistent between
BLAS and BLIS.

commit fc53ad6c5b2e39238b1bbbf625cc0c638b9da4e1
Author: Nisanth M P <nisanth.padinharepattamd.com>
Date: Mon Mar 19 12:49:26 2018 +0530

Re-enabling the small matrix gemm optimization for target zen

Change-Id: I13872784586984634d728cd99a00f71c3f904395

commit d12d34e167d7dc32732c0ed135f8065a55088106
Author: Nisanth M P <nisanth.padinharepattamd.com>
Date: Mon Mar 19 11:34:32 2018 +0530

Re-enabling Zen optimized cache block sizes for config target zen

Change-Id: I8191421b876755b31590323c66156d4a814575f1

commit 40fa10396c0a3f9601cf49f6b6cd9922185c932e
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Mar 19 18:19:43 2018 -0500

Fixed a few obscure bugs in the BLAS API.

Details:
- Fixed a missing parameter in the definition of sdsdot_(). The 'sb'
argument was missing. Strangely, the argument is omitted from dsdot_()
in the BLAS API.
- Fixed the missing 'c' or 'u' in the "?gerc" or "?geru" operation string
passed to xerbla_() by the bla_ger_check() macro.
- For bla_syrk_check() and bla_syr2k_check() macros, only allow
conjugate-transpose (trans='c') as a valid argument for the real
domain functions [sd]syrk_() and [sd]syr2k_(). (Previously, the
argument was allowed even for the complex domain equivalents, which
was inconsistent with the BLAS API.)

commit fe7d7f1e43e4c26249eed83d4188beee1ba96202
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sun Mar 18 19:43:06 2018 -0500

Fixed cpp macro parameter "ch" typo in bla_ger.c.

Details:
- Previously, the BLAS routine-generating macro in bla_ger.c was
incorrectly passing MKSTR(ch) into the _check() macro when it
should have been passing in the char that was available, chxy.
I've instead changed the name of the macro parameter from chxy
to ch. Similar change as made to bla_ger.h for consistency.
Thanks to Dave Love in helping track this down. (NOTE: This is
actually the root cause of the bug that was first patched by
increasing the length of the operation name strings passed into
xerbla_(), as defined by the constant BLIS_MAX_BLAS_FUNC_STR_LENGTH,
in 3d1a5a7. In theory, that change could be backed out now.)
- Applied aforementioned chxy->ch change to bla_dot.[ch], as well as
frame/compat/cblas/f77_sub/f77_dot_sub.[ch] (not because it needed
to happen, but for naming consistency).
- Reformatted function signatures/prototypes of CBLAS functions and
function calls to BLAS in frame/compat/cblas/f77_sub/*.c.

commit cb7ed90752d1ddbac11368c4510641ca4f3a02eb
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Mar 16 13:05:56 2018 -0500

Convert op names to uppercase before calling xerbla_().

Details:
- Defined a new function, bli_string_mkupper(), that calls toupper() on
every non-NULL character in a string.
- Call bli_string_mkupper() prior to calling xerbla_() in the level-2/-3
BLAS _check() macros. This prevents the BLAS testsuite from complaining
that the operation name (e.g. "dgemm") does not match the expected
value (e.g. "DGEMM"). Thanks to Dave Love for reporting this issue.

commit 3d1a5a7c08fed3ba29f060fe1db2b0dc42dde223
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Mar 16 12:24:07 2018 -0500

Fixed printf() format overflow.

Details:
- Increased the length of operation name strings passed to xerbla_() in
the level-2 and level-3 operation _check() functions, found in
frame/compat/check. This avoids a format specifier overflow warning by
gcc 7. Thanks to Dave Love for reporting this issue and suggesting the
fix.

commit c73055f028684d998e03b2392093c393782bbfe7
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Mar 15 16:08:21 2018 -0500

Return after non-zero info in BLAS checks.

Details:
- Previously, when calling the BLAS compatibility layer, discovering a
parameter check failure would result in the proper setting of the
info parameter (printed by xerbla_()), but would also come with an
immediate abort() rather than a return. This was incorrect behavior
for two overlapping reasons.
(1) BLAS should return gracefully to the caller in the event of a
bad set of parameters, not abort().
(2) When BLIS was being tested via the BLAS testsuite, BLIS's
xerbla_() would correctly get preempted/overridden by the
xerbla_() in the BLAS testsuite, but execution would then
erroneously continue on to the BLIS implementation with bad
parameter values.
- The previous issue was addressed by disabling the abort() in BLIS's
xerbla_(), changing all of the BLAS _check() functions to cpp macros,
and adding a return statement to the end of each _check() macro's
"if ( info != 0 )" conditional.
Thanks to Dave Love for reporting this issue.

commit c4f1d18b97a6a8c3ea0366aa759db597a664062a
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Mar 14 19:10:09 2018 -0500

Minor typo fix to printing arch in testsuite.

Details:
- Mistakenly was calling bli_cpuid_query_id() instead of
bli_arch_query_id() in the recent addition to the testsuite output
that prints the active sub-configuration. The former function is
only used for multi-architecture builds, whereas the latter is the
more general option that also works for single configuration
(including 'configure auto') builds.

commit 8f2fabec800a720b3e94b33c0048cc8c4ead436d
Author: Devin Matthews <dmatthewsutexas.edu>
Date: Wed Mar 14 17:43:42 2018 -0500

Make arm32 and arm64 families work. (176)

commit fc6a1842518a0820c6708c285611346d5a1419da
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Mar 14 15:31:17 2018 -0500

Print sub-configuration name in testsuite output.

Details:
- Added a line to the testsuite output that prints the name of the
current/active sub-configuration. This is useful when linking the
testsuite against multi-configuration builds because it confirms
the sub-configuration that is actually being employed at runtime.
Thanks to Devin Matthews for suggesting this feature.

commit 9943a899d64bf7ec4a24106f6f4c70629bbe1f6e
Merge: 290dd4a9 b1a15ae6
Author: Devin Matthews <dmatthewsutexas.edu>
Date: Wed Mar 14 13:27:44 2018 -0500

Merge pull request 173 from devinamatthews/dev

Fix Cortex-A9 and Cortex-A15 configs.

commit b1a15ae6ee0f46c9a95cf59f9555925e0e8e21ff
Author: Devin Matthews <dmatthewsutexas.edu>
Date: Wed Mar 14 13:26:44 2018 -0500

Use BLIS_H_FLAT

commit 290dd4a9feee447e69b40ad108954af78e196f7e
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Mar 14 13:15:37 2018 -0500

Allow arbitrarily deep configuration families.

Details:
- Updated configure so that configuration families specified in the
config_registry are no longer constrained as being only one level
deep. For example, previously the x86_64 family could not be defined
concisely in terms of, say, intel64 and amd64 families, and instead
had to be defined as containing "haswell, sandybridge, penryn, zen,
etc." In other words, families were constrained to only having
singleton configurations as their members. That constraint is now
lifted.
- Redefined x86_64 family in config_registry in terms of intel64 and
amd64.

commit 9cee78e006d56543ac02fc9c488905c0434e60ae
Author: Devin Matthews <dmatthewsutexas.edu>
Date: Wed Mar 14 13:09:48 2018 -0500

Fix Cortex-A9 and Cortex-A15 configs.

Tested with QEMU.

commit 1a3031740f7fcbbcc2c99d5c4cb50d0413407455
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Mar 13 16:04:40 2018 -0500

Updates to ARM hardware detection support.

Details:
- Updated/clarified the ARM preprocessor macro branch of bli_cpuid.c.
Going forward, cortexa57 (64-bit), cortexa15, and cortexa9 (32-bit)
sub-configurations are supported. However, the functions that detect
features specific to a15 and a9 are identical, and since a15 is tested
first, it will always be chosen for arm32 hardware (even if both
sub-configurations were enabled at configure-time and the library is
linked and run on an a9). Thus, more work needs to be done to
distinguish these two.
- Added cpp guard around x86_64 portions of bli_cpuid.c. Now, either
the x86_64 or ARM code will be compiled (or neither, if neither
environment is detected).
- In bli_arch_query_id(), call bli_cpuid_query_id() when the
BLIS_FAMILY_ARM64 or BLIS_FAMILY_ARM32 macros are defined.
- Added arm64 and arm32 configuration families to config_registry.
- Added a note to the arch_t typedef enum in bli_type_defs.h reminding
the developer to update the string array in bli_arch.c whenever new
enum values are added or existing values are reordered.

commit 1442d06886ebdc34d8f1cb620229ddc6062c2ce8
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sun Mar 11 16:59:50 2018 -0500

Fixed misnamed kernels in _cntx_init_cortexa57.c.

Details:
- Changed incorrect kernel function names in bli_cntx_init_cortexa57.c:
bli_sgemm_cortexa57_asm_8x12 -> bli_sgemm_armv8a_asm_8x12
bli_dgemm_cortexa57_asm_6x8 -> bli_dgemm_armv8a_asm_6x8
Thanks to Jacob Gorm Hansen for reporting this issue.

commit 28bcea37dfcf0eb99a99da6f46de2a2830393d1d
Merge: b1ea3092 8b0475a8
Author: praveeng <praveen.gamd.com>
Date: Fri Mar 9 19:13:08 2018 +0530

Merge master code till 06_mar_2018 to amd-staging

Change-Id: I12267e5999c92417e3715fef4f36ac2131d00f1a

commit 48da9f5805f0a49f6ad181ae2bf57b4fde8e1b0a
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Mar 7 12:54:06 2018 -0600

Tweaked common.mk, Makefile, skx/knl make_defs.mk.

Details:
- Reorganized linker-related section of common.mk so that LDFLAGS set
in a sub-configuration's make_defs.mk file will not be immediately
(and erroneously) overridden by the default values.
- Re-enabled redirected (to file) output of the testsuite when run from
the top-level Makefile via 'make test'. (For some reason, it was
commented-out for the non-verbose case.)
- Removed old/unnecessary code from the make_defs.mk files of skx and
knl sub-configurations.

commit 8b0475a87daa177916e2caac0e530c6a57fa07cf
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Mar 6 06:39:44 2018 -0600

Fixed typo in attempted fix in 1a8350f7.

Details:
- Mistakenly entered 148 as knl mc blocksize for double real when the
value should have been 144. Thanks to Dave Love for reporting this.

commit 8912e6886b97eabb4ce0c35a3609a0fd994d347b
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Mar 5 18:00:45 2018 -0600

Fixed missing flags during shared object build.

Details:
- Fixed a bug in common.mk that caused warning, position-independent
code, miscellaneous, and general preprocessor flags to be omitted
from the configuration family-specific variables that hold those
values, as registered by the family's make_defs.mk file. This would
most obviously manifest when targeting a configuration family such as
'intel64' while simultaneously configuring for a shared object build,
as the key '-fPIC' flag would be omitted at compile-time and prevent
successful linking. Thanks to Dave Love for reporting this bug.
- Other cleanups to common.mk for readability and clarity.

commit 1a8350f70557fc53ca0c2eadf2076710dd0d9bc9
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Mar 5 13:32:00 2018 -0600

Fixed cache blocksize bug in knl configuration.

Details:
- Changed the mc blocksize for double real execution in the knl sub-
configuration from 160 to 148. The old value was not a multiple of
mr (which is 24), and thus the safeguards in bli_gks_register_cntx()
were tripping. Thanks for Dave Love for reporting this issue.
- Switch knl sub-configuration to use default blocksizes for datatypes
not supported by native kernels.
- Fixed typos in bli_error.c that prevented certain error strings
(which report maximum cache blocksizes not being multiples of their
corresponding register blocksize) from properly initializing.

commit c09fffa827fe6241dc20193a1c404496664220de
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Mar 3 13:13:39 2018 -0600

Added missing cntx_t* arg in knl packm kernels.

Details:
- Added the missing cntx_t* argument to the function signature of packm
kernels in kernels/knl/1m/. Thanks to Dave Love for reporting this
issue.

commit b1ea30925dff751eced23dfa94ff578a20ea0b94
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Feb 23 17:42:48 2018 -0600

CHANGELOG update (0.3.0)

Change-Id: Id038b00a62de51c9818ad249651ec5dc662f4415

commit 1ef9360b1fd0209fbeb5766f7a35402fbd080fcb
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Mar 1 14:36:39 2018 -0600

Enable non-unit vector stride tests by default.

Details:
- Change "vector storage schemes to test" parameter in testsuite's
input.general file to "cj". This means that both unit stride column
vectors and non-unit stride column vectors will be tested in
operations with vector operands (e.g. level-1v, level-1f, level-2).
- Very minor comment (typo) changes to input.operations.

commit 8c4e55a1a1ead9a5e970200fee027ffd2c7e8454
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Feb 28 17:01:47 2018 -0600

Added individual operation overrides in testsuite.

Details:
- Updated the testsuite driver so that setting one or more individual
operation test switches to "2" in input.operations will enable ONLY
those operations and disable all others, regardless of the values of
the section overrides and other operation switches. This makes it
every easy to quickly test only one or two operations, and equally
easy to revert back to the previous combination of operation tests.
- Added more comments to input.operations describing the use of
individual "enable only" overrides.

commit 34862aed89e5d5a8f35aeecd49f3052ada1f337b
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Feb 28 15:30:14 2018 -0600

Use zen kernels in haswell sub-configuration.

Details:
- Register use of level-1v zen intrinsic kernels for amaxv, axpyv, dotv,
dotxv, and scalv, as well asl level-1f zen intrinsic kernels for axpyf
and dotxf. This works because these kernels simply target AVX/AVX2,
and therefore work without modification on haswell hardware.
- Switch to use of zen microkernels in bli_cntx_init_haswell.c. The zen
kernels are essentially identical to those used by haswell, except that
now zen kernels are a bit more up-to-date. In the future, I may
continue to maintain duplicates, or I may keep the kernels named after
one architecture (zen or haswell) but used by both sub-configurations.
- In config_registry, enable use of both haswell and zen kernels for the
haswell sub-configuration. This is necessary in order to make zen
kernels visible when registering kernels in bli_cntx_init_haswell.c.
- Enable use of assembly-based complex gemm microkernels for zen,
bli_cgemm_zen_asm_3x8() and bli_zgemm_zen_asm_3x4(), in
bli_cntx_init_zen.c. This was actually intended for 1681333.

0.3.0

Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Feb 23 17:42:48 2018 -0600

Version file update (0.3.0)

commit d9079655c9cbb903c6761d79194a21b7c0a322bc
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Feb 23 17:42:48 2018 -0600

CHANGELOG update (0.3.0)

commit 3defc7265c12cf85e9de2d7a1f243c5e090a6f9d
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Feb 23 17:38:19 2018 -0600

Applied 34b72a3 to non-active/unused microkernels.

Details:
- Applied the read-beyond-bounds bugfix in 34b72a3 to other haswell and
zen kernels (ie: other microtile shapes) which are not used by default.
This was done mostly in case someone decided to pick up these kernels
and start using them, not because it affects BLIS's behavior
out-of-the-box.

commit 34b72a351745aa0d47bb0b74ebcd0f0a616d613d
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Feb 23 16:33:32 2018 -0600

Fixed obscure read-beyond-bounds bug in sgemm ukrs.

Details:
- Fixed an obscure bug in the bli_sgemm_haswell_asm_6x16 and
bli_sgemm_zen_asm_6x16 microkernels when the input/output matrix C
is stored with general stride (ie: both rs and cs are non-unit). The
bug was rooted in the way those microkernels read from matrix C--
namely, they used vmovlps/vmovhps instead of movss. By loading two
floats at a time, even if one of them was treated as junk, the
assembly code could be written in a more concise manner. However,
under certain conditions--if m % mr == 0 and n % nr == 0 and the
underlying matrix is not an internal "view" into a larger matrix--
this could result in the very last vmovhps of the last (bottom-right)
microkernel invocation reading beyond valid memory. Specifically, the
low 32 bits read would always be valid, but the high 32 bits could
reside beyond the bounds of the array in which the output C matrix is
contained. To remedy this situation, we now selectively use movss to
load any element that could be the last element in the matrix.

commit 5112e1859e7f8888f5555eb7bc02bd9fab9b4442 (origin/rt)
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Feb 23 14:31:26 2018 -0600

Added missing 'restrict' to some kernels' cntx_t*.

Details:
- Added missing 'restrict' keyword to cntx_t* argument of function
signatures corresponding to level-1v, level-1f, and level-1m kernels.
This affected bli_l1v_ker_prot.h, bli_l1f_ker_prot.h, and
bli_l1m_ker_prot.h. (The 'restrict' was already being used to
qualify cntx_t* arguments for kernels defined in bli_l3_ker_prot.h.)
- Added comments to bli_l1v_ker.h, bli_l1f_ker.h, bli_l1m_ker.h, and
bli_l3_ukr.h that help explain how those headers function to produce
kernel prototypes using the prototype macros defined in the files
mentioned above.

commit 1fa8af95d807168e0849adb668492601e7009be0
Merge: c084b03b 16813335
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Feb 21 17:54:02 2018 -0600

Merge branch 'rt'

commit c084b03b31d84427a120e391963db5419f1911ee
Merge: 5d03b6e6 fa74af4e
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Feb 21 17:52:17 2018 -0600

Merge branch 'rt'

commit 16813335bdb5978bc9a26cd00a32bd5a130130c4
Merge: fa74af4e 5a7005dd
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Feb 21 17:43:32 2018 -0600

Merge branch 'amd' into rt

Details:
- Merged contributions made by AMD via 'amd' branch (see summary below).
Special thanks to AMD for their contributions to-date, especially with
regard to intrinsic- and assembly-based kernels.
- Added column storage output cases to microkernels in
bli_gemm_zen_asm_d6x8.c and bli_gemmtrsm_l_zen_asm_d6x8.c. Even with
the extra cost of transposing the microtile in registers, this is
much faster than using the general storage case when the underlying
matrix is column-stored.
- Added s and d assembly-based zen gemmtrsm_u microkernel (including
column storage optimization mentioned above).
- Updated zen sub-configuration to reflect presence of new native
kernels.
- Temporarily reverted zen sub-configuration's level-3 cache blocksizes
to smaller haswell values.
- Temporarily disabled small matrix handling for zen configuration
family in config/zen/bli_family_zen.h.
- Updated zen CFLAGS according to changes in 1e4365b.
- Updated haswell microkernels such that:
- only one vzeroupper instruction is called prior to returning
- movapd/movupd are used in leiu of movaps/movups for double-real
microkernels. (Note that single-real microkernels still use
movaps/movups.)
- Added kernel prototypes to kernels/zen/bli_kernels_zen.h, which is
now included via frame/include/bli_arch_config.h.
- Minor updates to bli_amaxv_ref.c (and to inlined "test" implementation
in testsuite/src/test_amaxv.c).
- Added early return for alpha == 0 in bli_dotxv_ref.c.
- Integrated changes from f07b176, including a fix for undefined
behavior when executing the 1m method under certain conditions.
- Updated config_registry; no longer need haswell kernels for zen
sub-configuration.
- Tweaked marginal and pass thresholds for dotxf.
- Reformatted level-1v, -1f, and -3 amd kernels and inserted additional
comments.
- Updated LICENSE file to explicitly mention that parts are copyright
UT-Austin and AMD.
- Added AMD copyright to header templates in build/templates.

Summary of previous changes from 'amd' branch.
- Added s and d assembly-based zen gemm microkernels (d6x8 and d8x6) and
s and d assembly-based zen gemmtrsm_l microkernels (d6x8).
- Added s and d intrinsics-based zen kernels for amaxv, axpyv, dotv, dotxv,
and scalv, with extra-unrolling variants for axpyv and scalv.
- Added a small matrix handler to bli_gemm_front(), with the handler
implemented in kernels/zen/3/bli_gemm_small_matrix.c.
- Added additional logic to sumsqv that first attempts to compute the
sum of the squares via dotv(). If there is a floating-point exception
(FE_OVERFLOW), then the previous (numerically conservative) code is
used; otherwise, the result of dotv() is square-rooted and stored as
the result. This new implementation is only enabled when FE_OVERFLOW
is defined. If the macro is not defined, then the previous
implementation is used.
- Added axpyv and dotv standalone test drivers to test directory.
- Added zen support to old cpuid_x86.c driver in build/auto-detect/old.
- Added thread-local and __attribute__-related macros to bli_macro_defs.h.

commit 5d03b6e6e19d5a07f0cccf1a158f02fbd62dfd99
Author: Devin Matthews <dmatthewsutexas.edu>
Date: Mon Feb 19 11:31:30 2018 -0600

Fix asm macro include line for KNL. Fixes 167.

commit f07b176c84dc9ca38fb0d68805c28b69287c938a
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Feb 15 18:36:54 2018 -0600

Fixed an obscure bug in the 1m implementation.

Details:
- Fixed a bug in the way the bli_gemm1m_cntx_ref() function (defined in
ref_kernels/bli_cntx_ref.c) initializes its context for 1m execution.
Previously, the function probed the context that was in the process of
being updated for use with 1m--this context being previously
initialized/copied from a native context--for its storage preference
to determine which "variant" (row- or column-oriented) of 1m would be
needed. However, the _cntx_ref() function was not updating the method
field of the context until AFTER this query, and the conditional which
depended on it, had taken place, meaning the storage preference query
function would mistakenly think the context was for native execution,
since the context's method field would still be set to BLIS_NAT. This
would lead it to incorrectly grab the storage preference of the complex
domain microkernel rather than the corresponding real domain
microkernel, which could cause the storage preference predicate to
evaluate to the wrong value, which would lead to the _cntx_ref()
function choosing the wrong variant. This could lead to undefined
behavior at runtime. The method is now explicitly set within the
context prior to calling the storage preference query function.
- Updated comments in frame/ind/oapi/bli_l3_3m4m1m_oapi.c.
- Fixed a typo in the commented-out CFLAGS in config/zen/make_defs.mk,
which are appropriate for gcc 6.x and newer. (Mistakenly used
-march=bdver4 instead of -march=znver1.)

commit 1f94bb7b96eb2b67257e6c4df89e29c73e9ab386
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Jan 19 12:46:53 2018 -0600

Document how to enable zen-specific instructions.

Details:
- Added as a comment in config/zen/make_defs.mk the list of compiler flags
that could be added to manually enable the instructions provided by the
Zen microarchitecture that are not already implied by -march=bdver4.
This information, along with the previous commit's flags to selectively
disable Bulldozer instructions no longer present in Zen, was gathered
from [1]. I hesitate to enable use of these instructions since I don't
have any Zen hardware to test on yet.
[1] https://wiki.gentoo.org/wiki/Ryzen

commit 1e4365b21bafa02bd108c5ac4705a25671fb9441
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Jan 18 12:03:51 2018 -0600

Augment zen CFLAGS to prevent illegal instruction.

Details:
- Added various compiler flags (-mno-fma4 -mno-tbm -mno-xop -mno-lwp) so
that compiling with -march=bdver4 on zen-based architectures does not
result in an illegal instruction error at runtime. Note: This fix is
only needed for gcc 5.4; gcc 6.3 or later supports the use of
-march=znver1, which can be used in lieu of the augmented set of flags
based on bdver4. Thanks to Nisanth Padinharepatt for reporting this
error.

commit fa74af4e1fa7385ac3f3089fe1ea7bb88c906029
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Jan 9 13:43:15 2018 -0600

Minor labeling update for './configure -c' output.

Details:
- Print the name of the configuration in the output of the
kernel-to-config map (and chosen pairs list) as a subtle way to remind
the user that these only apply to the targeted configuration (whereas
the config list and kernel list are printed without regard to which
configuration was actually targeted).

commit 5cdea756c7391e2c6cbfb38436ef9a205f860237
Merge: 9d8858b5 1e7a4896
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sun Jan 7 19:45:20 2018 -0600

Merge branch 'rt'

commit 9d8858b5cff4a4b078b87872847a5710073fff0a
Merge: 0b3ca3cf f7df64da
Author: Devin Matthews <dmatthewsutexas.edu>
Date: Sun Jan 7 10:03:25 2018 -0600

Merge pull request 164 from devinamatthews/master

Don't use memkind for skx configuration.

commit f7df64daf6bbe6431effada6e13d8d1fab5aa221
Author: Devin Matthews <dmatthewsutexas.edu>
Date: Sun Jan 7 09:37:25 2018 -0600

Don't use memkind for skx configuration. Fixes 163.

commit 1e7a4896e0cbe73c4685fa956278e3f28273cdf9
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Jan 5 12:33:48 2018 -0600

Minor error handling in update-version-file.sh.

Details:
- Added explicit handling of situations when 'git describe --tags'
returns an error. This command is used by update-version-file.sh
when deciding whether or not to update the version file prior to
configuration.
- Removed bli_packm.c and bli_unpackm.c, as they contained no source
code.

commit 0b3ca3cfb682715a3686fd93ebb10d4a695d1162
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Jan 4 20:51:35 2018 -0600

Intelligently select compiler for auto-detection.

Details:
- Rewrote code that selects the compiler for the purposes of compiling
the auto-detection executable. CC (if specified) is tried first. Then
gcc. Then clang. The absolute fallback is cc. The previous code was
sort of broken, and seemed to unintentionally always use gcc.
- Moved various configuration-agnostic flags from config/*/make_defs.mk
files to common.mk. The new mechanism appends the configuration-
agnostic flags to the various compiler flag variables initialized in
make_defs.mk. Flags specific to the sub-configuration are still set
in make_defs.mk.
- Added -Wno-tautological-compare to CMISCFLAGS when clang is in use.
Also added the flag to the compiler instantiation during configure-
time hardware detection (when clang is selected).
- Added some missing (but mostly-optional) quotes to configure script.

commit 5a7005dd44ed3174abbe360981e367fd41c99b4b
Merge: 7be88705 3bc99a96
Author: Nisanth M P <nisanth.padinharepattamd.com>
Date: Wed Jan 3 12:05:12 2018 +0530

Merge changes in AMD beta release 0.95 into amd branch

commit 0b9c5127e91508c115228ca604ee2dac8de8f477
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Dec 23 15:53:44 2017 -0600

Enabled C99, added stdint.h to auto-detect build.

Details:
- Added "-std=c99" to compiler arguments when building auto-detection
driver in configure script.
- Added include <stdint.h> to all three source files needed by auto-
detection program.

commit 0ce5e19c318e04909d3e664d69accb3a0fc6b988
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Dec 23 15:32:03 2017 -0600

Reimplemented configure-time hardware detection.

Details:
- Reimplemented the hardware detection functionality invoked when running
"./configure auto". Previously, a standalone script in build/auto-detect
that used CPUID was used. However, the script attempted to enumerate all
models for each microarchitecture supported. The new approach recycles
the same code used for runtime hardware detection introduced in 2c51356.
This has two immediate benefits. First, it reduces and consolidates the
code required to detect microarchitectures via the CPUID instruction.
Second, it provides an indirect way of testing at configure-time the
code that is used to detect hardware at runtime. This code is (a) only
activated when targeting a configuration family (such as intel64 or
amd64) at configure-time and (b) somewhat difficult to test in
practice, since it relies on having access to older microarchitectures.
- The above change required placing conditional cpp macro blocks in
bli_arch.c and bli_cpuid.c which either include "blis.h" or include
a bare-bones set of headers that does not rely on the presence of a
bli_config.h header. This is needed because bli_config.h has not been
created yet when configure-time auto-detection takes places.
- Defined a new function in bli_arch.c, bli_arch_string(), which takes
an arch_t id and returns a pointer to a string that contains the
lowercase name of the corresponding microarchitecture. This function
is used by the auto-detection script to printf() the name of the
sub-configuration corresponding to the detected hardware.

commit 9804adfd405056ec332bb8e13d68c7b52bd3a6c1 (origin/selfinit)
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Dec 21 19:22:57 2017 -0600

Added option to disable pack buffer memory pools.

Details:
- Added a new configure option, --[en|dis]able-packbuf-pools, which will
enable or disable the use of internal memory pools for managing buffers
used for packing. When disabled, the function specified by the cpp
macro BLIS_MALLOC_POOL is called whenever a packing buffer is needed
(and BLIS_FREE_POOL is called when the buffer is ready to be released,
usually at the end of a loop). When enabled, which was the status quo
prior to this commit, a memory pool data structure is created and
managed to provide threads with packing buffers. The memory pool
minimizes calls to bli_malloc_pool() (i.e., the wrapper that calls
BLIS_MALLOC_POOL), but does so through a somewhat more complex
mechanism that may incur additional overhead in some (but not all)
situations. The new option defaults to --enable-packbuf-pools.
- Removed the reinitialization of the memory pools from the level-3
front-ends and replaced it with automatic reinitialization within the
pool API's implementation. This required an extra argument to
bli_pool_checkout_block() in the form of a requested size, but hides
the complexity entirely from BLIS. And since bli_pool_checkout_block()
is only ever called within a critical section, this change fixes a
potential race condition in which threads using contexts with different
cache blocksizes--most likely a heterogeneous environment--can check
out pool blocks that are too small for the submatrices it wishes to
pack. Thanks to Nisanth Padinharepatt for reporting this potential
issue.
- Removed several functions in light of the relocation of pool reinit,
including bli_membrk_reinit_pools(), bli_memsys_reinit(),
bli_pool_reinit_if(), and bli_check_requested_block_size_for_pool().
- Updated the testsuite to print whether the memory pools are enabled or
disabled.

commit 107801aaae180c00022f1b990bc59038c14949d2
Merge: d9c05745 0084531d
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Dec 18 16:29:28 2017 -0600

Merge branch 'master' into selfinit

commit 0084531d3eea730a319ecd7018428148c81bbba7
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sun Dec 17 18:58:25 2017 -0600

Updated flatten-headers.py for python3.

Details:
- Modifed flatten-headers.py to work with python 3.x. This mostly
amounted to removing print statements (which I replaced with calls
to my_print(), a wrapper to sys.stdout.write()). Thanks to Stefan
Husmann for pointing out the script's incompatibility with python 3.
- Other minor changes/cleanups.

commit 90b11b79c302f208791bdfb1ed754873103c7ce5
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sun Dec 17 17:34:32 2017 -0600

Modest performance boost to flatten-headers.py.

Details:
- Updated flatten-headers.py to pre-compile the main regular expression
used to isolate include directives and the header filenames they
reference. The compiled regex object is then used over and over on
each header file in the tree of referenced headers. This appears to
have provided a 1.7-2x performance increase in the best case.
- Other minor tweaks, such as renaming the main recursive function from
replace_pass() to flatten_header().

commit 99dee87f30b4d437fa6b5e4ba862526d07b9f08b
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sun Dec 17 16:47:27 2017 -0600

Reimplemented flatten-headers.sh in python.

Details:
- Added flatten-headers.py, a python implementation of the bash script
flatten-headers.sh. The new script appears to be 25-100x faster,
depending on the operating system, filesystem, etc. The python script
abides by the same command line interface as its predecessor and
targets python 2.7 or later. (Thanks to Devin Matthews for suggesting
that I look into a python replacement for higher performance.)
- Activated use of flatten-headers.py in common.mk via the FLATTEN_H
variable.
- Made minor tweaks to flatten-headers.sh such as spelling corrections
in comments.

commit d9c0574599c3f97c0f9b6c334a077bab9452e1f4
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Dec 14 17:13:42 2017 -0600

Allow travis failures of OS X builds that run testsuite.

Details:
- Added an allowance for OS X builds that run the testsuite to fail.
There seems to be an issue with 1m when running in Travis CI under
OS X and clang, but only in double-precision. Haven't been able to
reproduce the error on my own, and thus, I can't debug it. (Hopefully
it is simply a version-specific compiler bug.)

commit 86cd23b7379b00a42b4ecc04fa668f1e3f9b54ee
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Dec 14 15:47:41 2017 -0600

Fixed testsuite Makefile brokenness from 9091a207.

Details:
- Fixed a makefile error encountered when building the testsuite directly
in its directory (as opposed to indirectly via 'make test'). The fix
involves introducing a new variable, BUILD_PATH, alongside the existing
DIST_PATH variable. By default, BUILD_PATH is set to the current
directory, and is overridden by other Makefiles used by, for example,
the testsuite and standalone test drivers in testsuite or test,
respectively.
- Some files/directories in common.mk were redefined in terms of
BUILD_DIR, such as the locations of config.mk file and the intermediate
include directory.

commit 6a3a8924c04d25507fc4aa593df30c56c7dc12f7
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Dec 14 13:20:02 2017 -0600

Temporarily show Makefile's testsuite output.

Details:
- Disabled redirection of testsuite output for 'test' target. This is
part of an attempt to debug a segmentation fault on OS X via Travis.

commit 9a01080dd426915bed18229f70401bfa639dc283
Merge: 83316485 a32e8a47
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Dec 14 11:27:19 2017 -0600

Merge branch 'master' into selfinit

commit a32e8a47c022b6071302b2956af5728976c83ca9 (origin/travis)
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Dec 13 16:31:36 2017 -0600

Added an exclusion to .travis.yml.

Details:
- Added exclusion for out-of-tree builds on OS X (clang).

commit b9f7d987df548965c86e16e0ba94d5cad0d9b399
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Dec 13 16:22:09 2017 -0600

Cleaned up after previous travis oot debugging.

Details:
- Removed debugging output from common.mk related to Travis CI
out-of-tree builds.
- Other minor cleanups to common.mk.

commit 9091a207aa8c49e279676ea02be533480b3b0d5a
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Dec 13 16:12:34 2017 -0600

Attempted fix to travis oot build failure.

Details:
- Found the likely cause of the Travis CI out-of-tree build failures:
config.mk was being read from DIST_PATH, rather than the current
directory.

commit c01c71c33e236e6c91f5ddd3ec1e3faec89368c1
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Dec 13 15:58:50 2017 -0600

Added debugging output to Makefile.

Details:
- Added $(info ...) statements in key locations in an attempt to reveal
why Travis CI doesn't like building BLIS out-of-tree.

commit 784289d69dd6b3692444d3b3e290f6a014465b72
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Dec 13 15:31:27 2017 -0600

Updated SHELL in common.mk from /bin/bash to bash.

commit d9bb1d1d4ebc89ea75d9d927d09882162a914f77
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Dec 13 15:27:54 2017 -0600

Defined SHELL in common.mk so "echo -n" works.

Details:
- Defined the SHELL variable in common.mk as "/bin/bash" so that the
-n option can be used with echo in the Makefile rule for flattening
blis.h. Thanks to Devin Matthews for suggesting this fix.

commit 9289a08667df2044f3a37af54d893efe2b56d555
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Dec 13 15:14:27 2017 -0600

Attempt 3 on .travis.yml.

commit 720bfcf0ef54fdc41df0dcaa94503edb0d5c8972
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Dec 13 14:52:28 2017 -0600

More fixes to .travis.yml.

Details:
- Fixed a mistake (hopefully) in d0c4dd0 that resulted in many more
osx/clang sub-tests than intended.
- Shortened the variable names in an effort to make them more readable
via the Travis CI web interface.

commit 8717c9c97fe9b1ecd3b3192049a73976f8390ca7
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Dec 13 14:36:37 2017 -0600

Added 'pwd' commands to .travis.yml for debugging.

Details:
- Added 'pwd' commands to the script portion of the .travis.yml file in
an attempt to uncover the problem with the recent out-of-tree build
testing changes made in d0c4dd0.

commit 83316485ce10f6fcafe92a1c146282de0dd8068a
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Dec 13 14:14:50 2017 -0600

Simplified/fixed self-initialization.

Details:
- Fixed a race condition in self-initialization whereby the bli_is_init
static variable could be erroneously read as TRUE by thread 1 while
thread 0 is still executing bli_init_apis(), thus allowing thread 1 to
use the library before it is actually ready. Thanks to to Minh Quan Ho
and Devin Matthews for pointing out this issue.
- Part of the solution to the aforementioned race condition was involved
replacing the runtime initialization of the global scalar constants
(e.g., BLIS_ONE, BLIS_ZERO, etc.) in bli_const.c with a static
initialization of those same constants. This eliminates the need for
bli_const_init() altogether. (The static initialization is made concise
via preprocess macros.)
- Defined bli_gks_query_cntx_noinit(), which behaves just like
bli_gks_query_cntx(), except that it does not call bli_init_once(). This
function is called in lieu of bli_gks_query_cntx() in bli_ind_init() and
bli_memsys_init() so as to not result in any recursion into
bli_init_once().
- Removed BLIS_ONE_HALF, BLIS_MINUS_ONE_HALF global scalar constants.
They have no use in BLIS or its test products, and we have little reason
to believe they are used by others.
- Removed testsuite/out file, which was accidentally committed as part
of 70640a3.

commit 6526d1d4ae6dbfa854ca8d1e5f224cd6ab3fa958
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Dec 12 13:50:43 2017 -0600

Added temp_dir argument to flatten-headers.sh.

Details:
- Added "temp_dir" argument to flatten-headers.sh so that the caller can
specify where intermediate files should be created as the script runs.
- Updated flatten-headers.sh to create intermediate files in temp_dir
instead of alongside the corresponding source files. This should now
(once again) allow out-of-tree builds where the BLIS distribution is
read-only, or where the out-of-tree build is running concurrently with
another out-of-tree build. (Thanks to Devin Matthews for pointing out
the possibility of simultaneous out-of-tree builds.)

commit 94755017c967630daf2e31c1f63ed5e88ab0d6ab
Merge: d0c4dd00 5cf7b0c4
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Dec 12 12:50:41 2017 -0600

Merge branch 'master' of github.com:flame/blis

commit d0c4dd000ff38acc249e8acf7e0655a523991695
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Dec 12 12:47:53 2017 -0600

Added out-of-tree build test to .travis.yml file.

Details:
- Modified .travis.yml file to include an out-of-tree build test (using
the "auto" configure target). Thanks to Devin Matthews for this
suggestion.

commit 5cf7b0c4e52922069183a87dc2aa177419644e04
Author: Devin Matthews <dmatthewsutexas.edu>
Date: Tue Dec 12 12:38:48 2017 -0600

Ignore blis.h.interm [ci skip]

commit 8d8ff74d15b4a584929cec36034ba6d3c53f7d27
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Dec 12 12:32:50 2017 -0600

Further attempt to fix out-of-tree builds.

Details:
- Fix applied in 87978f6 was necessary but not sufficient to fix
out-of-tree builds. It turns out that using a source tree that had
already built the target erroneously gave the impression that
out-of-tree builds were working again, when in fact they were still
broken. The additional changes in this commit should complete the
fix that was started in the aforementioned commit. Thanks to Devin
Matthews and Shaden Smith for their help in isolating this issue.

commit 70640a37109290b57c344083c00624e13c496e30
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Dec 11 17:18:43 2017 -0600

Implemented library self-initialization.

Details:
- Defined two new functions in bli_init.c: bli_init_once() and
bli_finalize_once(). Each is implemented with pthread_once(), which
guarantees that, among the threads that pass in the same pthread_once_t
data structure, exactly one thread will execute a user-defined function.
(Thus, there is now a runtime dependency against libpthread even when
multithreading is not enabled at configure-time.)
- Added calls to bli_init_once() to top-level user APIs for all
computational operations as well as many other functions in BLIS to
all but guarantee that BLIS will self-initialize through the normal
use of its functions.
- Rewrote and simplified bli_init() and bli_finalize() and related
functions.
- Added -lpthread to LDFLAGS in common.mk.
- Modified the bli_init_auto()/_finalize_auto() functions used by the
BLAS compatibility layer to take and return no arguments. (The
previous API that tracked whether BLIS was initialized, and then
only finalized if it was initialized in the same function, was too
cute by half and borderline useless because by default BLIS stays
initialized when auto-initialized via the compatibility layer.)
- Removed static variables that track initialization of the sub-APIs in
bli_const.c, bli_error.c, bli_init.c, bli_memsys.c, bli_thread, and
bli_ind.c. We don't need to track initialization at the sub-API level,
especially now that BLIS can self-initialize.
- Added a critical section around the changing of the error checking
level in bli_error.c.
- Deprecated bli_ind_oper_has_avail() as well as all functions
bli_<opname>_ind_get_avail(), where <opname> is a level-3 operation
name. These functions had no use cases within BLIS and likely none
outside of BLIS.
- Commented out calls to bli_init() and bli_finalize() in testsuite's
main() function, and likewise for standalone test drivers in 'test'
directory, so that self-initialization is exercised by default.

commit 70a64432ee5a7adbee10fb7ff6d7b608c1940a7a
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Dec 11 13:14:20 2017 -0600

Fixed off-by-one indexing in bli_cpuid.c.

Details:
- In bli_cpuid.c, fixed an off-by-one indexing statement in vpu_count()
whereby a string-terminating NULL character, '\0', is written beyond
the bounds of the model_num string.
- Minor whitespace and formatting edits to bli_cpuid.c.

commit 87978f6261a080d261d01f9acf4e9cc18855c833
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Dec 11 12:49:03 2017 -0600

Fixed broken out-of-tree builds since 52f9e6f.

Details:
- Added missing $(DIST_PATH)/ prefix to relative path to flatten-headers.sh
script in common.mk so that the script could be found during out-of-tree
builds. Thanks to Devin Matthews for reporting this bug.

commit 513ef4d040f89a18dda5154e8c4cf1aaf7463999
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Dec 11 12:35:59 2017 -0600

Various typecasting fixes, mis-typed enums, etc.

Details:
- Fixed implicit typecasting of conj_t to trans_t in bli_[un]packm_cxk.c.
- Properly typecast integer arguments to match format specifier in various
calls to printf() in bli_l3_thrinfo.c, bli_cntx.c, bli_pool.c, and
bli_util_oapi.c.
- Fixed "unsigned less-than-comparison with zero" checks in bli_check.c,
bli_cntx.h.
- Fixed mis-typed enums in bli_cntx.c (e.g., l1mkr_t that should have been
l1fkr_t or l1vkr_t).
- Fixed instances of opid_t value BLIS_GEMM that should have been l3ukr_t
value BLIS_GEMM_UKR in bli_cntx_ref.c.
- NOTE: These issues were identified via compiler warnings when building
BLIS with clang on a rather old installation of OS X:
$ clang --version
Apple LLVM version 5.0 (clang-500.2.79) (based on LLVM 3.3svn)
Target: x86_64-apple-darwin15.2.0
Thread model: posix

commit 3bc99a96a3648f51b9acdc8a8c7e1cf4eb815459
Merge: 3a441183 78199c53
Author: prangana <pradeep.raoamd.com>
Date: Mon Dec 11 12:53:03 2017 +0530

Fix merge conflicts after rebase with release branch

Change-Id: I581b26c6d515f717ff0dce91c7c0c92553aa2630

commit 3a44118398955d6f872e01f73ae5bb4a4f8500f7
Author: Nisanth M P <nisanth.padinharepattamd.com>
Date: Wed Nov 15 11:11:17 2017 +0530

Added AMD copyright line to the changed files in last 3 commits

Change-Id: I37d5dbbbe1b199e07529610a5e9cc9e49d067c66

commit 268a56c06e94d1c388766dbfe81d54efbe432809
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Nov 1 11:51:41 2017 -0500

Revert to default SIMD alignment for bulldozer.

Details:
- Removed the default-overriding define of BLIS_SIMD_ALIGN_SIZE set in
config/bulldozer/bli_kernel.h. Not sure where this value came from, but
it would seem to allow for insufficient starting address alignment for
any matrices created via bli_malloc_user(), such as via
bli_obj_create(). Thanks to Rene Sitt for reporting the behavior that
led us to this bug.
- This commit is a manual patch of the same fix made to the 'rt' branch
in 8f150f2.

commit 510a6863e28277f9446abfb77f1aea9f01d37e7a
Author: Devin Matthews <dmatthewsutexas.edu>
Date: Mon Oct 30 10:04:42 2017 -0500

Fix CVECFLAGS for bulldozer config.

commit c669716790bdda5d2b11ea0a026cbc121b228842
Author: Nisanth M P <nisanth.padinharepattamd.com>
Date: Tue Oct 24 16:36:36 2017 +0530

Adding __attribute__((constructor/destructor)) for CLANG case.

CLANG supports __attribute__, but its documentation doesn't
mention support for constructor/destructor. Compiling with
clang and testing shows that it does support this.

Change-Id: Ie115b20634c26bda475cc09c20960d687fb7050b

commit 24e64a9d0877d788357fc63d4b947e977f8697f7
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Oct 18 13:41:25 2017 -0500

Removed a duplicate bli_avx512_macros.h header.

Details:
- Removed a duplicate header file that was causing problems during
installation for the 'knl' configuration. Thanks to Victor Eijkhout
for reporting this issue.

commit 9c0a3c4c0260cbfefb9f11532f46508b4fd19ec2
Author: Nisanth M P <nisanth.padinharepattamd.com>
Date: Mon Oct 16 22:06:57 2017 +0530

Thread Safety: Move bli_init() before and bli_finalize() after main()

BLIS provides APIs to initialize and finalize its global context.
One application thread can finalize BLIS, while other threads
in the application are stil using BLIS.

This issue can be solved by removing bli_finalize() from API.
One way to do this is by getting bli_finalize() to execute by default
after application exits from main().

GCC supports this behaviour with the help of __attribute__((destructor))
added to the function that need to be executed after main exits.

Similarly bli_init() can be made to run before application enters main()
so that application need not call it.

Change-Id: I7ce6cfa28b384e92c0bdf772f3baea373fd9feac

commit 83f31253eb21c5ecd8a5907835e57720daae0b8b
Author: Nisanth M P <nisanth.padinharepattamd.com>
Date: Mon Oct 16 21:07:50 2017 +0530

Thread safety: Make the global induced method status array local to thread

BLIS retains a global status array for induced methods, and provides
APIs to modify this state during runtime. So, one application thread
can modify the state, before another starts the corresponding
BLIS operation.

This patch solves this issue by making the induced method status array
local to threads.

Change-Id: Iff59b6f473771344054c010b4eda51b7aa4317fe

commit e923402e68029be379a4297de3ac6fb155ffd928
Author: sthangar <Santanu.Thangarajamd.com>
Date: Thu Sep 28 12:15:36 2017 +0530

The inner loop paralleization is turned off by default, the JR and IR loop parameters are set to 1 by default

Change-Id: I8c3c2ecbbd636259f6ffb92768ec04148205c3e5

commit a64c15de19327c7595376d699be676c7003e850e
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Sep 26 19:02:53 2017 -0500

Fixed a pthread typo in previous commit.

Details:
- Misnamed 'pthread_mutex_t' type in bli_memsys.c as 'thread_mutex_t'.

commit 42dcd589c37e1a2473ab2e1539207da97aebc07f
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Sep 26 17:00:04 2017 -0500

Fixed bugs in gemm/gemmtrsm ukr tests in testsuite.

Details:
- Fixed a bug in gemmtrsm test module that was due to improper partitioning
into a k x k triangular matrix for the purposes of obtaining an mr x k
micropanel of A with which to test.
- Fixed a bug in gemm and gemmtrsm test modules that would only manifest for
very large k (depending on the product of mr x kc on that architecture).
The bug arose from the fact that the test module was triggering the
allocation of blocks from the internal memory pools, which are limited in
size. This allocation imposes an implicit assumption that the micro-
panel being tested with will fit inside, and this assumption is violated
for large values of k. Arbitrarily large k may now be tested for both
operation tests.
- Added OpenMP/pthread critical sections around the setting or getting of
statuses from the induced method operation lookup table in bli_l3_ind.c.
- Added the 'static' keyword to all pthread_mutex_t global variables in BLIS.
- Thanks to Nisanth Padinharepatt of AMD for reporting the first and third
issues.

commit 206beb68ff73b75f5c382413967aacbb8a0aac3a
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Sep 9 14:10:15 2017 -0500

Updated bibtex info for BLIS5 (3m4m) article.

commit 0c8c0363aeb1f4aa88f7ec2d02403dab05a6e014
Author: sthangar <Santanu.Thangarajamd.com>
Date: Mon Aug 28 16:44:42 2017 +0530

Bug fix for the testsuite build failing

Change-Id: I7cd8c9d187387c48b2564e45cbfb8df985e93d77

commit 63d1c84465b50f64787808dd3e8494e683c16821
Author: sthangar <Santanu.Thangarajamd.com>
Date: Wed Aug 23 13:01:14 2017 +0530

Adding auto hardware detection for Zen

Change-Id: I40ce6705dd66b35000c4ccddffad1c5b65998caf

commit 537fb2a895b09be94b11947696fd2da629be24dd
Author: Devin Matthews <dmatthewsutexas.edu>
Date: Tue Aug 15 10:02:25 2017 -0500

Add vzeroupper to Intel AVX kernels.

commit 7628de3f76f78a44788807605a4601ddda445854
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Aug 10 16:24:28 2017 -0500

Removed trailing enum commas from bli_type_defs.h.

Details:
- Removed trailing commas from enums in bli_type_defs.h. Thanks to
Erling Andersen for pointing out this inconsistency and suggesting
the change.

commit a666fd4e267ffae3d4b21f38d569c61ff56adc9e
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Aug 5 13:04:31 2017 -0500

Added edge handling to _determine_blocksize_b().

Details:
- Added explicit handling of situations where i == dim to
bli_determine_blocksize_b_sub(). This isn't actually needed by any
current use case within BLIS, but handling the situation is nonetheless
prudent. Thanks to Minh Quan for reporting this issue and requesting
the fix.

commit 0c8afa546d7f33760415519ba328d7c49eb7aa06
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Aug 4 14:17:44 2017 -0500

Fixed a minor bug in level-3 packm management.

Details:
- Fixed a bug in bli_l3_packm() that caused cntl_t-cached packed mem_t
entries to be released and then re-acquired unnecessarily. (In essence,
the "<" operands in the conditional that guards the
release-and-reacquire code block simply needed to be swapped.) The bug
should have only affected performance (rather than the computed result).
Thanks to Minh Quan for identifying and reporting the bug.

commit 6cf68a185d83fa46d438fcef65258ace78e24b13
Author: Devin Matthews <dmatthewsutexas.edu>
Date: Mon Jul 31 15:19:51 2017 -0500

Change lsame_ signature to match lapacke.

commit 6a9bd97295cc4fb1cbcd28f69824a43c073c9a76
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Jul 29 20:17:05 2017 -0500

Fixed pthreads compile bug with previous commit.

Details:
- Erroneously passed family parameter into l3int_t function despite
that function not taking the parameter. Oops.

commit 95adc43d800431dc0a02ca83a51426dbef641ad6
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Jul 29 14:53:39 2017 -0500

Moved 'family' field from cntx_t to cntl_t.

Details:
- Removed the family field inside the cntx_t struct and re-added it to the
cntl_t struct. Updated all accessor functions/macros accordingly, as well
as all consumers and intermediaries of the family parameter (such as
bli_l3_thread_decorator(), bli_l3_direct(), and bli_l3_prune_*()). This
change was motivated by the desire to keep the context limited, as much
as possible, to information about the computing environment. (The family
field, by contrast, is a descriptor about the operation being executed.)
- Added additional functions to bli_blksz_*() API.
- Added additional functions to bli_cntx_*() API.
- Minor updates to bli_func.c, bli_mbool.c.
- Removed 'obj' from bli_blksz_*() API names.
- Removed 'obj' from bli_cntx_*() API names.
- Removed 'obj' from bli_cntl_*(), bli_*_cntl_*() API names. Renamed routines
that operate only on a single struct to contain the "_node" suffix to
differentiate with those routines that operate on the entire tree.
- Added enums for packm and unpackm kernels to bli_type_defs.h.
- Removed BLIS_1F and BLIS_VF from bszid_t definition in bli_type_defs.h.
They weren't being used and probably never will be.

commit a98e4aa547f61ab09dd91d11478c2a2ef9882e11
Author: Devin Matthews <dmatthewsutexas.edu>
Date: Thu Jul 20 14:50:13 2017 -0500

Clang can't make up it's mind what to support.

commit 32eb36c3e8c2add2528514272044de16faed0c8f
Author: Devin Matthews <dmatthewsutexas.edu>
Date: Thu Jul 20 12:54:58 2017 -0500

Add default define for __has_extension.

commit 2a9aa134f7c29d3d4fdc160022ff257e61885a95
Author: Devin Matthews <dmatthewsutexas.edu>
Date: Thu Jul 20 10:04:34 2017 -0500

Add fallbacks to __sync_* or __c11_atomic_* builtins when __atomic_* is not supported. Fixes 143.

commit 6f07a034d575e1e9e30bb6417b8fcb77cf301297
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Jul 19 15:40:48 2017 -0500

Updated ar option list used by all configurations.

Details:
- Dropped 'u' from the list of modifiers passed into the library archiver
ar. Previously, "cru" was used, while now we employ only "cr". This
change was prompted by a warning observed on Ubuntu 16.04:

ar: `u' modifier ignored since `D' is the default (see `U')

This caused me to realize that the default mode causes timestamps to be
zero, and thus the 'u' option, which causes only changed object files to
be inserted, is not applicable.

commit 32bc03f9eed8795cfd2f2615d1c9f8673e039c57
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Jul 19 13:51:53 2017 -0500

Added --force-version=STRING option to configure.

Details:
- Added an option to configure that allows the user to force an arbitrary
version string at configure-time. The help text also now describes the
usage information.
- Changed the way the version string is communicated to the Makefile.
Previously, it was read into the VERSION variable from the 'version' file
via $(shell cat ...). Now, the VERSION variable is instead set in
config.mk (via a configure-substituted anchor from config.mk.in).

commit befaee6dd8b2a72de9e0461fe2ec1f36e9f88f3c
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Jul 18 17:56:00 2017 -0500

Updated openmp/pthread barriers with GNU atomics.

Details:
- Updated the non-tree openmp and pthreads barriers defined in
bli_thrcomm_openmp.c and bli_thrcomm_pthreads.c to instead call a common
implementation in bli_thrcomm.c, bli_thrcomm_barrier_atomic(). This new
implementation goes through the same motions as the previous codes, but
protects its loads and increments with GNU atomic built-ins. These atomic
statements take memory ordering parameters that allow us to specify just
enough constraints for the barrier to work as intended on weakly-ordered
hardware. The prior implementation was only guaranteed to work on systems
with strongly- ordered memory. (Thanks to Devin Matthews for suggesting
this change and his crash-course in atomics and memory ordering.)
- Removed 'volatile' from structs' barrier field declarations in
bli_thrcomm_*.h.
- Updated bli_thrcomm_pthread.? files to use renamed struct barrier fields
consistent with that of the _openmp.? files.
- Updated other bli_thrcomm_* files to rename "communicator" variables to
simply "comm".

commit 8f739cc847fcff2ddeeb336f8b2b9d080eb16f6c
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Jul 17 19:03:22 2017 -0500

Added API to set mt environment variables.

Details:
- Renamed bli_env_get_nway() -> bli_thread_get_env().
- Added bli_thread_set_env() to allow setting environment variables
pertaining to multithreading, such as BLIS_JC_NT or BLIS_NUM_THREADS.
- Added the following convenience wrapper routines:
bli_thread_get_jc_nt()
bli_thread_get_ic_nt()
bli_thread_get_jr_nt()
bli_thread_get_ir_nt()
bli_thread_get_num_threads()
bli_thread_set_jc_nt()
bli_thread_set_ic_nt()
bli_thread_set_jr_nt()
bli_thread_set_ir_nt()
bli_thread_set_num_threads()
- Added include "errno.h" to bli_system.h.
- This commit addresses issue 140.
- Thanks to Chris Goodyer for inspiring these updates.

commit 10163833075fd42be5b5b503acc855f91a484cfd
Author: Marat Dukhan <maratfb.com>
Date: Thu Jul 13 21:39:24 2017 -0700

Fix Emscripten builds

commit c09b30d115eade72f44f37bf90aa848c9c0e79af
Author: Minh Quan HO <mqhokalray.eu>
Date: Fri Jul 7 10:52:05 2017 +0200

set missing free_fp in bli_membrk_init for free-ing GEN_USE buffers

The membrk's free_fp is called when releasing GEN_USE buffers, but this free_fp is
not set in bli_membrk_init

commit 997628ed9793c72e9ef576dd8d715cfec27c4862
Author: sthangar <Santanu.Thangarajamd.com>
Date: Fri Jun 30 12:23:19 2017 +0530

Reducing the framework overhead of GEMV routines

Change-Id: I83607ad767bff74e305e915b54b0ea34ec3e5684

commit ee869066168239b710ad9938bb0e1ae454883f3a
Author: Kiran Varaganti <Kiran.Varagantiamd.com>
Date: Tue Jul 4 12:57:32 2017 +0530

Improved efficiency of dGEMM for large matrices by reducing TLB load misses and majorly L3 cache misses. This is achieved by changing the packed block sizes of matrix A & B. Now the optimum values are MC_D = 510 and KC_D = 1024.

Change-Id: I2d8bdd5f62f2d1f8782ae2997f3d7a26587d1ca4

commit 7b933b90b1859c96de49a402d48de82909bc73e5
Author: Devin Matthews <dmatthewsutexas.edu>
Date: Tue Jun 6 20:23:17 2017 -0500

Add new SSI acknowledgment

commit 3485abba4b426fbf42b146a9611a0841f6d236c6
Author: sthangar <Santanu.Thangarajamd.com>
Date: Wed May 24 11:48:16 2017 +0530

Checked in the small matrix code to compute GEMM called with A transpose case

Change-Id: I29f40046d43d7a4b037c1cb322503ee26495f462

commit de16beb83b29b4b9748f70db985b0fe04db85f7d
Author: Devin Matthews <dmatthewsutexas.edu>
Date: Fri May 26 14:49:31 2017 -0400

PACKDIM_MR=8 didn't work out, but messing with the prefetching helps 2%.

commit 25d0e618544b6eea7d3f13c7aec513ac0139801d
Author: Devin Matthews <dmatthewsgator3.ufhpc>
Date: Fri May 26 14:47:36 2017 -0400

Revert "Change PACKDIM_MR (double) for haswell to 8."

This reverts commit 681eec913d7c2ebcff637cec5c1627ced9a92b99.

commit c5bdd84b35bc2a8ebf55b7763fb56c0c945be0cb
Author: Devin Matthews <dmatthewsutexas.edu>
Date: Fri May 26 12:28:09 2017 -0500

Change PACKDIM_MR (double) for haswell to 8.

commit 172789d562001293b973bbdd8015bd27d37292e8
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed May 17 13:03:52 2017 -0500

Restored deleted lines from makefile fragments.

commit 3ea9bd2c8e90dbd35655fa6a5b953dfea1f308fe
Author: Devin Matthews <dmatthewsutexas.edu>
Date: Wed May 17 12:29:44 2017 -0500

Change to /bin/sh.

All scripts checked with Debian's checkbashisms. Also check for clang first in auto-detect.sh.

commit 49438409eedb98d3f0ebf00b8d1eee0ae45f4f8c
Author: Devin Matthews <dmatthewsutexas.edu>
Date: Wed May 17 12:27:14 2017 -0500

Remove shebangs from makefiles.

commit 497e2640474c016d576dce3530fa6a66891642a0
Author: J M Dieterich <dieterichogolem.org>
Date: Tue May 16 23:11:22 2017 -0400

Fix if/else structure. Thanks to TravisCI.

commit 835035c56a8de36ad25bb8d1375db170d489ef57
Author: J M Dieterich <dieterichogolem.org>
Date: Tue May 16 22:23:27 2017 -0400

Mark piledriver compilable w/ clang.

commit 6cdb533472ee61af297c1f948307abbf45828887
Author: J M Dieterich <dieterichogolem.org>
Date: Tue May 16 22:12:12 2017 -0400

Mark bulldozer compilable w/ clang.

commit a85697d62272da06d28cd1c947f6cf1098df6467
Author: J M Dieterich <dieterichogolem.org>
Date: Tue May 16 22:06:59 2017 -0400

Correct error message.

commit e0c64cad271058688a2b999caf8c2767dc3aef7e
Author: J M Dieterich <dieterichogolem.org>
Date: Tue May 16 22:03:23 2017 -0400

Indeed once can compile for carrizo also using clang.

commit 4aafe0505d3f0954d095ded5459a76976e5093b4
Author: J M Dieterich <dieterichogolem.org>
Date: Tue May 16 21:50:49 2017 -0400

A bunch of shebang fixes from unportable /bin/bash to portable /usr/bin/env bash

commit abaeaa68ea11e84be1810f564d6f38d506cbeb6a
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri May 5 15:06:56 2017 -0500

Fixed a bug in norm1v, norm1m.

Details:
- Fixed a bug that manifested as improperly-computed 1-norm for vectors
and matrices. This is one of the few operations in BLIS that does not
have its own test module within the testsuite, hence why it went
undetected for so long. The bad 1-norms were being used to normalize
matrices in the testsuite after initialization, which led to some
matrices containing a combination of "large" and "small" values. This
tended to push the residuals computed after each test away from zero.
In some cases, they were off *just* enough to the testsuite to label
it a "failure". Many thanks to Jeff Hammond for reporting this bug.
(Wonky details: the bug was due to improperly-defined level-0 scalar
macros for abval2, an operation that computes the absolute square,
or complex magnitude/modulus. Certain complex domain instances of
abval2 were being incorrectly defined in terms of real-only solutions,
leading to bad results. This level-0 operation forms the basis of
norm1v/norm1m. absq2 was also affected, but almost nothing uses
this operation.)

commit cc3107ae1c2074f72b724aa748d2e5b4cb290ed5
Author: Devin Matthews <dmatthewsutexas.edu>
Date: Thu May 4 10:35:22 2017 -0500

Setting any one of BLIS_NT_[IJ][CR] overrides BLIS_NUM_THEADS. Missing BLIS_NT_XX's are defaulted to 1. Fixes 123.

commit c8ab91f70d399ee14edd30a3a5c46b24c5d2f910
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed May 3 15:04:51 2017 -0500

Disable complex 3m/4m in testsuite by default.

Details:
- Disabled testsuite tests of all level-3 implementations based on 3m
and 4m. This will improve testing runtime on Travis CI as well as for
anyone manually running the testsuite using default test parameters.
Thanks to Devin Matthews for suggesting this change.

commit 9700f0e5785007ddafb72a5ca83800dee61fd35c
Author: Jeff Hammond <jeff.sciencegmail.com>
Date: Tue May 2 19:25:21 2017 -0700

allow KNL build without hbwmalloc.h (i.e. emulated)

we want to be able to run BLIS KNL binaries on non-KNL machines via SDE.
although it is possible to install hbwmalloc implementation on such
systems, it is easier not to, since obviously the performance of SDE
execution is not representative so there is no reason to emulate HBW
allocation.

commit 17dcd5a33ff91967f67e7c0ba09b4f18754609a4
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue May 2 16:48:43 2017 -0500

Fixed stray parentheses in README citations.

commit 2910d44ff9e1d951d3249313f4ab39d18ea1b48d
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue May 2 16:38:43 2017 -0500

CHANGELOG update (0.2.2)

commit 5ca3863220e07972fcefc6682ddd3f6e54fe4a94
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue May 2 15:48:30 2017 -0500

Fixed a trsm1m bug that affected right-side cases.

Details:
- Fixed a bug introduced in 1c732d3 that affected trsm1m_r. The result
was nondeterministic behavior (usually segmentation faults) for certain
problem sizes beyond the 1m instance of kc (e.g. 128 on haswell). The
cause of the bug was my commenting out lines in bli_gemm1m_ukr_ref.c
which explicitly directed the virtual gemm micro-kernel to use temporary
space if the storage preference of the [real domain] gemm ukernel did
not match the storage of the output matrix C. In the context of gemm,
this handling is not needed because agreement between the storage pref
and the matrix is guaranteed by a high-level optimization in BLIS.
However, this optimization is not applied to trsm because the storage
of C is not necessarily the same as the storage of the micro-panels of
B--both of which are updated by the micro-kernel during a trsm
operation. Thus, the guarantee of storage/preference agreement is not
in place for trsm, which means we must handle that case within the
virtual gemm micro-kernel.
- Comment updates and a minor macro change to bli_trsm*_cntx_init() for
3m1, 4m1a, and 1m.

commit 1af0b09f5c275ee7bac896cc6f36f42af721d9b5
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue May 2 12:09:39 2017 -0500

README.md update.

Details:
- Updated bibtex entries for 4th BLIS paper, and adds entries for 5th
and 6th BLIS papers.

commit db4a0bb8ba7cd697d68be8e5632371ee3e59fd63
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Mar 17 12:07:27 2017 -0500

Whitespace reformatting to armv8a kernels file.

Details:
- Updated formatting of function signature/header in
kernels/armv8a/3/bli_gemm_opt_4x4.c.

commit e3eb01f6b990e205b15edcbaffd3d54b3ddd1ca4
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Feb 21 15:33:39 2017 -0600

Disabled experiment-related 1m code.

Details:
- Commented out code in frame/ind/oapi/bli_l3_3m4m1m_oapi.c that was
specifically inserted to facilitate the benchmarking of 1m block-panel
and panel-block algorithms.
- Updates to test/3m4m/Makefile, runme.sh script, and test_gemm.c to
reflect changes used/needed during benchmarking.

commit 4f61528d56eed6a139eeac9db0c44e56f2d2d136
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Jan 25 16:25:46 2017 -0600

Added 1m-specific APIs for bp, pb gemm algorithms.

Details:
- Defined bli_gemmbp_cntl_create(), bli_gemmpb_cntl_create(), with the
body of bli_gemm_cntl_create() replaced with a call to the former.
- Defined bli_cntl_free_w_thrinfo(), bli_cntl_free_wo_thrinfo(). Now,
bli_cntl_free() can check if the thread parameter is NULL, and if so,
call the latter, and otherwise call the former.
- Defined bli_gemm1mbp_cntx_init(), bli_gemm1mpb_cntx_init(), both in
terms of bli_gemm1mxx_cntx_init(), which behaves the same as
bli_gemm1m_cntx_init() did before, except that an extra bool parameter
(is_pb) is used to support both bp and pb algorithms (including to
support the anti-preference field described below).
- Added support for "anti-preference" in context. The anti_pref field,
when true, will toggle the boolean return value of routines such as
bli_cntx_l3_ukr_eff_prefers_storage_of(), which has the net effect of
causing BLIS to transpose the operation to achieve disagreement (rather
than agreement) between the storage of C and the micro-kernel output
preference. This disagreement is needed for panel-block implementations,
since they induce a transposition of the suboperation immediately before
the macro-kernel is called, which changes the apparent storage of C. For
now, anti-preference is used only with the pb algorithm for 1m (and not
with any other non-1m implementation).
- Defined new functions,
bli_cntx_l3_ukr_eff_prefers_storage_of()
bli_cntx_l3_ukr_eff_dislikes_storage_of()
bli_cntx_l3_nat_ukr_eff_prefers_storage_of()
bli_cntx_l3_nat_ukr_eff_dislikes_storage_of()
which are identical to their non-"eff" (effectively) counterparts except
that they take the anti-preference field of the context into account.
- Explicitly initialize the anti-pref field to FALSE in
bli_gks_cntx_set_l3_nat_ukr_prefs().
- Added bli_gemm_ker_var1.c, which implements a panel-block macro-kernel
in terms of the existing block-panel macro-kernel _ker_var2(). This
technique requires inducing transposes on all operands and swapping
the A and B.
- Changed bli_obj_induce_trans() macro so that pack-related fields are
also changed to reflect the induced transposition.
- Added a temporary hack to bli_l3_3m4m1m_oapi.c that allows us to easily
specify the 1m algorithm (block-panel or panel-block).
- Renamed the following cntx_t-related macros:
bli_cntx_get_pack_schema_a() -> bli_cntx_get_pack_schema_a_block()
bli_cntx_get_pack_schema_b() -> bli_cntx_get_pack_schema_b_panel()
bli_cntx_get_pack_schema_c() -> bli_cntx_get_pack_schema_c_panel()
and updated all instantiations. Also updated the field names in the
cntx_t struct.
- Comment updates.

commit 1d728ccb2394e77365e7c42683db6579c5fba014
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Nov 25 18:29:49 2016 -0600

Implemented the 1m method.

Details:
- Implemented the 1m method for inducing complex domain matrix
multiplication. 1m support has been added to all level-3 operations,
including trsm, and is now the default induced method when native
complex domain gemm microkernels are omitted from the configuration.
- Updated _cntx_init() operations to take a datatype parameter. This was
needed for the corresponding function for 1m (because 1m requires us
to choose between column-oriented or row-oriented execution, which
requires us to query the context for the storage preference of the
gemm microkernel, which requires knowing the datatype) but I decided
that it made sense for consistency to add the parameter to all other
cntx initialization functions as well, even though those functions
don't use the parameter.
- Updated bli_cntx_set_blkszs() and bli_gks_cntx_set_blkszs() to take
a second scalar for each blocksize entry. The semantic meaning of the
two scalars now is that the first will scale the default blocksize
while the second will scale the maximum blocksize. This allows scaling
the two independently, and was needed to support 1m, which requires
scaling for a register blocksize but not the register storage
blocksize (ie: "packdim") analogue.
- Deprecated bli_blksz_reduce_dt_to() and defined two new functions,
bli_blksz_reduce_def_to() and bli_blksz_reduce_max_to(), for reducing
default and maximum blocksizes to some desired blocksize multiple.
These functions are needed in the updated definitions of
bli_cntx_set_blkszs() and bli_gks_cntx_set_blkszs().
- Added support for the 1e and 1r packing schemas to packm, including
1e/1r packing kernels.
- Added a minor optimization to bli_gemm_ker_var2() that allows, under
certain circumstances (specifically, real domain beta and row- or
column-stored matrix C), the real domain macrokernel and microkernel
to be called directly, rather than using the virtual microkernel
via the complex domain macrokernel, which carries a slight additional
amount of overhead.
- Added 1m support to the testsuite.
- Added 1m support to Makefile and runme.sh in test/3m4m. Also simplified
some code in test_gemm.c driver.

commit 0d1b90286e29aa8b768e280b5286d92c02ad87a1
Author: Jeff Hammond <jeff.sciencegmail.com>
Date: Tue Oct 25 21:15:26 2016 -0700

never use libm with Intel compilers

Intel compilers include a highly optimized math library (libimf) that
should be used instead of GNU libm.

yes, this change is for ALL targets, including those that are not
supported by the Intel compiler. there is no harm in doing this, and it
is future-proof in the event that the Intel compilers support other
architectures.

commit b150870397e7aee558e61d1bd72a0c0d1d99bee8
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Dec 8 16:08:41 2017 -0600

Removed most "old" directories.

Details:
- Removed the vast majority of directories named "old", which contained
deprecated code that I wasn't quite ready to jettison from the source
tree.

commit 270c65985df849297ba1951aa3b56c03948d7775
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Dec 8 15:21:18 2017 -0600

Modified bli_getopt() for thread-safety.

Details:
- Changed the interface of bli_getopt() to take a new argument, a getopt_t
struct, that stores the values of optarg, optind, opterr, and optopt,
and updated the implementation accordingly. (Previously, these
variables were assumed to be global.)
- Added a function for initializing a getopt_t struct.
- Changed test_libblis.c--currently the only consumer of bli_getopt()--to
utilize the new getopt_t state object.

commit ce4d8fabc2e39371f89c12192fb707be82ae021a
Merge: 39be59f2 e05a8dfa
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Dec 7 17:36:44 2017 -0600

Merge branch 'master' of github.com:flame/blis

commit 39be59f2a8470f40475907d9dd52639b8a911a92
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Dec 7 17:35:20 2017 -0600

Replaced several macros with static function APIs.

Details:
- Reimplemented several sets of get/set-style preprocessor macros with
static functions, including those in the following frame/base headers:
auxinfo, cntl, mbool, mem, membrk, opid, and pool. A few headers in
frame/thread were touched as well: mutex_*, thrcomm, and thrinfo.

commit e05a8dfa7cc7df41e966c1ad04e51c482b308b23
Merge: 79507337 4423e33d
Author: dnp <devangiparikhgmail.com>
Date: Wed Dec 6 16:45:24 2017 -0600

Merge branch 'rt'

commit 4423e33dc593115cda92c5763d756d7ad1298aa9
Author: dnp <devangiparikhgmail.com>
Date: Wed Dec 6 16:35:03 2017 -0600

Adding SKX kernels and configuration.

commit 79507337e140daec7639f6eb3ed9cfe6e123d342
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Dec 6 16:21:35 2017 -0600

Various checks to ensure that arch_t id is in range.

Details:
- Expanded checking of the arch_t id in bli_gks.c--either passed in from
the caller or as returned from bli_arch_query_id()--against the expected
range of id values. Thanks to Devangi Parikh for suggesting these
additional sanity checks.

commit fde7c1126c58373ecde83471890b257399144876
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Dec 4 16:11:01 2017 -0600

Added 'uninstall-old-headers' target to Makefile.

Details:
- Defined a new 'uninstall-old-headers' target that allows users of BLIS to
uninstall no-longer-needed headers left over from previous installations.
- Fixed the 'uninstall-old' target so that it will install both .a and .so
libraries.
- Renamed 'uninstall-old' to 'uninstall-old-libs'.
- Added 'uninstall-old' target (different from previous 'uninstall-old'
target) that combines 'uninstall-old-libs' and 'uninstall-old-headers'.

commit d4ee770bde213a87aa6049245145318324dc6b51
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Dec 4 14:53:43 2017 -0600

Create/install monolithic cblas.h.

Details:
- When CBLAS is enabled at configure-time, BLIS now creates a monolithic
cblas.h using the same flatten-header.sh script that was recently
introduced for creating monolithic blis.h header files. The top-level
Makefile will also install this cblas.h file into the install prefix
alongside blis.h when the 'install' target is invoked. The two header
files are compatible with one another. Regardless whether the user's
source includes cblas.h, both blis.h and cblas.h, or just blis.h,
the user will get the CBLAS function prototypes and enums, as expected.

commit 52f9e6f1b6468785af8947317656445d4729fc8b
Merge: ab57b979 21360dd8
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Dec 1 12:28:09 2017 -0600

Merge branch 'rt'

commit 21360dd8e2c7287100645e109acaabcc6ba1140c
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Nov 29 14:11:34 2017 -0600

Fixed cntx_t packm query when ker_id > _NUM_PACKM_KERS.

Details:
- Fixed a subtle bug in bli_cntx_get_[un]packm_ker_dt() in which the
function fails to return NULL when passed a kernel id argument that is
equal to or beyond BLIS_NUM_[UN]PACKM_KERS. Instead, the function was
attempting to index into the cntx_t's packm kernel array, which resulted
in undefined behvaior. Thanks to Devangi Parikh for finding this bug.

commit 244a6f4e66e8ff091e995f8090ce779c1928aa8b
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Nov 28 17:48:48 2017 -0600

Fixed POSIX sed non-compliance in flatten-header.sh.

Details:
- Changed GNU usage of 'i' and 'a' sed commands used in flatten-header.sh
to POSIX-compliant usage that will work on OS X's sed.

commit 45078621676833e53a2878af8f89479c4f93b8ab
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Nov 28 15:16:22 2017 -0600

Generate/compile with/install monolithic blis.h.

Details:
- Rewrote monolithify-header.sh (and renamed to flatten-header.sh) so that
headers are inserted recursively. This improves performance by a factor
of 3-4x.
- Modified configure to create an 'include/<configname>' directory in which
make can create a monolithic header.
- Modified the top-level Makefile so that a monolithic header is generated
unconditionally prior to compilation (stored in include/<configname>) and
so that the single header is installed instead of the 450 or so header
files that reside throughout the framework source tree.
- Added "include/*/*.h" to .gitignore file.
- Removed some pnacl/emscripten leftovers that I intended to include in
a1caeba (mostly in testsuite/Makefile).
- Trivial comment changes to frame/include/bli_f2c.h.

commit 1f30b1301bf6d6047ec29e57a5fde8eb1072a0ee
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Nov 25 16:54:26 2017 -0600

Added missing framework support for x86_64 family.

Details:
- Added support for the x86_64 configuration family to bli_arch.c and
bli_arch_config.h. Thanks to Johannes Dieterich for reporting this
issue.
- Bumped the default value for BLIS_SIMD_NUM_REGISTERS from 16 to 32 and
the default value for BLIS_SIMD_SIZE from 32 to 64. This will support
configuration families that include Skylake and newer processors without
any supported needed in the bli_family_*.h file. The semantics of these
values have always been "maximum" and not exact values; comments in
bli_kernel_macro_defs.h and the github wiki have been adjusted
accordingly.

commit 9f39806c4ed484c9ed13edf96005838d977722a9
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Nov 21 16:03:56 2017 -0600

Fixed a bug in e31f0b3/b131b9a.

Details:
- Erroneously placed the "don't overwrite existing blocksize" logic in
bli_blksz_init*() rather than in bli_cntx_set_blkszs(). It belongs in
the latter because that function copies blocksizes as-is from the
blksz_t function argument to the appropriate field in the cntx_t. If
the blksz_t was previously initialized selectively, based on the sign
of the blocksize value passed into bli_blksz_init*(), that just leaves
some fields possibly uninitialized (with garbage values), which
definitely will not work.
- The aforementioned logic has been moved to bli_cntx_set_blkszs() via
a new function bli_blksz_copy_if_pos(), which selectively copies only
the blocksizes that are greater than zero.

commit b131b9a025c15f548d4c2952a9ec85eee3d139b1
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Nov 21 14:30:26 2017 -0600

Updated configs to omit setting some blocksizes.

Details:
- Employ the new semantics of bli_blksz_init*() in e31f0b3 in various
sub-configurations' bli_cntx_init_*() functions by passing in 0 for
register and cache blocksizes that correpond to gemm microkernel
datatypes that were not registered, allowing the default values
set by the bli_cntx_init_*_ref() function call to remain.

commit 499a4c002f895744ecaf81ef7f62d2d6d0d7d594
Merge: e31f0b3e 6c3ba502
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Nov 21 14:25:08 2017 -0600

Merge branch 'rt' of github.com:flame/blis into rt

commit e31f0b3e2dba19ca8a2946bc21beb136a42d0f57
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Nov 21 14:21:25 2017 -0600

Subtle update to bli_blksz_init*() API.

Details:
- Updated the semantics of bli_blksz_init() and bli_blksz_init_ed() so
that non-positive blocksize values are ignored entirely. This provides
an easy way to indicate that certain existing values should not be
touched by the update. Thanks to Devangi Parikh for feedback that led
to these changes.

commit 6c3ba502a11f87bc67555d26154cfd39d0af1bac
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Nov 21 13:50:53 2017 -0600

Added 'x86_64' sub-config directory.

Details:
- Added missing x86_64 configuration directory, which was intended to be
part of b7ca580.
- Added -Wfatal-errors compiler warning flag to all configurations so that
compilation stops after the first error.
- Changed the vectorization flags for intel64 configuration to be compatible
with 'penryn', the oldest sub-config included in that family.
- Changed the vectorization flags for penryn to target the 'core2'
microarchitecture and ssse3.

commit 25eee3cc49b0631812485d4d5ceef0c23ed1b6dd
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Nov 21 12:34:20 2017 -0600

Added a dummy file to kernels/generic.

Details:
- Added a dummy file to kernels/generic, which was previously empty, so
that git would begin tracking the otherwise-empty directory. This
directory's existence is necessary for proper execution of configure
for any configuration family that contains the 'generic'
sub-configuration. Thanks to Johannes Dieterich for reporting the
issue that led to this fix.

commit ef024ce4cafa217669eaabb31ff8ab6df93cca05
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Nov 20 18:08:29 2017 -0600

More tweaks to monolithify-header.sh

Details:
- Further fixes monolithify-header.sh script.
- Removed unnecessary include "blis.h" from frame/3/bli_l3_packm.h.

commit 5028e7dec269b62895511453272585da36e591b5
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Nov 20 17:00:37 2017 -0600

Second attempt to implement travis_wait.

Details:
- Corrected accidental misplacement of the travis_wait prefix (on the
wrong line of the .travis.yml file) in commit 13e5d91.

commit 13e5d9107b3763cba46fb1bae87476852601b47c
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Nov 20 15:57:06 2017 -0600

Added travis_wait prefix to testsuite via Travis.

Details:
- It appears that Travis CL has implemented a new policy that results in
a test failing if it does not produce any output for more than 10
minutes. (Two test instances are now failing in Travis despite the most
recent commit not affecting the library or testsuite.) This issue can
be worked around by executing the test run via travis_wait, which takes
an optional time parameter. This commit attempts to use 'travis_wait 30'
in the .travis.yml file to prevent the early failure at 10 minutes.

commit a1caeba0ea79c8fecb1abadca1f91c6367ab3afb
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Nov 20 13:31:20 2017 -0600

Removed pnacl, emscripten support from Makefile.

commit 78199c539beaa50f37893add220261ce0dcb921a
Merge: b3d8ab2e ab57b979
Author: praveeng <praveen.gamd.com>
Date: Mon Nov 20 15:51:20 2017 +0530

Merge master code till 01-Nov-2017 to amd-staging

Change-Id: I40b53f876db84c8b947b3f2385c9b882245c6603

commit 9df6dda9ec51a0d40166169d2d8a2f84b42266e6
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Nov 18 19:03:26 2017 -0600

Improvements, bugfixes to monolithify-header.sh.

commit 21d26201f90b884eb8d5de279ed74bbd244ffcb5
Merge: 43baa3b3 b7ca5806
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Nov 18 14:16:53 2017 -0600

Merge branch 'rt' of github.com:flame/blis into rt

commit 43baa3b327d5ae1e2ba619432687b4dd849b05e3
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Nov 18 14:14:44 2017 -0600

Removed unnecessary flags for generic config.

Details:
- Removed -D_POSIX_C_SOURCE=200112L and -m64 flags from make_defs.mk file
of generic sub-configuration. These flags are generally not necessary,
and particularly not desirable for the generic configuration since they
unnecessarily restrict the environments in which the configuration can
be built.

commit b7ca580618f9382b7982168fd035ed058f83e4c2
Author: iotamudelta <dieterichogolem.org>
Date: Sat Nov 18 14:56:05 2017 -0500

[WIP] Add x86 and x86_64 processor families. (154)

* Add x86 and x86_64 processor families.
* Use generic config as fallback for more families.

After discussion with fgvanzee, a) it's "generic" and 2) use it for all the families as a fallback. Goal is that if a specific CPU is not yet supported by a family (say a new Intel microarchitecture on x86_64), it'll fall through to still work with the slower "generic" kernels

commit 870597d1663aaba1b74d7654b1d4946280aa0d3f
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Nov 17 17:06:42 2017 -0600

Added bash script for creating monolithic headers.

Details:
- Added a new script, monolithify-header.sh, to the 'build' directory.
This script recursively replaces all include directives in a selected
file with the contents of the header files referenced by each directive.
The idea is to "flatten" a tree of .h files into a single file, with
the script acting as a C preprocessor that only processes include
directives.

commit c76f77f4cc1e71988251c5e63cf6ef137477bf9c
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Nov 17 15:10:52 2017 -0600

Removed unnecessary include "blis.h" from header.

Details:
- Removed an errant include "blis.h directive from bli_cntx_ind_stage.h.
The generaly policy is that no header file in BLIS should include
blis.h. This will be important in the near future when using a tool to
recursively create a monolithic blis.h file from its consitutent
headers.

commit 2bb9bc6e9536fa239fbc19a7efaaf151116e15b4
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Nov 17 13:50:14 2017 -0600

Miscellaneous tweaks to gks, rt functionality.

Details:
- Updated bli_cpuid_query_id() so that BLIS_ARCH_GENERIC is always returned
if the hardware fails to test positive for any supported sub-configuration.
- Defined bli_gks_init_ref_cntx(), which will call the context initialization
function bli_cntx_init_configname() for the sub-configuration 'configname'
associated with the arch_t id returned by bli_arch_query_id(). This makes
initializing a reference context easy for experts who wish to construct
those contexts.

commit b3d8ab2ea02c127ab241532abc214624f35bfaab
Merge: 189ffbb0 fe71c06e
Author: Santanu Thangaraj <Santanu.Thangarajamd.com>
Date: Wed Nov 15 01:33:12 2017 -0500

Merge "Added AMD copyright line to the changed files in last 3 commits" into amd-staging

commit fe71c06e42b072407c83112779055b0afb67173d
Author: Nisanth M P <nisanth.padinharepattamd.com>
Date: Wed Nov 15 11:11:17 2017 +0530

Added AMD copyright line to the changed files in last 3 commits

Change-Id: I37d5dbbbe1b199e07529610a5e9cc9e49d067c66

commit d5bf79e50bf97072bbe7117c86b7c45e6e707ea0
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Nov 13 14:24:29 2017 -0600

Miscellaneous tweaks and fixes.

Details:
- Fixed incorrect calling sequence in bli_cntx_init_knl.c--an instance of
bli_blksz_init_easy() that should have been bli_blksz_init().
- Fixed a bug in code that is supposed to output the list of sub-directories
in the 'config' directory when configure script is run with no arguments.
- Expanded the output of "make showconfig" to include more info from config.mk.
- Minor changes to build/auto-detect/cpuid_x86.c, mostly in preparation for
someone to add excavator and zen support.
- Added a link to the ConfigurationHowTo wiki to config_registry.
- Other minor tweaks to configure.

commit 673e5184030532c4ebd9fdeecbaa6442bb3ad54f
Merge: 2c51356a 8f150f28
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Nov 1 17:37:42 2017 -0500

Merge branch 'rt' of github.com:flame/blis into rt

commit 2c51356a8b2699c99f9507c80d69c08a35d45fe3
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Nov 1 17:37:02 2017 -0500

Implemented runtime hardware detection via cpuid.

Details:
- Added runtime support for selecting an appropriate arch_t value based
on the results of the cpuid instruction (for x86_64). This allows
deferral of choosing a context (kernels, blocksizes, etc.) until
runtime, which allows BLIS to be built with support for multiple
microarchitectures. Currently, only amd64 and intel64 configurations
are registered in the config_registry; however, one could create
custom configuration families to support arbitrary sets of x86_64
microarchitectures.
- Current Intel microarchitectures supported via cpuid are knl, haswell,
sandybridge, and penryn.
- Current AMD microarchitectures supported via cpuid are: zen, excavator,
steamroller, piledriver, and bulldozer.

commit ab57b979046479bcda7f83165838a80117c2ad95
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Nov 1 11:51:41 2017 -0500

Revert to default SIMD alignment for bulldozer.

Details:
- Removed the default-overriding define of BLIS_SIMD_ALIGN_SIZE set in
config/bulldozer/bli_kernel.h. Not sure where this value came from, but
it would seem to allow for insufficient starting address alignment for
any matrices created via bli_malloc_user(), such as via
bli_obj_create(). Thanks to Rene Sitt for reporting the behavior that
led us to this bug.
- This commit is a manual patch of the same fix made to the 'rt' branch
in 8f150f2.

commit 8f150f28a678c4a0c1591400177ad7cca81fcaec
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Nov 1 11:41:45 2017 -0500

Revert to default SIMD alignment for bulldozer.

Details:
- Removed the default-overriding define of BLIS_SIMD_ALIGN_SIZE set in
bli_family_bulldozer.h. Not sure where this value came from, but it
would seem to allow for insufficient starting address alignment for
any matrices created via bli_malloc_user(), such as via
bli_obj_create(). Thanks to Rene Sitt for reporting the behavior that
led us to this bug.

commit e3f10557caf114441fbfff990e3ce3576c177bdc
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Oct 30 13:37:54 2017 -0500

Use perl for some substitution for OS X compatibility.

Details:
- Discovered that sed commands where the replacement string contains '\n'
are problematic with the version of sed present in OS X. For these cases
cases in the configure script, we instead use 'perl -pe' for
search-and-replace functionality.
- Various other minor comment/whitespace tweaks to configure.
- Removed remaining lines of code related to setting/checking variables to
track "unregistered" configurations.

commit dd45cfdfc3d8f9acf4cf7f69138d9b83dafc8842
Merge: 3e4f42a4 f60c827b
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Oct 30 12:23:05 2017 -0500

Merge branch 'master' into rt

commit f60c827ba95f452c8454fb914f5564f4895bf644
Author: Devin Matthews <dmatthewsutexas.edu>
Date: Mon Oct 30 10:04:42 2017 -0500

Fix CVECFLAGS for bulldozer config.

commit 3e4f42a4d2ebb37b95988933d92e561c5b2cc201
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Oct 27 11:41:37 2017 -0500

Typecast l1mkr_t enum value prior to comparison.

Details:
- Typecast l1mkr_t enum value in bli_cntx.h to guint_t before testing for
out-of-range value. This is an attempt to pacify a strange warning from
clang on OS X that is seemingly the result of the following compiler
warning flag:
-Wtautological-constant-out-of-range-compare

commit aec6e038d942d35b81bbd723a640cce2c054fb8e
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Oct 26 16:12:36 2017 -0500

Removed associative arrays from configure.

Details:
- Implemented a replacement for associative arrays in the configure script
that does not utilize arrays, and therefore works in pre-4.0 versions of
bash. (It appears that Mac OS X will be stuck with version 3.2 indefinitely
due to bash switching to the GPL 3.0 license starting with version 4.0.)

commit 189ffbb0d37262b21acddc0d35b4a22f2cbbca94
Merge: 06e0e635 3eb44f67
Author: Santanu Thangaraj <Santanu.Thangarajamd.com>
Date: Wed Oct 25 02:00:30 2017 -0400

Merge changes Ie115b206,I7ce6cfa2,Iff59b6f4 into amd-staging

* changes:
Adding __attribute__((constructor/destructor)) for CLANG case.
Thread Safety: Move bli_init() before and bli_finalize() after main()
Thread safety: Make the global induced method status array local to thread

commit 3eb44f67618b91ae5f5f0aaaba67e38f16042ee4
Author: Nisanth M P <nisanth.padinharepattamd.com>
Date: Tue Oct 24 16:36:36 2017 +0530

Adding __attribute__((constructor/destructor)) for CLANG case.

CLANG supports __attribute__, but its documentation doesn't
mention support for constructor/destructor. Compiling with
clang and testing shows that it does support this.

Change-Id: Ie115b20634c26bda475cc09c20960d687fb7050b

commit 07c352188bf5265af242255f8e6fcb97050d973d
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Oct 23 16:59:22 2017 -0500

Added "generic" configuration.

Details:
- Added a "generic" configuration that leaves the default blocksizes and
kernels unchanged. This replaces the older "reference" configuration.
Updated auto-detect script and code accordingly.
- Added support for generic configuration to arch_t (bli_type_defs.h),
bli_gks_init() (bli_gks.c), and bli_arch_config.h
- Moved bli_arch_query_id() to bli_arch.c (and prototype to bli_arch.h).
- Whitespace changes to configurations' make_defs.mk files.

commit c1a98d6f70608b02a1e6bcad6ba020a60773dace
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Oct 23 14:24:41 2017 -0500

Minor update to .travis.yml file.

commit 75b9383f01caa8b83f8be0117e15085b0d807ba6
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Oct 20 16:41:22 2017 -0500

Minor header renaming ahead of bli_arch.c.

Details:
- Renamed the various configurations' "bli_arch_<configname>.h" header files
(replacing "arch" with "family") to free up the 'bli_arch' namespace for a
different purpose (hardware detection).
- Renamed "bli_arch.h" and "bli_arch_pre_macro_defs.h" in frame/include to
"bli_arch_config.h" and "bli_arch_config_pre.h", respectively.

commit 482af51add26d5ed103c3e3f167657f273b32c7a
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Oct 20 15:44:26 2017 -0500

Fixed 'make test' target from top-level Makefile.

Details:
- Updated the top-level Makefile's build rule for testsuite object files to
properly obtain CFLAGS via get-frame-cflags-for() function instead of
simply using the $(CFLAGS) variable (which is empty). This means that
'make test' should now work as expected.

commit 3c269f700d207efe6c04193f09d519c88c1d4045
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Oct 20 13:57:21 2017 -0500

Makefile updates for test drivers, testsuite.

Details:
- Fixed semi-broken testsuite Makefile and very-broken test driver Makefiles,
as well as those for test/3m4m, test/thread_ranges, and test/exec_sizes
sub-directories.
- Factored out much of the top-level Makefile into common.mk. A Makefile
needs only set DIST_PATH to the relative path to the top level of the
BLIS source distribution before including common.mk in order to acquire
all of the definitions typically needed in a Makefile that tests BLIS.

commit 0557189d463446b4c32077cdcf0467fa71ca68dc
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Oct 18 15:05:27 2017 -0500

Minor updates to .travis.yml, configure script.

commit 2553734d1d62043793f4e783a027349ef6d4d563
Merge: 453deb29 37534279
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Oct 18 13:46:50 2017 -0500

Merge branch 'master' into rt

commit 375342799cbae981c28d831793af588d7951f3f6
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Oct 18 13:41:25 2017 -0500

Removed a duplicate bli_avx512_macros.h header.

Details:
- Removed a duplicate header file that was causing problems during
installation for the 'knl' configuration. Thanks to Victor Eijkhout
for reporting this issue.

commit 453deb29068889698e274f269c9aa90eea99b527
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Oct 18 13:29:32 2017 -0500

Implemented runtime kernel management.

Details:
- Reworked the build system around a configuration registry file, named
config_registry', that identifies valid configuration targets, their
constituent sub-configurations, and the kernel sets that are needed by
those sub-configurations. The build system now facilitates the building
of a single library that can contains kernels and cache/register
blocksizes for multiple configurations (microarchitectures). Reference
kernels are also built on a per-configuration basis.
- Updated the Makefile to use new variables set by configure via the
config.mk.in template, such as CONFIG_LIST, KERNEL_LIST, and KCONFIG_MAP,
in determining which sub-configurations (CONFIG_LIST) and kernel sets
(KERNEL_LIST) are included in the library, and which make_defs.mk files'
CFLAGS (KCONFIG_MAP) are used when compiling kernels.
- Reorganized 'kernels' directory into a "flat" structure. Renamed kernel
functions into a standard format that includes the kernel set name
(e.g. 'haswell'). Created a "bli_kernels_<kernelset>.h" file in each
kernels sub-directory. These files exist to provide prototypes for the
kernels present in those directories.
- Reorganized reference kernels into a top-level 'ref_kernels' directory.
This directory includes a new source file, bli_cntx_ref.c (compiled on
a per-configuration basis), that defines the code needed to initialize
a reference context and a context for induced methods for the
microarchitecture in question.
- Rewrote make_defs.mk files in each configuration so that the compiler
variables (e.g. CFLAGS) are "stored" (renamed) on a per-configuration
basis.
- Modified bli_config.h.in template so that bli_config.h is generated with
defines for the config (family) name, the sub-configurations that are
associated with the family, and the kernel sets needed by those
sub-configurations.
- Deprecated all kernel-related information in bli_kernel.h and transferred
what remains to new header files named "bli_arch_<configname>.h", which
are conditionally included from a new header bli_arch.h. These files
are still needed to set library-wide parameters such as custom
malloc()/free() functions or SIMD alignment values.
- Added bli_cntx_init_<configname>.c files to each configuration directory.
The files contain a function, named the same as the file, that initializes
a "native" context for a particular configuration (microarchitecture). The
idea is that optimized kernels, if available, will be initialized into
these contexts. Other fields will retain pointers to reference functions,
which will be compiled on a per-configuration basis. These bli_cntx_init_*()
functions will be called during the initialization of the global kernel
structure. They are thought of as initializing for "native" execution, but
they also form the basis for contexts that use induced methods. These
functions are prototyped, along with their _ref() and _ind() brethren, by
prototype-generating macros in bli_arch.h.
- Added a new typedef enum in bli_type_defs.h to define an arch_t, which
identifies the various sub-configurations.
- Redesigned the global kernel structure (gks) around a 2D array of cntx_t
structures (pointers to cntx_t, actually). The first dimension is indexed
over arch_t and the inner dimension is the ind_t (induced method) for
each microarchitecture. When a microarchitecture (configuration) is
"registered" at init-time, the inner array for that configuration in the
2D array is initialized (and allocated, if it hasn't been already). The
cntx_t slot for BLIS_NAT is initialized immediately and those for other
induced method types are initialized and cached on-demand, as needed. At
cntx_t registration, we also store function pointers to cntx_init functions
that will initialize (a) "reference" contexts and (b) contexts for use with
induced methods. We don't cache the full contexts for reference contexts
since they are rarely needed. The functions that initialize these two kinds
of contexts are generated automatically for each targeted sub-configuration
from cpp-templatized code at compile-time. Induced method contexts that
need "stage" adjustments can still obtain them via functions in
bli_cntx_ind_stage.c.
- Added new functions and functionality to bli_cntx.c, such as for setting
the level-1f, level-1v, and packm kernels, and for converting a native
context into one for executing an induced method.
- Moved the checking of register/cache blocksize consistency from being cpp
macros in bli_kernel_macro_defs.h to being runtime checks defined in
bli_check.c and called from bli_gks_register_cntx() at the time that the
global kernel structure's internal context is initialized for a given
microarchitecture/configuration.
- Deprecated all of the old per-operation bli_*_cntx.c files and removed
the previous operation-level cntx_t_init()/_finalize() invocations.
Instead, we now query the gks for a suitable context, usually via
bli_gks_query_cntx().
- Deprecated support for the 3m2 and 3m3 induced methods. (They required
hackery that I was no longer willing to support.)
- Consolidated the 1e and 1r packm kernels for any given register blocksize
into a single kernel that will branch on the schema and support packing
to both formats.
- Added the cntx_t* argument to all packm kernel signatures.
- Deprecated the local function pointer array in all bli_packm_cxk*.c files
and instead obtain the packm kernel from the cntx_t.
- Added bli_calloc_intl(), which serves as the calloc-equivalent to to
bli_malloc_intl(). Useful when we wish to allocate and initialize to
zero/NULL.
- Converted existing cpp macro functions defined in bli_blksz.h, bli_func.h,
bli_cntx.h into static functions.

commit 4607aac297e55ad540cbe5fffbe02e6b1889c181
Author: Nisanth M P <nisanth.padinharepattamd.com>
Date: Mon Oct 16 22:06:57 2017 +0530

Thread Safety: Move bli_init() before and bli_finalize() after main()

BLIS provides APIs to initialize and finalize its global context.
One application thread can finalize BLIS, while other threads
in the application are stil using BLIS.

This issue can be solved by removing bli_finalize() from API.
One way to do this is by getting bli_finalize() to execute by default
after application exits from main().

GCC supports this behaviour with the help of __attribute__((destructor))
added to the function that need to be executed after main exits.

Similarly bli_init() can be made to run before application enters main()
so that application need not call it.

Change-Id: I7ce6cfa28b384e92c0bdf772f3baea373fd9feac

commit 0f5ce26fc597cda6e8ae93a7526f52eb8cba01e9
Author: Nisanth M P <nisanth.padinharepattamd.com>
Date: Mon Oct 16 21:07:50 2017 +0530

Thread safety: Make the global induced method status array local to thread

BLIS retains a global status array for induced methods, and provides
APIs to modify this state during runtime. So, one application thread
can modify the state, before another starts the corresponding
BLIS operation.

This patch solves this issue by making the induced method status array
local to threads.

Change-Id: Iff59b6f473771344054c010b4eda51b7aa4317fe

commit b882648af87deb1b365fc6b3e94151e69c5ccfa4
Merge: 8b379069 e02d3cb8
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Oct 11 16:32:21 2017 -0500

Merge branch 'master' into rt

commit 06e0e6351acb9481225975ad9a4e0b8925336621
Author: sthangar <Santanu.Thangarajamd.com>
Date: Thu Sep 28 12:15:36 2017 +0530

The inner loop paralleization is turned off by default, the JR and IR loop parameters are set to 1 by default

Change-Id: I8c3c2ecbbd636259f6ffb92768ec04148205c3e5

commit e02d3cb84190a345ebe9b32f53db03a1838976b1
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Sep 26 19:02:53 2017 -0500

Fixed a pthread typo in previous commit.

Details:
- Misnamed 'pthread_mutex_t' type in bli_memsys.c as 'thread_mutex_t'.

commit f5962a1aae0fb3c9be104d0035c0d73210e7f670
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Sep 26 17:00:04 2017 -0500

Fixed bugs in gemm/gemmtrsm ukr tests in testsuite.

Details:
- Fixed a bug in gemmtrsm test module that was due to improper partitioning
into a k x k triangular matrix for the purposes of obtaining an mr x k
micropanel of A with which to test.
- Fixed a bug in gemm and gemmtrsm test modules that would only manifest for
very large k (depending on the product of mr x kc on that architecture).
The bug arose from the fact that the test module was triggering the
allocation of blocks from the internal memory pools, which are limited in
size. This allocation imposes an implicit assumption that the micro-
panel being tested with will fit inside, and this assumption is violated
for large values of k. Arbitrarily large k may now be tested for both
operation tests.
- Added OpenMP/pthread critical sections around the setting or getting of
statuses from the induced method operation lookup table in bli_l3_ind.c.
- Added the 'static' keyword to all pthread_mutex_t global variables in BLIS.
- Thanks to Nisanth Padinharepatt of AMD for reporting the first and third
issues.

commit 8e917b256ca2d4bcdc059fe98d86be8775c69561
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Sep 9 14:10:15 2017 -0500

Updated bibtex info for BLIS5 (3m4m) article.

commit 7be887057358df4978a4833eeae0c17e15acd9d1
Author: Nisanth M P <nisanth.padinharepattamd.com>
Date: Mon Aug 28 17:38:22 2017 +0530

Merging "Adding auto hardware detection for Zen"

Change-Id: Id450fb0c4f91a5cd5cbdc06970f4f9ed28dd8520

commit e056d810d16621891ead032603de0c2105cfc0f7
Author: sthangar <Santanu.Thangarajamd.com>
Date: Mon Aug 28 16:44:42 2017 +0530

Bug fix for the testsuite build failing

Change-Id: I7cd8c9d187387c48b2564e45cbfb8df985e93d77

commit 83796b7caf745fafc263e9e5e1bfcf5eff00c025
Merge: 8176f4e4 d1ee7762
Author: Kiran Varaganti <Kiran.Varagantiamd.com>
Date: Mon Aug 28 05:23:28 2017 -0400

Merge "Adding auto hardware detection for Zen" into amd-staging

commit d1ee776202b26874333af7a91b6d2686342c4c81
Author: sthangar <Santanu.Thangarajamd.com>
Date: Wed Aug 23 13:01:14 2017 +0530

Adding auto hardware detection for Zen

Change-Id: I40ce6705dd66b35000c4ccddffad1c5b65998caf

commit 8176f4e43872714b997f1a5f83056daadb0ff1a5
Merge: 12413018 adafe974
Author: praveeng <praveen.gamd.com>
Date: Mon Aug 28 12:21:16 2017 +0530

resolving conflicts bli_gemm_front.c and LICENCE

Change-Id: Id24ce53896d4c1c7ceccc3e004014a0ecceb5474

commit 57e1e5cd51e7ffe8612c96a20b6a041b55426ddb
Merge: f86ce54d d6ef56c6
Author: Nisanth M P <nisanth.padinharepattamd.com>
Date: Tue Aug 22 17:07:44 2017 +0530

Merge AMD authored changes

commit adafe974b4bc3fc0663bc2f6f4ce2fde71a97988
Merge: f86ce54d 7dc78b49
Author: Devin Matthews <dmatthewsutexas.edu>
Date: Tue Aug 15 15:17:21 2017 -0500

Merge pull request 150 from devinamatthews/vzeroupper

Add vzeroupper to Intel AVX kernels.

commit 7dc78b49f97e6b3cd6d72fcdc588ace534d0e700
Author: Devin Matthews <dmatthewsutexas.edu>
Date: Tue Aug 15 10:02:25 2017 -0500

Add vzeroupper to Intel AVX kernels.

commit f86ce54d6f315006984534fe29e47a2deaacc9f5
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Aug 10 16:24:28 2017 -0500

Removed trailing enum commas from bli_type_defs.h.

Details:
- Removed trailing commas from enums in bli_type_defs.h. Thanks to
Erling Andersen for pointing out this inconsistency and suggesting
the change.

commit 60a1eeb2317939d732b9eb6ff1e0d6d668c9a1e5
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Aug 5 13:04:31 2017 -0500

Added edge handling to _determine_blocksize_b().

Details:
- Added explicit handling of situations where i == dim to
bli_determine_blocksize_b_sub(). This isn't actually needed by any
current use case within BLIS, but handling the situation is nonetheless
prudent. Thanks to Minh Quan for reporting this issue and requesting
the fix.

commit b01c80829907d50ec79977fba8e7b53cfe7db80a
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Aug 4 14:17:44 2017 -0500

Fixed a minor bug in level-3 packm management.

Details:
- Fixed a bug in bli_l3_packm() that caused cntl_t-cached packed mem_t
entries to be released and then re-acquired unnecessarily. (In essence,
the "<" operands in the conditional that guards the
release-and-reacquire code block simply needed to be swapped.) The bug
should have only affected performance (rather than the computed result).
Thanks to Minh Quan for identifying and reporting the bug.

commit 8b379069fcd4811669855b1248ece831f190dff6
Merge: 1f3a5819 05925dd5
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Aug 1 15:30:40 2017 -0500

Merge branch 'master' into rt

commit 05925dd5d30e8f403bb671ce33029170d65ce7c0
Merge: 803bbef0 cecdc05d
Author: Devin Matthews <dmatthewsutexas.edu>
Date: Tue Aug 1 09:31:02 2017 -0500

Merge pull request 146 from devinamatthews/master

Change lsame_ signature to match lapacke.

commit cecdc05d2834786a84ff85775d3f99a958c0765a
Author: Devin Matthews <dmatthewsutexas.edu>
Date: Mon Jul 31 15:19:51 2017 -0500

Change lsame_ signature to match lapacke.

commit 803bbef0a386dd0571ad389f69d55154dbfe3c50
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Jul 29 20:17:05 2017 -0500

Fixed pthreads compile bug with previous commit.

Details:
- Erroneously passed family parameter into l3int_t function despite
that function not taking the parameter. Oops.

commit c63980f4ca750618f359031d0691289b1abf5146
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Jul 29 14:53:39 2017 -0500

Moved 'family' field from cntx_t to cntl_t.

Details:
- Removed the family field inside the cntx_t struct and re-added it to the
cntl_t struct. Updated all accessor functions/macros accordingly, as well
as all consumers and intermediaries of the family parameter (such as
bli_l3_thread_decorator(), bli_l3_direct(), and bli_l3_prune_*()). This
change was motivated by the desire to keep the context limited, as much
as possible, to information about the computing environment. (The family
field, by contrast, is a descriptor about the operation being executed.)
- Added additional functions to bli_blksz_*() API.
- Added additional functions to bli_cntx_*() API.
- Minor updates to bli_func.c, bli_mbool.c.
- Removed 'obj' from bli_blksz_*() API names.
- Removed 'obj' from bli_cntx_*() API names.
- Removed 'obj' from bli_cntl_*(), bli_*_cntl_*() API names. Renamed routines
that operate only on a single struct to contain the "_node" suffix to
differentiate with those routines that operate on the entire tree.
- Added enums for packm and unpackm kernels to bli_type_defs.h.
- Removed BLIS_1F and BLIS_VF from bszid_t definition in bli_type_defs.h.
They weren't being used and probably never will be.

commit 07837395560d413a1ba828163b41186e21a7bcfe
Merge: ca1d1d85 ad8610b4
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Jul 21 16:49:48 2017 -0500

Merge pull request 139 from Maratyszcza/emscripten

Fix Emscripten builds

commit ad8610b4415cc7982804d74f9aba29875e9e2b6c
Merge: 8772a0b3 ca1d1d85
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Jul 21 15:18:33 2017 -0500

Merge branch 'master' into emscripten

commit ca1d1d8560c9ab1a7e3b0ac43ac70d08075bf904
Merge: b537b5bb 733faf84
Author: Devin Matthews <dmatthewsutexas.edu>
Date: Fri Jul 21 09:49:50 2017 -0500

Merge pull request 144 from devinamatthews/fix_atomics_on_bgq

Add fallbacks to __sync_* or __c11_atomic_* builtins...

commit 733faf848dcc54834fcdfbb0185dc644978d8864
Author: Devin Matthews <dmatthewsutexas.edu>
Date: Thu Jul 20 14:50:13 2017 -0500

Clang can't make up it's mind what to support.

commit 7425d0744d9e9cd29a887120e57c2b43ba287040
Author: Devin Matthews <dmatthewsutexas.edu>
Date: Thu Jul 20 12:54:58 2017 -0500

Add default define for __has_extension.

commit b537b5bbe8cbee459a85bac11458498ae2bce4de
Merge: 1f1ec0db 7f41bb0a
Author: Devin Matthews <dmatthewsutexas.edu>
Date: Thu Jul 20 10:58:39 2017 -0500

Merge pull request 133 from devinamatthews/haswell-packdim

Fix prefetching in haswell ukernel

commit 8823f91a14638ce6f4e45e67df03212bb61609d6
Author: Devin Matthews <dmatthewsutexas.edu>
Date: Thu Jul 20 10:04:34 2017 -0500

Add fallbacks to __sync_* or __c11_atomic_* builtins when __atomic_* is not supported. Fixes 143.

commit 1f1ec0db9380b87679d5c771c4594daa1cfc5f0d
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Jul 19 15:40:48 2017 -0500

Updated ar option list used by all configurations.

Details:
- Dropped 'u' from the list of modifiers passed into the library archiver
ar. Previously, "cru" was used, while now we employ only "cr". This
change was prompted by a warning observed on Ubuntu 16.04:

ar: `u' modifier ignored since `D' is the default (see `U')

This caused me to realize that the default mode causes timestamps to be
zero, and thus the 'u' option, which causes only changed object files to
be inserted, is not applicable.

commit 5caaba2d61cbbc36d63102a0786ece28ff797f72
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Jul 19 13:51:53 2017 -0500

Added --force-version=STRING option to configure.

Details:
- Added an option to configure that allows the user to force an arbitrary
version string at configure-time. The help text also now describes the
usage information.
- Changed the way the version string is communicated to the Makefile.
Previously, it was read into the VERSION variable from the 'version' file
via $(shell cat ...). Now, the VERSION variable is instead set in
config.mk (via a configure-substituted anchor from config.mk.in).

commit 13175c5fb70fb6a378d5fff6ecede62e5ea6a1f6
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Jul 18 17:56:00 2017 -0500

Updated openmp/pthread barriers with GNU atomics.

Details:
- Updated the non-tree openmp and pthreads barriers defined in
bli_thrcomm_openmp.c and bli_thrcomm_pthreads.c to instead call a common
implementation in bli_thrcomm.c, bli_thrcomm_barrier_atomic(). This new
implementation goes through the same motions as the previous codes, but
protects its loads and increments with GNU atomic built-ins. These atomic
statements take memory ordering parameters that allow us to specify just
enough constraints for the barrier to work as intended on weakly-ordered
hardware. The prior implementation was only guaranteed to work on systems
with strongly- ordered memory. (Thanks to Devin Matthews for suggesting
this change and his crash-course in atomics and memory ordering.)
- Removed 'volatile' from structs' barrier field declarations in
bli_thrcomm_*.h.
- Updated bli_thrcomm_pthread.? files to use renamed struct barrier fields
consistent with that of the _openmp.? files.
- Updated other bli_thrcomm_* files to rename "communicator" variables to
simply "comm".

commit 0e58ba1b3aa84700ca51a96f1c0eed6067562fba
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Jul 17 19:03:22 2017 -0500

Added API to set mt environment variables.

Details:
- Renamed bli_env_get_nway() -> bli_thread_get_env().
- Added bli_thread_set_env() to allow setting environment variables
pertaining to multithreading, such as BLIS_JC_NT or BLIS_NUM_THREADS.
- Added the following convenience wrapper routines:
bli_thread_get_jc_nt()
bli_thread_get_ic_nt()
bli_thread_get_jr_nt()
bli_thread_get_ir_nt()
bli_thread_get_num_threads()
bli_thread_set_jc_nt()
bli_thread_set_ic_nt()
bli_thread_set_jr_nt()
bli_thread_set_ir_nt()
bli_thread_set_num_threads()
- Added include "errno.h" to bli_system.h.
- This commit addresses issue 140.
- Thanks to Chris Goodyer for inspiring these updates.

commit 8772a0b33a90154c80d88b381dcdd66f824e041f
Author: Marat Dukhan <maratfb.com>
Date: Thu Jul 13 21:39:24 2017 -0700

Fix Emscripten builds

commit 72c8b49bb8d3b9370b2cc37718da22f065de9c57
Merge: 70cc825b ba7cada5
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Jul 12 14:58:12 2017 -0500

Merge pull request 138 from hominhquan/membrk_set_free_fp

Set missing free_fp in bli_membrk_init for free-ing GEN_USE buffers

commit ba7cada51a238d320528e3504ed0f0a17a6b022a
Author: Minh Quan HO <mqhokalray.eu>
Date: Fri Jul 7 10:52:05 2017 +0200

set missing free_fp in bli_membrk_init for free-ing GEN_USE buffers

The membrk's free_fp is called when releasing GEN_USE buffers, but this free_fp is
not set in bli_membrk_init

commit 1241301869957c96f16a2c6567e3ad70afa547de
Merge: 969b67e8 25ead66f
Author: Kiran Varaganti <Kiran.Varagantiamd.com>
Date: Wed Jul 5 02:24:00 2017 -0400

Merge "Reducing the framework overhead of GEMV routines" into amd-staging

commit 25ead66fb78557f73af48bac305724d5d8aa3309
Author: sthangar <Santanu.Thangarajamd.com>
Date: Fri Jun 30 12:23:19 2017 +0530

Reducing the framework overhead of GEMV routines

Change-Id: I83607ad767bff74e305e915b54b0ea34ec3e5684

commit 969b67e8800fbd5d14a086606f3b5afbf66ed093
Author: Kiran Varaganti <Kiran.Varagantiamd.com>
Date: Tue Jul 4 12:57:32 2017 +0530

Improved efficiency of dGEMM for large matrices by reducing TLB load misses and majorly L3 cache misses. This is achieved by changing the packed block sizes of matrix A & B. Now the optimum values are MC_D = 510 and KC_D = 1024.

Change-Id: I2d8bdd5f62f2d1f8782ae2997f3d7a26587d1ca4

commit 70cc825b552dec05165b9d70f9e6eb33d8abb118
Author: Devin Matthews <dmatthewsutexas.edu>
Date: Tue Jun 6 21:58:21 2017 -0500

Update LICENSE

Remove totally unnecessary first 9 lines and hopefully get Github to recognize it as 3BSD [ci skip].

commit cf54c77bc79a0f33a514be72c80a654c4e6e6f63
Author: Devin Matthews <dmatthewsutexas.edu>
Date: Tue Jun 6 20:23:17 2017 -0500

Add new SSI acknowledgment

commit d6ef56c6dbaf6df8ee1af1ca6a0f0792a811396a
Author: prangana <pradeep.raoamd.com>
Date: Thu Jun 1 16:11:09 2017 +0530

Update version number

Change-Id: Ib6e52d1d34c0791367ab9152dfab31f94deedeb4

commit 897bfa0e92082c30bbb74229562d7d7327cbbac8
Author: prangana <pradeep.raoamd.com>
Date: Thu Jun 1 16:11:09 2017 +0530

Update version number

Change-Id: Ib6e52d1d34c0791367ab9152dfab31f94deedeb4

commit 99d0ba5606d4b63e6a9c639aa78d4defc2455f79
Merge: be2c7eb8 6d17e012
Author: Santanu Thangaraj <Santanu.Thangarajamd.com>
Date: Thu Jun 1 02:19:02 2017 -0400

Merge "Checked in the small matrix code to compute GEMM called with A transpose case" into amd-staging

commit 6d17e0120fe5c127b941136ad2c0c08e91439535
Author: sthangar <Santanu.Thangarajamd.com>
Date: Wed May 24 11:48:16 2017 +0530

Checked in the small matrix code to compute GEMM called with A transpose case

Change-Id: I29f40046d43d7a4b037c1cb322503ee26495f462

commit 9d93f8481a1404695f7b78a3ced8ca47e890b649
Author: prangana <pradeep.raoamd.com>
Date: Tue May 30 09:58:10 2017 +0530

Update Licence File

Change-Id: I4c5cf1690d0cef92a68400f9a89e454ab6856ad2

commit be2c7eb85168937bd4318f4d05ded37620119310
Author: prangana <pradeep.raoamd.com>
Date: Tue May 30 09:58:10 2017 +0530

Update Licence File

Change-Id: I4c5cf1690d0cef92a68400f9a89e454ab6856ad2

commit 7f41bb0a0becde6a7de7df0f99668d7b4686c3b0
Author: Devin Matthews <dmatthewsutexas.edu>
Date: Fri May 26 14:49:31 2017 -0400

PACKDIM_MR=8 didn't work out, but messing with the prefetching helps 2%.

commit d87614af3f3d9187be94d6e77984b282bf890928
Author: Devin Matthews <dmatthewsgator3.ufhpc>
Date: Fri May 26 14:47:36 2017 -0400

Revert "Change PACKDIM_MR (double) for haswell to 8."

This reverts commit 681eec913d7c2ebcff637cec5c1627ced9a92b99.

commit 681eec913d7c2ebcff637cec5c1627ced9a92b99
Author: Devin Matthews <dmatthewsutexas.edu>
Date: Fri May 26 12:28:09 2017 -0500

Change PACKDIM_MR (double) for haswell to 8.

commit 0a3ae0ecaa0ddcb5887005d7051fa234499f1120
Merge: 0f4e6652 6e04f9df
Author: praveeng <praveen.gamd.com>
Date: Sat May 20 16:53:50 2017 +0530

frame/3/gemm/bli_gemm_front.c

Change-Id: I52a0fbc1d33bb948d430942323bbc5fe44e3ca13

commit 6e04f9df01d79c1b0e673943ca0d5d0a6095eb2e
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed May 17 13:03:52 2017 -0500

Restored deleted lines from makefile fragments.

commit ec5c0c0448275280dca0991f6f33afeb73650450
Author: Devin Matthews <dmatthewsutexas.edu>
Date: Wed May 17 12:29:44 2017 -0500

Change to /bin/sh.

All scripts checked with Debian's checkbashisms. Also check for clang first in auto-detect.sh.

commit 555ddc30d4c7e44f3f335e436c98606f56e1598b
Author: Devin Matthews <dmatthewsutexas.edu>
Date: Wed May 17 12:27:14 2017 -0500

Remove shebangs from makefiles.

commit f26bd7f42e0c2a47fe321b2c452644990b689654
Merge: cbf8710a 169fb05f
Author: Devin Matthews <dmatthewsutexas.edu>
Date: Wed May 17 11:58:41 2017 -0500

Merge pull request 128 from iotamudelta/master

Portability and clang

commit 169fb05f225c2f060265bcaa872f7f80dc638b70
Author: J M Dieterich <dieterichogolem.org>
Date: Tue May 16 23:11:22 2017 -0400

Fix if/else structure. Thanks to TravisCI.

commit 0579dfea0bcfbb90ebc073fcf78b92a5cf7238e1
Author: J M Dieterich <dieterichogolem.org>
Date: Tue May 16 22:58:07 2017 -0400

Restore version.

commit a75b05c23dc786a1fdc45dc1627a5ce2299f1a7b
Author: J M Dieterich <dieterichogolem.org>
Date: Tue May 16 22:23:27 2017 -0400

Mark piledriver compilable w/ clang.

commit 7541d46e2ba8659bb2e36b444edef112fefa1345
Author: J M Dieterich <dieterichogolem.org>
Date: Tue May 16 22:12:12 2017 -0400

Mark bulldozer compilable w/ clang.

commit 91f897073ec0df3330ede449c4d6af8158266ae3
Author: J M Dieterich <dieterichogolem.org>
Date: Tue May 16 22:06:59 2017 -0400

Correct error message.

commit f5131e1e49167f948bddd714bb1af1761829c212
Author: J M Dieterich <dieterichogolem.org>
Date: Tue May 16 22:03:23 2017 -0400

Indeed once can compile for carrizo also using clang.

commit 5fa4e9439c04f35f89dd7d26ff742cb2dadc3180
Author: J M Dieterich <dieterichogolem.org>
Date: Tue May 16 21:50:49 2017 -0400

A bunch of shebang fixes from unportable /bin/bash to portable /usr/bin/env bash

commit 1f3a58197e5d5f9ac862bda91e7527cbfbab5d76
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon May 8 16:10:03 2017 -0500

Housekeeping, induced method file/function renames.

Details:
- Renamed all level-3 induced method files to use the "_vir.c" suffix
instead of "_ref.c". Also renamed functions within these files
accordingly.
- Renamed cpp macro definitions in frame/ind/include according to the
above changes.
- Removed frame/3/old.

commit cbf8710a1ba63e25aadaa6fc5da51ea81b3d596d
Merge: cf39d3ef fdc66f12
Author: Tyler Michael Smith <tmscs.utexas.edu>
Date: Mon May 8 11:21:20 2017 -0500

Merge pull request 127 from devinamatthews/fix_blis_nt_xx

Setting any one of BLIS_NT_[IJ][CR] overrides BLIS_NUM_THEADS

commit cf39d3ef3b29b8058c39fb4638c1a734fe64aaed
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri May 5 15:06:56 2017 -0500

Fixed a bug in norm1v, norm1m.

Details:
- Fixed a bug that manifested as improperly-computed 1-norm for vectors
and matrices. This is one of the few operations in BLIS that does not
have its own test module within the testsuite, hence why it went
undetected for so long. The bad 1-norms were being used to normalize
matrices in the testsuite after initialization, which led to some
matrices containing a combination of "large" and "small" values. This
tended to push the residuals computed after each test away from zero.
In some cases, they were off *just* enough to the testsuite to label
it a "failure". Many thanks to Jeff Hammond for reporting this bug.
(Wonky details: the bug was due to improperly-defined level-0 scalar
macros for abval2, an operation that computes the absolute square,
or complex magnitude/modulus. Certain complex domain instances of
abval2 were being incorrectly defined in terms of real-only solutions,
leading to bad results. This level-0 operation forms the basis of
norm1v/norm1m. absq2 was also affected, but almost nothing uses
this operation.)

commit 799485124f4d823e908d2e5d38b0c3a1e6172ade
Merge: 773a24ef 0df3541f
Author: Devin Matthews <dmatthewsutexas.edu>
Date: Thu May 4 10:52:09 2017 -0500

Merge pull request 121 from jeffhammond/not-real-knl

allow KNL build without hbwmalloc (i.e. emulated)

commit fdc66f12d40754ff46179804bff592fddafbca02
Author: Devin Matthews <dmatthewsutexas.edu>
Date: Thu May 4 10:35:22 2017 -0500

Setting any one of BLIS_NT_[IJ][CR] overrides BLIS_NUM_THEADS. Missing BLIS_NT_XX's are defaulted to 1. Fixes 123.

commit 773a24efb2fa1c3a220bf0ce1dd621a3176196da
Merge: dd58c954 b8854259
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed May 3 15:07:59 2017 -0500

Merge branch 'master' of github.com:flame/blis

commit dd58c9545c877c3f7553eaebca7b5e9720a66f5d
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed May 3 15:04:51 2017 -0500

Disable complex 3m/4m in testsuite by default.

Details:
- Disabled testsuite tests of all level-3 implementations based on 3m
and 4m. This will improve testing runtime on Travis CI as well as for
anyone manually running the testsuite using default test parameters.
Thanks to Devin Matthews for suggesting this change.

commit 0df3541f54b7fe0c604ab2ec47ba814f12391798
Author: Jeff Hammond <jeff.sciencegmail.com>
Date: Tue May 2 19:25:21 2017 -0700

allow KNL build without hbwmalloc.h (i.e. emulated)

we want to be able to run BLIS KNL binaries on non-KNL machines via SDE.
although it is possible to install hbwmalloc implementation on such
systems, it is easier not to, since obviously the performance of SDE
execution is not representative so there is no reason to emulate HBW
allocation.

commit b88542591d4dd0cde366e5ae35afd3205cb81bdc
Merge: 43007f7b c2c91e09
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue May 2 19:22:41 2017 -0500

Merge pull request 107 from jeffhammond/intel-compilers-no-use-libm

never use libm with Intel compilers

commit 43007f7b65ec7926cbbfc39965ff733fa251c15f
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue May 2 16:48:43 2017 -0500

Fixed stray parentheses in README citations.

commit a4f1d0b8801c114e9ef8be39df01e1b8d27ebcb3
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue May 2 16:38:43 2017 -0500

CHANGELOG update (0.2.2)

0.2.2

Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue May 2 16:38:42 2017 -0500

Version file update (0.2.2)

commit d5a5e003ea9b24bb6abf12e88862e8eb61ffb03d
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue May 2 15:48:30 2017 -0500

Fixed a trsm1m bug that affected right-side cases.

Details:
- Fixed a bug introduced in 1c732d3 that affected trsm1m_r. The result
was nondeterministic behavior (usually segmentation faults) for certain
problem sizes beyond the 1m instance of kc (e.g. 128 on haswell). The
cause of the bug was my commenting out lines in bli_gemm1m_ukr_ref.c
which explicitly directed the virtual gemm micro-kernel to use temporary
space if the storage preference of the [real domain] gemm ukernel did
not match the storage of the output matrix C. In the context of gemm,
this handling is not needed because agreement between the storage pref
and the matrix is guaranteed by a high-level optimization in BLIS.
However, this optimization is not applied to trsm because the storage
of C is not necessarily the same as the storage of the micro-panels of
B--both of which are updated by the micro-kernel during a trsm
operation. Thus, the guarantee of storage/preference agreement is not
in place for trsm, which means we must handle that case within the
virtual gemm micro-kernel.
- Comment updates and a minor macro change to bli_trsm*_cntx_init() for
3m1, 4m1a, and 1m.

commit e80993e71f4d571e9650a8e90ed386e32059eae5
Merge: a509fbd5 ca3a7924
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue May 2 12:30:28 2017 -0500

Merge branch 'master' into 1m

commit ca3a7924770d6cf203cce4ca9f5482e1d0d4e961
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue May 2 12:09:39 2017 -0500

README.md update.

Details:
- Updated bibtex entries for 4th BLIS paper, and adds entries for 5th
and 6th BLIS papers.

commit 0f4e6652dfe9b30105d3bab328ac26d9d5c11182
Merge: 42e7f6fb 6e7de6ef
Author: praveeng <praveen.gamd.com>
Date: Wed Apr 19 17:54:10 2017 +0530

Merge master code till 2017_04_19 to amd-staging

Change-Id: Ibebe83c8ea2e7eb15798c2bcf214b7228a1c9518

commit 42e7f6fb2a531429ee600b2fe0293b67371c7ccb
Author: sthangar <Santanu.Thangarajamd.com>
Date: Tue Mar 28 18:10:03 2017 +0530

fixed license attribute issues in AMD added files

Change-Id: I303f870a777c7cd1c1af29ea0b93f3e0a27948e4

commit 5600001e973c6cea048bd3fdb28117f1d7c98b9d
Merge: 0b190293 b3ed4933
Author: prangana <pradeep.raoamd.com>
Date: Mon Mar 20 13:56:33 2017 +0530

Fix merge conflicts after sync with release branch

Change-Id: Icf14a09f728befb69a73fff9fa79c4128e728310

commit 6e7de6ef84babb273dc5528a9b9d01f0febe394b
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Mar 17 12:10:24 2017 -0500

Minor updates to test/3m4m.

Details:
- Updated initial problem size and increment in Makefile.
- Updated code in test_gemm.c to correctly query kc from context.

commit f484c6cd4389dc7ae5b972849e12e98ad5bbf9a4
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Mar 17 12:07:27 2017 -0500

Whitespace reformatting to armv8a kernels file.

Details:
- Updated formatting of function signature/header in
kernels/armv8a/3/bli_gemm_opt_4x4.c.

commit 0b19029342ffc530fa22ef20398a26221cb8f6ec
Author: Kiran Varaganti <Kiran.Varagantiamd.com>
Date: Tue Mar 14 14:51:31 2017 +0530

Code cleanup, removed warnings from trsm, removed unused routines in axpyv & scalv

Change-Id: I02867f394c5f416194c4b1769a6c75f39243ec81

commit 825363bd2a5a60a923d4a6d9691dc143845a9cab
Merge: 093bdb80 513944e4
Author: praveeng <praveen.gamd.com>
Date: Wed Mar 8 15:42:49 2017 +0530

Merge code from master to amd-staging as on 2017_03_08 by praveeng

Change-Id: I80740081b2cb54c9b77a3e78b9fe540e170be23d

commit 093bdb80c86b06367e595aa17487139ae983822f
Author: sthangar <Santanu.Thangarajamd.com>
Date: Tue Mar 7 13:35:50 2017 +0530

Checked in Unpacked DGEMM code

Change-Id: I39dcc7b238b328f73ee2675d21a5e521d0488723

commit 33923da9a108854590d386e74b6ee66b971e7796
Author: Kiran Varaganti <Kiran.Varagantiamd.com>
Date: Mon Mar 6 14:31:31 2017 +0530

Added variant 10 for double precision axpyv microkernel

Change-Id: I7a20cc113a422603250bc450825c965136354974

commit bc828f7f8e3ddb9f58af07edc0b935b21759fb0f
Author: Kiran Varaganti <Kiran.Varagantiamd.com>
Date: Fri Mar 3 14:45:35 2017 +0530

Added new axpyv (single precision) microkernel where it performs 10 FMAs per loop- This gives better performance than all other implementations of axpyv

Change-Id: Ic4f0e4c67e367d67d0b24febcf34f81a70a39972

commit c9949f4603419267c10973adf1d63ec38497475d
Author: sthangar <Santanu.Thangarajamd.com>
Date: Fri Feb 17 14:16:33 2017 +0530

Checked in DGEMMTRSM and edge case handling routine in DDOTXF

Change-Id: I65f00661af6c09b2507294fd43e0a10641c0597e

commit a509fbd5ac04fafd4e51b43d2f59ca56432dc212
Merge: 69b4846a 513944e4
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Feb 21 17:06:16 2017 -0600

Merge branch 'master' into 1m

commit 69b4846ae9adb157c4171b52e159684db2867853
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Feb 21 15:33:39 2017 -0600

Disabled experiment-related 1m code.

Details:
- Commented out code in frame/ind/oapi/bli_l3_3m4m1m_oapi.c that was
specifically inserted to facilitate the benchmarking of 1m block-panel
and panel-block algorithms.
- Updates to test/3m4m/Makefile, runme.sh script, and test_gemm.c to
reflect changes used/needed during benchmarking.

commit 513944e4a951d8823b4de161b86ad7a965b4d99b
Merge: 8b462a0e 0e18f68c
Author: Devin Matthews <dmatthewsutexas.edu>
Date: Mon Feb 20 10:04:33 2017 -0500

Merge pull request 118 from devinamatthews/master

Handle k=0 correctly in KNL dgemm ukernel.

commit 0e18f68cf12eb9189ba901a20040b1cdae417670
Author: Devin Matthews <dmatthewsutexas.edu>
Date: Mon Feb 20 09:03:21 2017 -0600

Handle k=0 correctly in KNL dgemm ukernel.

commit 8b462a0e8c3e9252f0401940849e53cc772256fa
Merge: c362afc5 7d42fc07
Author: Devin Matthews <dmatthewsutexas.edu>
Date: Sun Feb 19 23:03:03 2017 -0500

Merge pull request 117 from devinamatthews/master

Cast dim_t and inc_t parameters to 64-bit in KNL microkernels.

commit 7d42fc0796ef0c010375fd8e59b1240ba41ce4d2
Author: Devin Matthews <dmatthewsutexas.edu>
Date: Sun Feb 19 21:10:55 2017 -0500

Cast dim_t and inc_t parameters to 64-bit in KNL microkernels.

commit 04245c9ff7f8b3c70d61003029c964bb9a4320ee
Author: Kiran Varaganti <Kiran.Varagantiamd.com>
Date: Fri Feb 10 14:24:30 2017 +0530

Reoptimized scalv routines - two vector multiplies are done per iteration, and these routines are enabled in bli_kernel.h

Change-Id: Ic5654508573d1f6bde2edef06aefe117e581feb5

commit c362afc525bab4050581d1b0fcea2fe4d582c608
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Feb 9 11:54:59 2017 -0600

Added missing "level-0" BLAS [sd]cabs1_().

Details:
- Fixed issue 115 by adding implementations for scabs1_() and dcabs1_()
to the BLAS compatibility layer. Thanks to heroxbd for pointing out
their absence.

commit 018180c938c32efbeaaf626ba71ec5b780664db1
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Feb 8 11:20:52 2017 -0600

Fixed a minor bug in configure (issue 114).

Details:
- Fixed a bug in the configure script whereby a non-preferred value for
--enable-threading would cause problems in common.mk vis-a-vis detecting
which threading model was chosen. Thanks to heroxbd for reporting this
issue.

commit 58b5b77e5fdb179ea465e398e416e6a00d917e05
Author: Kiran Varaganti <Kiran.Varagantiamd.com>
Date: Wed Feb 8 21:43:34 2017 +0530

Fixed a bug in axpyv, the arguments passed to intrinsic fmad instruction are corrected

Change-Id: If12f24c6bc74b22ac9e4acd6b9378e06d79f2f5e

commit 85de4ebf74d0a5587d5a12724eb5489d51674db3
Author: Kiran Varaganti <Kiran.Varagantiamd.com>
Date: Wed Feb 8 14:41:04 2017 +0530

variant 4 axpyv single precision modified: explicitly used FMA intrinsics, replaced vector multiply and add operations

Change-Id: I975feef56696d479d2b9e9441b0660021cf4f6ff

commit 3fa53e8af31d634779f40258c51483ae8af494fa
Merge: b5291a44 95be7b04
Author: Kiran Varaganti <Kiran.Varagantiamd.com>
Date: Wed Feb 8 11:46:34 2017 +0530

Merged axpyv and gemm small in bli_kernel.h
Merge branch 'amd-staging' of ssh://git.amd.com:29418/cpulibraries/er/blis into amd-staging

modified: config/zen/bli_kernel.h
modified: frame/3/gemm/bli_gemm_front.c
modified: kernels/x86_64/zen/3/bli_gemm_small_matrix.c

Change-Id: If181cf9345178c448b3530beb8bef453917fe295

commit 95be7b04709e688a4cb01fba680081e30f4258ef
Author: sthangar <Santanu.Thangarajamd.com>
Date: Tue Feb 7 14:01:27 2017 +0530

Added logic for packing matrix A and prefetching matrix C in Unpacked SGEMM code

Change-Id: I99efeca9eb5b4449286ec0ec133fd554ef1bb4f0

commit b5291a445b1313e01f1e0e8102c5f3660ab07f69
Author: Kiran Varaganti <Kiran.Varagantiamd.com>
Date: Tue Feb 7 12:39:31 2017 +0530

Added optimization variant 4 for axpyv single precision - this performs 5 FMA per loop, keeping the IPC always full

Change-Id: Ie77ed22584271136a257e673bcd3b1ba71136bc9

commit f4bfc1662af82aa4b98185334c44835e51f1cbec
Author: Kiran Varaganti <Kiran.Varagantiamd.com>
Date: Mon Feb 6 15:04:27 2017 +0530

New routines implemented for axpyv to improve performance for small vector sizes, vectorization is done for vectors as small as 8 (single precision) 4(double precision), since this operation has low compute to memory ratio, higher matrix sizes memory operations are dominating and hence not much gain - This still needs some work- added saxpyv and daxpyv var 3 routines in the file bli_axpyv_opt_var1.c

Change-Id: Ic1b33bd5516e10113b00e44ab41b97eb19d46072

commit ddf45e71770c55ea4a58ca24ea4913fe5d8beb9b
Merge: a6ab91bc 78e1b16e
Author: Devin Matthews <dmatthewsutexas.edu>
Date: Fri Jan 27 14:25:40 2017 -0600

Merge pull request 113 from devinamatthews/knl_thread_params

Change default threading parameters for KNL.

commit 78e1b16e16d589ed31b2e712115ee282097f114d
Author: Devin Matthews <dmatthewsutexas.edu>
Date: Fri Jan 27 14:22:20 2017 -0600

Change default threading parameters for KNL.

commit 574472ba5a89924eca7dbd10055d0e1dcd7f4c71
Author: sthangar <Santanu.Thangarajamd.com>
Date: Tue Jan 10 14:51:46 2017 +0530

checked in unpacked SGEMM optimization

Change-Id: I8e4ea374415c0c402c660b656fb076af15354181

commit 1c732d3ddc4ac0861d3b0e0dd15eb7e071615502
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Jan 25 16:25:46 2017 -0600

Added 1m-specific APIs for bp, pb gemm algorithms.

Details:
- Defined bli_gemmbp_cntl_create(), bli_gemmpb_cntl_create(), with the
body of bli_gemm_cntl_create() replaced with a call to the former.
- Defined bli_cntl_free_w_thrinfo(), bli_cntl_free_wo_thrinfo(). Now,
bli_cntl_free() can check if the thread parameter is NULL, and if so,
call the latter, and otherwise call the former.
- Defined bli_gemm1mbp_cntx_init(), bli_gemm1mpb_cntx_init(), both in
terms of bli_gemm1mxx_cntx_init(), which behaves the same as
bli_gemm1m_cntx_init() did before, except that an extra bool parameter
(is_pb) is used to support both bp and pb algorithms (including to
support the anti-preference field described below).
- Added support for "anti-preference" in context. The anti_pref field,
when true, will toggle the boolean return value of routines such as
bli_cntx_l3_ukr_eff_prefers_storage_of(), which has the net effect of
causing BLIS to transpose the operation to achieve disagreement (rather
than agreement) between the storage of C and the micro-kernel output
preference. This disagreement is needed for panel-block implementations,
since they induce a transposition of the suboperation immediately before
the macro-kernel is called, which changes the apparent storage of C. For
now, anti-preference is used only with the pb algorithm for 1m (and not
with any other non-1m implementation).
- Defined new functions,
bli_cntx_l3_ukr_eff_prefers_storage_of()
bli_cntx_l3_ukr_eff_dislikes_storage_of()
bli_cntx_l3_nat_ukr_eff_prefers_storage_of()
bli_cntx_l3_nat_ukr_eff_dislikes_storage_of()
which are identical to their non-"eff" (effectively) counterparts except
that they take the anti-preference field of the context into account.
- Explicitly initialize the anti-pref field to FALSE in
bli_gks_cntx_set_l3_nat_ukr_prefs().
- Added bli_gemm_ker_var1.c, which implements a panel-block macro-kernel
in terms of the existing block-panel macro-kernel _ker_var2(). This
technique requires inducing transposes on all operands and swapping
the A and B.
- Changed bli_obj_induce_trans() macro so that pack-related fields are
also changed to reflect the induced transposition.
- Added a temporary hack to bli_l3_3m4m1m_oapi.c that allows us to easily
specify the 1m algorithm (block-panel or panel-block).
- Renamed the following cntx_t-related macros:
bli_cntx_get_pack_schema_a() -> bli_cntx_get_pack_schema_a_block()
bli_cntx_get_pack_schema_b() -> bli_cntx_get_pack_schema_b_panel()
bli_cntx_get_pack_schema_c() -> bli_cntx_get_pack_schema_c_panel()
and updated all instantiations. Also updated the field names in the
cntx_t struct.
- Comment updates.

commit 41595e98eedaf3f1f93802c14dcae490402f933f
Merge: d625c49e a6ab91bc
Author: praveeng <praveen.gamd.com>
Date: Wed Dec 7 15:13:21 2016 +0530

Merge master code as on 2016_12_07 to amd-staging

Change-Id: I5d9ecef9bff960aeb9b51ca4e4b21714e789e44f

commit d625c49e20bd3c50d6d44e330e34076cced114a3
Author: sthangar <Santanu.Thangarajamd.com>
Date: Tue Nov 29 15:05:19 2016 +0530

checked-in SGEMMTRSM microkernel for Zen

Change-Id: Ib61936418dea911b2154aa99f703b66e9669f94f

commit a6ab91bc61432490fadf18d596de4589645f37dd
Merge: 145a551d 7f31a630
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Nov 30 09:26:58 2016 -0600

Merge pull request 111 from figual/master

Fixed missing cntx argument in ARMv8 microkernels.

commit 7f31a6307b7bd35f913c895947552c3a176f789b
Author: Francisco Igual <figualucm.es>
Date: Sun Nov 27 14:40:47 2016 +0100

Fixed missing cntx argument in ARMv8 microkernels.

commit 126482a3b609b9ad7026ba348f6c4bf6a29be8a1
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Nov 25 18:29:49 2016 -0600

Implemented the 1m method.

Details:
- Implemented the 1m method for inducing complex domain matrix
multiplication. 1m support has been added to all level-3 operations,
including trsm, and is now the default induced method when native
complex domain gemm microkernels are omitted from the configuration.
- Updated _cntx_init() operations to take a datatype parameter. This was
needed for the corresponding function for 1m (because 1m requires us
to choose between column-oriented or row-oriented execution, which
requires us to query the context for the storage preference of the
gemm microkernel, which requires knowing the datatype) but I decided
that it made sense for consistency to add the parameter to all other
cntx initialization functions as well, even though those functions
don't use the parameter.
- Updated bli_cntx_set_blkszs() and bli_gks_cntx_set_blkszs() to take
a second scalar for each blocksize entry. The semantic meaning of the
two scalars now is that the first will scale the default blocksize
while the second will scale the maximum blocksize. This allows scaling
the two independently, and was needed to support 1m, which requires
scaling for a register blocksize but not the register storage
blocksize (ie: "packdim") analogue.
- Deprecated bli_blksz_reduce_dt_to() and defined two new functions,
bli_blksz_reduce_def_to() and bli_blksz_reduce_max_to(), for reducing
default and maximum blocksizes to some desired blocksize multiple.
These functions are needed in the updated definitions of
bli_cntx_set_blkszs() and bli_gks_cntx_set_blkszs().
- Added support for the 1e and 1r packing schemas to packm, including
1e/1r packing kernels.
- Added a minor optimization to bli_gemm_ker_var2() that allows, under
certain circumstances (specifically, real domain beta and row- or
column-stored matrix C), the real domain macrokernel and microkernel
to be called directly, rather than using the virtual microkernel
via the complex domain macrokernel, which carries a slight additional
amount of overhead.
- Added 1m support to the testsuite.
- Added 1m support to Makefile and runme.sh in test/3m4m. Also simplified
some code in test_gemm.c driver.

commit d8f13beeea90338e0ecb0a3aeaa2d59d8ebd6c36
Merge: c25a9205 145a551d
Author: praveeng <praveen.gamd.com>
Date: Fri Nov 25 17:31:08 2016 +0530

Merge master code till 2016_11_25 to amd-staging

commit c25a9205fd8c8d8de7fd81b1e5621e7ac79f4e87
Merge: 65298762 bdc0a264
Author: praveeng <praveen.gamd.com>
Date: Fri Nov 25 17:06:36 2016 +0530

Merge master code till Switched to simpler trsm_r 2016_11_25 to amd-staging

Change-Id: Ibf71d224d8fb6cf0bc497f84d50c27d276512cc1

commit 145a551d524ae5492667a05fc248923d922df850
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Nov 23 17:59:06 2016 -0600

Switched to simpler trsm_r implementation.

Details:
- Disabled the implementation of trsm_r that allows the right-hand matrix
B to be trianglar, and switched to the implementation that simply
transposes the operation (and thus the storage of C) in order to recast
the operation as trsm_l. This avoids the need to use trsm_rl and trsm_ru
macrokernels, which require an awkward swapping of MR and NR. For now,
the support for trsm_r macrokernels, via separate control trees, remains.
- Modified bli_config_macro_defs.h so that BLIS_RELAX_MCNR_NCMR_CONSTRAINTS
is defined by default. This is mostly a safety precaution in case someone
tries to switch back to the previous trsm_r implementation, but also
serves as a convenience on some systems where one does not naturally
choose blocksizes in a way that satisfies MC % NR = 0 and NC % MR = 0.

commit b3e58ee30307cf1e11529f2113acb9abbeda25af
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Nov 23 17:58:26 2016 -0600

Reimplemented 4x12 haswell ukernels (real only).

Details:
- Replaced permutation-based implementations in bli_gemm_asm_d4x12.c, which
defines 4x24 single real and 4x12 double real gemm microkernels, with
broadcast-based implementations. (The previous microkernel file has been
moved to an 'old' subdirectory.)

commit 65298762ff15c45e8588e0c279a9feaa98c927a0
Author: sthangar <Santanu.Thangarajamd.com>
Date: Tue Nov 22 12:15:33 2016 +0530

removed a redundant copy operation in DNRM2

Change-Id: I673b08efde4480e871779716f7715566740ad9ce

commit d6863e851adeef037e4d1476fe63bb293fb9d987
Author: sthangar <Santanu.Thangarajamd.com>
Date: Mon Nov 21 11:30:30 2016 +0530

checked-in DNRM2 optimizations

Change-Id: I3b31d768bd7f4fbf43042aa5a0762995c73c4522

commit bdc0a264d2fb5940bfd09298b1de823674a39053
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Nov 16 14:13:08 2016 -0600

Adjusted stride selection of ct in macrokernels.

Details:
- Updated the changes introduced in 618f433 so that the strides of the
temporary microtile ct used in the macrokernels is determined based
on the storage preference of the microkernel (via the new functions
below), rather than the strides of c. In almost all cases, presently,
this change results in no net effect, as a high-level optimization
in the _front() functions aligns the storage of c to that of the
microkernel's preference. However, I encountered some cases where
this is not always the case in some development code that has yet
to be committed, and therefore I'm generalizing the framework code
in advance.
- Defined two new functions in bli_cntx.c:
bli_cntx_l3_ukr_prefers_rows_dt()
bli_cntx_l3_ukr_prefers_cols_dt()
which return bool_t's based on the current micro-kernel's storage
preferences. For induced methods, the preference of the underlying
real domain microkernel is returned.
- Updated definition of bli_cntx_l3_ukr_dislikes_storage_of(), and
by proxy bli_cntx_l3_ukr_prefers_storage_of(), to be in terms of
the above functions, rather than querying the preferences of the
native microkernel directly (which did the wrong thing for induced
methods).

commit 031978d2647cf08316858baf29c84ebba9c3133e
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Nov 16 14:04:33 2016 -0600

Fixed inactive trsm_r blocksize constraint code.

Details:
- Changed a cpp macro that was meant to prevent using certain trsm_r code
if BLIS_RELAX_MCNR_NCMR_CONSTRAINTS was defined. It was actually coded
incorrectly at first. I've now fixed its location and changed its
consequence to a compile-time error message.

commit 9772218cae57d55c252595b01e3669d8bed84944
Author: sthangar <Santanu.Thangarajamd.com>
Date: Wed Nov 16 15:19:19 2016 +0530

Added optimized DAMAX routines for Zen

Change-Id: I499c0c8f0f4ce6c19235c47b86d5608db6ba50f8

commit 9c448e30174e5eb76a94b43b30819704a5dfcb3f
Merge: 998d8240 e35d3c23
Author: Santanu Thangaraj <Santanu.Thangarajamd.com>
Date: Wed Nov 16 04:18:57 2016 -0500

Merge "Added new optimized micro-kernel for dotxv routine" into amd-staging

commit 998d824044adac0d54c921dcd44fb58f3d54aad2
Merge: 0d13e9a4 6b5a4032
Author: praveeng <praveen.gamd.com>
Date: Wed Nov 16 14:22:42 2016 +0530

Merge master code till devinamatthews/omp_num_thrds 2016_11_16 to amd-staging

Change-Id: I601ff1d3ec8a680e1be039ffc7b299744e8a27c5

commit 6b5a4032d2e3ed29a272c7f738b7e3ed6657e556
Merge: 3b524a08 a8220e3a
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Nov 10 15:28:24 2016 -0600

Merge pull request 109 from devinamatthews/omp_num_threads

Add automatic loop thread assignment.

commit a8220e3a86433b5d76789e32ea7ca014a11b6d17
Author: Devin Matthews <dmatthewsutexas.edu>
Date: Thu Nov 10 14:19:34 2016 -0600

- Fix typo in bli_cntx.c
- Bump BLIS_DEFAULT_NR_THREAD_MAX to 4

commit e35d3c23f28784e50ee13d2e77a69d60e0c24c1f
Author: Kiran Varaganti <Kiran.Varagantiamd.com>
Date: Thu Nov 10 14:30:53 2016 +0530

Added new optimized micro-kernel for dotxv routine

Change-Id: I2c544e9b25a454d971ad690353502a55cd668391

commit 0d13e9a4f6f2fcda08f205215240cdf86442d6c6
Merge: e044fa62 3b524a08
Author: praveeng <praveen.gamd.com>
Date: Mon Nov 7 14:40:41 2016 +0530

bli_kernel.h

Change-Id: I425d089f79497a0de7d1622e829c3ca9edf7f091

commit c05b3862f6241486442b313eff0c8bee7b5e1274
Author: Devin Matthews <dmatthewsutexas.edu>
Date: Fri Nov 4 15:48:02 2016 -0500

Add automatic loop thread assignment.

- Number of threads is determined by BLIS_NUM_THREADS or OMP_NUM_THREADS, but can be overridden by BLIS_XX_NT as before.
- Threads are assigned to loops (ic, jc, ir, and jc) automatically by weighted partitioning and heuristics, both of which are tunable via bli_kernel.h.
- All level-3 BLAS covered.

commit 3b524a08e3fb8380e7b8b2ba835312c51a331570
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Nov 2 17:45:18 2016 -0500

Consolidated 3m1/4m1 gemmtrsm, trsm ukernel code.

Details:
- Consolidated the macros that define the lower and upper versions of the
gemmtrsm microkernels into a single macro that is instantiated twice.
Did this for both 3m1 and 4m1 microkernels.
- Consolidated lower and upper versions of the trsm microkernels for 3m1
and 4m1 into single files (each).

commit ead231aca635deb3db270f118454e4222c627f31
Merge: d25e6f8b 62987f60
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Nov 2 13:03:50 2016 -0500

Merge pull request 108 from devinamatthews/patch-2

Update .travis.yml with additional tests

commit 62987f60a6a6ff0a75b31d0404f493593ce35ccc
Author: Devin Matthews <dmatthewsutexas.edu>
Date: Wed Nov 2 11:20:37 2016 -0500

Allow KNL to fail

commit 8f9010542c751ae3cbfe6121cb011d8985c1e00d
Author: Devin Matthews <dmatthewsutexas.edu>
Date: Wed Nov 2 11:18:32 2016 -0500

Fix some problems with OSX builds:

- Update CPU detection for Intel archs (esp. Skylake)
- Allow clang for the reference config

commit d25e6f8b63c57f30b8a67dffbf4995977cf9f235
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Nov 1 14:35:15 2016 -0500

Can disable trsm_r-specific blocksize constraints.

Details:
- Added cpp guards around the constraints in bli_kernel_macro_defs.h
that enforce MC % NR = 0 and NC % MR = 0. These constraints are ONLY
needed when handling right-side trsm by allowing the matrix on the
right (matrix B) to be triangular, because it involves swapping
register, but not cache, blocksizes (packing A by NR and B by MR)
and then swapping the operands to gemmtrsm just before that kernel
is called. It may be useful to disable these constraints if, for
example, the developer wishes to test the configuration with
a different set of cache blocksizes where only MC % MR = 0 and
NC % NR = 0 are enforced.
- In summary, defining BLIS_RELAX_MCNR_NCMR_CONSTRAINTS will bypass
the enforcement of MC % NR = 0 and NC % MR = 0.

commit 1a67e3688edb073a9d44c160e7b0798e08796b8a
Author: Devin Matthews <dmatthewsutexas.edu>
Date: Tue Nov 1 13:53:18 2016 -0500

Bogus commit

Need to trigger another Travis build.

commit 2cd82d67b372cad1bed50cfd99e524f1f40b4e24
Author: Devin Matthews <dmatthewsutexas.edu>
Date: Tue Nov 1 13:25:50 2016 -0500

Some fixes for .travis.yml

- Switch to gcc-5 to support knl
- Don't run tests in parallel -- it is super slow.
- Use clang on OSX since gcc is only a zombie husk.

commit a3db4e6bdfe745083acf704ab0f51f74ea869538
Author: Devin Matthews <dmatthewsutexas.edu>
Date: Tue Nov 1 10:33:18 2016 -0500

Update .travis.yml with additional tests

- Test knl configuration (without running of course).
- Test openmp and pthreads threading for auto configuration with 4 threads.
- Test auto configuration with and without pthreads on OSX.
- Also, run make in parallel.

I don't know how the `addons:` section works on OSX; hopefully it is just ignored.

commit 8a11a2174a1a5b9426f13bbc5338dc86ab138cdd
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Oct 31 19:07:55 2016 -0500

Updates to non-default haswell microkernels.

Details:
- Updated s and d microkernels in bli_gemm_asm_d8x6.c to relax alignment
constraints.
- Added missing c and z microkernels, which are based on the corresponding
kernels in the d6x8 set.
- This completes the d8x6 set (which may be used for situations when it
is desirable to have a microkernel with a column preference).

commit 618f4331eba209803ecab99747872eceb1b5f091
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Oct 31 14:40:51 2016 -0500

Align strides of ct in macrokernels to that of c.

Details:
- Previously, rs_ct and cs_ct, the strides of the temporary microtile used
primarily in the macrokernels' edge case handling, were unconditionally
set to 1 and MR, respectively. However, Devin Matthews noted that this
ought to be changed so that the strides of ct were in agreement with the
strides of C. (That is, if C was row-stored, then ct should be accessed
as by rows as well.) The implicit assumption is that the strides of C
have already been adjusted, via induced transposition, if the storage
preference of the microkernel is at odds with the storage of C. So, if
the microkernel prefers row storage, the macrokernel's interior cases
would present row-stored (ideal) microkernel subproblems to the
microkernel, but for edge cases, it would still see column-stored
subproblems (not ideal). This commit fixes this issue. Thanks to Devin
for his suggestion.

commit c2c91e09b4893cb81314774557f728a95080f81e
Author: Jeff Hammond <jeff.sciencegmail.com>
Date: Tue Oct 25 21:15:26 2016 -0700

never use libm with Intel compilers

Intel compilers include a highly optimized math library (libimf) that
should be used instead of GNU libm.

yes, this change is for ALL targets, including those that are not
supported by the Intel compiler. there is no harm in doing this, and it
is future-proof in the event that the Intel compilers support other
architectures.

commit 630391002325a589063aec2ab0a7d89ef2e178c0
Merge: 956b3edf 216206c1
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Oct 25 19:34:51 2016 -0500

Merge pull request 105 from devinamatthews/knl

Support for Intel Knight's Landing.

commit 216206c1d328a865c2192e35a4df6e9aff79a85b
Author: Devin Matthews <dmatthewsutexas.edu>
Date: Tue Oct 25 13:56:18 2016 -0500

Fix up for merge to master.

commit 11eb7957abbcdf02d5e312898e094260eadb1209
Merge: cd5b6681 956b3edf
Author: Devin Matthews <dmatthewsutexas.edu>
Date: Tue Oct 25 13:51:07 2016 -0500

Merge branch 'master' into knl

Conflicts:
frame/thread/bli_thread.h

commit cd5b6681838899283cd94e5427dfda206e7fbabe
Author: Devin Matthews <dmatthewsutexas.edu>
Date: Tue Oct 25 13:49:27 2016 -0500

Don't use %rbp in KNL packing kernels.

commit 956b3edf8eb09480f31f2e861c1b10f9ecbb2e52
Merge: b7e41d71 0662a3c1
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Oct 25 13:02:57 2016 -0500

Merge pull request 104 from devinamatthews/misspellings

Add flexible options for thread model (pthread/posix for pthreads etc.).

commit 0662a3c1b1f4644a86bf8e5073d1391808c91b4a
Author: Devin Matthews <dmatthewsutexas.edu>
Date: Tue Oct 25 12:42:44 2016 -0500

Add flexible options for thread model (pthread/posix for pthreads etc.).

commit e044fa624008c161de32a39d734cddf1dd22dd41
Author: Kiran Varaganti <Kiran.Varagantiamd.com>
Date: Tue Oct 25 13:03:05 2016 +0530

Changed double precision trsm kernel macro definition to bli_dtrsm_l_int_6x8 from 6x16 : it fixes the seg fault

Change-Id: Ia8c1de5fe13a370d691570a50136d55ffb18908a

commit b3ed4933aa0da72ad771fb0fdf1727e5ba9ad7b4
Author: Kiran Varaganti <Kiran.Varagantiamd.com>
Date: Tue Oct 25 13:03:05 2016 +0530

Changed double precision trsm kernel macro definition to bli_dtrsm_l_int_6x8 from 6x16 : it fixes the seg fault

Change-Id: Ia8c1de5fe13a370d691570a50136d55ffb18908a

commit b7e41d71b07d2af6d22d632c70e0c5f7ce46852c
Merge: 4bd905bd 5117d444
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Oct 24 16:47:46 2016 -0500

Merge pull request 103 from devinamatthews/patch-1

Change .align to .p2align in Bulldozer ukernels.

commit 5117d444f7f3a2bc327f067926eaf2398212edda
Author: Devin Matthews <dmatthewsutexas.edu>
Date: Mon Oct 24 16:20:47 2016 -0500

Change .align to .p2align in Bulldozer ukernels

Apparently OSX doesn't allow .align directives for >16B, so I've changed these to their .p2align counterparts.

commit 4bd905bd4597e0ad7bedf31e25e779d3e2dfda29
Merge: 936d5fdc 7f32dd57
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Oct 21 14:48:44 2016 -0500

Merge pull request 93 from ShadenSmith/config_check

Adds sanity check to configuration choice.

commit 936d5fdc26c6c4dab199a8d11fde948975cfa1d6
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Oct 21 14:34:27 2016 -0500

Fixed multithreading compilation bug in 970745a.

Details:
- Moved the definition of the cpp macro BLIS_ENABLE_MULTITHREADING
from bli_thread.h to bli_config_macro_defs.h. Also moved the
sanity check that OpenMP and POSIX threads are not both enabled.
- Thanks to Krzysztof Drewniak for reporting this bug.

commit d250e6a3af3af8beedcda28f508ac03e94efb3c8
Author: Kiran Varaganti <Kiran.Varagantiamd.com>
Date: Thu Oct 20 14:34:39 2016 +0530

Merged TRSM and scalv routines into zen folder

Change-Id: Ice897bc83e8fb70b90f23cc3ce892c39883aceb9

commit 8feb0f85a674e84bec2417486e3bcea584b14c04
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Oct 19 16:05:41 2016 -0500

Removed auto-prototyping of malloc()/free() substitutes.

Details:
- Removed the header file, bli_malloc_prototypes.h, which automatically
generated prototypes for the functions specified by the following
cpp macros:
BLIS_MALLOC_INTL
BLIS_FREE_INTL
BLIS_MALLOC_POOL
BLIS_FREE_POOL
BLIS_MALLOC_USER
BLIS_FREE_USER
These prototypes were originally provided primarily as a convenience
to those developers who specified their own malloc()/free() substitutes
for one or more of the following. However, we generated these prototypes
regardless, even when the default values (malloc and free) of the
macros above were used. A problem arose under certain circumstances
(e.g., gcc in C++ mode on Linux with glibc) when including blis.h that
stemmed from the "throw" specification which was added to the glibc's
malloc() prototype, resulting in a prototype mismatch. Therefore, going
forward, developers who specify their own custom malloc()/free()
substitutes must also prototype those substitutes via bli_kernel.h.
Thanks to Krzysztof Drewniak for reporting this bug, and Devin Matthews
for researching the nature and potential solutions.

commit 970745a5fc7c29de3e202988e5eb104fabca4fdc
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Oct 19 15:58:03 2016 -0500

Reorganized typedefs to avoid compiler warnings.

Details:
- Relocated membrk_t definition from bli_membrk.h to bli_type_defs.h.
- Moved include of bli_malloc.h from blis.h to bli_type_defs.h.
- Removed standalone mtx_t and mutex_t typedefs in bli_type_defs.h.
- Moved include of bli_mutex.h from bli_thread.h to bli_typedefs.h.
- The redundant typedefs of membrk_t and mtx_t caused a warning on some C
compilers. Thanks to Tyler Smith for reporting this issue.

commit 1c2f7b57d557c05f5ef6148cccafaf0f70d910da
Author: sthangar <Santanu.Thangarajamd.com>
Date: Tue Oct 18 15:06:35 2016 +0530

Removed symlinks to zen kernels from haswell kernel folder and also modified the bli_kernel.h file accordingly

Change-Id: Ib3736af48e851c8243bbe10d937fb942c49ad048

commit d864ea9f4f039fe2b2dc395d0015bd9e8902bc8e
Merge: 7045fcbf 28b2af8a
Author: praveeng <praveen.gamd.com>
Date: Fri Oct 14 17:00:57 2016 +0530

Merge master code 2016_10_14 till Added disabled code thrinfo_t structures

Change-Id: If7db98d286c1471fcd30f00757abee9b253ef987

commit 28b2af8a71133ce68774e153b6e05afb05affba8
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Oct 13 14:50:08 2016 -0500

Added disabled code to print thrinfo_t structures.

Details:
- Added cpp-guarded code to bli_thrcomm_openmp.c that allows a curious
developer to print the contents of the thrinfo_t structures of each
thread, for verification purposes or just to study the way thread
information and communicators are used in BLIS.
- Enabled some previously-disabled code in bli_l3_thrinfo.c for freeing
an array of thrinfo_t* values that is used in the new, cpp-guarde code
mentioned above.
- Removed some old commented lines from bli_gemm_front.c.

commit 11eed3f683d09e65f721567b346b0f733bff9a64
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Oct 13 14:23:23 2016 -0500

Fixed a configure -t omp/openmp bug from fd04869.

Details:
- Forgot to update certain occurrences of "omp" in common.mk during
commit fd04869, which changed the preferred configure option string
for enabling OpenMP from "omp" to "openmp".

commit 7045fcbf0bd349ebe6cb9ac4508c6a387bb05966
Merge: 7e044900 9cda6057
Author: praveeng <praveen.gamd.com>
Date: Thu Oct 13 12:02:28 2016 +0530

Merge master code 2016_10_13 Removed previously renamed/old files

Change-Id: I8106d371afaa0af474a8967388d44481b05de923

commit 7e04490002206d3557fcfb7dd893838a7f36916f
Author: sthangar <Santanu.Thangarajamd.com>
Date: Wed Oct 12 16:43:02 2016 +0530

Checked in the SAMAX optimizations

Change-Id: I7faf8c3adf52ff01432188ad3b9866ee4b9a9dfd

commit 9cda6057eaa16a24ac8785a9fa167df6c9edba44
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Oct 11 13:21:26 2016 -0500

Removed previously renamed/old files.

Details:
- Removed frame/base/bli_mem.c and frame/include/bli_auxinfo_macro_defs.h,
both of which were renamed/removed in 701b9aa. For some reason, these
files survived when the compose branch was merged back into master.
(Clearly, git's merging algorithm is not perfect.)
- Removed frame/base/bli_mem.c.prev (an artifact of the long-ago changed
memory allocator that I was keeping around for no particular reason).

commit 22377abd84b9e560ffe1c4e4d284eb443ddb7133
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Oct 10 13:43:56 2016 -0500

Fixed bli_gemm() segfault on empty C matrices.

Details:
- Fixed a bug that would manifest in the form of a segmentation fault
in bli_cntl_free() when calling any level-3 operation on an empty
output matrix (ie: m = n = 0). Specifically, the code previously
assumed that the entire control tree was built prior to it being
freed. However, if the level-3 operation performs an early exit, the
control tree will be incomplete, and this scenario is now handled.
Thanks to Elmar Peise for reporting this bug.

commit 0b571cd94d9b175331c9453258a6b1389a718ae8
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Oct 6 14:48:15 2016 -0500

Fixed segfault in bli_free_align() for NULL ptrs.

Details:
- Fixed a bug in bli_free_align() caused by failing to handle NULL pointers
up-front, which led to performing pointer arithmetic on NULL pointers in
order to free the address immediately before the pointer. Thanks to Devin
Matthews for reporting this bug.

commit cd84fb95182514601d72c78ee0e36a394d0284d7
Author: praveeng <praveen.gamd.com>
Date: Thu Oct 6 15:08:21 2016 +0530

syntax erros in configure file

Change-Id: Ibe8a6071aad97df550df64c009fec33a9d8f43a1

commit f2e7ea113aa93b74f1d42408d5db2c5a7b00a653
Merge: 133983c3 86969873
Author: praveeng <praveen.gamd.com>
Date: Thu Oct 6 12:35:30 2016 +0530

conflicts merge for bli_kernel.h

Change-Id: I15d846bd34e11f86ebfd7ed091ff671a1f3366a0

commit 133983c36fa01c7acb6d666b3744f77f216314a5
Author: sthangar <Santanu.Thangarajamd.com>
Date: Thu Oct 6 11:26:22 2016 +0530

code clean up in bli_kernel.h

Change-Id: I11d9cdf2af8e8199209eb084f6c3a7c910b83d5d

commit 4fb9b4ef2e4cf2626a6e000a41628fb823f16da8
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Oct 5 14:41:35 2016 -0500

CHANGELOG update (0.2.1)

Page 3 of 7

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.