Blis

Latest version: v1.2.0

Safety actively analyzes 714815 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 6 of 7

0.0.8

Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Jun 12 16:02:12 2013 -0500

Use separate CFLAGS for "kernels" directories.

Details:
- Added a new "special" directory type: any source code within directories
named "kernels" will be compiled with a separate CFLAGS_KERNELS set of
compiler flags. This allows the developer to specify a separate set of
flags (e.g. optimization flags) for compiling kernels while maintaining a
standard set for regular framework code.
- Fixed a bug in the top-level Makefile that was causing "noopt" code
to be compiled with the standard set of compilation flags.
- Updated make_defs.mk in reference, flame, and clarksville configurations
according to above changes.

commit 08475e7c7653ba598665071a617d10f0d8f763c2
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Jun 11 12:18:39 2013 -0500

Various level-3 optimizations for row storage.

Details:
- Implemented remaining two cases within bli_packm_blk_var2(), which allow
packing from a lower or upper-stored symmetric/Hermitian matrix to column
panels (which are row-stored). Previously one could only pack to row panels
(which are column-stored).
- Implemented various optimizations in the level-3 front-ends that allow more
favorable access through row-stored matrices for gemm, hemm, herk, her2k,
symm, syrk, and syr2k.
- Cleaned up code in level-3 front-ends that has to do with setting target and
execution datatypes.

commit 05a657a6b92e8d34efa5c57ae6a18a4f35ec0841
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Jun 7 11:04:10 2013 -0500

Added beta == 0 optimization to x86_64 ukernel.

Details:
- Modified x86_64 gemm microkernel so that when beta is zero, C is not read
from memory (nor scaled by beta).
- Fixed minor bug in test suite driver when "Test all combinations of storage
schemes?" switch is disabled, which would result in redundant tests being
executed for matrix-only (e.g. level-1m, level-3) operations if multiple
vector storage schemes were specified.
- Restored debug flags as default in clarksville configuration.

commit f1aa6b81cc421516dd77dd0f18f7c432724e6ef2
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Jun 6 13:36:06 2013 -0500

Whitespace changes to old test drivers.

Details:
- Replaced tabs with four spaces in places where indention was already
in place.

commit 9feb4c23d2e36f3d8b5417a3802c69f94b29f749
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Jun 4 14:57:46 2013 -0500

Fixed unaligned handling in axpyf, dotxaxpyf.

Details:
- Fixed over-cautious handling of unaligned operands in vector instrinsic
implementation of axpyf kernel.
- Fixed over- and under-cautious handling of unaligned operands in vector
intrinsic implementation of dotxaxpyf kernel.

commit 22b06cfcd2e3205c8325a246c2279e4b1047c066
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Jun 3 16:54:52 2013 -0500

Updated level-1/-1f [vector intrinsic] kernels.

Details:
- Updated level-1/-1f kernels so that non-unit and un-aligned cases are
handled by reference implementation (rather than aborted).
- Added -fomit-frame-pointer to default make_defs.mk for clarksville
configuration.
- Defined bli_offset_from_alignment() macro.
- Minor edits to old test drivers.

commit 0288c827d3659bb225ac9c10f168b623ed0106a2
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Jun 1 08:02:23 2013 -0500

Updated ukernels for x86_64.

Details:
- Tweaked micro-kernels and configuration for clarksville.
- Updated/cleaned up old test drivers in test directory.
- Fixed syntax bug in trsv_unb_var1 and trsv_unf_var1 (introduced
recently).

commit 85a6d1c9a52c2b27c71a3a3e341c51d7ba263749
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon May 6 11:05:08 2013 -0500

Replaced axpys usage with subs in trsv.

Details:
- Replaced instances of axpys with alpha equal to -1 with subs.
- Use BLIS_MAX_TYPE_SIZE to define BLIS_CONSTANT_SLOT_SIZE instead of
sizeof(dcomplex).

commit 2d9c667f3c48a12cab64e5ad09d5fcb9f4c19d78
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri May 24 16:28:10 2013 -0500

Fixed x86_64 kernel bugs and other minor issues.

Details:
- Fixed bugs in trmv_l and trsv_u due to backwards iteration resulting in
unaligned subpartitions. We were already going out of our way a bit to
handle edge cases in the first iteration for blocked variants, and this
was simply the unblocked-fused extension of that idea.
- Fixed control tree handling in her/her2/syr/syr2 that was not taking
into account how the choice of variant needed to be altered for
upper-stored matrices (given that only lower-stored algorithms are
explicitly implemented).
- Added bli_determine_blocksize_dim_f(), bli_determine_blocksize_dim_b()
macros to provide inlined versions of bli_determine_blocksize_[fb]() for
use by unblocked-fused variants.
- Integrated new blocksize_dim macros into gemv/hemv unf variants for
consistency with that of the bugfix for trmv/trsv (both of which now
use the same macros).
- Modified bli_obj_vector_inc() so that 1 is returned if the object is a
vector of length 1 (ie: 1 x 1). This fixes a bug whereby under certain
conditions (e.g. dotv_opt_var1), an invalid increment was returned, which
was invalid only because the code was expecting 1 (for purposes of
performing contiguous vector loads) but got a value greater than 1 because
the column stride of the object (e.g. rho) was inflated for alignment
purposes (albeit unnecessarily since there is only one element in the
object).
- Replaced some old invocations of set0 with set0s.
- Added alpha parameter to gemmtrsm ukernels for x86_64 and use accordingly.
- Fixed increment bug in cleanup loop of gemm ukernel for x86_64.
- Added safeguard to test modules so that testing a problem with a zero
dimension does not result in a failure.
- Tweaked handling of zero dimensions in level-2 and level-3 operations'
internal back-ends to correctly handle cases where output operand still
needs to be scaled (e.g. by beta, in the case of gemm with k = 0).

commit d57ec42b34f8447c88adeffa95cf22f8c115ad51
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri May 3 17:35:32 2013 -0500

Renamed _trans_status() macro.

Details:
- Mistakenly forgot to rename the _trans_status() macro and instances in
previous commit.

commit 9e2b227866af429a4a6fb7dbb8c457bbdda2f136
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri May 3 17:24:58 2013 -0500

Renamed _set_trans(), _trans_status() macros.

Details:
- Renamed the following macros:
bli_obj_set_trans() -> bli_obj_set_onlytrans()
bli_obj_trans_status() -> bli_obj_onlytrans_status()
to remove ambiguity as to which bits are read/updated.

commit 2f8174509ea9f844db11ebd9389de5168e85b132
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed May 1 15:06:30 2013 -0500

Unconditionally check memory pool(s) for errors.

Details:
- Changed bli_mem_acquire_m() in bli_mem.c so that we still check if the
memory pool is exhausted before checking out and returning a block, even
if BLIS error checking has been disabled. These errors are useful because
they likely indicate that BLIS was improperly configured for the code
being run.

commit 75405a2b83679b6aff38d7e7425199d623a7b0a9
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed May 1 15:00:30 2013 -0500

CHANGELOG update.

0.0.7

Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Apr 30 19:35:54 2013 -0500

Absorbed blocksize extensions into main objects.

Details:
- Revamped some parts of commit b6ef84fad1c9 by adding blocksize extension
fields to the blksz_t object rather than have them as separate structs.
- Updated all packm interfaces/invocations according to above change.
- Generalized bli_determine_blocksize_?() so that edge case optimization
happens if and only if cache blocksizes are created with non-zero
extensions.
- Updated comments in bli_kernel.h files to indicate that the edge case
blocksize extension mechanism is now available for use.

commit bc7c8005cedbe50961ac2a99aeeabf4e9f9a8e9e
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Apr 25 17:16:59 2013 -0500

Added option to disable err checking in testsuite.

Details:
- Added a new line to input.general that allows one to specify the error-
checking level to use for each BLIS experiment. The only two levels
supported for now are "no error checking" and "full error checking".

commit 096b366ddcfe386f44419ef84d8df8be13825f86
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Apr 25 16:43:43 2013 -0500

Use cntl trees that block in n dimension.

Details:
- Updated _cntl.c files for each level-3 operation to induce blocked
algorithms that first paritition in the n dimension with a blocksize
of NC. Typically this is not an issue since only very large problems
exceed that of NC. But developers often run very large problems, and
so this extra blocking should be the default.
- Removed some recently introduced but now unused macros from
bli_param_macro_defs.h.

commit b6e24b23cb4dfc488c1c9c70d596539c2287f72e
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Apr 25 12:06:12 2013 -0500

Use PASTEMAC in macro-kernels (over MAC2 or MAC3).

Details:
- Replaced multi-type invocations of copys_mxn, xpbys_mxn, etc. (PASTEMAC2
and PASTEMAC3) with those that only use a single type (PASTEMAC).
- Added extra macros to bli_adds_mxn_uplo.h and bli_xpbys_mxn_uplo.h to
accommodate above change.
- Fixed comment typo in bli_config.h files.
- Added .nfs* pattern to .gitignore.

commit df80acf517dde180ddcc5835c6136b2fa7556d4b
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Apr 23 19:43:23 2013 -0500

Fixed computation of b_next in L3 macro-kernels.

Details:
- Restructured herk_l and herk_u macro-kernels in the imagine of trmm
and trsm, in that the edge cases are captured by the main loop, rather
than trying to have "cleanup" sections that result in four distinct
parts (interior, bottom edge, right edge, bottom-right edge) of the
code.
- Fixed the way b_next was being computed in the non-gemm level-3
macro-kernels (herk, trmm, trsm). The way they are computed now matches
that of gemm.

commit 3671528cf8efe4b445d196665143a5c50c2c6048
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Apr 23 19:12:14 2013 -0500

Fixed minor bug in computing b_next in gemm.

commit db072a5b4a039a9a668ef951333ecfb5bd3a74b9
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Apr 23 17:49:10 2013 -0500

Fixed rare edge case bug in herk_l macro-kernel.

Details:
- Fixed a potential bug in herk_l at the m_left edge case. If MR was
chosen to be much larger than NR, then one could encounter edge cases
in the the MC dimension that fall entirely below the diagonal, which
the previous implementation of the herk_l macro-kernel was not allowing
for.

commit 1dab11e37d1cb403cbe75b73a644c00de534f104
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Apr 23 17:17:11 2013 -0500

Updated x86 gemmtrsm ukernels to use alpha.

commit 9d10d7dd9bc92a993fea7162bfa5983f75506f49
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Apr 23 16:00:18 2013 -0500

Added a_next, b_next arguments to micro-kernels.

Details:
- Added two more arguments to the gemm and gemmtrsm microkernels: the
addresses of the next micro-panels of A and B. By passing these
pointers into the micro-kernel, we allow the micro-kernel author to
prefetch micro-panels of A and B as necessary (though this is
completely optional; these addresses may also be safely ignored).
- Updated all seven macro-kernels so that they compute and pass in
a_next and b_next. Note that ONLY the gemm macro-kernel computes
a_next and b_next with the precise semantics we want. I will go back
and fix the other macro-kernels in the near future.
- Added 'restrict' to various micro-kernels from which it was missing.

commit f3815dc84d385c514a5acaf1e925424a57be2f51
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Apr 23 11:12:33 2013 -0500

Added code for backward edge-case blocking.

Disabled:
- Edited bli_determine_blocksize_b() to include experimental (and
currently disabled) code that computes extended blocks.
- Updated commnts relate to above changes.
- Enabled use of x86 gemmtrsm ukernel in config/flame/bli_kernel.h.

commit 4fe1435f20e8fc7dd72f795ac58c8e236e6c631b
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Apr 22 19:00:43 2013 -0500

Updated dupl implementation to use PACKNR and NR.

Details:
- Updated frame/util/dupl/bli_dupl_unb_var1.c to utilize PACKNR and NR
explicitly so navigate b1 so that situations where PACKNR > NR are
supported.
- Moved the 4x2 and 4x4 reference micro-kernels in frame/3/gemm/ukernels and
frame/3/trsm/ukernels to kernels/c99/.
- Updated clarksville and flame configurations.

commit 2d6f9e83799a46d52d7901e275f8fd67f0a0edc6
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sun Apr 21 15:10:34 2013 -0500

Disabled blocksize checks for memory pools.

Details:
- Temporarily disabled checks that ensure that enough memory will be allocated
by the contiguous memory allocator for all types, given that the values for
double precision real are the ones used to allocate the space. These checks
can easily go awry in certain situations, especially if you are developing for
only one datatype. So for now, they are probably more trouble than they are
worth.

commit b6ef84fad1c9884c84b7f1350a0bcdfe1737e8f2
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sun Apr 21 15:00:24 2013 -0500

Allow ldim of packed micro-panels != MR, NR.

Details:
- Made substantial changes throughout the framework to decouple the leading
dimension (row or column stride) used within each packed micro-panel from
the corresponding register blocksize. It appears advantageous on some
systems to use, for example, packed micro-panels of A where the column
stride is greater than MR (whereas previously it was always equal to MR).
- Changes include:
- Added BLIS_EXTEND_[MNK]R_? macros, which specify how much extra padding
to use when packing micro-panels of A and B.
- Adjusted all packing routines and macro-kernels to use PACKMR and PACKNR
where appropriate, instead of MR and NR.
- Added pd field (panel dimension) to obj_t.
- New interface to bli_packm_cntl_obj_create().
- Renamed bli_obj_packed_length()/_width() macros to
bli_obj_padded_length()/_width().
- Removed local defines for cache/register blocksizes in level-3 *_cntl.c.
- Print out new cache and register blocksize extensions in test suite.
- Also added new BLIS_EXTEND_[MNK]C_? macros for future use in using a larger
blocksize for edge cases, which can improve performance at the margins.

commit 59fca58dbe678d79c1df0916b022afbeac7c48fa
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Apr 19 15:26:29 2013 -0500

Fixed bug in compatibility layer (her2k/syr2k).

Details:
- Fixed a bug in the BLAS compatibility layer, specifically in bla_her2k.c
and bla_syr2k.c, that caused incorrect computation to occur when the BLAS
interface caller requests the [conjugate-]transpose case. Thanks to Bryan
Marker for reporting the behavior that led to this bug.

commit 09eacbd1ab1380a95a0e9625726b45e43ed102d6
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Apr 18 19:39:13 2013 -0500

Changed old level3 test drivers to call front-ends.

Details:
- Changed old level-3 test drivers, in 'test' directory, to always call the
front-end object API instead of the internal back-end with the locally
defined control tree.

commit 83e45de23e565138b8fde06fb11cfedc973b7246
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Apr 18 18:33:03 2013 -0500

Allow packm_init() to reacquire a too-small mem_t.

Details:
- Changed bli_packm_init() to react differently to a situation where a pack
obj_t has an already-allocated mem_t entry that has a buffer that is smaller
than what will be needed to hold the block/panel that now needs to be
packed. Previously, this situation was treated with an abort() since I
assumed something was horribly wrong. I have changed the code so that it now
reacts by releasing the previous mem_t and re-acquires a new mem_t with the
new information. (This change was done at the request of Bryan Marker to
facilitate code generation via DxT.)

commit a6990434173b0cf651f8521194f3aef738deb7d2
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Apr 18 13:52:47 2013 -0500

Fixed bug in packing block of A for hemm/symm.

Details:
- Fixed a bug in bli_packm_blk_var2() that affected the packing functionality
of hemm and symm. The bug occurs whenever attempting to pack a Hermitian or
symmetric matrix where the block of A being packed intersects the diagonal,
but some of its micro-panels do not intersect the diagonal and lie completely
in the unstored region. Thanks to Francisco Igual for reporting this bug.
- Comment updates to both _blk_var2.c and _blk_var3.c.

commit c92e7590e1934f830814ab614c794215ebe0c415
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Apr 17 20:53:29 2013 -0500

Activated bli_packm_acquire_mpart_t2b().

Details:
- Removed the overly-paranoid bli_abort() from the end of
bli_packm_acquire_mpart_t2b(), to allow others to experiment with
partitioning through packed blocks of A. Also, and more importantly,
changed an earlier check that was causing an erroneous (but
coincidentally redundant) abort(). Also, updated some of the comments
in bli_packm_part.c.

commit bea579e9f009a44e08008eb14d09f38748ab2b53
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Apr 16 19:43:14 2013 -0500

Allow creation of "empty" objects.

Details:
- Modified bli_obj_alloc_buffer() to allow allocating an empty buffer, and
modified bli_adjust_strides() to explicitly handle m = n = 0.
- Updated bli_check_matrix_strides() to allow cases where m = n = 0.

commit 7904e20f2e6908571ee5008da2a08084198eefae
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Apr 16 17:37:16 2013 -0500

Fixed "root" object bug in bli_her[2]k/syr[2]k.

Details:
- Fixed an obscure bug in the front-ends for herk, her2k, syrk, and syr2k,
that manifested as the incorrect triangle being updated. It occurred when
the user would pass in a matrix object that was correctly marked as
symmetric/Hermitian and lower-stored, but whose root object was never marked
as lower (or upper). We now alias and re-assign root status for matrix C
within the front-ends. Note that trmm and trsm were already doing this,
albeit for a slightly different reason (to allow the internal back-end to
choose which algorithm to run--lower or upper--based on the uplo of the root
object for both left and right side cases). Thanks to Bryan Marker for
leading me to this bug.

commit 19155a768dd97b57cfb59c32fa8e54a344ec66e1
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Apr 16 11:24:03 2013 -0500

Fixed overzealous type-checking in bli_getsc().

Details:
- Relaxed type checking in getsc so that the input object could be a constant
and not just a proper floating-point type. (If it is a constant, default to
extracting the dcomplex values.) Thanks to Bryan Marker for reporting this
bug.
- Added definition for bli_is_constant() in bli_param_macro_defs.h
- Comment updates to various level-0 scalar routines.

commit 2ee6bbca2953d04c967685da9735b3eaf8a4b813
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Apr 15 19:27:57 2013 -0500

Fixed bug in bli_obj_is_packed() and renamed.

Details:
- This macro is used to determine whether the partitioning routines should
call a corresponding packm_part routine instead. However, it was
unintentionally catching matrices that were marked as "packed" by virtue
of them simply being marked as BLIS_PACKED_UNSPEC in, say, bli_gemv().
The macro has now been renamed to bli_obj_is_panel_packed(), and now only
checks for row or column panel packing. (Note that I first attempted to
fix this bug in a571af816d72.) Thanks to Bryan Marker for reporting the
erroneous behavior that led me to this bug.

commit 99b99eebe70336b5f28039a4a084aa7f5fa7059d
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Apr 15 17:54:43 2013 -0500

Removed local reference ukernel blocksize macros.

Details:
- Removed locally defined gemm microkernel blocksize macros from _mxn
reference microkernel definition and header. Meant to include this in
a recent/previous commit (0020ef7c8271).

commit 6a538fa7b164655f41cea5b9c8d3902438bda66b
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Apr 15 14:40:31 2013 -0500

Formatting change to mods in previous commit.

commit ea079d35591e808971d2d98a1a7d9f89bc1f7c2f
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Apr 15 14:31:40 2013 -0500

Set structure of objects in level-2 BLIS APIs.

Details:
- Added missing statement to set structure field of local objects in
top-level BLIS (BLAS-like) API wrappers. Thanks to Bryan Marker for
reporting this bug.

commit d9948c541c0446e20e249a1ccc83709ce51b7aa8
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Apr 15 10:21:26 2013 -0500

Tweak to test suite function string construction.

Details:
- Fixed a minor bug in the way that the test suite would construct function
name strings when the user anchored all parameters in input.operations.
In this case, the test driver would mistake this situation for one where
the operation simply had no parameters to begin with, and thus would not
include the parameter string in the function string that is output for
every result.

commit ca9e435c57c5c7a000d2a32681dd8070ba850abd
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Apr 15 09:59:46 2013 -0500

Fixed a bug in reference implementation of dupl.

Details:
- Fixed a bug in reference implementation of dupl (bli_dupl_unb_var1.c),
which resulted in incorrect duplication.
- Updated old test drivers according to recently updated packm control tree
creation interface.
- Added 'restrict' to x86 gemm microkernel interface.

commit 26cbd52e364bbe439e3744101cd5a6cbcb82dffd
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sun Apr 14 19:05:33 2013 -0500

Modified bli_kernel.h include order in blis.h.

Details:
- Delayed include of bli_kernel.h in blis.h to prevent a situation where
_kernel.h includes an optimized microkernel header, which uses BLIS types
such as dim_t and inc_t, which would precede the definition of those types
in bli_type_defs.h.
- Moved the include of bli_kernel_macro_defs.h in bli_macro_defs.h to blis.h
(immediately after that of bli_kernel.h).

commit 3414a23c38b0de45a8034b3dda2fc4b5a755e4e1
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Apr 13 16:53:16 2013 -0500

CHANGELOG update.

0.0.6

Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Apr 13 16:41:16 2013 -0500

Updated INSTALL file (now redirects to website).

commit 0020ef7c82711a7ebf08e5174f939bee2563184c
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Apr 13 15:26:35 2013 -0500

Removed gemmtrsm-, trsm-specific blocksize macros.

Details:
- Modified gemmtrsm micro-kernel wrappers to use new aliased blocksize macros
instead of operation-specific ones.
- Removed local, gemmtrsm-specific blocksize macro definitions found in
micro-kernel header files.
(Meant to include above changes in 31b100e7bf4a.)
- Added comments to reference gemmtrsm micro-kernel wrapper implementation.

commit 1a9f427b85bb95aaa9e54c8ff8ecad8734b361ee
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Apr 12 15:25:54 2013 -0500

Added/renamed alignment constants to _config.h.

Details:
- Added new memory alignment constants:
BLIS_HEAP_STRIDE_ALIGN_SIZE (previously assumed to be same as SYSTEM_MEM)
BLIS_CONTIG_ADDR_ALIGN_SIZE (previously assumed to be same as PAGE_SIZE)
BLIS_STACK_BUF_ALIGN_SIZE (previously not enforced)
and renamed existing ones
BLIS_SYSTEM_MEM_ALIGN_SIZE -> BLIS_HEAP_ADDR_ALIGN_SIZE
BLIS_CONTIG_MEM_ALIGN_SIZE -> BLIS_CONTIG_STRIDE_ALIGN_SIZE
to better convey what the alignment factor is used for (and what it is
not used for).
- Removed BLIS_ENABLE_SYSTEM_MEM_ALIGN. Dynamic memory alignment is now
disabled by setting BLIS_HEAP_STRIDE_ALIGN_SIZE to 1.
- Inserted instances of __attribute__((aligned(BLIS_STACK_BUF_ALIGN_SIZE)))
into macro-kernels to specify stack alignment of temporary buffers.
- Modified test suite driver to output new constants.
- Removed bli_align_dim_to_sys() and bli_align_dim_to_cmem(). Instead, we now
use bli_align_dim_to_size(), which takes a third argument (the desired
alignment).

commit a77d10e87e3c0ab55ec14d74c285bc95c06285c3
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Apr 12 11:40:55 2013 -0500

Fixed an bug in axpyv/axpym when alpha is unit.

Details:
- Fixed bug whereby axpyv and axpym were incorrectly simplifying to a copy,
rather than an add, when alpha = 1. Thanks to Bryan Marker for identifying
this bug.

commit 0495bd1d6de5995fe2fb79b321eec79e961eb7a5
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Apr 11 16:39:25 2013 -0500

Moved _POSIX_C_SOURCE def to compiler cmd line.

Details:
- Removed the define of _POSIX_C_SOURCE in bli_config.h (for both reference
and clarksville configurations) and added "-D_POSIX_C_SOURCE=200112L" to
the compiler command line arguments in make_defs.mk (for both configs).
Thanks to Devin Matthews for suggesting this change.

commit d43d1a0a2ef6de4bc57627566aef8e3fdb458b8c
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Apr 11 16:28:17 2013 -0500

Appended 'f2c_' to abs, min, max macros in f2c.h.

Details:
- Renamed abs, min, max, dmin, and dmax macros in bli_f2c.h so that they
would not conflict with anything defined by the user (or the language).
Thanks to Devin Matthews for suggesting this fix.
- Updated all instances of the above macros accordingly.

commit 31b100e7bf4aeaa4ceafefd2b6c3102d5fbc4cbb
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Apr 11 11:11:52 2013 -0500

Added new kernel blocksize macro aliases.

Details:
- Added new macros that alias level-3 cache and register blocksize macros
to names that can be constructed via the PASTEMAC macro. These aliased
macro definitions live inside bli_kernel_macro_defs.h, which is now
included after bli_kernel.h.
- Modified macro-kernels to use new aliased blocksize macros instead of
operation-specific ones.
- Removed local, operation-specific kernel blocksize macro definitions
(found in macro-kernel header files).

commit bd2b24ba65b36d7c07c5918a3838ce2ff57c4b48
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Apr 11 10:35:39 2013 -0500

Updated CREDITS file.

commit 79328c15410215737f3f14cd069328cf52aa11fd
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Apr 11 10:32:14 2013 -0500

Reverted testsuite object files' home to 'obj'.

Details:
- Removed 'obj' and 'lib' from .gitignore.
- Added testsuite/obj/.gitkeep (which is an empty file).
- Updated testsuite/Makefile accordingly.
- Thanks to Vernon Austel for pointing out the .gitkeep trick to tracking
empty directories in git.

commit 4afe3bfd82c03e1e97b58b7d250588a0d28541e5
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Apr 9 17:45:39 2013 -0500

Renamed/moved object scalar constant macros.

Details:
- Replaced scalar constant macro definitions in bli_const_defs.h with a single,
simplier macro in bli_obj_macro_defs.h.
- Updated invocations of old macros accordingly.
- Removed bli_const_defs.h.

commit 357893f5be5c56ab7b062874005e77e614b23f06
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Apr 9 14:48:15 2013 -0500

Applied fix from prev commit to gemmtrsm_?_ref_4x4

Details:
- Fixed hard-coded kernels in bli_gemmtrsm_l_ref_4x4.c and
bli_gemmtrsm_u_ref_4x4.c.

commit 54988e8dca44475610bcaee5a7bc1c40e8921402
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Apr 8 19:08:43 2013 -0500

Fixed a performance bug in trsm.

Details:
- Fixed a bug in the reference implementations of the gemmtrsm wrappers
(bli_gemmtrsm_l_ref_mxn.c and bli_gemmtrsm_u_ref_mxn.c) whereby the
reference gemm microkernel was hard-coded, and thus always called, even
when GEMM_UKERNEL was defined to point to an optimzied microkernel. This
manifested as artificially low trsm performance for all problem sizes, but
especially for small problem sizes as it only affected blocks of A that
intersected the diagonal. Thanks to Mike Kistler of IBM for helping me
find this bug.

commit a7252e40b5c351eef9a1df531ea0ef25cb5fb705
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Apr 8 16:08:22 2013 -0500

Generate testsuite objects 'src'.

Details:
- Tweaked the testsuite makefile so that object files are stored in 'src'
rather than 'obj', since (a) the top-level .gitignore dictates that
obj directories are to be ignored, and (b) since git has problems
tracking empty directories. Now, users do not need to create their own
obj directories within their own local clones of BLIS.

commit 803871c55b60d3c225ad9a0607fa507a9c16aab7
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Apr 8 15:18:42 2013 -0500

Minor formatting changes.

commit a571af816d72727e16cad37007e7043b9d6fa362
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Apr 8 15:00:13 2013 -0500

Fixed definition of bli_is_packed_object() macro.

Details:
- Changed the definition of bli_is_packed_object() so that it keys off of the
value of the pack schema bits in the info field of obj_t, rather than
comparing the obj_t buffer with that of the mem_t entry. This was the cause
of a very low probability bug whereby uninitialized memory caused the macro
to evaluate to TRUE even though the object in question was not packed.
Thanks to Vernon Austel of IBM for helping discover this bug.
- Changed an abort() in bli_packm_part() to a not-yet-implemented.

commit 3be14c32f735ecc6169d3ab6370cf8b69162acec
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Apr 6 12:54:45 2013 -0500

Updated information in testsuite output header.

Details:
- Added to the information that is echoed at the beginning of the test suite's
output, and also re-labeled some existing information.

commit 874707c1b183a4dd9a91dbfd4ea1522384c190df
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Apr 5 17:19:43 2013 -0500

Fixed edge case handling bug in herk macrokernels.

Details:
- Fixed a bug present in bli_herk_l_ker_var2() and bli_herk_u_ker_var2() that
only manifests when BLIS is configured such that MR != NR. The bug involves
incorrectly detecting edge cases, which resulted in some parts of matrix C
potentially being skipped and not updated, depending on the problem size.
- Updated the default values of MR and NR in config/reference/bli_kernel.h to
8 and 4, respectively, so that I can better stress the framework on a
day-to-day basis. (The fact that they were both equal to 4 for so long is
why I did not stumble upon this bug much sooner.)

commit 7cbda15291d3e01300e71c286b9657b7ef0708bf
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Apr 4 15:25:43 2013 -0500

Added reference microkernels for arbitrary MR, NR.

Details:
- Added a new set of reference gemm, gemmtrsm, and trsm micro-kernels that
contain explicit loops over MR and NR, thus allowing them to be used
unmodified by developers who want to build a reference library with
custom register blocksizes.
- Changed config/reference/bli_kernel.h to use above ukernels by default.
- Changed interfaces of new and existing gemm, gemmtrsm, and trsm micro-kernels
to use 'restrict' keyword.
- Added -funroll-loops option to config/reference/make_defs.mk.
- Updated comments in bli_kernel.h describing constraints on register and
cache blocksizes.
- Updated _adds_mxn.h, _copys_mxn.h, and _xpbys_mxn.h macros files so that
single-char macros are also defined.

commit 6684b73d5501f91d24a79e26655a42819c9b3114
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Apr 2 13:06:20 2013 -0500

Implemented amax operation and related changes.

Details:
- Implemented amax operation in BLIS.
- Activated BLAS2BLIS routine mapping for new amax BLIS implementation.
- Added integer support to [f]printv, [f]printm.
- Added integer support to level-0 copys macros.
- Updated printing of configuration information in test suite driver.
- Comment changes to _config.h files.
- Added comments to bla_dot.c to reminder reader what sdsdot()/dsdot() are
used for.

commit fb68087f8727cd5fd656a742a110e54fb1c91db9
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Mar 26 15:10:16 2013 -0500

More memory alignment-related tweaks.

Details:
- Renamed BLIS_MEMORY_ALIGNMENT_SIZE to BLIS_CONTIG_MEM_ALIGN_SIZE.
- Renamed BLIS_ENABLE_MEMORY_ALIGNMENT to BLIS_ENABLE_SYSTEM_MEM_ALIGN.
- Added BLIS_SYSTEM_MEM_ALIGN_SIZE, which controls only the alignment
passed into posix_memalign() or equivalent.
- Defined new function, bli_align_dim_to_cmem(), which applies the
contiguous memory alignment (rather than the system/malloc alignment).

commit 9682ef61dbf9a8846c8b0826d4de24bc216cd641
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Mar 26 14:14:53 2013 -0500

Always define memory alignment size cpp constant.

Details:
- Removed guard around define for memory alignment size constant.
Memory alignment should always be enabled, and so this value should
always be defined.

commit 3a787cccaae16531474f34398e3c0cf4f49b8cd8
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Mar 26 13:59:19 2013 -0500

Renamed memory alignment macro constant.

Details:
- Renamed all occurrences of BLIS_MEMORY_ALIGNMENT_BOUNDARY to
BLIS_MEMORY_ALIGNMENT_SIZE.

commit 37308f9a502b56d94fa52a7df71c676a46c3be3d
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Mar 26 12:43:14 2013 -0500

Align packed panel strides with system alignment.

Details:
- Pass panel strides through bli_align_dim_to_sys() to ensure that each
subsequent packed panel of A and B begins at an aligned address. (The
first panel is presumably aligned to system alignment because it is
aligned to a page boundary, which is typically much larger.)
- Rearranged code in packm_init_pack() to prevent additional conditional
blocks as a result of the aforementioned change.
- Adjusted contiguous memory allocator so that the system memory alignment
is used to allocate enough space for each block no matter what kind of
register blocking is used (even if register blocksize is unit and every
row/column needs maximal padding).
- Adjusted default blocksizes in reference configuration so that MC*KC
and KC*NC result in identical footprints for all datatypes.

commit 40a0654ada5f256beb3da80ebba015a3c71fb61f
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sun Mar 24 20:18:12 2013 -0500

CHANGELOG update.

0.0.5

Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sun Mar 24 20:01:49 2013 -0500

Migrated 'bl2' prefix to 'bli'.

Details:
- Changed all filename and function prefixes from 'bl2' to 'bli'.
- Changed the "blis2.h" header filename to "blis.h" and changed all
corresponding include statements accordingly.
- Fixed incorrect association for Fran in CREDITS file.

commit 132bffcef7441f32d02cc7485aef6a0648e0ef1e
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sun Mar 24 18:49:36 2013 -0500

Removed several 'old' directories and files.

Details:
- Removed most of the 'old' directories scattered throughout the framework,
which includes alternate/half-baked/broken implementations.

commit 551ea4767a3ea6c263f12aaca94bc2642cee4cfa
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sun Mar 24 18:00:10 2013 -0500

Removed include "blis2.h" from low-level headers.

Details:
- Removed include of "blis2.h" from various lower-level, operation-specific
header files throughout the framework. Given that these low-level headers
are included within blis2.h in a very specific order, include'ing blis2.h
within them directly is unnecessary.

commit bc7b318ed0960edeb4537797dd8c91de0d942ca9
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Mar 22 17:18:58 2013 -0500

Added cpp guards to conflicting libflame typedefs.

Details:
- Added cpp guards around the definitions of dim_t, scomplex, and dcomplex.
This is a temporary hack to allow interoperability with libflame. (Similarly
temporary changes are being made to libflame's type definitions file.)

commit f469907503fcdc24dff0174c569170e6e756e045
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Mar 22 15:20:15 2013 -0500

Renamed MAX_PREFETCH_BYTE_OFFSET to MAX_PRELOAD_.

Details:
- Renamed BLIS_MAX_PREFETCH_BYTE_OFFSET to
BLIS_MAX_PRELOAD_BYTE_OFFSET since "prefetch" is kind of a loaded word
(e.g. "prefetch" instructions, which are different than the particular
kind of prefetching/preloading referred to by this constant).

commit d1023bfbc6668a58a01ee4f82ded2319911e7b19
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Mar 22 15:09:59 2013 -0500

Removed build/old directory.

commit 718888849c48d99f83eea6b8f83bc1998cffef7e
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Mar 22 15:07:01 2013 -0500

Deprecated 'flame' configuration.

Details:
- Removed 'flame' configuration, as it was horribly out-of-date.
- Comment changes to bl2_blocksize.c and bl2_mem.c.

commit bba38cf4e9d28058c14483f44fa074a6d2852ad9
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Mar 19 18:07:40 2013 -0500

Added missing conjbeta argument to scald.

commit 1f82b51d06d0279dded3f2b87ba59403f3ed0af6
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Mar 18 15:37:20 2013 -0500

Relocated packed mem_t dimension fields to obj_t.

Details:
- Removed the m and n (and elem_size) fields from the mem_t object, and added
m_packed and n_packed fields to obj_t. These new fields track the same as
the old ones. From an abstraction standpoint, it seemed awkward to store
those dimensions inside the mem_t.
- Updated interfaces to bl2_mem_acquire_*() so that only a byte size argument
is passed in, instead of m, n, and elem_size.
- Updated bl2_packm_init_pack() and bl2_packv_init_pack() to inline the
functionality of bl2_mem_alloc_update_m() and bl2_mem_alloc_update_v(),
respectively.
- Updated packm variants to access the packed length and width fields from
their new locations.

commit 36c782857bf9b8ac1b1dac47a70f689a4407e2cc
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Mar 18 10:37:03 2013 -0500

CHANGELOG update.

0.0.4

Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Mar 15 17:12:36 2013 -0500

Re-implemented contiguous memory allocator.

Details:
- Completely re-wrote the contiguous memory allocator (bl2_mem.c). The new
allocator instantiates and initializes three separate memory pool objects,
each one associated with a separate array of contiguous memory blocks, each
block of fixed and uniform size. (The three pools are for allocating mc-by-kc
blocks of A, kc-by-nc panels of B, and mc-by-nc panels of C.) The pool
objects use a stack structure internally to track which blocks in the region
have been "checked out" to a thread and which are still available. Critical
regions are now clearly marked and adaptable to parallel environments (e.g.
OpenMP). Memory pools are set up when bl2_init() is called.
- Added a new field to the packm control tree node, which indicates what kind
of packed buffer is being allocated. The enumerated type for this argument
is defined as packbuf_t in bl2_type_defs.h.
- Updated level-3 _cntl.c files to pass in the appropriate value for a new
packbuf_t argument to bl2_packm_cntl_obj_create().
- Moved some macros called by packm_init_pack() from bl2_obj_macro_defs.h to
bl2_mem_macro_defs.h.
- Added BLIS_MAX_NUM_THREADS to bl2_config.h, which we use as the default
number of blocks of A reserved for the memory allocator.
- Deprecated bl2_align_dim(). Replaced usage with that of
bl2_align_dim_to_mult(). Turns out that typically we don't need to align
a dimension to the system alignment, since that value has to do with
starting addresses, whereas the values we are dealing with are unitless
dimensions.

commit 1e76cae00cb0a04544aaae1ade878686b238d283
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Mar 15 12:21:42 2013 -0500

Perform her2k var1 loops in sequence.

Details:
- Changed variant 1 of her2k so that the two rank-k products are computed
and accumulated in sequence rather than fused into one loop. This is
necessary if BLIS is to be configured to provide only enough contiguous
memory for one panel of B.

commit c95c270eba91ae4efc26603beddfd0292caa919b
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Mar 7 14:42:15 2013 -0600

Enhanced tracking of dimensions for mem_t objects.

Details:
- Added new fields to mem_t struct definition to track the allocated (as
opposed to the currently used) dimensions of the memory region. This
allows packm_init() to be more robust in situations where memory is
already allocated but is more than needed for the current packing job.
- Updated logic in bl2_obj_set_buffer_with_cached_packm_mem() macro, used
in packm_init(), to update the "currently used" dimensions of the mem_t
object if the requested dimensions are smaller than the allocated
dimensions.

commit e99281a0f41d482fddeffa239bfc8e13e6d13d4b
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Mar 7 14:00:10 2013 -0600

Fixed test suite flop formulas for ops with side.

Details:
- Fixed incorrect flop counts in test suite modules for hemm, symm, trmm,
trmm3, and trsm.
- Comment updates in herk macro-kernels.

commit ef8cbfc44dd620fdcbdb51cdb173217194bebe31
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Mar 2 12:47:06 2013 -0600

Added "version" to .gitignore.

Details:
- Added "version" to .gitignore file so that the file does not show up when
running 'git status', or accidentally get pulled into the index when
running 'git add' or 'git add --all'.

commit e9e0747c2f6c178f53ac46ab794acbb7b8c4fea8
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Mar 2 12:43:54 2013 -0600

Removed version file from version control.

Details:
- Removed version file from version control to prevent git errors that occur
when trying to pull new commits.

commit bb612f864e9c17dd9805e9446840f02259619469
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Mar 1 12:55:42 2013 -0600

Updated behavior of bl2_obj_induce_trans() macro.

Details:
- Changed bl2_obj_induce_trans() so that the transposition bit is no longer
updated as part of the macro. All current uses of the macro have been
coupled with instances of bl2_obj_set_trans() to clear the bit.
- Added Jed to CREDITS file.

commit f24e29b789e7314764a818ceb3063126936c986f
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Feb 22 18:15:41 2013 -0600

Replaced banded/packed BLAS2 stubs with f2c code.

Details:
- Retired the blas2blis wrappers that simply called abort with a "not yet
implemented" message. This includes all of the level-2 banded and packed
routines.
- Replaced the aforementioned with the corresponding netlib implementations
having been run through f2c (with some customization).
- Added directories named 'attic' to build/gen-make-frags/ignore_list.

commit 1454c1a14207766dfed372b8e38b47fa384f5198
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Feb 22 12:38:45 2013 -0600

Moved Fortran name-mangling macro to bl2_config.h.

Details:
- Moved the Fortran-77 name-mangling macros from bl2_blas_macro_defs.h to the
configuration directory (bl2_config.h, specifically) given that it can be
expected to be tweaked by some developers.

0.0.3

Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Feb 22 12:11:24 2013 -0600

Implemented blas2blis compatibility layer.

Details:
- Added the blas2blis compatibility layer, located in frame/compat. This
includes virtually all of the BLAS, including banded and packed level-2
operations.

- Defined bl2_init_safe(), bl2_finalize_safe(). The former allows a conditional
initialization, which stores the "exit status" in an err_t, which is then
read by the latter function to determine whether finalization should actually
take place.
- Added calls to bl2_init_safe(), bl2_finalize_safe() to all level-2 and
level-3 BLAS-like wrappers.
- Added configuration option to instruct BLIS to remain initialized whenever
it automatically initializes itself (via bl2_init_safe()), until/unless the
application code explicitly calls bl2_finalize().

- Added INSERT_GENTFUNC* and INSERT_GENTPROT* macros to facilitate type
templatization of blas2blis wrappers.
- Defined level-0 scalar macro bl2_??swaps().
- Defined level-1v operation bl2_swapv().
- Defined some "Fortran" types to bl2_type_defs.h for use with BLAS
wrappers.

commit 995edf43e21c1868732dbdd7fee14b08730218bd
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Feb 21 14:30:50 2013 -0600

Updated version file. (Forgot to in prev commit).

commit e823b08aaf7b65ecc6ddc30570709ea8a4b52aa7
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Feb 21 12:00:17 2013 -0600

Fixed some scalar types in BLAS-like Herm APIs.

Details:
- Some of the scalars of Hermitian operations, such as alpha in her,
alpha and beta in herk, and beta in her2k, need to be real. These
arguments were typed incorrectly as the complex types. This has been
fixed. Note the issue was only present in the BLAS-like APIs for
these operations (not the native object-based interfaces).

commit 5ece050a669e74ba4a711d1d4669239d22d45642
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Feb 20 15:50:54 2013 -0600

Updated version file. (Forgot to in prev commit).

commit f243034b8b430d4684680ea8eddfd246e73fefc0
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Feb 20 14:11:36 2013 -0600

Changed API of packm_init_pack() to use blksz_t.

Details:
- Changed the interface of packm_init_pack() so that mult_m and mult_n
are passed in as type blksz_t* instead of dim_t.
- Make similar change for packv_init_pack().

commit da0c22f24107be9f33e0ea2dae52e5534b1fd0e5
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Feb 15 09:59:48 2013 -0600

Minor changes to lower levels of scalm and setm.

Details:
- Removed diagx parameter from lower-level interfaces of scalm.
- Modified scalm_basic_check() to expect an object with a nonunit diagonal.
- Changed setm_unb_var1() so that having an implicit unit diagonal results
in only the strictly lower or upper triangle of the matrix being modified.

commit 2c836adadcd2a7d7f217033ac4d7fcad03d5bd55
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Feb 14 10:42:56 2013 -0600

Updated beta == zero semantics of mulsc.

Details:
- Updated beta == zero semantics of mulsc. Hopefully this is the last
operation that needed updating.
- Added Devin to CREDITS file.

commit 722b66c7dcaaaa1b109e7c8b1d53fd71a9af8240
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Feb 14 10:18:00 2013 -0600

Removed some calls to setv() in test modules.

Details:
- Removed calls to setv() in test modules whose sole purpose was to
initialize vectors to zero to ensure that nan's and inf's would not
taint the computation. Now that beta == zero semantics have been
updated to clear the output operand (when beta is zero), rather than
multiply against it, these setv() calls are no longer needed.

commit e6ac623a902f776c42f85eadbf76996d9770a0db
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Feb 13 18:44:59 2013 -0600

Properly implemented beta == 0 semantics.

Details:
- Changed name of set0 and set0_mxn macros to set0s and set0s_mxn,
respectively.
- Added code to the following operations that sets the output operand to
zero if the corresponding scalar is zero (rather than performing the
floating-point multiply, or in the case of setv, copying the value).
This will prevent nan's and inf's from creeping into results from
uninitialized memory.
- axpy
- dotxv
- scalv
- scal2v
- setv
- gemv
- ger
- hemv
- her
- her2
- gemm reference ukernels

commit aedccbc85d491e41711a0c6eb0d246d8700a199a
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Feb 13 18:29:53 2013 -0600

Fixed stale interface to packm_unb_var1().

Details:
- Removed the control tree from the interface to packm_unb_var1(), which
I meant to do when it was un-deprecated.

commit c23135669f7a8a545e2e11ef559bf284be8bc65c
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Feb 13 13:21:00 2013 -0600

Un-deprecated packm_unb_var1.c (needed by l2 ops).

Details:
- Added bl2_packm_unb_var1() back into the mix once I realized that level-2
operations still need this routine for packing matrices. Now, whether
level-2 operations should be packing matrices to begin with is another
matter. But this fixes the segmentation fault one would have gotten when
running bl2_gemv() on a general stride matrix.

commit cf49e35f9819f9d93ebdca4703ade5abab28f6f6
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Feb 12 18:39:35 2013 -0600

Removed cntl tree usage from packm implementation.

Details:
- Added new fields to obj_t info field:
- invert_diag
- pack_order_if_upper
- pack_order_if_lower
These fields allow packm_init() to embed information that begins
in the control tree into the object so that the packm implementation
does not need to use control trees at all. This is being done to aid
Bryan's DxT code generation.
- Added macros that operate on above fields.
- Changed packm_init(), packm_blk_var2(), and packm_blk_var3() according
to above changes.
- Made similar (but much simpler) changes to packv.
- Deprecated packm_blk_var1(), packm_unb_var1(), and packm_densify().
These were part of prototype implementations and are no longer needed.

commit eb139ae256651af7820b93ef982626180195b87f
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Feb 12 12:39:30 2013 -0600

Replaced bl2_abs() with _fabs() where appropriate.

commit 474bac30c99928f9e87315972bcb45c632c0b7ec
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Feb 12 12:23:48 2013 -0600

Removed level-0 macros projrs, grabis.

Details:
- Replaced instances of projrs and grabis macros with newer,
more general-purpose getris.

commit 03a260a457c8964e4603a655cee0d40ac17affba
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Feb 12 11:45:34 2013 -0600

Restored executable permissions to scripts.

Details:
- Restored executable (0755) permissions to scripts that were touched by
the recursive sed script that updated the copyright headers in the
previous commit.

commit 1274e1243775e5e705114257a43176f63635227f
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Feb 11 14:37:47 2013 -0600

Updated copyright headers from 2012 to 2013.

commit 3b620cc8e90c53c79129bd9dd89ae6b77c2446f1
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Feb 11 13:38:07 2013 -0600

CHANGELOG update.

Page 6 of 7

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.