ETYPE
,
the element type of a vector
byte
, long
, or float
.
Each lane contains an independent value of the element type.
Operations on vectors are typically
lane-wise,
distributing some scalar operator (such as
addition)
across the lanes of the participating vectors,
usually generating a vector result whose lanes contain the various
scalar results. When run on a supporting platform, lane-wise
operations can be executed in parallel by the hardware. This style
of parallelism is called Single Instruction Multiple Data
(SIMD) parallelism.
In the SIMD style of programming, most of the operations within
a vector lane are unconditional, but the effect of conditional
execution may be achieved using
masked operations
such as blend()
,
under the control of an associated VectorMask
.
Data motion other than strictly lane-wise flow is achieved using
cross-lane
operations, often under the control of an associated
VectorShuffle
.
Lane data and/or whole vectors can be reformatted using various
kinds of lane-wise
conversions,
and byte-wise reformatting
reinterpretations,
often under the control of a reflective VectorSpecies
object which selects an alternative vector format different
from that of the input vector.
Vector<E>
declares a set of vector operations (methods)
that are common to all element types. These common operations
include generic access to lane values, data selection and movement,
reformatting, and certain arithmetic and logical operations (such as addition
or comparison) that are common to all primitive types.
Public subtypes of Vector
correspond to specific
element types. These declare further operations that are specific
to that element type, including unboxed access to lane values,
bitwise operations on values of integral element types, or
transcendental operations on values of floating point element
types.
Some lane-wise operations, such as the add
operator, are defined as
a full-service named operation, where a corresponding method on Vector
comes in masked and unmasked overloadings, and (in subclasses) also comes in
covariant overrides (returning the subclass) and additional scalar-broadcast
overloadings (both masked and unmasked).
Other lane-wise operations, such as the min
operator, are defined as a
partially serviced (not a full-service) named operation, where a corresponding
method on Vector
and/or a subclass provide some but all possible
overloadings and overrides (commonly the unmasked variant with scalar-broadcast
overloadings).
Finally, all lane-wise operations (those named as previously described,
or otherwise unnamed method-wise) have a corresponding
operator token
declared as a static constant on VectorOperators
.
Each operator token defines a symbolic Java expression for the operation,
such as a + b
for the
ADD
operator token.
General lane-wise operation-token accepting methods, such as for a
unary lane-wise
operation, are provided on Vector
and come in the same variants as
a full-service named operation.
This package contains a public subtype of Vector
corresponding to each supported element type:
ByteVector
, ShortVector
,
IntVector
, LongVector
,
FloatVector
, and DoubleVector
.
The element type of a vector,
referred to as ETYPE
, is one of the primitive types
byte
, short
, int
, long
, float
, or double
.
The type E
in Vector<E>
is the boxed version
of ETYPE
. For example, in the type Vector<Integer>
, the E
parameter is Integer
and the ETYPE
is int
. In such a
vector, each lane carries a primitive int
value. This pattern continues
for the other primitive types as well. (See also sections 5.1.7 and
5.1.8 of the The Java Language Specification.)
The length of a vector
is the lane count, the number of lanes it contains.
This number is also called VLENGTH
when the context makes
clear which vector it belongs to. Each vector has its own fixed
VLENGTH
but different instances of vectors may have
different lengths. VLENGTH
is an important number, because
it estimates the SIMD performance gain of a single vector operation
as compared to scalar execution of the VLENGTH
scalar
operators which underly the vector operation.
VSHAPE
. Each possible VSHAPE
is represented by
a member of the VectorShape
enumeration, and represents
an implementation format shared in common by all vectors of
that shape. Thus, the size in bits of
of a vector is determined by appealing to its vector shape.
Some Java platforms give special support to only one shape, while others support several. A typical platform is not likely to support all the shapes described by this API. For this reason, most vector operations work on a single input shape and produce the same shape on output. Operations which change shape are clearly documented as such shape-changing, while the majority of operations are shape-invariant, to avoid disadvantaging platforms which support only one shape. There are queries to discover, for the current Java platform, the preferred shape for general SIMD computation, or the largest available shape for any given lane type. To be portable, code using this API should start by querying a supported shape, and then process all data with shape-invariant operations, within the selected shape.
Each unique combination of element type and vector shape
determines a unique
vector species.
A vector species is represented by a fixed instance of
VectorSpecies<E>
shared in common by all vectors of the same shape and
ETYPE
.
Unless otherwise documented, lane-wise vector operations
require that all vector inputs have exactly the same VSHAPE
and VLENGTH
, which is to say that they must have exactly
the same species. This allows corresponding lanes to be paired
unambiguously. The check()
method
provides an easy way to perform this check explicitly.
Vector shape, VLENGTH
, and ETYPE
are all
mutually constrained, so that VLENGTH
times the
bit-size of each lane
must always match the bit-size of the vector's shape.
Thus, reinterpreting a
vector may double its length if and only if it either halves the lane size,
or else changes the shape. Likewise, reinterpreting a vector may double the
lane size if and only if it either halves the length, or else changes the
shape of the vector.
ByteVector
, ShortVector
,
IntVector
, LongVector
, FloatVector
, and
DoubleVector
. Along with type-specific operations these classes
support creation of vector values (instances of Vector).
They expose static constants corresponding to the supported species,
and static methods on these types generally take a species as a parameter.
For example,
FloatVector.fromArray
creates and returns a float vector of the specified species, with elements
loaded from the specified float array.
It is recommended that Species instances be held in static final
fields for optimal creation and usage of Vector values by the runtime compiler.
As an example of static constants defined by the typed vector classes,
constant FloatVector.
is the unique species whose lanes are float
s and whose
vector size is 256 bits. Again, the constant
FloatVector#SPECIES_PREFERRED
is the species which
best supports processing of float
vector lanes on
the currently running Java platform.
As another example, a broadcast scalar value of
(double)0.5
can be obtained by calling
DoubleVector.
, but the argument dsp
is
required to select the species (and hence the shape and length) of
the resulting vector.
float
and
shape S_256_BIT
has eight lanes, since 32*8=256
.
Most operations on vectors are lane-wise, which means the operation
is composed of an underlying scalar operator, which is repeated for
each distinct lane of the input vector. If there are additional
vector arguments of the same type, their lanes are aligned with the
lanes of the first input vector. (They must all have a common
VLENGTH
.) For most lane-wise operations, the output resulting
from a lane-wise operation will have a VLENGTH
which is equal to
the VLENGTH
of the input(s) to the operation. Thus, such lane-wise
operations are length-invariant, in their basic definitions.
The principle of length-invariance is combined with another
basic principle, that most length-invariant lane-wise operations are also
shape-invariant, meaning that the inputs and the output of
a lane-wise operation will have a common VSHAPE
. When the
principles conflict, because a logical result (with an invariant
VLENGTH
), does not fit into the invariant VSHAPE
,
the resulting expansions and contractions are handled explicitly
with
special conventions.
Vector operations can be grouped into various categories and
their behavior can be generally specified in terms of underlying
scalar operators. In the examples below, ETYPE
is the
element type of the operation (such as int.class
) and
EVector
is the corresponding concrete vector type (such as
IntVector.class
).
w = v0.
neg
()
,
takes one input vector,
distributing a unary scalar operator across the lanes,
and produces a result vector of the same type and shape.
For each lane of the input vector a
,
the underlying scalar operator is applied to the lane value.
The result is placed into the vector result in the same lane.
The following pseudocode illustrates the behavior of this operation
category:
ETYPE scalar_unary_op(ETYPE s);
EVector a = ...;
VectorSpecies<E> species = a.species();
ETYPE[] ar = new ETYPE[a.length()];
for (int i = 0; i < ar.length; i++) {
ar[i] = scalar_unary_op(a.lane(i));
}
EVector r = EVector.fromArray(species, ar, 0);
w = v0.
add
(v1)
,
takes two input vectors,
distributing a binary scalar operator across the lanes,
and produces a result vector of the same type and shape.
For each lane of the two input vectors a
and b
,
the underlying scalar operator is applied to the lane values.
The result is placed into the vector result in the same lane.
The following pseudocode illustrates the behavior of this operation
category:
ETYPE scalar_binary_op(ETYPE s, ETYPE t);
EVector a = ...;
VectorSpecies<E> species = a.species();
EVector b = ...;
b.check(species); // must have same species
ETYPE[] ar = new ETYPE[a.length()];
for (int i = 0; i < ar.length; i++) {
ar[i] = scalar_binary_op(a.lane(i), b.lane(i));
}
EVector r = EVector.fromArray(species, ar, 0);
N
input vectors v[j]
,
distributing an n-ary scalar operator across the lanes,
and produces a result vector of the same type and shape.
Except for a few ternary operations, such as
w = v0.
fma
(v1,v2)
,
this API has no support for
lane-wise n-ary operations.
For each lane of all of the input vectors v[j]
,
the underlying scalar operator is applied to the lane values.
The result is placed into the vector result in the same lane.
The following pseudocode illustrates the behavior of this operation
category:
ETYPE scalar_nary_op(ETYPE... args);
EVector[] v = ...;
int N = v.length;
VectorSpecies<E> species = v[0].species();
for (EVector arg : v) {
arg.check(species); // all must have same species
}
ETYPE[] ar = new ETYPE[a.length()];
for (int i = 0; i < ar.length; i++) {
ETYPE[] args = new ETYPE[N];
for (int j = 0; j < N; j++) {
args[j] = v[j].lane(i);
}
ar[i] = scalar_nary_op(args);
}
EVector r = EVector.fromArray(species, ar, 0);
w0 = v0.
convert
(VectorOperators.I2D, 0)
,
takes one input vector,
distributing a unary scalar conversion operator across the lanes,
and produces a logical result of the converted values. The logical
result (or at least a part of it) is presented in a vector of the
same shape as the input vector.
Unlike other lane-wise operations, conversions can change lane
type, from the input (domain) type to the output (range) type. The
lane size may change along with the type. In order to manage the
size changes, lane-wise conversion methods can product partial
results, under the control of a part
parameter, which
is explained elsewhere.
(Following the example above, the second group of converted lane
values could be obtained as
w1 = v0.convert(VectorOperators.I2D, 1)
.)
The following pseudocode illustrates the behavior of this
operation category in the specific example of a conversion from
int
to double
, retaining either lower or upper
lanes (depending on part
) to maintain shape-invariance:
IntVector a = ...;
int VLENGTH = a.length();
int part = ...; // 0 or 1
VectorShape VSHAPE = a.shape();
double[] arlogical = new double[VLENGTH];
for (int i = 0; i < limit; i++) {
int e = a.lane(i);
arlogical[i] = (double) e;
}
VectorSpecies<Double> rs = VSHAPE.withLanes(double.class);
int M = Double.BITS / Integer.BITS; // expansion factor
int offset = part * (VLENGTH / M);
DoubleVector r = DoubleVector.fromArray(rs, arlogical, offset);
assert r.length() == VLENGTH / M;
e = v0.
reduceLanes
(VectorOperators.ADD)
,
operates on all
the lane elements of an input vector.
An accumulation function is applied to all the
lane elements to produce a scalar result.
If the reduction operation is associative then the result may be accumulated
by operating on the lane elements in any order using a specified associative
scalar binary operation and identity value. Otherwise, the reduction
operation specifies the order of accumulation.
The following pseudocode illustrates the behavior of this operation category
if it is associative:
ETYPE assoc_scalar_binary_op(ETYPE s, ETYPE t);
EVector a = ...;
ETYPE r = <identity value>;
for (int i = 0; i < a.length(); i++) {
r = assoc_scalar_binary_op(r, a.lane(i));
}
w = v0.
rearrange
(shuffle)
operates on all
the lane elements of an input vector and moves them
in a data-dependent manner into different lanes
in an output vector.
The movement is steered by an auxiliary datum, such as
a VectorShuffle
or a scalar index defining the
origin of the movement.
The following pseudocode illustrates the behavior of this
operation category, in the case of a shuffle:
EVector a = ...;
Shuffle<E> s = ...;
ETYPE[] ar = new ETYPE[a.length()];
for (int i = 0; i < ar.length; i++) {
int source = s.laneSource(i);
ar[i] = a.lane(source);
}
EVector r = EVector.fromArray(a.species(), ar, 0);
VectorMask
argument.
In lanes the mask is set, the operation behaves as if the mask
argument were absent, but in lanes where the mask is unset, the
underlying scalar operation is suppressed.
Masked operations are explained in
greater detail elsewhere.
a
and b
, selecting lane
values from one input or the other depending on a mask m
.
In lanes where m
is set, the corresponding value from
b
is selected into the result; otherwise the value from
a
is selected. Thus, a blend acts as a vectorized version
of Java's ternary selection expression m?b:a
:
ETYPE[] ar = new ETYPE[a.length()];
for (int i = 0; i < ar.length; i++) {
boolean isSet = m.laneIsSet(i);
ar[i] = isSet ? b.lane(i) : a.lane(i);
}
EVector r = EVector.fromArray(species, ar, 0);
m = v0.
lt
(v1)
,
takes two input vectors,
distributing a binary scalar comparison across the lanes,
and produces, not a vector of booleans, but rather a
vector mask.
For each lane of the two input vectors a
and b
,
the underlying scalar comparison operator is applied to the lane values.
The resulting boolean is placed into the vector mask result in the same lane.
The following pseudocode illustrates the behavior of this operation
category:
boolean scalar_binary_test_op(ETYPE s, ETYPE t);
EVector a = ...;
VectorSpecies<E> species = a.species();
EVector b = ...;
b.check(species); // must have same species
boolean[] mr = new boolean[a.length()];
for (int i = 0; i < mr.length; i++) {
mr[i] = scalar_binary_test_op(a.lane(i), b.lane(i));
}
VectorMask<E> m = VectorMask.fromArray(species, mr, 0);
m = v0.
test
(IS_FINITE)
,
takes one input vector, distributing a scalar predicate
(a test function) across the lanes, and produces a
vector mask.
If a vector operation does not belong to one of the above categories then the method documentation explicitly specifies how it processes the lanes of input vectors, and where appropriate illustrates the behavior using pseudocode.
Most lane-wise binary and comparison operations offer convenience
overloadings which accept a scalar as the second input, in place of a
vector. In this case the scalar value is promoted to a vector by
broadcasting it
into the same lane structure as the first input.
For example, to multiply all lanes of a double
vector by
a scalar value 1.1
, the expression v.mul(1.1)
is
easier to work with than an equivalent expression with an explicit
broadcast operation, such as v.mul(v.broadcast(1.1))
or v.mul(DoubleVector.broadcast(v.species(), 1.1))
.
Unless otherwise specified the scalar variant always behaves as if
each scalar value is first transformed to a vector of the same
species as the first vector input, using the appropriate
broadcast
operation.
Many vector operations accept an optional
mask
argument, selecting which lanes participate
in the underlying scalar operator. If present, the mask argument
appears at the end of the method argument list.
Each lane of the mask argument is a boolean which is either in the set or unset state. For lanes where the mask argument is unset, the underlying scalar operator is suppressed. In this way, masks allow vector operations to emulate scalar control flow operations, without losing SIMD parallelism, except where the mask lane is unset.
An operation suppressed by a mask will never cause an exception or side effect of any sort, even if the underlying scalar operator can potentially do so. For example, an unset lane that seems to access an out of bounds array element or divide an integral value by zero will simply be ignored. Values in suppressed lanes never participate or appear in the result of the overall operation.
Result lanes corresponding to a suppressed operation will be filled with a default value which depends on the specific operation, as follows:
slice()
from
another vector, suppressed lanes are not loaded, and are filled
with the default value for the ETYPE
, which in every case
consists of all zero bits. An unset lane can never cause an
exception, even if the hypothetical corresponding memory location
does not exist (because it is out of an array's index range).VectorShuffle
or
Vector
, suppressed lanes are not computed, and are filled
with the zero default value. Normally, invalid lane indexes elicit
an IndexOutOfBoundsException
, but if a lane is unset, the
zero value is quietly substituted, regardless of the index. This
rule is similar to the previous rule, for masked memory loads.unslice()
into
another vector, suppressed lanes are not stored, and the
corresponding memory or vector locations (if any) are unchanged.
(Note: Memory effects such as race conditions never occur for suppressed lanes. That is, implementations will not secretly re-write the existing value for unset lanes. In the Java Memory Model, reassigning a memory variable to its current value is not a no-op; it may quietly undo a racing store from another thread.)
false
regardless
of the suppressed input values. In effect, it is as if the
comparison operation were performed unmasked, and then the
result intersected with the controlling mask. As an example, a masked binary operation on two input vectors
a
and b
suppresses the binary operation for lanes
where the mask is unset, and retains the original lane value from
a
. The following pseudocode illustrates this behavior:
ETYPE scalar_binary_op(ETYPE s, ETYPE t);
EVector a = ...;
VectorSpecies<E> species = a.species();
EVector b = ...;
b.check(species); // must have same species
VectorMask<E> m = ...;
m.check(species); // must have same species
boolean[] ar = new boolean[a.length()];
for (int i = 0; i < ar.length; i++) {
if (m.laneIsSet(i)) {
ar[i] = scalar_binary_op(a.lane(i), b.lane(i));
} else {
ar[i] = a.lane(i); // from first input
}
}
EVector r = EVector.fromArray(species, ar, 0);
VLENGTH
.
It is useful to consider vector lanes as ordered
sequentially from first to last, with the first lane
numbered 0
, the next lane numbered 1
, and so on to
the last lane numbered VLENGTH-1
. This is a temporal
order, where lower-numbered lanes are considered earlier than
higher-numbered (later) lanes. This API uses these terms
in preference to spatial terms such as "left", "right", "high",
and "low".
Temporal terminology works well for vectors because they (usually) represent small fixed-sized segments in a long sequence of workload elements, where the workload is conceptually traversed in time order from beginning to end. (This is a mental model: it does not exclude multicore divide-and-conquer techniques.) Thus, when a scalar loop is transformed into a vector loop, adjacent scalar items (one earlier, one later) in the workload end up as adjacent lanes in a single vector (again, one earlier, one later). At a vector boundary, the last lane item in the earlier vector is adjacent to (and just before) the first lane item in the immediately following vector.
Vectors are also sometimes thought of in spatial terms, where the first lane is placed at an edge of some virtual paper, and subsequent lanes are presented in order next to it. When using spatial terms, all directions are equally plausible: Some vector notations present lanes from left to right, and others from right to left; still others present from top to bottom or vice versa. Using the language of time (before, after, first, last) instead of space (left, right, high, low) is often more likely to avoid misunderstandings.
As second reason to prefer temporal to spatial language about vector lanes is the fact that the terms "left", "right", "high" and "low" are widely used to describe the relations between bits in scalar values. The leftmost or highest bit in a given type is likely to be a sign bit, while the rightmost or lowest bit is likely to be the arithmetically least significant, and so on. Applying these terms to vector lanes risks confusion, however, because it is relatively rare to find algorithms where, given two adjacent vector lanes, one lane is somehow more arithmetically significant than its neighbor, and even in those cases, there is no general way to know which neighbor is the more significant.
Putting the terms together, we view the information structure of a vector as a temporal sequence of lanes ("first", "next", "earlier", "later", "last", etc.) of bit-strings which are internally ordered spatially (either "low" to "high" or "right" to "left"). The primitive values in the lanes are decoded from these bit-strings, in the usual way. Most vector operations, like most Java scalar operators, treat primitive values as atomic values, but some operations reveal the internal bit-string structure.
When a vector is loaded from or stored into memory, the order of vector lanes is always consistent with the inherent ordering of the memory container. This is true whether or not individual lane elements are subject to "byte swapping" due to details of byte order. Thus, while the scalar lane elements of vector might be "byte swapped", the lanes themselves are never reordered, except by an explicit method call that performs cross-lane reordering.
When vector lane values are stored to Java variables of the same type, byte swapping is performed if and only if the implementation of the vector hardware requires such swapping. It is therefore unconditional and invisible.
As a useful fiction, this API presents a consistent illusion that vector lane bytes are composed into larger lane scalars in little endian order. This means that storing a vector into a Java byte array will reveal the successive bytes of the vector lane values in little-endian order on all platforms, regardless of native memory order, and also regardless of byte order (if any) within vector unit registers.
This hypothetical little-endian ordering also appears when a
reinterpretation cast is
applied in such a way that lane boundaries are discarded and
redrawn differently, while maintaining vector bits unchanged. In
such an operation, two adjacent lanes will contribute bytes to a
single new lane (or vice versa), and the sequential order of the
two lanes will determine the arithmetic order of the bytes in the
single lane. In this case, the little-endian convention provides
portable results, so that on all platforms earlier lanes tend to
contribute lower (rightward) bits, and later lanes tend to
contribute higher (leftward) bits. The reinterpretation casts between ByteVector
s and the
other non-byte vectors use this convention to clarify their
portable semantics.
The little-endian fiction for relating lane order to per-lane byte order is slightly preferable to an equivalent big-endian fiction, because some related formulas are much simpler, specifically those which renumber bytes after lane structure changes. The earliest byte is invariantly earliest across all lane structure changes, but only if little-endian convention are used. The root cause of this is that bytes in scalars are numbered from the least significant (rightmost) to the most significant (leftmost), and almost never vice-versa. If we habitually numbered sign bits as zero (as on some computers) then this API would reach for big-endian fictions to create unified addressing of vector bytes.
java.lang.foreign.MemorySegment
.
Byte order for lane storage is chosen such that the stored vector values can be read or written as single primitive values, within the array or segment that holds the vector, producing the same values as the lane-wise values within the vector. This fact is independent of the convenient fiction that lane values inside of vectors are stored in little-endian order.
For example,
FloatVector.
creates and returns a float vector of some particular species fsp
,
with elements loaded from some float array fa
.
The first lane is loaded from fa[i]
and the last lane
is initialized loaded from fa[i+VL-1]
, where VL
is the length of the vector as derived from the species fsp
.
Then, fv=fv.
will produce another float vector of that species fsp
,
given a vector fv2
of the same species fsp
.
Next, mnz=fv.
tests whether the result is zero,
yielding a mask mnz
. The non-zero lanes (and only those
lanes) can then be stored back into the original array elements
using the statement
fv.
.
As a basic principle, lane-wise operations are
length-invariant, unless clearly marked otherwise.
Length-invariance simply means that
if VLENGTH
lanes go into an operation, the same number
of lanes come out, with nothing discarded and no extra padding.
As a second principle, sometimes in tension with the first,
lane-wise operations are also shape-invariant, unless
clearly marked otherwise.
Shape-invariance means that VSHAPE
is constant for typical
computations. Keeping the same shape throughout a computation
helps ensure that scarce vector resources are efficiently used.
(On some hardware platforms shape changes could cause unwanted
effects like extra data movement instructions, round trips through
memory, or pipeline bubbles.)
Tension between these principles arises when an operation
produces a logical result that is too large for the
required output VSHAPE
. In other cases, when a logical
result is smaller than the capacity of the output VSHAPE
,
the positioning of the logical result is open to question, since
the physical output vector must contain a mix of logical result and
padding.
In the first case, of a too-large logical result being crammed
into a too-small output VSHAPE
, we say that data has
expanded. In other words, an expansion operation
has caused the output shape to overflow. Symmetrically, in the
second case of a small logical result fitting into a roomy output
VSHAPE
, the data has contracted, and the
contraction operation has required the output shape to pad
itself with extra zero lanes.
In both cases we can speak of a parameter M
which
measures the expansion ratio or contraction ratio
between the logical result size (in bits) and the bit-size of the
actual output shape. When vector shapes are changed, and lane
sizes are not, M
is just the integral ratio of the output
shape to the logical result. (With the possible exception of
the maximum shape, all vector
sizes are powers of two, and so the ratio M
is always
an integer. In the hypothetical case of a non-integral ratio,
the value M
would be rounded up to the next integer,
and then the same general considerations would apply.)
If the logical result is larger than the physical output shape,
such a shape change must inevitably drop result lanes (all but
1/M
of the logical result). If the logical size is smaller
than the output, the shape change must introduce zero-filled lanes
of padding (all but 1/M
of the physical output). The first
case, with dropped lanes, is an expansion, while the second, with
padding lanes added, is a contraction.
Similarly, consider a lane-wise conversion operation which
leaves the shape invariant but changes the lane size by a ratio of
M
. If the logical result is larger than the output (or
input), this conversion must reduce the VLENGTH
lanes of the
output by M
, dropping all but 1/M
of the logical
result lanes. As before, the dropping of lanes is the hallmark of
an expansion. A lane-wise operation which contracts lane size by a
ratio of M
must increase the VLENGTH
by the same
factor M
, filling the extra lanes with a zero padding
value; because padding must be added this is a contraction.
It is also possible (though somewhat confusing) to change both lane size and container size in one operation which performs both lane conversion and reshaping. If this is done, the same rules apply, but the logical result size is the product of the input size times any expansion or contraction ratio from the lane change size.
For completeness, we can also speak of in-place
operations for the frequent case when resizing does not occur.
With an in-place operation, the data is simply copied from logical
output to its physical container with no truncation or padding.
The ratio parameter M
in this case is unity.
Note that the classification of contraction vs. expansion
depends on the relative sizes of the logical result and the
physical output container. The size of the input container may be
larger or smaller than either of the other two values, without
changing the classification. For example, a conversion from a
128-bit shape to a 256-bit shape will be a contraction in many
cases, but it would be an expansion if it were combined with a
conversion from byte
to long
, since in that case
the logical result would be 1024 bits in size. This example also
illustrates that a logical result does not need to correspond to
any particular platform-supported vector shape.
Although lane-wise masked operations can be viewed as producing partial operations, they are not classified (in this API) as expansions or contractions. A masked load from an array surely produces a partial vector, but there is no meaningful "logical output vector" that this partial result was contracted from.
Some care is required with these terms, because it is the data, not the container size, that is expanding or contracting, relative to the size of its output container. Thus, resizing a 128-bit input into 512-bit vector has the effect of a contraction. Though the 128 bits of payload hasn't changed in size, we can say it "looks smaller" in its new 512-bit home, and this will capture the practical details of the situation.
If a vector method might expand its data, it accepts an extra
int
parameter called part
, or the "part number".
The part number must be in the range [0..M-1]
, where
M
is the expansion ratio. The part number selects one
of M
contiguous disjoint equally-sized blocks of lanes
from the logical result and fills the physical output vector
with this block of lanes.
Specifically, the lanes selected from the logical result of an
expansion are numbered in the range [R..R+L-1]
, where
L
is the VLENGTH
of the physical output vector, and
the origin of the block, R
, is part*L
.
A similar convention applies to any vector method that might
contract its data. Such a method also accepts an extra part number
parameter (again called part
) which steers the contracted
data lanes one of M
contiguous disjoint equally-sized
blocks of lanes in the physical output vector. The remaining lanes
are filled with zero, or as specified by the method.
Specifically, the data is steered into the lanes numbered in the
range [R..R+L-1]
, where L
is the VLENGTH
of
the logical result vector, and the origin of the block, R
,
is again a multiple of L
selected by the part number,
specifically |part|*L
.
In the case of a contraction, the part number must be in the
non-positive range [-M+1..0]
. This convention is adopted
because some methods can perform both expansions and contractions,
in a data-dependent manner, and the extra sign on the part number
serves as an error check. If vector method takes a part number and
is invoked to perform an in-place operation (neither contracting
nor expanding), the part
parameter must be exactly zero.
Part numbers outside the allowed ranges will elicit an indexing
exception. Note that in all cases a zero part number is valid, and
corresponds to an operation which preserves as many lanes as
possible from the beginning of the logical result, and places them
into the beginning of the physical output container. This is
often a desirable default, so a part number of zero is safe
in all cases and useful in most cases.
The various resizing operations of this API contract or expand their data as follows:
Vector.convert()
will expand (respectively, contract) its operand by ratio
M
if the
element size of its output is
larger (respectively, smaller) by a factor of M
.
If the element sizes of input and output are the same,
then convert()
is an in-place operation.
Vector.convertShape()
will expand (respectively, contract) its operand by ratio
M
if the bit-size of its logical result is
larger (respectively, smaller) than the bit-size of its
output shape.
The size of the logical result is defined as the
element size of the output,
times the VLENGTH
of its input.
Depending on the ratio of the changed lane sizes, the logical size
may be (in various cases) either larger or smaller than the input
vector, independently of whether the operation is an expansion
or contraction.
Vector.castShape()
is a convenience method for convertShape()
, its classification
as an expansion or contraction is the same as for convertShape()
.
Vector.reinterpretShape()
is an expansion (respectively, contraction) by ratio M
if the
vector bit-size of its input is
crammed into a smaller (respectively, dropped into a larger)
output container by a factor of M
.
Otherwise it is an in-place operation.
Since this method is a reinterpretation cast that can erase and
redraw lane boundaries as well as modify shape, the input vector's
lane size and lane count are irrelevant to its classification as
expanding or contracting.
unslice()
methods expand
by a ratio of M=2
, because the single input slice is
positioned and inserted somewhere within two consecutive background
vectors. The part number selects the first or second background
vector, as updated by the inserted slice.
Note that the corresponding
slice()
methods, although inverse
to the unslice()
methods, do not contract their data
and thus require no part number. This is because
slice()
delivers a slice of exactly VLENGTH
lanes extracted from two input vectors.
partLimit()
on VectorSpecies
can be used, before any
expanding or contracting operation is performed, to query the
limiting value on a part parameter for a proposed expansion
or contraction. The value returned from partLimit()
is
positive for expansions, negative for contractions, and zero for
in-place operations. Its absolute value is the parameter M
, and so it serves as an exclusive limit on valid part number
arguments for the relevant methods. Thus, for expansions, the
partLimit()
value M
is the exclusive upper limit
for part numbers, while for contractions the partLimit()
value -M
is the exclusive lower limit.
slice()
family of methods,
which extract contiguous slice of VLENGTH
fields from
a given origin point within a concatenated pair of vectors.
unslice()
family of
methods, which insert a contiguous slice of VLENGTH
fields
into a concatenated pair of vectors at a given origin point.
rearrange()
family of
methods, which select an arbitrary set of VLENGTH
lanes
from one or two input vectors, and assemble them in an arbitrary
order. The selection and order of lanes is controlled by a
VectorShuffle
object, which acts as an routing table
mapping source lanes to destination lanes. A VectorShuffle
can encode a mathematical permutation as well as many other
patterns of data movement.
compress(VectorMask)
and expand(VectorMask)
methods, which select up to VLENGTH
lanes from an
input vector, and assemble them in lane order. The selection of lanes
is controlled by a VectorMask
, with set lane elements mapping, by
compression or expansion in lane order, source lanes to destination lanes.
Some vector operations are not lane-wise, but rather move data across lane boundaries. Such operations are typically rare in SIMD code, though they are sometimes necessary for specific algorithms that manipulate data formats at a low level, and/or require SIMD data to move in complex local patterns. (Local movement in a small window of a large array of data is relatively unusual, although some highly patterned algorithms call for it.) In this API such methods are always clearly recognizable, so that simpler lane-wise reasoning can be confidently applied to the rest of the code.
In some cases, vector lane boundaries are discarded and
"redrawn from scratch", so that data in a given input lane might
appear (in several parts) distributed through several output lanes,
or (conversely) data from several input lanes might be consolidated
into a single output lane. The fundamental method which can redraw
lanes boundaries is
reinterpretShape()
.
Built on top of this method, certain convenience methods such
as reinterpretAsBytes()
or
reinterpretAsInts()
will
(potentially) redraw lane boundaries, while retaining the
same overall vector shape.
Operations which produce or consume a scalar result can be
viewed as very simple cross-lane operations. Methods in the
reduceLanes()
family fold together all lanes (or mask-selected
lanes) of a method and return a single result. As an inverse, the
broadcast
family of methods can be thought
of as crossing lanes in the other direction, from a scalar to all
lanes of the output vector. Single-lane access methods such as
lane(I)
or withLane(I,E)
might also be regarded as
very simple cross-lane operations.
Likewise, a method which moves a non-byte vector to or from a byte array could be viewed as a cross-lane operation, because the vector lanes must be distributed into separate bytes, or (in the other direction) consolidated from array bytes.
This API will also work correctly even on Java platforms which do not include specialized hardware support for SIMD computations. The Vector API is not likely to provide any special performance benefit on such platforms.
Currently the implementation is optimized to work best on:
blend
as in
the expression a.blend(a.lanewise(op, b), m)
, where
a
and b
are vectors, op
is the vector
operation, and m
is the mask.
The implementation does not currently support optimal
vectorized instructions for floating point transcendental
functions (such as operators SIN
and LOG
).
Vector<Integer>
may seem to
work with boxed Integer
values, the overheads associated
with boxing are avoided by having each vector subtype work
internally on lane values of the actual ETYPE
, such as
int
.
Vector
, along with all of its subtypes and many of its
helper types like VectorMask
and VectorShuffle
, is a
value-based
class.
Once created, a vector is never mutated, not even if only a single lane is changed. A new vector is always created to hold a new configuration of lane values. The unavailability of mutative methods is a necessary consequence of suppressing the object identity of all vectors, as value-based classes.
With Vector
,
identity-sensitive operations such as ==
may yield
unpredictable results, or reduced performance. Oddly enough,
v.
is likely to be faster
than v==w
, since equals
is not an identity
sensitive method.
Also, these objects can be stored in locals and parameters and as
static final
constants, but storing them in other Java
fields or in array elements, while semantically valid, may incur
performance penalties.
Implementation Note
Modifier and Type | Method and Description |
---|---|
public abstract Vector | |
public abstract Vector | |
public abstract Vector | Returns: the result of adding this vector to the given vectorthe second input vector v, VectorMask<E> the mask controlling lane selection m)Adds this vector to a second input vector, selecting lanes under the control of a mask. |
public abstract Vector | Returns: the result of incrementing each lane element by its corresponding lane indexN , scaled by scale the number to multiply by each lane index
scale)N , typically 1 Adds the lanes of this vector to their corresponding lane numbers, scaled by a given constant. |
public abstract int | Returns: the total size, in bits, of this vectorReturns the total size, in bits, of this vector. |
public abstract Vector | Returns: the result of blending the lane elements of this vector with those of the second input vectorthe second input vector, containing replacement lane values v, VectorMask<E> the mask controlling lane selection from the second input vector m)Replaces selected lanes of this vector with corresponding lanes from a second input vector under the control of a mask. |
public abstract Vector | Returns: the result of blending the lane elements of this vector with the scalar valuethe input scalar, containing the replacement lane value e, VectorMask<E> the mask controlling lane selection of the scalar m)Replaces selected lanes of this vector with a scalar value under the control of a mask. |
public abstract Vector | Returns: a vector where all lane elements are set to the primitive valuee the value to broadcast e)Returns a vector of the same species as this one
where all lane elements are set to
the primitive value |
public abstract int | Returns: the total size, in bytes, of this vectorReturns the total size, in bytes, of this vector. |
public abstract < the boxed element type of the output species F> Vector | Returns: a vector converted by element type from this vectorthe desired output species rsp, int the part number
of the result, or zero if neither expanding nor contracting part)Convenience method for converting a vector from one lane type to another, reshaping as needed when lane sizes change. |
public abstract < the boxed element type of the required lane type F> Vector | |
public abstract < the boxed element type of the required species F> Vector | Returns: the same vectorthe required species species)Checks that this vector has the given species, and returns this vector unchanged. |
public abstract VectorMask | Returns: the mask result of testing lane-wise if this vector compares to the input, according to the selected comparison operatorthe operation used to compare lane values op,a second input vector v)Tests this vector by comparing it with another input vector, according to the given comparison operation. |
public abstract VectorMask | Returns: the mask result of testing lane-wise if this vector compares to the input, according to the selected comparison operator, and only in the lanes selected by the maskthe operation used to compare lane values op,a second input vector v, VectorMask<E> the mask controlling lane selection m)Tests this vector by comparing it with another input vector, according to the given comparison operation, in lanes selected by a mask. |
public abstract VectorMask | Returns: the mask result of testing lane-wise if this vector compares to the input, according to the selected comparison operatorthe operation used to compare lane values op,the input scalar e)Tests this vector by comparing it with an input scalar, according to the given comparison operation. |
public abstract VectorMask | Returns: the mask result of testing lane-wise if this vector compares to the input, according to the selected comparison operator, and only in the lanes selected by the maskthe operation used to compare lane values op,the input scalar e, VectorMask<E> the mask controlling lane selection m)Tests this vector by comparing it with an input scalar, according to the given comparison operation, in lanes selected by a mask. |
public abstract Vector | Returns: the compressed lane elements of this vectorthe mask controlling the compression m)Compresses the lane elements of this vector selecting lanes under the control of a specific mask. |
public abstract < the boxed element type of the species F> Vector | Returns: a vector converted by shape and element type from this vectorthe desired scalar conversion to apply lane-wise conv,the part number
of the result, or zero if neither expanding nor contracting part)Convert this vector to a vector of the same shape and a new
element type, converting lane values from the current |
public abstract < the boxed element type of the output species F> Vector | Returns: a vector converted by element type from this vectorthe desired scalar conversion to apply lane-wise conv,the desired output species rsp, int the part number
of the result, or zero if neither expanding nor contracting part)Converts this vector to a vector of the given species, shape and
element type, converting lane values from the current |
public abstract Vector | |
public abstract Vector | Returns: the result of dividing this vector by the second input vectora second input vector v, VectorMask<E> the mask controlling lane selection m)Divides this vector by a second input vector under the control of a mask. |
public abstract int | Returns: the lane size, in bits, of this vectorReturns the size of each lane, in bits, of this vector. |
public abstract Class | Returns: the primitive element type of this vectorReturns the primitive element type
( |
public abstract VectorMask | |
public abstract boolean | Returns: whether this vector is identical to some other objectthe reference object with which to compare. obj)Overrides java. |
public abstract Vector | Returns: the expanded lane elements of this vectorthe mask controlling the compression m)Expands the lane elements of this vector under the control of a specific mask. |
public abstract int | Returns: a hash code value for this vectorOverrides java. |
public abstract void | intoMemorySegment(MemorySegment
the memory segment ms, long the offset into the memory segment offset, ByteOrder the intended byte order bo)Stores this vector into a memory segment starting at an offset using explicit byte order. |
public abstract void | intoMemorySegment(MemorySegment
the memory segment ms, long the offset into the memory segment offset, ByteOrder the intended byte order bo, VectorMask<E> the mask controlling lane selection m)Stores this vector into a memory segment starting at an offset using explicit byte order and a mask. |
public abstract Vector | Returns: the result of applying the operation lane-wise to the input vectorthe operation used to process lane values opOperates on the lane values of this vector. |
public abstract Vector | Returns: the result of applying the operation lane-wise to the input vectorthe operation used to process lane values op,the mask controlling lane selection m)Operates on the lane values of this vector, with selection of lane elements controlled by a mask. |
public abstract Vector | Returns: the result of applying the operation lane-wise to the two input vectorsthe operation used to combine lane values op,the input vector v)Combines the corresponding lane values of this vector with those of a second input vector. |
public abstract Vector | Returns: the result of applying the operation lane-wise to the two input vectorsthe operation used to combine lane values op,the second input vector v, VectorMask<E> the mask controlling lane selection m)Combines the corresponding lane values of this vector with those of a second input vector, with selection of lane elements controlled by a mask. |
public abstract Vector | Returns: the result of applying the operation lane-wise to the input vector and the scalarthe operation used to combine lane values op,the input scalar e)Combines the lane values of this vector with the value of a broadcast scalar. |
public abstract Vector | Returns: the result of applying the operation lane-wise to the input vector and the scalarthe operation used to combine lane values op,the input scalar e, VectorMask<E> the mask controlling lane selection m)Combines the corresponding lane values of this vector with those of a second input vector, with selection of lane elements controlled by a mask. |
public abstract Vector | Returns: the result of applying the operation lane-wise to the three input vectorsthe operation used to combine lane values op,the second input vector v1, Vector<E> the third input vector v2)Combines the corresponding lane values of this vector with the lanes of a second and a third input vector. |
public abstract Vector | Returns: the result of applying the operation lane-wise to the three input vectorsthe operation used to combine lane values op,the second input vector v1, Vector<E> the third input vector v2, VectorMask<E> the mask controlling lane selection m)Combines the corresponding lane values of this vector with the lanes of a second and a third input vector, with selection of lane elements controlled by a mask. |
public abstract int | |
public abstract VectorMask | |
public abstract VectorMask | Returns: a mask where each lane is set or unset according to the given bitthe given mask bit to be replicated bit)Returns a mask of same species as this vector, where each lane is set or unset according to given single boolean, which is broadcast to all lanes. |
public abstract Vector | |
public abstract Vector | |
public abstract Vector | |
public abstract Vector | Returns: the result of multiplying this vector by the given vectorthe second input vector v, VectorMask<E> the mask controlling lane selection m)Multiplies this vector by a second input vector under the control of a mask. |
public abstract Vector | |
public abstract Vector | Returns: the rearrangement of the lane elements of this vectorthe shuffle controlling lane index selection s)Rearranges the lane elements of this vector, selecting lanes under the control of a specific shuffle. |
public abstract Vector | Returns: the rearrangement of the lane elements of this vectorthe shuffle controlling lane index selection s, VectorMask<E> the mask controlling application of the shuffle m)Rearranges the lane elements of this vector, selecting lanes under the control of a specific shuffle and a mask. |
public abstract Vector | Returns: the rearrangement of lane elements of this vector and a second input vectorthe shuffle controlling lane selection from both input vectors s, Vector<E> the second input vector v)Rearranges the lane elements of two vectors, selecting lanes under the control of a specific shuffle, using both normal and exceptional indexes in the shuffle to steer data. |
public abstract long | Returns: the accumulated result, cast tolong the operation used to combine lane values opReturns a value accumulated from all the lanes of this vector. |
public abstract long | Returns: the reduced result accumulated from the selected lane valuesthe operation used to combine lane values op,the mask controlling lane selection m)Returns a value accumulated from selected lanes of this vector, controlled by a mask. |
public abstract ByteVector | Returns: aByteVector with the same shape and information contentViews this vector as a vector of the same shape
and contents but a lane type of |
public abstract DoubleVector | Returns: aDoubleVector with the same shape and information contentReinterprets this vector as a vector of the same shape
and contents but a lane type of |
public abstract FloatVector | Returns: aFloatVector with the same shape and information contentReinterprets this vector as a vector of the same shape
and contents but a lane type of |
public abstract IntVector | Returns: aIntVector with the same shape and information contentReinterprets this vector as a vector of the same shape
and contents but a lane type of |
public abstract LongVector | Returns: aLongVector with the same shape and information contentReinterprets this vector as a vector of the same shape
and contents but a lane type of |
public abstract ShortVector | Returns: aShortVector with the same shape and information contentReinterprets this vector as a vector of the same shape
and contents but a lane type of |
public abstract < the boxed element type of the species F> Vector | Returns: a vector transformed, by shape and element type, from this vectorthe desired vector species species, int the part number
of the result, or zero if neither expanding nor contracting part)Transforms this vector to a vector of the given species of
element type |
public abstract Vector | Returns: the rearrangement of the lane elements ofv the vector supplying the result values v)Using index values stored in the lanes of this vector,
assemble values stored in second vector |
public abstract Vector | Returns: the rearrangement of the lane elements ofv the vector supplying the result values v, VectorMask<E> the mask controlling selection from m)v Using index values stored in the lanes of this vector, assemble values stored in second vector, under the control of a mask. |
public abstract VectorShape | |
public abstract Vector | Returns: a contiguous slice ofVLENGTH lanes, taken from
this vector starting at the indicated origin, and
continuing (as needed) into the second vectorthe first input lane to transfer into the slice origin, Vector<E> a second vector logically concatenated with the first,
before the slice is taken (if omitted it defaults to zero) v1)Slices a segment of adjacent lanes, starting at a given
|
public abstract Vector | Returns: a contiguous slice ofVLENGTH lanes, taken from
this vector starting at the indicated origin, and
continuing (as needed) into the second vectorthe first input lane to transfer into the slice origin, Vector<E> a second vector logically concatenated with the first,
before the slice is taken (if omitted it defaults to zero) v1, VectorMask<E> the mask controlling lane selection into the resulting vector m)Slices a segment of adjacent lanes
under the control of a mask,
starting at a given
|
public abstract Vector | Returns: the lastVLENGTH-origin input lanes,
placed starting in the first lane of the output,
padded at the end with zeroesthe first input lane to transfer into the slice origin)Slices a segment of adjacent lanes, starting at a given
|
public abstract VectorSpecies | |
public abstract Vector | |
public abstract Vector | Returns: the result of subtracting the second input vector from this vectorthe second input vector v, VectorMask<E> the mask controlling lane selection m)Subtracts a second input vector from this vector under the control of a mask. |
public abstract VectorMask | Returns: the mask result of testing the lanes of this vector, according to the selected test operatorthe operation used to test lane values opTests the lanes of this vector according to the given operation. |
public abstract VectorMask | Returns: the mask result of testing the lanes of this vector, according to the selected test operator, and only in the lanes selected by the maskthe operation used to test lane values op,the mask controlling lane selection m)Test selected lanes of this vector, according to the given operation. |
public abstract Object | Returns: an accurately typed array containing the lane values of this vectorReturns a packed array containing all the lane values. |
public abstract double[] | Returns: adouble[] array containing
the lane values of this vector,
possibly rounded to representable
double valuesReturns a |
public abstract int[] | Returns: anint[] array containing
the lane values of this vectorReturns an |
public abstract long[] | Returns: along[] array containing
the lane values of this vectorReturns a |
public abstract VectorShuffle | Returns: a shuffle representation of this vectorConverts this vector into a shuffle, converting the lane values
to |
public abstract String | Returns: a string of the form"[0,1,2...]"
reporting the lane values of this vectorOverrides java. "[0,1,2...]" , reporting the lane values of this
vector, in lane order.
|
public abstract Vector | Returns: either the first or second part of a pair of background vectorsw , updated by inserting
this vector at the indicated originthe first output lane to receive the slice origin, Vector<E> the background vector that (as two copies) will receive
the inserted slice w, int the part number of the result (either zero or one) part)Reverses a slice(), inserting
the current vector as a slice within another "background" input
vector, which is regarded as one or the other input to a
hypothetical subsequent |
public abstract Vector | Returns: either the first or second part of a pair of background vectorsw , updated by inserting
selected lanes of this vector at the indicated originthe first output lane to receive the slice origin, Vector<E> the background vector that (as two copies) will receive
the inserted slice, if they are set in w, int m the part number of the result (either zero or one) part, VectorMask<E> the mask controlling lane selection from the current vector m)Reverses a slice(), inserting
(under the control of a mask)
the current vector as a slice within another "background" input
vector, which is regarded as one or the other input to a
hypothetical subsequent |
public abstract Vector | |
public abstract Vector | Returns: the original vector, reinterpreted as floating pointViews this vector as a vector of the same shape, length, and contents, but a lane type that is a floating-point type. |
public abstract Vector | Returns: the original vector, reinterpreted as non-floating pointViews this vector as a vector of the same shape, length, and contents, but a lane type that is not a floating-point type. |