Appendix B. Hidden-layer catalog

This appendix is the reference catalog of the hardware-native layer kinds reached by authoring the network description directly, behind chapter 26.

Every row is a native descriptor the conversion path never emits: the descriptor and its _ANECValidate<Name>Layer checker are present in the compiler on every target, and the layer is reached by handing the compiler a Unit whose Type is the native name and whose Params hold the attributes the matching ZinParse<Name>Unit parser reads.

Catalog

Table B.1 lists each native layer kind with what it computes, its netplist Type and compiler symbol, and its family gate.

Layer kindWhat it computesNetplist Type and compiler symbolFamily gate
Fused attention from four operands (Q, K, V, scale) plus an optional fifth additive mask , with the channel axis holding the sequenceSDPA; ANECSDPALayerDesc, ZinParseSDPAUnit, anec.sdpa. The one parsed key is SubtractMax, defaulting false in _ANECSDPALayerDescInitialize and set true for a correct softmaxAll families from the M1 onward; runs on the matmul, softmax, and transpose path, not the texture engine
SortFull sort along a chosen axis, ascending or descending, values or argsort indicesSort; ZinParseSortUnit. Keys: Direction, SortDimension, VectorDimension, SortIndices, IndicesValidator callable on the M1, but the code generator rejects Sort there; runs on later families
Top-kThe k largest or smallest along an axis, values or indices, index outputs returned float16-encoded and exact below 2048TopK; ZinParseTopKUnit. Keys: Type (Max or Min), K, SortDimension, VectorDimension, SortIndices, IndicesRuns on the M1 outside a forbidden band: K in fails the compiler at every width
Argument min and maxSpatial or channel argmin and argmax over a kernel windowArgMinMax; _ANECValidateArgMinMaxLayer. Keys: Mode (SpatialArgMax, ChannelArgMax, SpatialArgMin, ChannelArgMin), KernelWidth, KernelHeight, Pad*All families from the M1 onward
Whole-tensor argument min and maxArgmin or argmax over an entire tensor dimensionGlobalArgMinMax. Keys: Type (Max or Min), DimensionGated to the A15 generation; rejected on the M1
Spatial rearrangeDepth-to-space and space-to-depth in two channel-ordering conventions, plus space-and-batch reshuffles, parameterized by per-axis integer factorsPixelShuffle, PixelUnshuffle, ChannelToSpace, SpaceToChannel, SpaceToBatch, BatchToSpace; ZinParse<Name>Unit. Three int32 keys: FactorX, FactorY, FactorZAll families from the M1 onward; BatchToSpace requires batch divisible by FactorX times FactorY
Range normalizationMaps a tensor to its minimum-to-maximum span per row or per columnMinMaxNormalization; _ANECValidateMinMaxNormLayer, _ANECMinMaxNormLayerDescInitialize. Keys: Dimension (Width or Height), Epsilon as a float16 bit patternWidth and Height run; Dimension of Channel is arch-gated and rejected on the M1
Local response normalizationCross-channel response normalization over a channel windowLocalResponseNormalization; _ANECValidateLRNLayer. Alpha is a float16 bit pattern divided internally by KernelChannel; only the first KernelChannel channels are normalizedAll families from the M1 onward
Scaled elementwiseA binary elementwise op fused with a scalar scale, ScaledElementWise. Keys: Type (Add, Mult, Sub, and the elementwise vocabulary), Scale as a float16 bit patternAll families from the M1 onward
Template cross-correlationValid cross-correlation of a single-channel map with an unflipped template, CrossCorrelation. The MIL frontend rejects the op; the netplist Type reaches it directlyAll families from the M1 onward
Three-vector cross productThe cross product of two length-3 vectors held in the channel axis, CrossProduct. Inputs shaped D1 C3 H1 W1; the MIL frontend rejects the opAll families from the M1 onward
Furthest-point samplingGreedy L2 furthest-point sampling of up to 1024 centroids from up to 8192 points, seeded at the first point, centroids returned channel-majorFurthestPointSampling. Keys: CentroidCount, DistanceMetric (L2 only on this architecture)All families from the M1 onward
Radius neighborhood searchAn L2 ball query returning a points-by-centroids membership matrix, one membership flag per pairRadiusSearch. Two inputs (centroids, points), both D1 C3 H1; key RadiusAll families from the M1 onward
Stereo cost volumeThe L1 matching cost per disparity, , over disparity planesCostVolume. Keys: DisparityDirection, DisparityRange; requires reference width All families from the M1 onward
Re-strided input viewA contiguous offset window along one named axis, no data movementInputView. Keys: Dimension, Offset, Size, Step; gates InvalidInputView{Dimension,Offset,Size,Step}All families from the M1 onward
Runtime-offset dynamic sliceA window whose start is bound as a constant or runtime indexDynamicSlice; reached by the MIL slice_by_index path. Keys: DynamicSliceAxisOrder, DynamicSliceInfo, CoordinateInfo, PaddingInfo, BackgroundValueValidator callable on the M1, but the code generator rejects DynamicSlice there; runs on later families
Tile, concatenate, and reshape utilitiesFlatten as an NCHW identity reshape, inference-time dropout as identity, and broadcast of a length-1 axisFlatten, Dropout (rate 0), Broadcast (keys Dimension, Size)All families from the M1 onward

Table B.1. The hidden-layer catalog.

Arch-gated negatives

Three rows above name layers a later chip accepts and the M1 rejects, by the family gates of chapter 12. GlobalArgMinMax is gated to the A15 generation and rejected on the M1. MinMaxNormalization with a Channel reduction is arch-gated and rejected on the M1, while its Width and Height reductions run. The texture-engine samplers (resize, crop-and-resize, grid resample, and the affine spatial transform) are accepted from the A14 generation and rejected on the M1, where the compiler reports that the affine transform is not supported on this architecture. They are part of the same gated family but are not authored as netplist Units here.

A second class of rejection is not a family gate but the attested-is-not-reachable rule of chapter 4: the Sort and DynamicSlice validators are callable on the M1, yet the code generator rejects both, and TopK is accepted only outside the band. An authored layer is confirmed by a compile-and-run on the target, not by the presence of its descriptor.

Validator gate set

Every authored layer passes through one per-layer validator, the _ANECValidate<Op>Layer family, of which 55 symbols are exported and 50 are per-layer. The compiler runs the same validators in two roles: the segmenter dry-runs them through _ANECValidateNetworkCreate to decide engine eligibility, and the back-end legalizer re-runs them during a real compile, so the dry-run prediction never drifts from the compile result. The five non-layer exports are _ANECValidate, _ANECValidateNetworkCreate, _ANECValidateMPSModule, _ANECValidateMPSModuleCreate, and _ANECValidateMutableProcedureInfo.

Each validator reads a fixed bottom (input-tensor) count and a per-chip feature byte from the hardware-abstraction layer, and rejects with a measured literal string. Table B.2 reproduces those gates for the validators that guard the authored and bridge-reachable layers, with the bottom count, constraint, and reject string for each.

ValidatorBottomsGate and constraintReject string
SDPA4 or 5key and value same shape; mask broadcast-compatible; scale constantSDPA layer must have only 4 or 5(optional mask) inputs
Conv1 plus weightkernel within the per-chip range; large kernel W and H multiple of 8; channels divisible by groupsInvalid conv kernel %s = %zd, It should be in [%zd, %zd]
MatrixMult2depth 1 on both operands; out-C equals A-C; fits the kernel-memory budgetdepth > 1 is not supported for MatMult
Linear1 plus weightinput rank below 5Linear layer must have only one single input.
Pool1window below input; pad below kernel; mode gated per chipPooling mode "%s" is not available on this ANE architecture.
Neuron1non-linear mode 1 to 46; type in the per-chip list; ReLU-N positive parameters when the gate byte is 0This platform doesn't support Neuron %s
Reduction1reduce-then-square needs feature byte 0x494 (0 through the M1); each axis at most 4square operation after reduction is not supported
Softmax1feature byte 0x815 (0 on older); output FloatSoftmax is not supported by this ANE architecture
LayerNorm1channels divisible by num-groups; grouped form requires depth 1... does not yet support depth > 1
InstanceNorm1feature byte 0x816; spatial axes onlyInstanceNorm layer not supported for this ANE architecture
MinMaxNorm1feature byte 0x818; spatial only, the Channel axis arch-gated(encoded assert, byte 0x818)
LRN1feature byte 0x81a; channel count of 16 or above fails code generation on the M1LRN is not supported on this architecture.
ArgMinMax1channel-reduce C at most 2048 fp16; pad below kernel; equal left and right, zero front and backArgMinMax layer must have one input
GlobalArgMinMax1feature byte 0x4f2 (1 from the M1 on the bridge route); mode 1 or 2; reduce dimension not 5(encoded assert, byte 0x4f2)
Transpose1permutation valid; extent capped 16384 through the M3, 65536 on the M5; last four dimensions onlyNE Input Transpose is not supported for this arch
Concatvariadicmatch input zero on every non-concat axis; constant positive axis; same layoutConcat layer must have at least 2 inputs
Pad1H or W axes only; symmetric and reflect modes need texture byte 0x81d (0 on the M1)Channel padding is not supported on ANE
Broadcast1broadcast only from a length-1 axis; depth-axis broadcast needs byte 0x812Broadcast along depth axis is not supported on this architecture
Gather2M1 software envelope: data batch 1, depth 1, index channel 3; texture path from the A14Cannot decompose layer on this architecture
PixelShuffle / PixelUnshuffle1depth factor 1; W and H factors in 1, 2, 3, 4, 8; channel divisible by the factor productreturned invalid:
SpaceToBatch / BatchToSpace1factors fully factor into 2, 3, 4, 8; batch divisible by the factor productInput batch n = %zd is not divisible by factor x = %d * factor y = %d
ChannelToSpace1the z dimension is not reorganizableChannelToSpace in z dimension is not supported, current factor.z = %d.
Resize1dimension or ratio, not both; sampling axes H and W; texture byte 0x81d (0 on the M1 takes a software route)failed to map resize layer on this arch
CropResize2index format fp16; texture engine from the A14; same coordinate, method, and padding across axesCodegen Error: Invalid Texture CropCfg
AffineTransform2matrix fp16; texture byte 0x81d (0 on the M1)affine transform is not supported on this architecture
Resample2warp depth 1; warp channel 1 or 2; texture engine from the A14Channel size in coordinates should be 1 or 2
Sort1direction valid; output fp16 or uint16; validator passes, code generation rejects on the M1(passes validation, code-generation reject)
TopK1k in the sort-dimension range; the M1 rejects at code generation, and k in 3 or 4 is forbidden on the M5(passes validation, code-generation reject)
Dropout1feature byte 0x4a9 (1 only from the A15); rate in the half-open unit intervalDropout layer is not supported on this architecture.
Randomnonefeature byte 0x4a9 (1 only from the A15); low below high; output Int8, UInt8, or Float16Random layer is not supported on this architecture.
RingBufferWriter2the writer must connect to a live-state buffer; circular mode arch-gatedCircular buffer is not supported on this architecture
NMS2 or moreboxes channel 4; runs only on a CPU or GPU backend, never engine-native(passes validation, not engine-native)

Table B.2. The per-layer validator gates.

The validators are public exported symbols, so they are callable from user space, and this is the basis of the precompile predictor: a callable validator marks a schema-gated layer reachable by direct authoring. A callable validator that accepts the schema does not guarantee the layer compiles, which is why the Sort, TopK, DynamicSlice, and the M1 CropResize rows pass the validator and fail at hardware-executable lowering. One opcode-surface gap holds the other way: the RCAS and Reverse operations have an internal semantics validator but no exported per-layer symbol, so they are not reachable by direct authoring through this route.