References
This guide is archived on arXiv as arXiv:2606.22283. To cite it:
@misc{bryngelson2026ane,
author = {Bryngelson, Spencer H.},
title = {Apple Neural Engine: Architecture, Programming, and Performance},
year = {2026},
archivePrefix = {arXiv},
eprint = {2606.22283},
doi = {10.48550/arXiv.2606.22283},
url = {https://arxiv.org/abs/2606.22283},
}
- AmiraniLabs. "libane: a native Apple Neural Engine runtime." Repository, https://github.com/AmiraniLabs/libane.
- Apple. Accelerate and BNNS documentation. https://developer.apple.com/documentation/accelerate.
- Apple. Active installed base of 2.5 billion devices, reported by T. Cook on the first-quarter fiscal 2026 earnings call, January 29, 2026. apple.com.
- Apple. Apple silicon technical specifications. https://www.apple.com/mac/compare/.
- Apple. Core ML framework documentation. https://developer.apple.com/documentation/coreml.
- Apple. Core ML Tools (coremltools) documentation. https://apple.github.io/coremltools.
- Apple. Vision framework documentation. https://developer.apple.com/documentation/vision.
- Apple Machine Learning Research. "Deploying Transformers on the Apple Neural Engine." Apple Machine Learning Research article, 2022.
- Benazir, A., and Lin, F. X. "Efficient Mixture-of-Experts LLM Inference with Apple Silicon NPUs." Preprint, arXiv:2604.18788, 2026.
- Bi, Z., Chen, X., Sun, L., Yao, Y., Shen, Q., Lou, J., and Deng, C. "RooflineBench: A Benchmarking Framework for On-Device LLMs via Roofline Analysis." Preprint, arXiv:2602.11506, 2026.
- Bryngelson, S. H. "ANEForge: Python for direct computation on the Apple Neural Engine." Preprint, arXiv:2606.17090, 2026.
- Chen, L., Feng, D., Feng, E., Wang, Y., Zhao, R., Xia, Y., Xu, P., and Chen, H. "Characterizing Mobile SoC for Accelerating Heterogeneous LLM Inference." ACM SIGOPS Symposium on Operating Systems Principles (SOSP), 2025. arXiv:2501.14794, DOI 10.1145/3731569.3764808.
- Choi, J. W., Bedard, D., Fowler, R., and Vuduc, R. "A Roofline Model of Energy." IEEE International Symposium on Parallel and Distributed Processing (IPDPS), 661-672, 2013. DOI 10.1109/IPDPS.2013.77.
- Community Apple Neural Engine reverse-engineering repositories. johnmai-dev/ANE-LM, mechramc/Orion, skyfallsin/apple-neural-engine-field-guide, and dmaynor/apple-vuln-research. Repositories.
- Ding, N., and Williams, S. "An Instruction Roofline Model for GPUs." IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS), 7-18, 2019. DOI 10.1109/PMBS49563.2019.00007.
- Fanariotis, A., Orphanoudakis, T., and Fotopoulos, V. "Evaluating the Energy Efficiency of NPU-Accelerated Machine Learning Inference on Embedded Microcontrollers." Preprint, arXiv:2509.17533, 2025.
- Gerganov, G. "whisper.cpp: Whisper inference in C/C++ with Core ML Neural Engine support." Repository, https://github.com/ggml-org/whisper.cpp.
- Hollemans, M. "The Neural Engine: What Do We Know About It?" Community-maintained repository, https://github.com/hollance/neural-engine.
- Hotz, G., and the tinygrad authors. "tinygrad." Repository, https://github.com/tinygrad/tinygrad.
- Hübner, P., Hu, A., Peng, I., and Markidis, S. "Apple vs. Oranges: Evaluating the Apple Silicon M-Series SoCs for HPC Performance and Efficiency." Preprint, arXiv:2502.05317, 2025.
- Ignatov, A., Timofte, R., Kulik, A., Yang, S., Wang, K., Baum, F., Wu, M., Xu, L., and Van Gool, L. "AI Benchmark: All About Deep Learning on Smartphones in 2019." Preprint, arXiv:1910.06663, 2019.
- Ilic, A., Pratas, F., and Sousa, L. "Cache-Aware Roofline Model: Upgrading the Loft." IEEE Computer Architecture Letters, 13(1), 21-24, 2014. DOI 10.1109/L-CA.2013.6.
- Jayanth, R., Gupta, N., and Prasanna, V. "Benchmarking Edge AI Platforms for High-Performance ML Inference." Preprint, arXiv:2409.14803, 2024.
- Jouppi, N. P., Young, C., Patil, N., Patterson, D. A., et al. "In-Datacenter Performance Analysis of a Tensor Processing Unit." International Symposium on Computer Architecture (ISCA), 1-12, 2017. Also arXiv:1704.04760.
- Kumaresan, R. "Orion: Characterizing and Programming Apple's Neural Engine for LLM Training and Inference." Preprint, arXiv:2603.06728, 2026.
- ML.ENERGY / Zeus. "Programmatic Energy Consumption Measurement on Apple Silicon (macOS)." Project issue report (#159), 2025.
- Moon, S., Cha, J., Park, H., and Kim, J. "Hybe: GPU-NPU Hybrid System for Efficient LLM Inference with Million-Token Context Window." International Symposium on Computer Architecture (ISCA), 808-820, 2025. DOI 10.1145/3695053.3731051.
- Plyenkov, B. "Decoupling Machine Intelligence from Application in IoT Devices." Master's thesis, Aalto University, 2019.
- Prashanthi, S. K., Sahoo, K. K., Saikia, A. R., Gupta, P., Joshi, A. V., Pansari, P., and Simmhan, Y. "Pagoda: An Energy and Time Roofline Study for DNN Workloads on Edge Accelerators." Preprint, arXiv:2509.20189, 2025.
- Singh, M. "Inside the M4 Apple Neural Engine, Part 1: Reverse Engineering." Blog post and repository, 2026, https://github.com/maderix/ANE.
- Tummalapalli, P., Arayakandy, S., Pal, R., and Kundan, K. "LLM Inference at the Edge: Mobile, NPU, and GPU Performance Efficiency Trade-offs Under Sustained Load." Preprint, arXiv:2603.23640, 2026.
- Verhelst, M., Benini, L., and Verma, N. "How to Keep Pushing ML Accelerator Performance? Know Your Rooflines!" IEEE Journal of Solid-State Circuits, 2025. DOI 10.1109/JSSC.2025.3553765.
- Williams, S., Waterman, A., and Patterson, D. A. "Roofline: An Insightful Visual Performance Model for Multicore Architectures." Communications of the ACM, 52(4), 65-76, 2009. DOI 10.1145/1498765.1498785.
- Xu, D., Zhang, H., Yang, L., Liu, R., Huang, G., Xu, M., and Liu, X. "Fast On-device LLM Inference with NPUs." ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2025. arXiv:2407.05858, DOI 10.1145/3669940.3707239.
- Yang, C., Kurth, T., and Williams, S. "Hierarchical Roofline Analysis for GPUs: Accelerating Performance Optimization for the NERSC-9 Perlmutter System." Concurrency and Computation: Practice and Experience, 32(20), e5547, 2020. DOI 10.1002/cpe.5547.
- Yoon, E. "ane: a reverse-engineered Linux driver for the Apple Neural Engine, with anecc." Repository, 2022, https://github.com/eiln/ane.