References

This guide is archived on arXiv as arXiv:2606.22283. To cite it:

@misc{bryngelson2026ane,
  author        = {Bryngelson, Spencer H.},
  title         = {Apple Neural Engine: Architecture, Programming, and Performance},
  year          = {2026},
  archivePrefix = {arXiv},
  eprint        = {2606.22283},
  doi           = {10.48550/arXiv.2606.22283},
  url           = {https://arxiv.org/abs/2606.22283},
}

  1. AmiraniLabs. "libane: a native Apple Neural Engine runtime." Repository, https://github.com/AmiraniLabs/libane.
  2. Apple. Accelerate and BNNS documentation. https://developer.apple.com/documentation/accelerate.
  3. Apple. Active installed base of 2.5 billion devices, reported by T. Cook on the first-quarter fiscal 2026 earnings call, January 29, 2026. apple.com.
  4. Apple. Apple silicon technical specifications. https://www.apple.com/mac/compare/.
  5. Apple. Core ML framework documentation. https://developer.apple.com/documentation/coreml.
  6. Apple. Core ML Tools (coremltools) documentation. https://apple.github.io/coremltools.
  7. Apple. Vision framework documentation. https://developer.apple.com/documentation/vision.
  8. Apple Machine Learning Research. "Deploying Transformers on the Apple Neural Engine." Apple Machine Learning Research article, 2022.
  9. Benazir, A., and Lin, F. X. "Efficient Mixture-of-Experts LLM Inference with Apple Silicon NPUs." Preprint, arXiv:2604.18788, 2026.
  10. Bi, Z., Chen, X., Sun, L., Yao, Y., Shen, Q., Lou, J., and Deng, C. "RooflineBench: A Benchmarking Framework for On-Device LLMs via Roofline Analysis." Preprint, arXiv:2602.11506, 2026.
  11. Bryngelson, S. H. "ANEForge: Python for direct computation on the Apple Neural Engine." Preprint, arXiv:2606.17090, 2026.
  12. Chen, L., Feng, D., Feng, E., Wang, Y., Zhao, R., Xia, Y., Xu, P., and Chen, H. "Characterizing Mobile SoC for Accelerating Heterogeneous LLM Inference." ACM SIGOPS Symposium on Operating Systems Principles (SOSP), 2025. arXiv:2501.14794, DOI 10.1145/3731569.3764808.
  13. Choi, J. W., Bedard, D., Fowler, R., and Vuduc, R. "A Roofline Model of Energy." IEEE International Symposium on Parallel and Distributed Processing (IPDPS), 661-672, 2013. DOI 10.1109/IPDPS.2013.77.
  14. Community Apple Neural Engine reverse-engineering repositories. johnmai-dev/ANE-LM, mechramc/Orion, skyfallsin/apple-neural-engine-field-guide, and dmaynor/apple-vuln-research. Repositories.
  15. Ding, N., and Williams, S. "An Instruction Roofline Model for GPUs." IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS), 7-18, 2019. DOI 10.1109/PMBS49563.2019.00007.
  16. Fanariotis, A., Orphanoudakis, T., and Fotopoulos, V. "Evaluating the Energy Efficiency of NPU-Accelerated Machine Learning Inference on Embedded Microcontrollers." Preprint, arXiv:2509.17533, 2025.
  17. Gerganov, G. "whisper.cpp: Whisper inference in C/C++ with Core ML Neural Engine support." Repository, https://github.com/ggml-org/whisper.cpp.
  18. Hollemans, M. "The Neural Engine: What Do We Know About It?" Community-maintained repository, https://github.com/hollance/neural-engine.
  19. Hotz, G., and the tinygrad authors. "tinygrad." Repository, https://github.com/tinygrad/tinygrad.
  20. Hübner, P., Hu, A., Peng, I., and Markidis, S. "Apple vs. Oranges: Evaluating the Apple Silicon M-Series SoCs for HPC Performance and Efficiency." Preprint, arXiv:2502.05317, 2025.
  21. Ignatov, A., Timofte, R., Kulik, A., Yang, S., Wang, K., Baum, F., Wu, M., Xu, L., and Van Gool, L. "AI Benchmark: All About Deep Learning on Smartphones in 2019." Preprint, arXiv:1910.06663, 2019.
  22. Ilic, A., Pratas, F., and Sousa, L. "Cache-Aware Roofline Model: Upgrading the Loft." IEEE Computer Architecture Letters, 13(1), 21-24, 2014. DOI 10.1109/L-CA.2013.6.
  23. Jayanth, R., Gupta, N., and Prasanna, V. "Benchmarking Edge AI Platforms for High-Performance ML Inference." Preprint, arXiv:2409.14803, 2024.
  24. Jouppi, N. P., Young, C., Patil, N., Patterson, D. A., et al. "In-Datacenter Performance Analysis of a Tensor Processing Unit." International Symposium on Computer Architecture (ISCA), 1-12, 2017. Also arXiv:1704.04760.
  25. Kumaresan, R. "Orion: Characterizing and Programming Apple's Neural Engine for LLM Training and Inference." Preprint, arXiv:2603.06728, 2026.
  26. ML.ENERGY / Zeus. "Programmatic Energy Consumption Measurement on Apple Silicon (macOS)." Project issue report (#159), 2025.
  27. Moon, S., Cha, J., Park, H., and Kim, J. "Hybe: GPU-NPU Hybrid System for Efficient LLM Inference with Million-Token Context Window." International Symposium on Computer Architecture (ISCA), 808-820, 2025. DOI 10.1145/3695053.3731051.
  28. Plyenkov, B. "Decoupling Machine Intelligence from Application in IoT Devices." Master's thesis, Aalto University, 2019.
  29. Prashanthi, S. K., Sahoo, K. K., Saikia, A. R., Gupta, P., Joshi, A. V., Pansari, P., and Simmhan, Y. "Pagoda: An Energy and Time Roofline Study for DNN Workloads on Edge Accelerators." Preprint, arXiv:2509.20189, 2025.
  30. Singh, M. "Inside the M4 Apple Neural Engine, Part 1: Reverse Engineering." Blog post and repository, 2026, https://github.com/maderix/ANE.
  31. Tummalapalli, P., Arayakandy, S., Pal, R., and Kundan, K. "LLM Inference at the Edge: Mobile, NPU, and GPU Performance Efficiency Trade-offs Under Sustained Load." Preprint, arXiv:2603.23640, 2026.
  32. Verhelst, M., Benini, L., and Verma, N. "How to Keep Pushing ML Accelerator Performance? Know Your Rooflines!" IEEE Journal of Solid-State Circuits, 2025. DOI 10.1109/JSSC.2025.3553765.
  33. Williams, S., Waterman, A., and Patterson, D. A. "Roofline: An Insightful Visual Performance Model for Multicore Architectures." Communications of the ACM, 52(4), 65-76, 2009. DOI 10.1145/1498765.1498785.
  34. Xu, D., Zhang, H., Yang, L., Liu, R., Huang, G., Xu, M., and Liu, X. "Fast On-device LLM Inference with NPUs." ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2025. arXiv:2407.05858, DOI 10.1145/3669940.3707239.
  35. Yang, C., Kurth, T., and Williams, S. "Hierarchical Roofline Analysis for GPUs: Accelerating Performance Optimization for the NERSC-9 Perlmutter System." Concurrency and Computation: Practice and Experience, 32(20), e5547, 2020. DOI 10.1002/cpe.5547.
  36. Yoon, E. "ane: a reverse-engineered Linux driver for the Apple Neural Engine, with anecc." Repository, 2022, https://github.com/eiln/ane.