Training Models on Apple’s Neural Engine Directly

Apple’s Neural Engine now supports direct model training through Core ML, eliminating the need to train on GPU and convert afterward.

Key Capabilities

The Neural Engine, Apple’s dedicated machine learning accelerator, traditionally handled only inference workloads. Training happened on external GPUs or cloud infrastructure, with models converted to Core ML format for deployment. Apple changed this with Core ML 5, introducing training APIs that execute directly on the Neural Engine across iPhone, iPad, and Mac devices.

The training framework supports common architectures including convolutional networks, transformers, and recurrent models. Developers can implement custom training loops with gradient computation, backpropagation, and optimizer updates running entirely on Neural Engine hardware. The API exposes loss functions, optimizers like Adam and SGD, and data augmentation pipelines.

Performance varies by device generation. M-series chips deliver the strongest training throughput, with the M3’s 18-core Neural Engine processing approximately 38 trillion operations per second. iPhone 15 Pro’s A17 Pro chip provides 35 TOPS, while older A-series processors offer reduced but still viable training speeds.

Memory constraints matter significantly. The Neural Engine shares unified memory with CPU and GPU, so training batch sizes must account for total system memory. A 16GB MacBook Air can train moderate-sized vision models, while 8GB devices require smaller batches or reduced model complexity.

Code implementation uses Swift and the Core ML Training framework:

import CoreML
import CreateML

let trainingData = MLImageClassifier.DataSource.labeledDirectories(at: trainingURL)
let validationData = MLImageClassifier.DataSource.labeledDirectories(at: validationURL)

let model = try MLImageClassifier(
    trainingData: trainingData,
    validationData: validationData,
    parameters: MLImageClassifier.ModelParameters(
        maxIterations: 50,
        augmentation: [.crop, .rotate, .flip]
    )
)

try model.write(to: outputURL)

The framework handles device selection automatically, routing computation to Neural Engine when available and falling back to GPU or CPU otherwise.

Who Benefits

iOS and macOS developers building on-device personalization features gain the most immediate value. Apps can fine-tune pre-trained models based on user data without sending information to external servers. Photo apps adapt to individual photography styles, keyboard apps learn typing patterns, and health apps customize predictions to personal metrics.

Edge computing scenarios benefit from reduced latency and bandwidth requirements. Training happens where data originates rather than requiring round-trips to cloud infrastructure. This matters for real-time applications like augmented reality filters that adapt to specific environments or accessibility features that personalize to individual users.

Privacy-focused applications leverage on-device training to avoid data transmission entirely. Medical apps can train diagnostic models on patient data that never leaves the device. Financial apps adapt fraud detection without exposing transaction details to third parties.

Research teams prototyping mobile ML applications can iterate faster without managing separate training infrastructure. The same device used for testing handles both training and inference, simplifying development workflows.

Quick Start

Begin with Create ML, Apple’s no-code training tool bundled with Xcode. It provides templates for image classification, object detection, sound classification, and tabular data. Models train directly on Mac hardware with visual progress monitoring.

For programmatic control, install Xcode 14 or later and create a Swift project targeting iOS 16+ or macOS 13+. Import the Core ML Training framework and choose a model architecture. Apple provides pre-built templates through Create ML Components:

https://developer.apple.com/documentation/createml

Start with transfer learning rather than training from scratch. Load a pre-trained model from Apple’s model gallery or convert a PyTorch/TensorFlow model to Core ML format. Fine-tune the final layers on device-specific data while keeping earlier layers frozen.

Monitor memory usage through Xcode’s Instruments tool. The Allocations template shows Neural Engine memory consumption and helps identify batch size limits. Reduce batch size or model complexity if training crashes due to memory pressure.

Test across device generations. Training performance on an M2 MacBook differs substantially from an iPhone 12. Profile on the minimum supported device to ensure acceptable training times.

Alternatives

PyTorch Mobile supports on-device training for Android and iOS through C++ APIs. It offers broader architecture support and cross-platform consistency but requires more complex integration than Core ML’s Swift-native approach.

TensorFlow Lite includes experimental training APIs for mobile devices, though documentation remains limited and performance lags behind PyTorch Mobile and Core ML.

Federated learning frameworks like Flower enable distributed training across multiple devices while keeping data local. This suits scenarios where models benefit from learning across many users without centralizing data.

Cloud-based training with MLOps platforms like Weights & Biases or Amazon SageMaker provides greater computational power and flexibility for complex models that exceed on-device capabilities.

Training ML Models on Apple's Neural Engine

Training Models on Apple’s Neural Engine Directly

Key Capabilities

Who Benefits

Quick Start

Alternatives

Related Tips

Caveman: Slashing AI Development Time on Benchmarks

Abliteration: Surgical Removal of AI Safety Filters

AgentHandover: Auto-Generate AI Skills from Screen Use