Machine Learning¶

The DART-MX95 supports running machine learning workloads on multiple processing units. Models can run on the CPU, be offloaded to the GPU for parallel acceleration, or use the dedicated Neural Processing Unit (NPU) for higher performance and lower power consumption during inference.

For general information about getting started with machine learning, refer to NXP's i.MX Machine Learning User's Guide:

The newest version is available directly from NXP at https://www.nxp.com/docs/en/user-guide/UG10166.pdf.
However, this link always points to the latest revision and may differ from the version used in this current BSP. For best results, download the corresponding guide from the NXP release documentation page that matches your specific software release: https://www.nxp.com/design/design-center/software/embedded-software/i-mx-software/embedded-linux-for-i-mx-applications-processors:IMXLINUX.

This guide demonstrates how to run a TensorFlow Lite model on the i.MX95 using NXP's Neutron Delegate, as described in the i.MX Machine Learning User's Guide.

Running a TensorFlow Lite model¶

NXP provides example TensorFlow Lite models for evaluation and testing. Before these models can run on the i.MX95 with hardware acceleration, they must first be converted using the eIQ Toolkit.

As described in the i.MX Machine Learning User's Guide:

For the offline compilation, the model should be converted through the eIQ toolkit first. In the converted model, the neutronGraph node is already generated. The neutron-delegate only captures the neutronGraph node and offloads the work to Neutron-S.

This conversion process prepares the model for execution on the NPU by inserting the required neutronGraph node, which allows the Neutron delegate to offload supported operations to the Neutron-S accelerator.

The following steps describe how to convert the model using the eIQ toolkit.

Prepare a Ubuntu 20.04 machine¶

The eIQ Toolkit currently supports Ubuntu 20.04 as the recommended host environment. You can use any Ubuntu 20.04 system, but the example below demonstrates how to quickly create a clean virtual machine using Multipass.

dev@host$ multipass launch 20.04 --name eiq-focal --cpus 8 --mem 8G --disk 60G

Install the eIQ Toolkit¶

Download the eIQ toolkit from NXP: https://www.nxp.com/design/design-center/software/eiq-ai-development-environment/eiq-toolkit-for-end-to-end-model-development-and-deployment:EIQ-TOOLKIT

This guide was tested using eIQ Toolkit 1.17.0.110 Ubuntu 20.04.03 Installer:

dev@host$ sha256sum eiq-toolkit-v1.17.0.110-1_amd64_b251020.deb.bin 
a23bf1810d71604e71383e308ba9de7189ac7d44e4265d635b5bf8417064bf6d  eiq-toolkit-v1.17.0.110-1_amd64_b251020.deb.bin

If using Multipass, copy the installer to your VM and open a shell in the VM:

dev@host$ multipass transfer eiq-toolkit-v1.17.0.110-1_amd64_b251020.deb.bin eiq-focal:
dev@host$ multipass shell eiq-focal

Make the installer executable, run it to extract the .deb, then install the package.

ubuntu@eiq-focal$ chmod +x eiq-toolkit-v1.17.0.110-1_amd64_b251020.deb.bin 
ubuntu@eiq-focal$ ./eiq-toolkit-v1.17.0.110-1_amd64_b251020.deb.bin

After installation, the toolkit binaries (including neutron-converter) are available under /opt/nxp.

Obtain a sample model from the target¶

Copy a known-good sample model from your running DART-MX95 to the VM. The example below uses the Mobilenet v1 quantized model shipped in Variscite's recovery SD card.

ubuntu@eiq-focal$ scp [email protected]:/usr/bin/tensorflow-lite-2.18.0/examples/mobilenet_v1_1.0_224_quant.tflite .

Replace 192.168.0.104 with the IP of your target.

Convert the model offline with Neutron¶

Run the Neutron converter from the eIQ Toolkit. This generates a Neutron-enabled .tflite that contains a neutronGraph node which the delegate will offload to Neutron-S on the i.MX95.

ubuntu@eiq-focal$ /opt/nxp/eIQ_Toolkit_v1.16.0/bin/neutron-converter/MCU_SDK_25.06.00+Linux_6.12.49_2.2.0/neutron-converter \
    --input  mobilenet_v1_1.0_224_quant.tflite \
    --output mobilenet_v1_1.0_224_neutron.tflite \
    --target imx95
Converting model with the following options:
  Input  = mobilenet_v1_1.0_224_quant.tflite
  Output = mobilenet_v1_1.0_224_neutron.tflite
  Target = imx95
Starting Tile scheduling. This might take a while.
[===============================================================================>] 99 %
Starting TCM allocation. This might take a while.
[================================================================================] 100 %
Conversion statistics:
  Number of operators after import    = 31
  Number of operators after optimize  = 47
    Number of operators converted     = 44
    Number of operators NOT converted = 3
  Number of operators after extract   = 4
    Number of Neutron graphs          = 1
    Number of operators NOT converted = 3
  Operator conversion ratio           = 44 / 47 = 0.93617
  Operators converted                 = 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,
Time for optimization = 0.654027 (seconds)
Time for extraction   = 0.00802751 (seconds)
Time for generation   = 3.13225 (seconds)

Copy the converted model back to the target¶

Transfer the converted model to your i.MX95 home directory.

ubuntu@eiq-focal$ scp mobilenet_v1_1.0_224_neutron.tflite [email protected]:

Run inference on the target using the Neutron delegate¶

Use the label_image example with the external delegate and the converted model.

root@imx95-var-dart:/usr/bin/tensorflow-lite-2.18.0/examples# \
  ./label_image \
    --external_delegate_path=/usr/lib/libneutron_delegate.so \
    -m ~/mobilenet_v1_1.0_224_neutron.tflite

INFO: Loaded model /root/mobilenet_v1_1.0_224_neutron.tflite
INFO: resolved reporter
INFO: EXTERNAL delegate created.
INFO: NeutronDelegate delegate: 1 nodes delegated out of 4 nodes with 1 partitions.
INFO: Neutron delegate version: v1.0.0-a5d640e6, zerocp enabled.
INFO: Applied EXTERNAL delegate.
Error in cpuinfo: prctl(PR_SVE_GET_VL) failed
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
INFO: invoked
INFO: average time: 1.363 ms
INFO: 0.729412: 653 military uniform
INFO: 0.164706: 907 Windsor tie
INFO: 0.0196078: 458 bow tie
INFO: 0.00784314: 835 suit
INFO: 0.00784314: 466 bulletproof vest

Conclusion¶

This flow demonstrates offline compilation with the eIQ Toolkit on a host PC and hardware-accelerated inference on i.MX95 using the Neutron delegate. Convert once on the host, deploy the Neutron-enabled model to the target, and run using --external_delegate_path=/usr/lib/libneutron_delegate.so for acceleration.