How to run this model on mobile device

#9
by sadaqathunzai - opened

since there is lot of onnx files in this repository , which is the main file or main model , to run on mobile device using onnx runtime on android and ios

Thanks in advance

ONNX Community org

The full-precision model is model.onnx for the graph (and model.onnx_data for the weights). However, if you're looking to run on mobile, you'd probably want to choose a quantized version. I'd recommend one of the following:

  • model_quantized.onnx - optimized for CPU
  • model_q4.onnx - optimized for GPU
  • model_q4f16.onnx - optimized for GPU w/ fp16 support

image.png

sadaqathunzai changed discussion status to closed

does this code valid , the input etc

Future _runInference() async {

//

final prompt = "hi";
final tokenizedInput = tokenizeInput(prompt); // Custom tokenization

//
final sessionOptions = OrtSessionOptions();
const assetFileName = 'assets/models/model_quantized.onnx';
final rawAssetFile = await rootBundle.load(assetFileName);
final bytes = rawAssetFile.buffer.asUint8List();
final session = OrtSession.fromBuffer(bytes, sessionOptions);

 final inputTensor = OrtValueTensor.createTensorWithDataList(tokenizedInput, [1, tokenizedInput.length]);
final inputs = {'input': inputTensor};

final runOptions = OrtRunOptions();
final outputs = await session.runAsync(runOptions, inputs);

final output = outputs?.first?.value;
// Process the output tensor
final result = 'Inference result: $output';

setState(() {
  _result = result;
});

inputTensor.release();
runOptions.release();

outputs!.forEach((element) {
  element?.release();
});
session.release();

}

List<double> tokenizeInput(String input) {
// Placeholder for actual tokenization logic
// Example: Convert text to a list of floats (e.g., ASCII values)
return input.codeUnits.map((unit) => unit.toDouble()).toList();

}

String processOutput(OrtValueTensor output) {
// Placeholder for processing output tensor (e.g., convert back to text)
return 'Output: ${output.value}';
}

sadaqathunzai changed discussion status to open

i am getting this error

Dart Error: NewExternalTypedData expects argument 'length' to be in the range [0..1073741823].

working code with transformers.js

import { AutoTokenizer, AutoModelForCausalLM, TextStreamer } from "@huggingface/transformers.js";

const stream = false;
const model_name = "onnx-community/DeepSeek-R1-Distill-Qwen-1.5B-ONNX";
const prompt = "check this email xnohat@email.com is valid or not, answer only in json format not extra text {valid: true/false}";

// Create tokenizer and model directly instead of using pipeline
const tokenizer = await AutoTokenizer.from_pretrained(
  model_name
);

const model = await AutoModelForCausalLM.from_pretrained(
  model_name,
  { 
    dtype: "q4f16",
    device: "webgpu"
  }
);

// Define the chat messages
const messages = [
  { role: "user", content: prompt },
];

// Prepare input using chat template
const inputs = tokenizer.apply_chat_template(messages, {
  add_generation_prompt: true,
  return_dict: true,
});

// Create streamer if streaming is enabled
const streamer = stream ? new TextStreamer(tokenizer, {
  skip_prompt: true,
  skip_special_tokens: true,
  callback_function: (output) => {
    console.log('Streaming output:', output);
  }
}) : null;

// Generate with proper model.generate() call
const { sequences } = await model.generate({
  ...inputs,
  do_sample: false,
  max_new_tokens: 4096,
  temperature: 0.4,
  repetition_penalty: 1.5,
  return_dict_in_generate: true,
  ...(stream && { streamer }) // Only include streamer if streaming is enabled
});

// Decode the output properly using batch_decode
const response = tokenizer.batch_decode(sequences, {
  skip_special_tokens: true,
});

console.log('Final response:', response[0]);

Sign up or log in to comment