--- base_model: Qwen/Qwen-VL-Chat --- # Lumixion-e1-70k-fncall-qlora Lumixion is the first ever vast array of multi-modal function calling models easily available for usage. This is the first iteration finetuned on 70+ samples with qlora and many other optimizations. If you would like to work on real-world multi-modal AI join our discord: [LINK](https://discord.gg/a2FWEDD8HV) ![IMG](img.webp) ## Usage ```python from transformers import AutoTokenizer, AutoModelForCausalLM from transformers.generation import GenerationConfig tokenizer = AutoTokenizer.from_pretrained("AgoraX/Lumixion-e1-70k-fncall-qlora",trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained( "AgoraX/Lumixion-e1-70k-fncall-qlora", # path to the output directory device_map="cuda", trust_remote_code=True ).eval() # 1st dialogue turn query = tokenizer.from_list_format([ {'image': 'https://images.rawpixel.com/image_800/cHJpdmF0ZS9sci9pbWFnZXMvd2Vic2l0ZS8yMDIzLTA4L3Jhd3BpeGVsX29mZmljZV8xNV9waG90b19vZl9hX2RvZ19ydW5uaW5nX3dpdGhfb3duZXJfYXRfcGFya19lcF9mM2I3MDQyZC0zNWJlLTRlMTQtOGZhNy1kY2Q2OWQ1YzQzZjlfMi5qcGc.jpg'}, # Either a local path or an url {'text': "What are the objects in the image? What animals are present? Are there any people in the image?"}, ]) print("sending model to chat") response, history = model.chat(tokenizer, query=query, history=None) print(response) ## How to Get Started with the Model ``` ## output ``` [FUNCTION CALL] {{ 'type': 'object', 'properties': {{ 'objects': {{ 'type': 'array', 'description': 'The objects present in the image.', 'items': {{ 'type': 'string', 'enum': ['dog', 'person', 'tree', 'path', 'sun'] }} }}, 'animals': {{ 'type': 'array', 'description': 'The animals present in the image.', 'items': {{ 'type': 'string', 'enum': ['dog'] }} }}, 'people': {{ 'type': 'boolean', 'description': 'Whether there are people in the image.', 'enum': [true] }} }} }} [EXPECTED OUTPUT] {{ 'objects': ['dog', 'person', 'tree', 'path', 'sun'], 'animals': ['dog'], 'people': true }} ``` ## Model Details ### Model Description - **Developed by:** Agora Research - **Model type:** Vision Language Model - **Language(s) (NLP):** English/Chinese - **Finetuned from model:** Qwen-VL-Chat ### Model Sources [optional] - **Repository:** https://github.com/QwenLM/Qwen-VL - **Paper:** https://arxiv.org/pdf/2308.12966.pdf ## Uses ``` from transformers import AutoTokenizer, AutoModelForCausalLM from transformers.generation import GenerationConfig ``` # Note: The default behavior now has injection attack prevention off. ``` tokenizer = AutoTokenizer.from_pretrained("qwen/Qwen-VL-Chat",trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained( "MODEL_PATH_HERE", # path to the output directory device_map="cuda", trust_remote_code=True ).eval() ``` # Specify hyperparameters for generation (generation_config if transformers < 4.32.0) ``` #model.generation_config = GenerationConfig.from_pretrained("Qwen/Qwen-VL-Chat", trust_remote_code=True) # 1st dialogue turn query = tokenizer.from_list_format([ {'image': 'https://images.rawpixel.com/image_800/cHJpdmF0ZS9sci9pbWFnZXMvd2Vic2l0ZS8yMDIzLTA4L3Jhd3BpeGVsX29mZmljZV8xNV9waG90b19vZl9hX2RvZ19ydW5uaW5nX3dpdGhfb3duZXJfYXRfcGFya19lcF9mM2I3MDQyZC0zNWJlLTRlMTQtOGZhNy1kY2Q2OWQ1YzQzZjlfMi5qcGc.jpg'}, # Either a local path or an url {'text': "What are the objects in the image? What animals are present? Are there any people in the image?"}, ]) print("sending model to chat") response, history = model.chat(tokenizer, query=query, history=None) print(response) ``` # Print Results ``` [FUNCTION CALL] {{ 'type': 'object', 'properties': {{ 'objects': {{ 'type': 'array', 'description': 'The objects present in the image.', 'items': {{ 'type': 'string', 'enum': ['dog', 'person', 'tree', 'path', 'sun'] }} }}, 'animals': {{ 'type': 'array', 'description': 'The animals present in the image.', 'items': {{ 'type': 'string', 'enum': ['dog'] }} }}, 'people': {{ 'type': 'boolean', 'description': 'Whether there are people in the image.', 'enum': [true] }} }} }} [EXPECTED OUTPUT] {{ 'objects': ['dog', 'person', 'tree', 'path', 'sun'], 'animals': ['dog'], 'people': true }} ``` ### Direct Use Just send an image and ask a question in the text. ### Recommendations (recommended) transformers >= 4.32.0 ## How to Get Started with the Model ``` query = tokenizer.from_list_format([ {'image': 'https://images.rawpixel.com/image_800/cHJpdmF0ZS9sci9pbWFnZXMvd2Vic2l0ZS8yMDIzLTA4L3Jhd3BpeGVsX29mZmljZV8xNV9waG90b19vZl9hX2RvZ19ydW5uaW5nX3dpdGhfb3duZXJfYXRfcGFya19lcF9mM2I3MDQyZC0zNWJlLTRlMTQtOGZhNy1kY2Q2OWQ1YzQzZjlfMi5qcGc.jpg'}, # Either a local path or an url {'text': "QUESTIONS/QUERIES GO HERE"}, ]) ``` ## Training Details ### Training Data Custom Function Calling Dataset with 70k examples ### Training Procedure qlora for 3 epochs