Vision AI Meets Multi-Modal Detection: A Unified Framework for Intelligent Perception

Authors

  • Anvita Savalagi Student, Department of Information Science and Engineering, Basaveshwar Engineering College, Bagalkote, India
  • Anusha Bidarkundi Student, Department of Information Science and Engineering, Basaveshwar Engineering College, Bagalkote, India
  • Arpita Deshapande Student, Department of Information Science and Engineering, Basaveshwar Engineering College, Bagalkote, India
  • Bhumika Jambagi Student, Department of Information Science and Engineering, Basaveshwar Engineering College, Bagalkote, India
  • Sandeep N. Kugali Assistant Professor, Department of Information Science and Engineering, Basaveshwar Engineering College, Bagalkote, India

Abstract

The Vision AI – Multimodal Detection Suite is an advanced artificial intelligence system that integrates five core visual processing capabilities into a single, unified platform. It is designed to perform real-time face detection, object recognition, text extraction, language translation, and barcode management. The face detection module uses pre-trained models like YOLO and OpenCV to accurately identify human faces in images or video streams. The object recognition component leverages YOLOv8 to detect and classify multiple objects simultaneously, making it suitable for surveillance and automation. Text extraction is handled through Tesseract OCR, enabling the system to read and digitize printed or handwritten text from visual data. This text can then be translated into various languages using Google Translate API, facilitating multilingual communication. Additionally, the barcode scanning feature employs pyzbar libraries to detect and decode standard barcodes and QR codes, which is useful in inventory and retail applications. Built using Python, OpenCV, Flask, and front-end technologies like HTML, CSS, and JavaScript, this suite offers a user-friendly interface and showcases the practical implementation of multimodal AI for smart environments, automation, and accessibility.

Downloads

Download data is not yet available.

Downloads

Published

30-06-2025

Issue

Section

Articles

How to Cite

[1]
A. Savalagi, A. Bidarkundi, A. Deshapande, B. Jambagi, and S. N. Kugali, “Vision AI Meets Multi-Modal Detection: A Unified Framework for Intelligent Perception”, IJRESM, vol. 8, no. 6, pp. 144–150, Jun. 2025, Accessed: Jul. 04, 2025. [Online]. Available: https://journal.ijresm.com/index.php/ijresm/article/view/3313