Vision AI Meets Multi-Modal Detection: A Unified Framework for Intelligent Perception

Anvita Savalagi; Anusha Bidarkundi; Arpita Deshapande; Bhumika Jambagi; Sandeep N. Kugali

Authors

Anvita Savalagi Student, Department of Information Science and Engineering, Basaveshwar Engineering College, Bagalkote, India
Anusha Bidarkundi Student, Department of Information Science and Engineering, Basaveshwar Engineering College, Bagalkote, India
Arpita Deshapande Student, Department of Information Science and Engineering, Basaveshwar Engineering College, Bagalkote, India
Bhumika Jambagi Student, Department of Information Science and Engineering, Basaveshwar Engineering College, Bagalkote, India
Sandeep N. Kugali Assistant Professor, Department of Information Science and Engineering, Basaveshwar Engineering College, Bagalkote, India

Abstract

The Vision AI – Multimodal Detection Suite is an advanced artificial intelligence system that integrates five core visual processing capabilities into a single, unified platform. It is designed to perform real-time face detection, object recognition, text extraction, language translation, and barcode management. The face detection module uses pre-trained models like YOLO and OpenCV to accurately identify human faces in images or video streams. The object recognition component leverages YOLOv8 to detect and classify multiple objects simultaneously, making it suitable for surveillance and automation. Text extraction is handled through Tesseract OCR, enabling the system to read and digitize printed or handwritten text from visual data. This text can then be translated into various languages using Google Translate API, facilitating multilingual communication. Additionally, the barcode scanning feature employs pyzbar libraries to detect and decode standard barcodes and QR codes, which is useful in inventory and retail applications. Built using Python, OpenCV, Flask, and front-end technologies like HTML, CSS, and JavaScript, this suite offers a user-friendly interface and showcases the practical implementation of multimodal AI for smart environments, automation, and accessibility.

Downloads

Download data is not yet available.

Vision AI Meets Multi-Modal Detection: A Unified Framework for Intelligent Perception

Authors

Abstract

Downloads

Downloads

Published

Issue

Section

License

How to Cite

For Authors

Submit Paper Online

Submit Paper by email

Contact Us

Indexing/Abstracting

WhatsApp