Karun Sharma

I am a final year Artificial Intelligence and CS Undergrad student, currently working at Georgia Institute of Technology as a Research Intern under Prof. Vijay K Madisetti on Multi-modality visual grounding. Prior to this I worked at Zocket AI as Computer Vision intern. My research interests lie in Multimodal Learning(Any-To-Any Modals), Embodied AI.

Email  /  GitHub  /  LinkedIn

profile photo

Experience

project image

Research Intern - Georgia Institute of Technology


08 - 2024

Working on Multimodal Visual grounding on images and videos using Open-Vocab Computer Vision techniques.

project image

Computer Vision Intern - Zocket AI


02 - 2024

Worked on content moderation engine for images generated by our AI Models according to policies of various social media sites. Trained and tuned background removal model for fine-grained and smooth output.




Research

I'm interested in Computer Vision, Multimodals, Machine Learning, Optimization

project image

LLaVA-PlantDiag: Integrating Large-scale Vision-Language Abilities for Conversational Plant Pathology Diagnosis


Karun Sharma, Vidushee Vats, Abhinendra Singh, Rahul Sahani, Dr. Deepak Rai, Dr. Ashok Sharma
Preprint, 2024
website /

LLaVA-PlantDiag, is a conversational AI system designed for plant pathology. We use visual instruction tuning for model finetuning. Our model outperforms others like GPT-4 Vision and Gemini, We also release first multimodal data on plant-pathology.

project image

An Improved Hybrid Model for Target Detection


Umesh Gupta, Richa Golash, Vidushee Vats, Karun Sharma
International Conference on Emerging Techniques in Computational Intelligence, 2023
IEEE /

We worked on developing a refined model (YOLO and R-CNN Family) for detecting multiple objects by fusing thermal and visible images. The fusion techniques, including Multiscale Fusion, Channel-Based Fusion, and Blind Source Separation, significantly improve target detection in hazardous environments, enhancing safety and security in critical applications like autonomous driving and surveillance.




Projects





Design and source code from Jon Barron's website