Sreetama Sarkar

prof_pic_sreetama.jpg

I am a 4th year PhD student at EESSC lab in the University of Southern California, advised by Prof. Peter Beerel. My research interests involve energy efficiency and trustworthiness in multi-modal models. My recent works include efficient fine-tuning and inference of Vision Transformer (ViT), Vision Language Models (VLMs) and hallucination mitigation in VLMs.

Before this, I completed Master of Science in Communication Engineering from the Technical University of Munich. I conducted my Master’s thesis on Robustness aware Pruning methods for Convolutional Neural Networks in the Autonomous Driving Group at BMW. I completed my BTech with a gold medal in Electronics and Communications Engineering from the National Institute of Technology, Durgapur, India.

In my free time, I enjoy cooking as it helps me unwind, and I love to try out different cuisines. I also enjoy swimming, biking and playing badminton. My creative pursuits include painting and dancing.

news

Aug 20, 2025 Our paper Mitigating Hallucinations in Vision-Language Models through Image-Guided Head Suppression has been accepted at EMNLP Main 2025!
May 19, 2025 Joined Samsung Research America as a Research Scientist Intern in the Visual Display Intelligence Lab! My research focus is Improving Efficiency in Vision Language Models.
Jan 21, 2025 Our paper Region Masking to Accelerate Video Processing on Neuromorphic Hardware, in collaboration with Intel Labs, accepted for ORAL presentation at ISQED 2025!
Oct 29, 2024 MaskVD accepted at WACV 2025!
Aug 15, 2024 Awarded the Annenberg Endowed Graduate Fellowship 2024-2025 at USC!
Aug 05, 2024 FixPix accepted at ICPR 2024! See you in Kolkata :relaxed:
Jul 16, 2024 Our recent work MaskVD where we explore region masking for efficient video inference in now on arxiv!
Jun 27, 2024 My video won the 2-minute Video Contest at DAC Young Fellows Program 2024! :sparkles:
Apr 10, 2024 Nominated as Outstanding Mentor in the Viterbi Graduate Mentorship Program!
Apr 08, 2024 Two papers, RLNet and BSR accepted for ORAL presentation at CVPR Workshops TCV and ECV(overall acceptance rate: 32.6%)!

selected publications

  1. EMNLP
    Mitigating Hallucinations in Vision-Language Models through Image-Guided Head Suppression
    Sreetama Sarkar, Yue Che, Alex Gavin, and 2 more authors
    2025
  2. WACV
    MaskVD: Region Masking for Efficient Video Object Detection
    Sreetama Sarkar, Gourav Datta, Souvik Kundu, and 3 more authors
    2025
  3. CVPRW
    Block Selective Reprogramming for On-device Training of Vision Transformers
    Sreetama Sarkar, Souvik Kundu, Kai Zheng, and 1 more author
    In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024
  4. CVPRW
    RLNet: Robust Linearized Networks for Efficient Private Inference
    Sreetama Sarkar, Souvik Kundu, and Peter A. Beerel
    In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024
  5. DAC
    Accelerating and pruning cnns for semantic segmentation on fpga
    Pierpaolo Morı̀, Manoj-Rohit Vemparala, Nael Fasfous, and 8 more authors
    In Proceedings of the 59th ACM/IEEE Design Automation Conference, 2022
  6. CVPRW
    Adversarial robust model compression using in-train pruning
    Manoj-Rohit Vemparala, Nael Fasfous, Alexander Frickenstein, and 8 more authors
    In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021