Sreetama Sarkar

I am a 4^th year PhD student at EESSC lab in the University of Southern California, advised by Prof. Peter Beerel. My research interests involve energy efficiency and trustworthiness in multi-modal models. My recent works include efficient fine-tuning and inference of Vision Transformer (ViT), Vision Language Models (VLMs) and hallucination mitigation in VLMs.

Before this, I completed Master of Science in Communication Engineering from the Technical University of Munich. I conducted my Master’s thesis on Robustness aware Pruning methods for Convolutional Neural Networks in the Autonomous Driving Group at BMW. I completed my BTech with a gold medal in Electronics and Communications Engineering from the National Institute of Technology, Durgapur, India.

In my free time, I enjoy cooking as it helps me unwind, and I love to try out different cuisines. I also enjoy swimming, biking and playing badminton. My creative pursuits include painting and dancing.

news

May 18, 2026	Joined Dolby Laboratories as a PhD Research Intern in the Multimodal Perception Lab.
May 04, 2026	🏆 Honored to be awarded the USC WiSE Merit Award 2026-2027! 🌟
Apr 15, 2026	Successfully passed my PhD Qualifying Exam! ✨
Apr 15, 2026	🏆 Honored to be selected for participating in the 13th Heidelberg Laureate Forum 2026! 🌟
Aug 20, 2025	Our paper Mitigating Hallucinations in Vision-Language Models through Image-Guided Head Suppression has been accepted at EMNLP Main 2025!
May 19, 2025	Joined Samsung Research America as a Research Scientist Intern in the Visual Display Intelligence Lab! My research focus is Improving Efficiency in Vision Language Models.
Jan 21, 2025	Our paper Region Masking to Accelerate Video Processing on Neuromorphic Hardware, in collaboration with Intel Labs, accepted for ORAL presentation at ISQED 2025!
Oct 29, 2024	MaskVD accepted at WACV 2025!
Aug 15, 2024	🏆 Awarded the Annenberg Endowed Graduate Fellowship 2024-2025 at USC!
Aug 05, 2024	FixPix accepted at ICPR 2024! See you in Kolkata

selected publications

CVPR

RedVTP: Training-Free Acceleration of Diffusion Vision-Language Models Inference via Masked Token-Guided Visual Token Pruning

Jingqi Xu, Jingxi Lu, Chenghao Li, and 3 more authors

In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Findings, Jun 2026

Bib PDF

@inproceedings{Xu_2026_CVPR,
  author = {Xu, Jingqi and Lu, Jingxi and Li, Chenghao and Sarkar, Sreetama and Kundu, Souvik and A Beerel, Peter},
  title = {RedVTP: Training-Free Acceleration of  Diffusion Vision-Language Models Inference  via Masked Token-Guided Visual Token Pruning},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Findings},
  month = jun,
  year = {2026},
}

EMNLP

Mitigating Hallucinations in Vision-Language Models through Image-Guided Head Suppression

Sreetama Sarkar, Yue Che, Alex Gavin, and 2 more authors

Jun 2025

Bib PDF Code

@misc{sarkar2025spin,
  title = {Mitigating Hallucinations in Vision-Language Models through Image-Guided Head Suppression},
  author = {Sarkar, Sreetama and Che, Yue and Gavin, Alex and Beerel, Peter A. and Kundu, Souvik},
  year = {2025},
  booktitle = {EMNLP Main 2025},
}

WACV

MaskVD: Region Masking for Efficient Video Object Detection

Sreetama Sarkar, Gourav Datta, Souvik Kundu, and 3 more authors

Jun 2025

Bib PDF Code

@article{sarkar2024maskvd,
  title = {MaskVD: Region Masking for Efficient Video Object Detection},
  author = {Sarkar, Sreetama and Datta, Gourav and Kundu, Souvik and Zheng, Kai and Bhattacharyya, Chirayata and Beerel, Peter A},
  year = {2025},
  eprint = {2407.12067},
  archiveprefix = {arXiv},
  primaryclass = {cs.CV},
}

CVPRW

Block Selective Reprogramming for On-device Training of Vision Transformers

Sreetama Sarkar, Souvik Kundu, Kai Zheng, and 1 more author

In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, Jun 2024

Bib PDF Code

@inproceedings{sarkarECV24,
  author = {Sarkar, Sreetama and Kundu, Souvik and Zheng, Kai and Beerel, Peter},
  title = {Block Selective Reprogramming for On-device Training of Vision Transformers},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
  year = {2024},
}

CVPRW

RLNet: Robust Linearized Networks for Efficient Private Inference

Sreetama Sarkar, Souvik Kundu, and Peter A. Beerel

In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, Jun 2024

Bib PDF Code

@inproceedings{sarkar2024rlnet,
  author = {Sarkar, Sreetama and Kundu, Souvik and Beerel, Peter A.},
  title = {RLNet: Robust Linearized Networks for Efficient Private Inference},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
  year = {2024},
  pages = {244-253},
}

DAC

Accelerating and pruning cnns for semantic segmentation on fpga

Pierpaolo Morı̀, Manoj-Rohit Vemparala, Nael Fasfous, and 8 more authors

In Proceedings of the 59th ACM/IEEE Design Automation Conference, Jun 2022

Bib PDF

@inproceedings{mori2022accelerating,
  title = {Accelerating and pruning cnns for semantic segmentation on fpga},
  author = {Mor{\`\i}, Pierpaolo and Vemparala, Manoj-Rohit and Fasfous, Nael and Mitra, Saptarshi and Sarkar, Sreetama and Frickenstein, Alexander and Frickenstein, Lukas and Helms, Domenik and Nagaraja, Naveen Shankar and Stechele, Walter and others},
  booktitle = {Proceedings of the 59th ACM/IEEE Design Automation Conference},
  pages = {145--150},
  year = {2022},
}

CVPRW

Adversarial robust model compression using in-train pruning

Manoj-Rohit Vemparala, Nael Fasfous, Alexander Frickenstein, and 8 more authors

In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun 2021

Bib PDF

@inproceedings{vemparala2021adversarial,
  title = {Adversarial robust model compression using in-train pruning},
  author = {Vemparala, Manoj-Rohit and Fasfous, Nael and Frickenstein, Alexander and Sarkar, Sreetama and Zhao, Qi and Kuhn, Sabine and Frickenstein, Lukas and Singh, Anmol and Unger, Christian and Nagaraja, Naveen-Shankar and others},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages = {66--75},
  year = {2021},
}