CV
General Information
Full Name | Sangeet Sagar |
Languages | Hindi (Native), English (C1), German (A2) |
Education
- April 2023
M.Sc.
Universität des Saarlandes, Saarbrücken, Germany
- GPA- 1.8 (ECTS, lower is better)
- Thesis- Noise Robust Speech Recognition for Search and Rescue Domain
- Major- Language Science and Technology
- June 2019
B.Tech.
The LNM Institute of Information Technology, Jaipur, India
- GPA- 7.13/10.0
- Thesis- Analysis of Emotion Recognition using Speech Features
- Major- Electronics and Communication Engineering
Experience
- Sept 2023 - July 2025
Research Engineer for Automatic Speech Recognition
EML Speech Technology GmbH, Munich, Germany
- Leading the development of end-to-end models to facilitate real-time Automatic Speech Recognition (ASR) during live conferences within a commercial setting.
- Developed a C++ runtime for a streaming faster Conformer-Transducer (NeMo) and integrated its CPU-based decoder with our in-house end-to-end ASR decoder.
- Initiated and guided the integration of a target speaker extraction system into the core ASR pipeline, improving WER by 18% on overlapping speech and enabling deployment in challenging multi-speaker scenarios.
- May 2023 - Sept 2023
Speech-to-Text Intern
Airbus Defence and Space GmbH, Munich, Germany
- Utilized SOTA models such as the Wav2Vec2 and Whisper ASR models to enhance communication between pilots and air traffic control (ATC) by developing state-of-the-art speech-to-text systems for aerospace domain data.
- June 2021 - Feb 2023
Research Assistant | HiWi
German Research Center for Artificial Intelligence (DFKI) GmbH, Saarbrücken, Germany
- Designed and developed a noise-robust automatic speech recognition system (STT) (German language) as a component of MS thesis, enabling functionality under hostile noisy conditions such as search and rescue operations.
- Trained open-source attention-based BiRNN punctuation restoration system+TruCasing for the German language. The system outperformed the baseline model- Vosk model by over 14% in recall metric.
- Oct. 2019 - Dec 2020
University research assistant
Institute of Formal and Applied Linguistics, Charles University, Prague, Czechia
- Served as the principal tester and evaluator for a live speech-language translation (SLT) system (ELITR project), identifying key failure points and providing critical feedback.
- Training and testing (with in-domain/out-of-domain data) Czech punctuator system for live ASR, thereby improving the usability of live ASR transcripts.
- Feb. 2019 - Sept. 2019
University research assistant
Faculty of Information Technology, Brno University of Technology, Brno, Czechia
- Developed and implemented a novel system for cross-lingual topic identification in low-resource languages (Kinyarwanda, Zulu, Hindi), achieving a weighted average precision of 0.52 on Kinyarwanda by building upon a linear transformation technique to English embedding space.
- Managed core tasks including text feature extraction, classifier training, and embedding generation for cross-lingual analysis.
Open Source Projects
- 2022
English and Chinese poetry generation
- English poetry generation and comparison (using PPL as evaluation metric) using LSTM, encoder-decoder based models, and transformer model (GPT). We conclude that fine-tuning GPT-2 model generated the highest quality poems i.e. with the least PPL score. Also trained a topic-prediction model to study how well a machine-generated poem is interpreted by a system trained on human-written poems. Report | Code
- 2022
Span extraction based slot-filling using attention and RNNs
- Performed slot-filling (and intent recognition) using RNN on a multi-head attention mechanism. The main idea was to model the slot-filling task as a span extraction and to utilize available information about slot type for which value is to be provided. F1 score of 0.83 was achieved compared to 0.96 of the baseline model. Report | Code
- 2022
Evaluating and defending against stealthy backdoor attacks
- Presented defense strategies that counter backdoor attacks. These defenses significantly decrease the attack success rate (by 77%) on specific samples designed by the attacker. We do this by transforming each input such that the trigger words get replaced in the input, and the attack is not triggered. These defenses can be used without any significant runtime costs. Report
- 2021
Out-of-vocabulary (OOV) word estimation using subword representation
- Achieved a better OOV rate and perplexity score than the baseline for three levels of granularity (char level, small, large vocab) with appropriate hyperparameter tuning. This was done by training RNN based language model to artificially generate corpus and compute OOV rate on varying sizes of the generated corpora. Report | Code
- 2019
Low Resource Languages for Emergent Incidents (LORELEI)
- 2018
(Bachelors Thesis) Analysis of Emotion Recognition using Speech Features
- 7% improvisation over baseline on the classification of speech signals based on human emotions like angry, disgust, fear, happy, etc. Implemented on SAVEE and Emo-DB datasets using classifiers like GMM, CNN, MLPNN, we propose the use of feature extraction algorithms like S-Transform and image spectrogram of the speech signal to perform emotion recognition. Report | Code
Programming Skills
- Programming Languages: Python, C++, Bash, MATLAB
- Libraries/Frameworks: PyTorch, K2/Icefall, SpeechBrain, Huggingface
- Tools & Platforms: Docker, Git, AWS, HPC (SLURM), Adv. Linux user