CV

General Information

Full Name Sangeet Sagar
Languages Hindi (Native), English (C1), German (A2)

Education

Experience

  • Sept 2023 - July 2025
    Research Engineer for Automatic Speech Recognition
    EML Speech Technology GmbH, Munich, Germany
    • Leading the development of end-to-end models to facilitate real-time Automatic Speech Recognition (ASR) during live conferences within a commercial setting.
    • Developed a C++ runtime for a streaming faster Conformer-Transducer (NeMo) and integrated its CPU-based decoder with our in-house end-to-end ASR decoder.
    • Initiated and guided the integration of a target speaker extraction system into the core ASR pipeline, improving WER by 18% on overlapping speech and enabling deployment in challenging multi-speaker scenarios.
  • May 2023 - Sept 2023
    Speech-to-Text Intern
    Airbus Defence and Space GmbH, Munich, Germany
    • Utilized SOTA models such as the Wav2Vec2 and Whisper ASR models to enhance communication between pilots and air traffic control (ATC) by developing state-of-the-art speech-to-text systems for aerospace domain data.
  • June 2021 - Feb 2023
    Research Assistant | HiWi
    German Research Center for Artificial Intelligence (DFKI) GmbH, Saarbrücken, Germany
    • Designed and developed a noise-robust automatic speech recognition system (STT) (German language) as a component of MS thesis, enabling functionality under hostile noisy conditions such as search and rescue operations.
    • Trained open-source attention-based BiRNN punctuation restoration system+TruCasing for the German language. The system outperformed the baseline model- Vosk model by over 14% in recall metric.
  • Oct. 2019 - Dec 2020
    University research assistant
    Institute of Formal and Applied Linguistics, Charles University, Prague, Czechia
    • Served as the principal tester and evaluator for a live speech-language translation (SLT) system (ELITR project), identifying key failure points and providing critical feedback.
    • Training and testing (with in-domain/out-of-domain data) Czech punctuator system for live ASR, thereby improving the usability of live ASR transcripts.
  • Feb. 2019 - Sept. 2019
    University research assistant
    Faculty of Information Technology, Brno University of Technology, Brno, Czechia
    • Developed and implemented a novel system for cross-lingual topic identification in low-resource languages (Kinyarwanda, Zulu, Hindi), achieving a weighted average precision of 0.52 on Kinyarwanda by building upon a linear transformation technique to English embedding space.
    • Managed core tasks including text feature extraction, classifier training, and embedding generation for cross-lingual analysis.

Open Source Projects

  • 2022
    English and Chinese poetry generation
    • English poetry generation and comparison (using PPL as evaluation metric) using LSTM, encoder-decoder based models, and transformer model (GPT). We conclude that fine-tuning GPT-2 model generated the highest quality poems i.e. with the least PPL score. Also trained a topic-prediction model to study how well a machine-generated poem is interpreted by a system trained on human-written poems. Report | Code
  • 2022
    Span extraction based slot-filling using attention and RNNs
    • Performed slot-filling (and intent recognition) using RNN on a multi-head attention mechanism. The main idea was to model the slot-filling task as a span extraction and to utilize available information about slot type for which value is to be provided. F1 score of 0.83 was achieved compared to 0.96 of the baseline model. Report | Code
  • 2022
    Evaluating and defending against stealthy backdoor attacks
    • Presented defense strategies that counter backdoor attacks. These defenses significantly decrease the attack success rate (by 77%) on specific samples designed by the attacker. We do this by transforming each input such that the trigger words get replaced in the input, and the attack is not triggered. These defenses can be used without any significant runtime costs. Report
  • 2021
    Out-of-vocabulary (OOV) word estimation using subword representation
    • Achieved a better OOV rate and perplexity score than the baseline for three levels of granularity (char level, small, large vocab) with appropriate hyperparameter tuning. This was done by training RNN based language model to artificially generate corpus and compute OOV rate on varying sizes of the generated corpora. Report | Code
  • 2021
    LSTM Based Parts-of-Speech Tagger
    • Utilized LSTM-based models with word embeddings that use sub-word information to perform POS tagging. It included LSTM models and Bi-LSTM models and their comparison with a bigram HMM-based parts-of-speech tagger that leverages Viterbi algorithm. Report | Code
  • 2019
    Low Resource Languages for Emergent Incidents (LORELEI)
    • Performed topic identification based on multinomial sub-space models (for the DARPA funded Lorelei program) on low resource languages. Achieved a remarkable weighted average precision score of 0.5212 (on 10 different topics) on Kinyarwanda language. PPT | Code
  • 2018
    (Bachelors Thesis) Analysis of Emotion Recognition using Speech Features
    • 7% improvisation over baseline on the classification of speech signals based on human emotions like angry, disgust, fear, happy, etc. Implemented on SAVEE and Emo-DB datasets using classifiers like GMM, CNN, MLPNN, we propose the use of feature extraction algorithms like S-Transform and image spectrogram of the speech signal to perform emotion recognition. Report | Code

Programming Skills

  • Programming Languages: Python, C++, Bash, MATLAB
  • Libraries/Frameworks: PyTorch, K2/Icefall, SpeechBrain, Huggingface
  • Tools & Platforms: Docker, Git, AWS, HPC (SLURM), Adv. Linux user