Portfolio

Agnus The Troll
An interactive audio-video agent designed to mock and provoke attendees.
- Built a real-time audio-video loop connecting a dynamic microphone and webcam for live speech and visual input.
- Integrated the Gemini 2.5 Live API with WebSockets to stream and play spoken responses in continuous conversation.
- Configured session memory and a simple Gradio interface to let users start, stop and resume interaction seamlessly.
Generative AI Audio Video
Multimodal Alzheimer's Detection
Used spontaneous speech from clinician-patient conversations as a diagnostic signal for automated Alzheimer's detection.
- Designed a scalable ETL pipeline for recordings, handling audio processing, transcription, and feature extraction across multiple models to produce audio and text embeddings
- Developed a training pipeline to evaluate multiple classifiers on audio, text, and combined embeddings.
- Found that text emeddings performed best (82% / 0.83 F1), while combining audio and text improved results over the audio-only baseline (70% / 0.74 F1)
Machine Learning Audio Text