Research Scientist — Multimodal AI

Build AI systems that understand the world through vision, language, audio, and beyond — the foundation of genuine general intelligence.

About This Role

AGI must perceive the world the way humans do — across multiple modalities simultaneously. This role pushes the state of the art in multimodal understanding and generation.

Responsibilities

▸Research and develop architectures that unify vision, language, and audio understanding
▸Design multimodal pretraining objectives and evaluation benchmarks
▸Run experiments on large-scale image-text and video-text datasets
▸Collaborate with the alignment team to ensure multimodal models remain safe
▸Publish research and contribute to the open-source multimodal community

Requirements

▸PhD in Computer Vision, NLP, or Machine Learning
▸Strong publication record in multimodal AI, vision-language models, or generative models
▸Expert knowledge of contrastive learning, diffusion models, or autoregressive vision
▸Proficiency with PyTorch and large-scale training infrastructure

Nice to Have

▸Experience with video understanding or audio-visual models
▸Background in human perception or psychophysics
▸Prior work on CLIP, DALL-E, Flamingo, or similar systems

TAL Corp is an equal opportunity employer. We believe the best team reflects the full diversity of humanity — because we are building for all of it.

Apply Through Training

At TAL Corp you don't just send a résumé — you prove yourself. Apply by joining our training program; complete it, and top performers are hired into this role.

1Register & start your 7-day program
2Train, build real skills, earn a credential
3Top performers → straight into hiring

Already training? Log in

← View all openings