RehabGen

Generative Expert-in-the-Loop Vision-Language Model for Everyday Contextualized Rehabilitation Exercises (IMWUT'26)

(1)KAIST     (2)ASU     (3)Columbia University

🔎 RehabGen is a vision-language model that analyzes everyday egocentric images to generate personalized, clinically grounded rehabilitation exercise sequences.

Abstract

Self-directed rehabilitation in everyday environments remains difficult due to limited feedback, low adherence, and the absence of contextually grounded guidance. We present RehabGen, a generative vision-language model(VLM) for contextualized rehabilitation planning that integrates egocentric images, symptom descriptions, patient goals, quantitative assessments, and retrieval-augmented clinical context to generate personalized exercise sequences. Through preliminary interview and focus group interview(n=8), we identified domain requirements for feasible and object-aware home rehabilitation. We then constructed a 1.66k multimodal expert-preference dataset and refined the model using supervised fine-tuning(SFT) and Direct Preference Optimization(DPO). In Expert Validation(n=23), the DPO model achieved a 57.6% item-level model preference win rate over the Baseline and outperformed Baseline and SFT→DPO in Personalized Recommendation(85.7%), Identify Purpose(63.6%), and Recommendation Satisfaction(66.7%). In Patient Experience Evaluation(n=8), patients used first-person images to validate feasibility. This work contributes an expert-in-the-loop(EITL) framework for clinically grounded, context-aware rehabilitation planning.