Task Planner
This agent leverages a LLM (GPT-4o) to perform its three designated tasks: Recaptioning, Temporal Composition and Task Decomposition.
Trajectory Editor
If trajectory information is identified by the Task Planner, the Trajectory Editor prompts GPT-4o to generate a mathematical function to compute trajectory coordinates for the pelvis joint. This guides the entire generation sequence to follow the explicited path by the user.
Motion Generator
After the text processing agents finalized their tasks, the Motion Generator agent, through SPAM, generates and edit a motion sequence with the fine-grained instructions.
Motion Reviewer
Powered by an instruction fine-tuned VLM module (VideoChat2) using the colored human models rendered through Blender, the Motion Reviewer captions the sequence generated by SPAM and applies cosine similarity to gauge its relevancy to the original user prompt. If not above the arbitrary threshold, this agent initiates the self-correction pipeline.