RVT-2: Learning Precise Manipulation from Few Examples

Ankit Goyal Valts Blukis Jie Xu Yijie Guo Yu-Wei Chao Dieter Fox

Accepted to RSS 2024

A single RVT-2 model can perform multiple 3D manipulation tasks, including ones requiring millimeter-level precision.

RVT-2 trains significantly faster and achieves higher performance than prior SOTA, RVT and PerAct. Because of the efficiency gains, with the same compute, RVT-2 trains 6X faster than RVT, while improving performance by 19 percentage points. When compared with PerAct, RVT-2 improves the relative performance by 67%.

Summary

In this work, we study how to build a robotic system that can solve multiple 3D manipulation tasks given task instructions. To be useful in various domains like manufacturing and home, such a system capable of solving tasks that require high precision and learn a new tasks with few demonstrations.

Prior works like RVT and PerAct have studied this problem, however, they often struggle with tasks requiring high precision. We build upon prior works to make them more performant, precise and fast. We propose RVT-2, which is 6X faster in training and 2X faster in inference than its predecessor RVT. RVT-2 achieves a new state-of-the-art on multi-task RLBench benchmark, improving the success rate from 65% to 82%. RVT-2 is also effective in the real world, where it can learn tasks requiring high precision like inserting peg from just 10 demonstations.

Experiment

A single RVT-2 model can peform the following tasks in the real world.

Pick and insert plug

Success: Pick and insert plug

Failure: Pick and insert plug

Pick and insert 8mm peg

Success: Pick and insert 8mm peg

Failure: Pick and insert 8mm peg

Pick and insert 16mm peg

Success: Pick and insert 16mm peg

Failure: Pick and insert 16mm peg

Put object in drawer

Success: Put green marker in drawer

Failure: Put blue marker in drawer

Put object in shelf

Success: Put green block in bottom shelf

Success: Put green block in top shelf

Put marker in bowl/mug

Success: Put green marker in bowl

Success: Put blue marker in bowl

Success: Put green marker in mug

Failure: Put blue marker in bowl

Press sanitizer

Success: Press sanitizer

Failure: Press sanitizer

Stack blocks

Success: Put blue block on red block

Success: Put green block on blue block

Success: Put red block on green block

Failure: Put red block in green block

Generalization Case Study

RVT-2 tested on various unseen scenerios on the stack blocks tasks.

Failure Recovery Case Study

RVT-2 demonstrates ability to recover from failures.

BibTeX

@article{goyal2024rvt,
  author    = {Goyal, Ankit and Blukis, Valts and Xu, Jie and Guo, Yijie and Chao, Yu-Wei and Fox, Dieter},
  title     = {RVT2: Learning Precise Manipulation from Few Demonstrations},
  journal   = {RSS},
  year      = {2024},
}