Skip to content

Learning-Based Manipulation Baselines

Installation & Environment Setup

We recommend creating two separate Conda environments:

  • gr1 for training and evaluation the GR1 model.
  • genmanip for running the GenManip evaluation utilities.

This separation helps avoid dependency conflicts, especially for packages like torch or robotic toolkits.

  1. Create a Conda environment:
    Terminal window
    conda create -n gr1 python=3.10
    conda activate gr1
  2. Install dependencies:
    Terminal window
    pip install -r requirements.txt
    pip install git+https://github.com/openai/CLIP.git
    pip3 install roboticstoolbox-python

Preparation for Training

Dataset Downloads

When using simulated data for training and evaluation, you can link the datasets into the saved/demonstrations directory under the GenManip project. The recommended folder structure under saved/ is as follows.

Data Preparation

Data Format & Structure

  • Directorysaved/
    • DirectoryGR1/ # Checkpoints and training outputs of the GR1 model
    • DirectoryTraining_log/
    • Directorydemonstrations/ # Training data, organized by task
      • Directorytask_1/
        • Directoryepisode_1/
        • Directoryepisode_2/
      • Directorytask_2/
        • Directoryepisode_1/
        • Directoryepisode_2/
    • Directoryeval_results/
    • Directorypublic/
      • ViT-B-32.pt
    • Directorytasks/ # Evaluation datasets
      • Directorytask_1/
      • Directorytask_2/
    • Directoryvit_mae/
      • mae_pretrain_vit_base.pth

Checkpoint Downloads

GR1 requires pretrained checkpoints from MAE and CLIP models. Please download the following files and place them under the recommended folders in your project directory, according to the structure described above:

  1. MAE Checkpoint Download link: MAE Pretrained ViT-Base

    Save to GenManip/saved/vit_mae/mae_pretrain_vit_base.pth

  2. CLIP ViT-B/32 Checkpoint Download link: CLIP ViT-B/32

    Save to GenManip/saved/public/ViT-B-32.pt

Make sure the checkpoints are correctly placed before running training or evaluation, as GR1 depends on these pretrained vision models.

Training

  1. Begin by executing utils/stat_utils_args.py to define the training dataset to be used by the GR1 model. Before running the script, make sure to set the root_path variable to the absolute path of your local GenManip repository:
    if __name__ == "__main__":
    args = parse_args()
    root_path = "YOUR_GENMANIP_PATH"
    make_dir(f"{root_path}/baselines/learning_based_framework/GR1/data_info/")
    After execution, check that baselines/learning_based_framework/GR1/data_info/ contains the expected metadata files in both .json and .pkl formats.
  2. A complete example for running distributed training of the GR1 model is provided below. Some parameters may need to be customized before running:
    • banana_dataset_name: The directory under saved/demonstrations/ that contains the training episodes for a specific task, such as "banana".
    • banana_dataset_info_name: The metadata file name under baselines/learning_based_framework/GR1/data_info/, typically aligned with the dataset name. For example, "banana_200" indicates the dataset contains 200 training episodes.
    #!/bin/bash
    # -------- Networking & NCCL Config --------
    export NCCL_DEBUG=INFO
    export NCCL_IB_DISABLE=0
    export NCCL_IB_HCA=mlx5_bond_0
    export NCCL_SOCKET_IFNAME=eth0
    export NCCL_IB_GID_INDEX=3
    # Using async gradient all reduce
    export CUDA_DEVICE_MAX_CONNECTIONS=1
    # -------- Environment -------
    source /<your_path>/miniconda3/etc/profile.d/conda.sh
    conda activate gr1
    # -------- Training Configuration --------
    NODES=1
    NPROC_PER_NODE=8
    NODE_RANK=0
    MASTER_ADDR=127.0.0.1
    MASTER_PORT="29501"
    BATCH_JOB_ID=$(date +"%Y%m%d_%H%M%S")
    ROOT_DIR="/<your_path>/GenManip-Sim/"
    save_checkpoint_path="${ROOT_DIR}/saved/GR1"
    vit_checkpoint_path="${ROOT_DIR}/saved/vit_mae/mae_pretrain_vit_base.pth" # downloaded from https://drive.google.com/file/d/1bSsvRI4mDM3Gg51C6xO0l9CbojYw3OEt/view?usp=sharing
    clip_checkpoint_path="${ROOT_DIR}/saved/public/ViT-B-32.pt"
    data_dir="${ROOT_DIR}/saved/demonstrations/"
    model_name="<model_name>"
    wandb_project="<wandb_project_name>"
    banana_dataset_name="<banana_dataset_name>"
    banana_dataset_info_name="<banana_dataset_info_name>"
    OUTPUT_LOG="${ROOT_DIR}/saved/Training_log/${model_name}/${BATCH_JOB_ID}_${NODE_RANK}.log"
    mkdir -p "$(dirname "$OUTPUT_LOG")"
    OBS_CAMERA_TYPE="obs_camera"
    # -------- Launch Training --------
    torchrun --nnodes="${NODES}" \
    --node_rank="${NODE_RANK}" \
    --nproc_per_node="${NPROC_PER_NODE}" \
    --master_addr="${MASTER_ADDR}" \
    --master_port="${MASTER_PORT}" \
    train.py \
    --traj_cons \
    --rgb_pad 10 \
    --gripper_pad 4 \
    --gradient_accumulation_steps 1 \
    --bf16_module "vision_encoder" \
    --vit_checkpoint_path ${vit_checkpoint_path} \
    --calvin_dataset "" \
    --workers 8 \
    --clip_checkpoint_path ${clip_checkpoint_path} \
    --lr_scheduler cosine \
    --save_every_iter 100000 \
    --num_epochs 40 \
    --seed 42 \
    --batch_size 32 \
    --precision fp32 \
    --learning_rate 1e-3 \
    --save_checkpoint \
    --finetune_type banana \
    --root_dir ${data_dir} \
    --wandb_project ${wandb_project} \
    --report_to_wandb \
    --weight_decay 1e-4 \
    --num_resampler_query 6 \
    --run_name ${model_name} \
    --save_checkpoint_path ${save_checkpoint_path} \
    --except_lang \
    --transformer_layers 24 \
    --phase "finetune" \
    --action_pred_steps 1 \
    --sequence_length 10 \
    --future_steps 3 \
    --window_size 13 \
    --obs_pred \
    --loss_action \
    --loss_image \
    --save_checkpoint_seq 1 \
    --start_save_checkpoint -1 \
    --warmup_epochs 5 \
    --gripper_width \
    --banana_dataset_names ${banana_dataset_name} \
    --dataset_info_names ${banana_dataset_info_name} \
    --action_type "delta_qpos" \
    --use_aug_data \
    --obs_type ${OBS_CAMERA_TYPE}

Evaluation

To evaluate the model on specific tasks, you should place the corresponding scene data under saved/tasks/, using a directory name that matches the evaluation config name.

Additionally, ensure that the “model_name” and “banana_dataset_name” specified in the evaluation scripts are consistent with those used during training.

  1. The following example provides a ready-to-use script for evaluating the GR1 model.Before executing the evaluation script, navigate to the directory saved/baselines/learning_based_framework/GR1/.
    #!/bin/bash
    # ===== Environment Setup =====
    source /path/to/miniconda3/etc/profile.d/conda.sh
    conda activate gr1
    # ===== Configuration =====
    config_name="<config_name>" # Config name (should match a .yml file)
    epoch=39 # Checkpoint epoch
    ROOT_DIR="/path/to/GenManip" # Root path of the project
    MODEL_NAME="GR1"
    model_name="<model_name>"
    banana_dataset_name="<banana_dataset_name>"
    # Checkpoint paths
    resume_from_checkpoint="${ROOT_DIR}/saved/${MODEL_NAME}/${model_name}/${epoch}.pth"
    clip_checkpoint_path="${ROOT_DIR}/saved/public/ViT-B-32.pt"
    vit_checkpoint_path="${ROOT_DIR}/saved/vit_mae/mae_pretrain_vit_base.pth"
    # Evaluation config path
    CONFIG_PATH="${ROOT_DIR}/configs/tasks/${config_name}.yml"
    # Random ports to avoid conflicts
    receive_port=$((RANDOM % 50000 + 10000))
    send_port=$((RANDOM % 50000 + 10000))
    master_port=$((RANDOM % 50000 + 10000))
    OBS_CAMERA_TYPE="obs_camera"
    # ===== Launch Controller =====
    python -m torch.distributed.run \
    --nnodes=1 \
    --nproc_per_node=1 \
    --master_port=${master_port} \
    controller/controller.py \
    --traj_cons \
    --rgb_pad 10 \
    --gripper_pad 4 \
    --bf16_module vision_encoder \
    --vit_checkpoint_path ${vit_checkpoint_path} \
    --clip_checkpoint_path ${clip_checkpoint_path} \
    --workers 16 \
    --seed 42 \
    --batch_size 64 \
    --precision fp32 \
    --num_resampler_query 6 \
    --run_name test \
    --transformer_layers 24 \
    --phase evaluate \
    --finetune_type real \
    --action_pred_steps 1 \
    --sequence_length 10 \
    --future_steps 3 \
    --window_size 13 \
    --obs_pred \
    --resume_from_checkpoint ${resume_from_checkpoint} \
    --real_eval_max_steps 600 \
    --banana_dataset_names ${banana_dataset_name} \
    --gripper_width \
    --eval_libero_ensembling \
    --action_type delta_qpos \
    --obs_type ${OBS_CAMERA_TYPE} \
    --receive_port ${receive_port} \
    --send_port ${send_port} &
    # Wait for controller to start
    sleep 10
    # ===== Launch Evaluation =====
    conda activate genmanip
    python ${ROOT_DIR}/eval_V3.py \
    --receive_port ${send_port} \
    --send_port ${receive_port} \
    --config ${CONFIG_PATH}
    # Wait for background processes
    wait $!