Learning-Based Manipulation Baselines
Installation & Environment Setup
We recommend creating two separate Conda environments:
gr1
for training and evaluation the GR1 model.genmanip
for running the GenManip evaluation utilities.
This separation helps avoid dependency conflicts, especially for packages like torch
or robotic toolkits.
- Create a Conda environment:
Terminal window conda create -n gr1 python=3.10conda activate gr1 - Install dependencies:
Terminal window pip install -r requirements.txtpip install git+https://github.com/openai/CLIP.gitpip3 install roboticstoolbox-python
Preparation for Training
Dataset Downloads
When using simulated data for training and evaluation, you can link the datasets into the saved/demonstrations
directory under the GenManip
project. The recommended folder structure under saved/
is as follows.
Data Preparation
Data Format & Structure
Directorysaved/
DirectoryGR1/ # Checkpoints and training outputs of the GR1 model
- …
DirectoryTraining_log/
- …
Directorydemonstrations/ # Training data, organized by task
Directorytask_1/
Directoryepisode_1/
- …
Directoryepisode_2/
- …
Directorytask_2/
Directoryepisode_1/
- …
Directoryepisode_2/
- …
Directoryeval_results/
- …
Directorypublic/
- ViT-B-32.pt
Directorytasks/ # Evaluation datasets
Directorytask_1/
- …
Directorytask_2/
- …
Directoryvit_mae/
- mae_pretrain_vit_base.pth
Checkpoint Downloads
GR1 requires pretrained checkpoints from MAE and CLIP models. Please download the following files and place them under the recommended folders in your project directory, according to the structure described above:
-
MAE Checkpoint Download link: MAE Pretrained ViT-Base
Save to
GenManip/saved/vit_mae/mae_pretrain_vit_base.pth
-
CLIP ViT-B/32 Checkpoint Download link: CLIP ViT-B/32
Save to
GenManip/saved/public/ViT-B-32.pt
Make sure the checkpoints are correctly placed before running training or evaluation, as GR1 depends on these pretrained vision models.
Training
- Begin by executing
utils/stat_utils_args.py
to define the training dataset to be used by the GR1 model. Before running the script, make sure to set theroot_path
variable to the absolute path of your local GenManip repository:After execution, check thatif __name__ == "__main__":args = parse_args()root_path = "YOUR_GENMANIP_PATH"make_dir(f"{root_path}/baselines/learning_based_framework/GR1/data_info/")baselines/learning_based_framework/GR1/data_info/
contains the expected metadata files in both.json
and.pkl
formats. - A complete example for running distributed training of the GR1 model is provided below. Some parameters may need to be customized before running:
banana_dataset_name
: The directory undersaved/demonstrations/
that contains the training episodes for a specific task, such as"banana"
.banana_dataset_info_name
: The metadata file name underbaselines/learning_based_framework/GR1/data_info/
, typically aligned with the dataset name. For example,"banana_200"
indicates the dataset contains 200 training episodes.
#!/bin/bash# -------- Networking & NCCL Config --------export NCCL_DEBUG=INFOexport NCCL_IB_DISABLE=0export NCCL_IB_HCA=mlx5_bond_0export NCCL_SOCKET_IFNAME=eth0export NCCL_IB_GID_INDEX=3# Using async gradient all reduceexport CUDA_DEVICE_MAX_CONNECTIONS=1# -------- Environment -------source /<your_path>/miniconda3/etc/profile.d/conda.shconda activate gr1# -------- Training Configuration --------NODES=1NPROC_PER_NODE=8NODE_RANK=0MASTER_ADDR=127.0.0.1MASTER_PORT="29501"BATCH_JOB_ID=$(date +"%Y%m%d_%H%M%S")ROOT_DIR="/<your_path>/GenManip-Sim/"save_checkpoint_path="${ROOT_DIR}/saved/GR1"vit_checkpoint_path="${ROOT_DIR}/saved/vit_mae/mae_pretrain_vit_base.pth" # downloaded from https://drive.google.com/file/d/1bSsvRI4mDM3Gg51C6xO0l9CbojYw3OEt/view?usp=sharingclip_checkpoint_path="${ROOT_DIR}/saved/public/ViT-B-32.pt"data_dir="${ROOT_DIR}/saved/demonstrations/"model_name="<model_name>"wandb_project="<wandb_project_name>"banana_dataset_name="<banana_dataset_name>"banana_dataset_info_name="<banana_dataset_info_name>"OUTPUT_LOG="${ROOT_DIR}/saved/Training_log/${model_name}/${BATCH_JOB_ID}_${NODE_RANK}.log"mkdir -p "$(dirname "$OUTPUT_LOG")"OBS_CAMERA_TYPE="obs_camera"# -------- Launch Training --------torchrun --nnodes="${NODES}" \--node_rank="${NODE_RANK}" \--nproc_per_node="${NPROC_PER_NODE}" \--master_addr="${MASTER_ADDR}" \--master_port="${MASTER_PORT}" \train.py \--traj_cons \--rgb_pad 10 \--gripper_pad 4 \--gradient_accumulation_steps 1 \--bf16_module "vision_encoder" \--vit_checkpoint_path ${vit_checkpoint_path} \--calvin_dataset "" \--workers 8 \--clip_checkpoint_path ${clip_checkpoint_path} \--lr_scheduler cosine \--save_every_iter 100000 \--num_epochs 40 \--seed 42 \--batch_size 32 \--precision fp32 \--learning_rate 1e-3 \--save_checkpoint \--finetune_type banana \--root_dir ${data_dir} \--wandb_project ${wandb_project} \--report_to_wandb \--weight_decay 1e-4 \--num_resampler_query 6 \--run_name ${model_name} \--save_checkpoint_path ${save_checkpoint_path} \--except_lang \--transformer_layers 24 \--phase "finetune" \--action_pred_steps 1 \--sequence_length 10 \--future_steps 3 \--window_size 13 \--obs_pred \--loss_action \--loss_image \--save_checkpoint_seq 1 \--start_save_checkpoint -1 \--warmup_epochs 5 \--gripper_width \--banana_dataset_names ${banana_dataset_name} \--dataset_info_names ${banana_dataset_info_name} \--action_type "delta_qpos" \--use_aug_data \--obs_type ${OBS_CAMERA_TYPE}
Evaluation
To evaluate the model on specific tasks, you should place the corresponding scene data under saved/tasks/
, using a directory name that matches the evaluation config name.
Additionally, ensure that the “model_name” and “banana_dataset_name” specified in the evaluation scripts are consistent with those used during training.
- The following example provides a ready-to-use script for evaluating the GR1 model.Before executing the evaluation script, navigate to the directory saved/baselines/learning_based_framework/GR1/.
#!/bin/bash# ===== Environment Setup =====source /path/to/miniconda3/etc/profile.d/conda.shconda activate gr1# ===== Configuration =====config_name="<config_name>" # Config name (should match a .yml file)epoch=39 # Checkpoint epochROOT_DIR="/path/to/GenManip" # Root path of the projectMODEL_NAME="GR1"model_name="<model_name>"banana_dataset_name="<banana_dataset_name>"# Checkpoint pathsresume_from_checkpoint="${ROOT_DIR}/saved/${MODEL_NAME}/${model_name}/${epoch}.pth"clip_checkpoint_path="${ROOT_DIR}/saved/public/ViT-B-32.pt"vit_checkpoint_path="${ROOT_DIR}/saved/vit_mae/mae_pretrain_vit_base.pth"# Evaluation config pathCONFIG_PATH="${ROOT_DIR}/configs/tasks/${config_name}.yml"# Random ports to avoid conflictsreceive_port=$((RANDOM % 50000 + 10000))send_port=$((RANDOM % 50000 + 10000))master_port=$((RANDOM % 50000 + 10000))OBS_CAMERA_TYPE="obs_camera"# ===== Launch Controller =====python -m torch.distributed.run \--nnodes=1 \--nproc_per_node=1 \--master_port=${master_port} \controller/controller.py \--traj_cons \--rgb_pad 10 \--gripper_pad 4 \--bf16_module vision_encoder \--vit_checkpoint_path ${vit_checkpoint_path} \--clip_checkpoint_path ${clip_checkpoint_path} \--workers 16 \--seed 42 \--batch_size 64 \--precision fp32 \--num_resampler_query 6 \--run_name test \--transformer_layers 24 \--phase evaluate \--finetune_type real \--action_pred_steps 1 \--sequence_length 10 \--future_steps 3 \--window_size 13 \--obs_pred \--resume_from_checkpoint ${resume_from_checkpoint} \--real_eval_max_steps 600 \--banana_dataset_names ${banana_dataset_name} \--gripper_width \--eval_libero_ensembling \--action_type delta_qpos \--obs_type ${OBS_CAMERA_TYPE} \--receive_port ${receive_port} \--send_port ${send_port} &# Wait for controller to startsleep 10# ===== Launch Evaluation =====conda activate genmanippython ${ROOT_DIR}/eval_V3.py \--receive_port ${send_port} \--send_port ${receive_port} \--config ${CONFIG_PATH}# Wait for background processeswait $!