Communication Protocol
As introduced in the Quick Start section through the use of fake_port
, the platform employs a socket-based communication mechanism to exchange structured data between the simulation controller and the inference model.
You can extend the example in fake_port
to integrate your own model controller for benchmark evaluation.
Outgoing Data
At each simulation timestep, the system sends the following structured data through a socket. Note that in practice, you might receive additional fields — those are deprecated and will be removed in future releases.
data = { "camera_data": camera_data, // Dict, camera observations "instruction": instruction, // String, task instruction "joint_position_state": joint_position_state, // np.ndarray, current joint positions "ee_pose_state": ee_pose_state, // list[np.ndarray] or list[list[np.ndarray]], end-effector pose(s) "timestep": step, // Int, current simulation step "reset": reset, // Bool, whether to reset the environment or model}
Field Definitions
Field Name | Type | Description |
---|---|---|
camera_data | dict | Data captured from the robot-mounted cameras. |
instruction | string | The natural language command or task instruction given to the agent. |
joint_position_state | np.ndarray (shape (9,) ) | Current joint angles of the robot arm (e.g., Franka). |
ee_pose_state | list[np.ndarray ] or list[list[np.ndarray ]] | End-effector pose(s) of the robot in the robot frame. For single-arm robots, this is a list of two arrays — translation (3D) and orientation (4D, scalar-first quaternion). For dual-arm robots, it is a nested list for left and right arms, each with its own translation and orientation. |
timestep | int | The current simulation step index in the rollout. |
reset | bool | Whether the environment or model should reset. |
The model or controller must then respond with a structured action message.
Camera Data Structure
Each entry in the camera_data
dictionary contains the following fields:
Field Name | Type | Description |
---|---|---|
p | np.ndarray (shape (3,) ) | Camera position in world coordinates. |
q | np.ndarray (shape (4,) ) | Camera orientation in world coordinates (quaternion, scalar-first). |
rgb | np.ndarray | RGB image array. The resolution depends on the configuration in configs/cameras/ . |
depth | np.ndarray | Depth image array. |
intrinsics_matrix | np.ndarray (shape (3, 3) ) | Camera intrinsic matrix. |
Returned Action
The model must return an action dictionary, which can represent either joint positions or end-effector (ee) poses.
Joint Position Mode
Currently, delta joint positions are supported. If your model outputs absolute joint positions, you should convert them to deltas as shown below:
while True: data = wait_message(receive_socket) if data["reset"]: last_joint_position = data["joint_position_state"] processed_data = process_data(data) action = model.inference(processed_data) delta_joint_position[:7] = action[:7] - last_joint_position[:7] last_joint_position = action action = {"action": delta_joint_position} send_message(send_socket, action)
Supported Formats
-
Franka with Panda Hand
- Array of length 9
- First 7 elements: delta arm joint positions
- Last 2 elements: gripper control
[0.04, 0.04]
= fully open,[0.0, 0.0]
= fully closed
-
Franka with RoboTiq Hand
- Array of length 13
- First 7 elements: delta arm joint positions
- Last 6 elements: gripper control
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
= fully open[0.7853, 0.7853, -0.7853, -0.7853, -0.7853, -0.7853]
= fully closed
-
Aloha
- Array of length 16
- 0:6 and 8:14: delta joint positions of left and right arms
- 6:8 and 14:16: gripper control
[0.05, 0.05]
= fully open,[0.0, 0.0]
= fully closed
End-Effector Pose Mode
In this mode, delta ee poses are supported. If your model outputs absolute poses, convert them into deltas accordingly.
Supported Formats
-
Franka with Panda Hand
-
Tuple of length 3:
- 3D translation,
- 4D quaternion orientation (scalar-first),
- 2D gripper control (
[0.04, 0.04]
= open,[0.0, 0.0]
= closed)
-
-
Franka with RoboTiq Hand
-
Tuple of length 3:
- 3D translation,
- 4D quaternion orientation (scalar-first),
- 6D gripper control (
[0.0,...]
= open,[0.7853,...]
= closed)
-
-
Aloha
-
Tuple of length 2: for left and right arms
-
Each arm’s tuple includes:
- 3D translation,
- 4D quaternion orientation (scalar-first),
- 2D gripper control (
[0.05, 0.05]
= open,[0.0, 0.0]
= closed)
-