Communication Protocol
As introduced in the Quick Start using fake_port
, the platform uses socket-based communication to exchange structured data between the simulation controller and the inference module. You can extend fake_port
by integrating your own model controller to conduct benchmarking.
Outgoing Data
At each simulation timestep, the following structured data is sent through the socket:
data = { "camera_data": camera_data, "instruction": instruction, "joint_position_state": joint_position_state, "franka_hand_pose": franka_hand_pose, "franka_pose": franka_pose, "timestep": step, "reset": reset,}
Field Definitions
Field Name | Type | Description |
---|---|---|
camera_data | dict | Data collected from robot-mounted cameras. |
instruction | string | Natural language instruction or task prompt for the agent. |
joint_position_state | np.ndarray (shape: (9,) ) | Current joint positions of the robot arm (e.g., Franka). |
franka_hand_pose | tuple(np.ndarray , np.ndarray ) | World-frame pose of the Franka hand: position (3,) , quaternion (4,) (scalar-first). |
franka_pose | tuple(np.ndarray , np.ndarray ) | World-frame pose of the Franka base. |
timestep | int | Current timestep index during rollout. |
reset | bool | Whether to reset the environment or the model. |
This data is received by the model or controller, which is expected to return a structured action message in response.
Camera Data Structure
Each camera_data
dictionary entry includes the following fields per camera:
Field Name | Type | Description |
---|---|---|
p | np.ndarray (shape: (3,) ) | Camera position in world coordinates. |
q | np.ndarray (shape: (4,) ) | Camera orientation as a quaternion (scalar-first) in world coordinates. |
rgb | np.ndarray | RGB image array. Resolution depends on configuration in configs/cameras/ . |
depth | np.ndarray | Depth image array. |
intrinsics_matrix | np.ndarray (shape: (3, 3) ) | Camera intrinsics matrix. |
Incoming Data (Returned Action)
The returned data should be an action dictionary. Currently, only Franka robots are supported. The expected return is a 9-dimensional array representing joint control:
-
First 7 elements: Joint angle deltas (in radians). If your model outputs absolute joint positions, you must convert them to deltas. Example:
while True:data = wait_message(receive_socket)if data["reset"]:last_joint_position = data["joint_position_state"]processed_data = process_data(data)action = model.inference(processed_data)delta_joint_position[:7] = action[:7] - last_joint_position[:7]last_joint_position = actionaction = {"action": delta_joint_position}send_message(send_socket, action) -
Last 2 elements: Gripper width control.
[0.04, 0.04]
represents fully open, and[0.0, 0.0]
represents fully closed.
Return in format:
action = {"action": nine-dimensional-array}