Skip to content

Introduction

GenManip adopts a pipeline-based approach for data generation. In this chapter, we provide a brief overview of the overall workflow and the core ideas behind the data generation process.

Components of Data Generation

According to GenManip’s design, data generation primarily consists of the following key components:

  • Manipulator: The robotic arm used to perform specific tasks.
  • Object Set: The collection of objects used for manipulation within the scene or as background elements.
  • Initial Layout: The relative positions and orientations of objects in the initial state. This can be described using a scene graph or explicit numeric values.
  • Target Layout: The desired arrangement of objects after task execution. In this version, we do not consider intermediate layouts that might change over time (e.g., sequentially placing objects in multiple locations).
  • Action Generation: The process of generating concrete actions to accomplish the task based on the above components. This can be achieved through an ideal General Action Generation method (i.e., an “oracle script” that produces near-optimal actions) or fall back to more hardcoded or RL-based approaches. (Note: RL-based methods are not supported in this version.)

Based on these components, GenManip manages the complexity behind the scenes. Users do not need to handle the implementation details manually — instead, they can simply define configurations via a Config file, and GenManip will automatically carry out the data generation process. Specifically:

  • Manipulator: Automatically configured manipulators ensure physical correctness. Users can specify the initial joint positions and apply various position/orientation randomizations.
  • Object Set: Automatically managed object collections. You can operate directly on objects defined in an imported scene file, or specify three folders containing all objects to be used as manipulation targets, containers, and background objects. At least five background objects should be included to support scaling at different levels of granularity.
  • Initial Layout: Configurable at various levels of granularity, supporting either a scene graph as input or randomized placement within a defined range. For example, Object A can be fixed at a certain position, Object B can be placed to the right of A, and Object C can appear randomly anywhere on the table.
  • Target Layout: Supports simple scene graph descriptions as the target layout.
  • Action Generation: Supports automatic parsing of pick-and-place tasks, enabling scalable data generation and the creation of long-horizon tasks with virtually unlimited extensibility. Future versions will support general 1-DoF articulation data generation. Hardcoded approaches are also supported to produce more uniform datasets that are easier for single models to converge on.
  • Randomization: Provides randomization of textures, lighting, and camera positions. Various camera parameters can be adjusted to align simulated data more closely with real-world setups.

Data Generation and Asset Management

Simulation-based data generation is closely tied to asset management. In GenManip, all assets are stored under the saved/assets directory, and all generated data is managed under saved/demonstrations. All paths are managed relatively, enabling seamless integration with other components or projects. If you wish to store these files elsewhere, you can use ln -s to create symbolic links.

Within the simulation process, data generation is divided into two stages: Planning and Rendering.

  • In the Planning stage, the program generates actions after randomization but does not render any images.
  • In the Rendering stage, the program reconstructs each state from the saved information generated during planning and then performs rendering.

This design significantly reduces wasted time when tasks fail during planning — since rendering is deferred until successful plans are available, no rendering time is lost on failed attempts.

Generate Data by Config

You can generate data simply by config:

Terminal window
python demogen_V4.py -cfg configs/tasks/xxx.yml
python render_V3.py -cfg configs/tasks/xxx.yml

If you want to generate data by yourself, please refer to Generate Data Simply by GenManip for more details.