Modular Framework

Installation

You can install the Modular Framework using conda and python.

Create a conda environment and activate it.

conda create -n prompt python=3.10
conda activate prompt

Install the dependencies.

git clone https://github.com/facebookresearch/sam2.git && cd sam2
pip install -e .
pip install huggingface_hub flask opencv-python python-dotenv

Configuration

The Modular Framework involves a series of configurations, including API keys for calling MLLM and local port configurations.

Configuring API Key

Modify the file baselines/modular_framework/modular_framework/modular_server/.env to add your API key.

As you may use a proxy, both the Base URL and API Key configurations are included.

An example file is shown below:

BASE_URL=https://api.openai.com/v1
API_KEY=sk-proj-1234567890
X_URL=http://api.XXX.com/v1
X_API_KEY=sk-proj-1234567890

Additionally, you may use different APIs for different models, and hence need to configure forwarding rules. Edit the baselines/modular_framework/modular_framework/modular_server/utils/gpt_utils.py file and add your forwarding rules.

model_route = {
    "gpt-4.5-preview": ["BASE_URL", "API_KEY"],
    "gpt-4.5-preview-2025-02-27": ["BASE_URL", "API_KEY"],
    "gpt-4o": ["BASE_URL", "API_KEY"],
    "gpt-4o-2024-05-13": ["BASE_URL", "API_KEY"],
    "gpt-4o-2024-08-06": ["BASE_URL", "API_KEY"],
    "gpt-4o-2024-11-20": ["BASE_URL", "API_KEY"],
    "gpt-4o-mini": ["BASE_URL", "API_KEY"],
    "gpt-4o-mini-2024-07-18": ["BASE_URL", "API_KEY"],
    "Qwen/Qwen2.5-VL-72B-Instruct": ["SILICON_URL", "SILICON_KEY"],
    "claude-3-5-sonnet-20240620": ["X_URL", "X_API_KEY"],
    "claude-3-5-sonnet-20241022": ["X_URL", "X_API_KEY"],
    "claude-3-7-sonnet-20250219": ["X_URL", "X_API_KEY"],
    "claude-3-7-sonnet-20250219-thinking": ["X_URL", "X_API_KEY"],
    "gemini-2.0-flash": ["Y_URL", "Y_API_KEY"],
    "gemini-2.0-pro-exp-02-05": ["Y_URL", "Y_API_KEY"],
    "gemini-2.0-flash-thinking-exp": ["Y_URL", "Y_API_KEY"],
}

Afterwards, you can run a total of four Flask services, which are as follows:

Filename	Port	Description
`modular_server/prompt_app.py`	5004	Used for calling the Modular Framework
`modular_server/SAM_app.py`	5002	Used for calling SAM
`modular_server/task_completion_checker_app.py`	5009	Used for checking if the task is completed
`modular_server/task_planning_app.py`	5008	Used for task planning

If you intend to deploy these four services and the full Modular Framework in a distributed manner, you need to modify the baselines/modular_framework/modular_framework/modular_server/config.py file.

class Config:
    PROMPT_PIPELINE_ADDRESS = "127.0.0.1"
    PROMPT_PIPELINE_PORT = "5004"
    CHECK_FINISHED_PIPELINE_ADDRESS = "127.0.0.1"
    CHECK_FINISHED_PIPELINE_PORT = "5009"
    TASK_SPLIT_PIPELINE_ADDRESS = "127.0.0.1"
    TASK_SPLIT_PIPELINE_PORT = "5008"
    debug_text = True

Usage

Once the four services are up and running (it is recommended to use tmux for running them), you can execute the complete Modular Framework.

cd baselines/modular_framework/deploy_sim
python run.py -r 10000 -s 10001 --model_name gpt-4o --P2P True --CtoF False

In a separate terminal, run the eval_V3.py file to start the evaluation process.

python eval_V3.py -cfg configs/tasks/genmanipbench.yml -r 10001 -s 10000