Skip to content

Modular Framework

Installation

You can install the Modular Framework using conda and python.

  1. Create a conda environment and activate it.
    Terminal window
    conda create -n prompt python=3.10
    conda activate prompt
  2. Install the dependencies.
    Terminal window
    git clone https://github.com/facebookresearch/sam2.git && cd sam2
    pip install -e .
    pip install huggingface_hub flask opencv-python python-dotenv

Configuration

The Modular Framework involves a series of configurations, including API keys for calling MLLM and local port configurations.

Configuring API Key

Modify the file baselines/modular_framework/modular_framework/modular_server/.env to add your API key.

As you may use a proxy, both the Base URL and API Key configurations are included.

An example file is shown below:

baselines/modular_framework/modular_framework/modular_server/.env
BASE_URL=https://api.openai.com/v1
API_KEY=sk-proj-1234567890
X_URL=http://api.XXX.com/v1
X_API_KEY=sk-proj-1234567890

Additionally, you may use different APIs for different models, and hence need to configure forwarding rules. Edit the baselines/modular_framework/modular_framework/modular_server/utils/gpt_utils.py file and add your forwarding rules.

baselines/modular_framework/modular_framework/modular_server/utils/gpt_utils.py
model_route = {
"gpt-4.5-preview": ["BASE_URL", "API_KEY"],
"gpt-4.5-preview-2025-02-27": ["BASE_URL", "API_KEY"],
"gpt-4o": ["BASE_URL", "API_KEY"],
"gpt-4o-2024-05-13": ["BASE_URL", "API_KEY"],
"gpt-4o-2024-08-06": ["BASE_URL", "API_KEY"],
"gpt-4o-2024-11-20": ["BASE_URL", "API_KEY"],
"gpt-4o-mini": ["BASE_URL", "API_KEY"],
"gpt-4o-mini-2024-07-18": ["BASE_URL", "API_KEY"],
"Qwen/Qwen2.5-VL-72B-Instruct": ["SILICON_URL", "SILICON_KEY"],
"claude-3-5-sonnet-20240620": ["X_URL", "X_API_KEY"],
"claude-3-5-sonnet-20241022": ["X_URL", "X_API_KEY"],
"claude-3-7-sonnet-20250219": ["X_URL", "X_API_KEY"],
"claude-3-7-sonnet-20250219-thinking": ["X_URL", "X_API_KEY"],
"gemini-2.0-flash": ["Y_URL", "Y_API_KEY"],
"gemini-2.0-pro-exp-02-05": ["Y_URL", "Y_API_KEY"],
"gemini-2.0-flash-thinking-exp": ["Y_URL", "Y_API_KEY"],
}

Afterwards, you can run a total of four Flask services, which are as follows:

FilenamePortDescription
modular_server/prompt_app.py5004Used for calling the Modular Framework
modular_server/SAM_app.py5002Used for calling SAM
modular_server/task_completion_checker_app.py5009Used for checking if the task is completed
modular_server/task_planning_app.py5008Used for task planning

If you intend to deploy these four services and the full Modular Framework in a distributed manner, you need to modify the baselines/modular_framework/modular_framework/modular_server/config.py file.

baselines/modular_framework/modular_framework/modular_server/config.py
class Config:
PROMPT_PIPELINE_ADDRESS = "127.0.0.1"
PROMPT_PIPELINE_PORT = "5004"
CHECK_FINISHED_PIPELINE_ADDRESS = "127.0.0.1"
CHECK_FINISHED_PIPELINE_PORT = "5009"
TASK_SPLIT_PIPELINE_ADDRESS = "127.0.0.1"
TASK_SPLIT_PIPELINE_PORT = "5008"
debug_text = True

Usage

Once the four services are up and running (it is recommended to use tmux for running them), you can execute the complete Modular Framework.

Terminal window
cd baselines/modular_framework/deploy_sim
python run.py -r 10000 -s 10001 --model_name gpt-4o --P2P True --CtoF False

In a separate terminal, run the eval_V3.py file to start the evaluation process.

Terminal window
python eval_V3.py -cfg configs/tasks/genmanipbench.yml -r 10001 -s 10000