Modular Framework
Installation
You can install the Modular Framework using conda
and python
.
- Create a conda environment and activate it.
Terminal window conda create -n prompt python=3.10conda activate prompt - Install the dependencies.
Terminal window git clone https://github.com/facebookresearch/sam2.git && cd sam2pip install -e .pip install huggingface_hub flask opencv-python python-dotenv
Configuration
The Modular Framework involves a series of configurations, including API keys for calling MLLM and local port configurations.
Configuring API Key
Modify the file baselines/modular_framework/modular_framework/modular_server/.env
to add your API key.
As you may use a proxy, both the Base URL and API Key configurations are included.
An example file is shown below:
BASE_URL=https://api.openai.com/v1API_KEY=sk-proj-1234567890X_URL=http://api.XXX.com/v1X_API_KEY=sk-proj-1234567890
Additionally, you may use different APIs for different models, and hence need to configure forwarding rules. Edit the baselines/modular_framework/modular_framework/modular_server/utils/gpt_utils.py
file and add your forwarding rules.
model_route = { "gpt-4.5-preview": ["BASE_URL", "API_KEY"], "gpt-4.5-preview-2025-02-27": ["BASE_URL", "API_KEY"], "gpt-4o": ["BASE_URL", "API_KEY"], "gpt-4o-2024-05-13": ["BASE_URL", "API_KEY"], "gpt-4o-2024-08-06": ["BASE_URL", "API_KEY"], "gpt-4o-2024-11-20": ["BASE_URL", "API_KEY"], "gpt-4o-mini": ["BASE_URL", "API_KEY"], "gpt-4o-mini-2024-07-18": ["BASE_URL", "API_KEY"], "Qwen/Qwen2.5-VL-72B-Instruct": ["SILICON_URL", "SILICON_KEY"], "claude-3-5-sonnet-20240620": ["X_URL", "X_API_KEY"], "claude-3-5-sonnet-20241022": ["X_URL", "X_API_KEY"], "claude-3-7-sonnet-20250219": ["X_URL", "X_API_KEY"], "claude-3-7-sonnet-20250219-thinking": ["X_URL", "X_API_KEY"], "gemini-2.0-flash": ["Y_URL", "Y_API_KEY"], "gemini-2.0-pro-exp-02-05": ["Y_URL", "Y_API_KEY"], "gemini-2.0-flash-thinking-exp": ["Y_URL", "Y_API_KEY"],}
Afterwards, you can run a total of four Flask services, which are as follows:
Filename | Port | Description |
---|---|---|
modular_server/prompt_app.py | 5004 | Used for calling the Modular Framework |
modular_server/SAM_app.py | 5002 | Used for calling SAM |
modular_server/task_completion_checker_app.py | 5009 | Used for checking if the task is completed |
modular_server/task_planning_app.py | 5008 | Used for task planning |
If you intend to deploy these four services and the full Modular Framework in a distributed manner, you need to modify the baselines/modular_framework/modular_framework/modular_server/config.py
file.
class Config: PROMPT_PIPELINE_ADDRESS = "127.0.0.1" PROMPT_PIPELINE_PORT = "5004" CHECK_FINISHED_PIPELINE_ADDRESS = "127.0.0.1" CHECK_FINISHED_PIPELINE_PORT = "5009" TASK_SPLIT_PIPELINE_ADDRESS = "127.0.0.1" TASK_SPLIT_PIPELINE_PORT = "5008" debug_text = True
Usage
Once the four services are up and running (it is recommended to use tmux
for running them), you can execute the complete Modular Framework.
cd baselines/modular_framework/deploy_simpython run.py -r 10000 -s 10001 --model_name gpt-4o --P2P True --CtoF False
In a separate terminal, run the eval_V3.py
file to start the evaluation process.
python eval_V3.py -cfg configs/tasks/genmanipbench.yml -r 10001 -s 10000