EAI Prompt Extractor

This repository provides a lightweight library for stripping the static instruction text from the Embodied AI (EAI) competition prompts. The extractor turns every populated LLM prompt into a compact data structure that only contains the task specific information (object lists, goal states, PDDL snippets, …). It is designed to support downstream prompt modifications.

Installation

The project ships as a small Python package. You can install it in editable mode while working inside this repo:

pip install -e .

The code requires Python 3.9 or newer and has no third-party dependencies beyond the Python standard library.

Install from a private Git repository

For collaborators with access (🫵): you can install directly via pip. Run one of the following inside a virtual environment:

SSH (recommended):

pip install "git+ssh://git@github.com/CinnamonRolls1/eai-extractor.git@main"

You can also pin to a tag or commit:

pip install "git+ssh://git@github.com/CinnamonRolls1/eai-extractor.git@<tag-or-commit>"

Notes:

This installs only into the Python environment where you run pip (venv/Conda recommended). Using the system interpreter makes it available system-wide on that machine.
Import name is extractor.

Quick sanity check after install:

python -c "from extractor import extract_file; print('ok')"

Usage

The main entry points live in extractor.core:

from extractor import extract_file, extract_from_prompt

# Extract the dynamic sections from a prompts json file.
results = extract_file("llm_prompts/behavior_action_sequencing_prompts.json")
first = results[0]
print(first.identifier)
print(first.dynamic_content)

# Extract from a raw prompt string when you already know the prompt type.
from pathlib import Path
import json

payload = json.loads(Path("llm_prompts/behavior_transition_modeling_prompts.json").read_text())
prompt = payload[0]["llm_prompt"]
dynamic_only = extract_from_prompt(prompt, "behavior_transition_modeling")

Each extractor returns a dictionary that only contains the task-specific fields for the corresponding module. See below for the full list of supported modules and their returned fields.

See the unit tests in tests/test_extractors.py for more details on the returned shapes.

If the prompt file name follows the official convention (e.g. behavior_goal_interpretation_prompts.json) the extract_file helper will infer the prompt type automatically. Otherwise you can pass the type explicitly: extract_file(path, prompt_type="virtualhome_subgoal_decomposition").

Inline rendering and JSON export

Once you have an ExtractionResult you can fill one of the editable templates directly in Python. The renderer reuses the same formatting rules as the rewrite CLI, so you can point it either to a single template file or to the templates/ directory that the CLI generates:

from extractor import extract_file, render_and_save

results = extract_file("llm_prompts/virtualhome_goal_interpretation_prompts.json")
first = results[0]

# Render a prompt from a template file (or pass template_string="..." to inline the text).
prompt_text = first.render(
    template_path="outputs/modified/templates/virtualhome_goal_interpretation.md",
)

# Turn it into the JSON entry shape that prompt dumps expect.
entry = first.render_to_entry(
    template_path="outputs/modified/templates/virtualhome_goal_interpretation.md",
)

# Optionally write a batch of prompts back to disk. When template_path is a directory the
# helper loads `<prompt_type>.md` automatically for each result.
render_and_save(
    results[:5],
    "outputs/virtualhome_goal_interpretation_prompts.json",
    template_path="outputs/modified/templates",
)

All rendering helpers also accept a template_string argument, which lets you supply inline template text. In that mode every result in the batch must share the same prompt_type, and the helper takes care of normalising whitespace to match the formatting that the CLI would produce.

Prompt rewrite pipeline

The package also ships with a small pipeline that turns the verbose prompt dumps into editable templates. Install the project (pip install -e .) and run the CLI:

eai-extractor rewrite-prompts

By default this command expects the original json dumps in ./llm_prompts/ and writes the artifacts to ./outputs/modified/:

outputs/modified/
├── templates/                  # Eight markdown files, one per prompt type
├── generated/                  # Rendered prompts (defaults to all entries)
└── render_prompts.py           # Helper script to rebuild the prompts later

Each markdown template contains the static prompt text with placeholder markers such as {{INITIAL_STATES}}. The remainder of the file shows the dynamic text extracted from a sample entry so you can quickly inspect the values. Edit the text around the placeholders to tweak the instructions and then regenerate the prompts with:

python outputs/modified/render_prompts.py

The CLI accepts a few useful flags:

--sample-index N controls which entry is used to populate the example sections of the templates.
--limit N renders only the first N prompts per file (handy while testing).
--skip-render generates the templates and helper script without producing any prompt files.
--overwrite-templates / --overwrite-script allow replacing existing artifacts.

The parity between the generated prompts and the originals is covered by tests/test_prompt_rewriter.py.

Modules and returned fields

behavior_action_sequencing:
- initial_states: list of symbolic facts, each as a Python literal list.
- target_states: list of target symbolic facts, as lists.
- interactable_objects: list of dictionaries with name, category.
virtualhome_action_sequencing:
- objects: list of dictionaries with name, id, properties.
- nodes: list of dictionaries with name, states, properties.
- edges: list of dictionaries with from, from_id, relation, to, to_id.
- node_goals: list of node goal strings.
- edge_goals: list of edge goal strings.
- action_goals: list of action goal strings (may be empty).
behavior_goal_interpretation:
- relevant_objects: mapping from object name to list of possible states.
- initial_states: list of symbolic facts, as lists.
- task_instructions: mapping with goal metadata (e.g., Task Name, Goal Instructions).
virtualhome_goal_interpretation:
- relevant_objects: list of dictionaries with name, initial_states, possible_states.
- relationships: mapping from relationship name to description string.
- goal_name: the goal name string.
- goal_description: the natural language goal description.
behavior_subgoal_decomposition:
- task_name: the task name string.
- relevant_objects: list of dictionaries with name, category.
- initial_states: list of symbolic fact strings.
- goal_states: list of symbolic goal strings.
virtualhome_subgoal_decomposition:
- task_name: the task name string.
- relevant_objects: list of dictionaries with name, category, properties.
- initial_states: list of symbolic fact strings.
- goal_states: list of symbolic goal strings.
- required_actions: list of action names required by the goal (may be empty).
- actions_are_necessary: string indicating if actions are necessary (e.g., "Yes"/"No").
behavior_transition_modeling:
- problem_file: the raw PDDL problem text (string).
- actions: list of PDDL action definitions (each as a string).
virtualhome_transition_modeling:
- problem_file: the raw PDDL problem text (string).
- actions: list of PDDL action definitions (each as a string).

Tests

The tests in the repository uses pytest (this is not a dependency of the package). To run the test suite against the bundled prompt files:

pytest

Project layout

extractor/: Python package with all parsing logic.
llm_prompts/: Original prompt dumps used to validate the extractor.
outputs/: Validated dynamic versions of the jsons with all dynamic fields added.

Name		Name	Last commit message	Last commit date
Latest commit History 279 Commits
external		external
extractor		extractor
llm_prompts		llm_prompts
llm_prompts_test		llm_prompts_test
outputs		outputs
tests		tests
.gitignore		.gitignore
README.md		README.md
process_llm_prompts.py		process_llm_prompts.py
pyproject.toml		pyproject.toml
report.md		report.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EAI Prompt Extractor

Installation

Install from a private Git repository

Usage

Inline rendering and JSON export

Prompt rewrite pipeline

Modules and returned fields

Tests

Project layout

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

EAI Prompt Extractor

Installation

Install from a private Git repository

Usage

Inline rendering and JSON export

Prompt rewrite pipeline

Modules and returned fields

Tests

Project layout

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages