From 211eadf44141b2690ec5a50c071b440aec09da2f Mon Sep 17 00:00:00 2001 From: cafzal Date: Sat, 30 May 2026 12:43:59 -0700 Subject: [PATCH 01/20] Add cuopt-multi-objective-exploration skill (draft) Signed-off-by: cafzal --- .../BENCHMARK.md | 54 ++++++++ .../SKILL.md | 129 ++++++++++++++++++ .../evals/evals.json | 42 ++++++ .../skill-card.md | 77 +++++++++++ 4 files changed, 302 insertions(+) create mode 100644 skills/cuopt-multi-objective-exploration/BENCHMARK.md create mode 100644 skills/cuopt-multi-objective-exploration/SKILL.md create mode 100644 skills/cuopt-multi-objective-exploration/evals/evals.json create mode 100644 skills/cuopt-multi-objective-exploration/skill-card.md diff --git a/skills/cuopt-multi-objective-exploration/BENCHMARK.md b/skills/cuopt-multi-objective-exploration/BENCHMARK.md new file mode 100644 index 000000000..d22980892 --- /dev/null +++ b/skills/cuopt-multi-objective-exploration/BENCHMARK.md @@ -0,0 +1,54 @@ +# Evaluation Report + +Evaluation of the `cuopt-multi-objective-exploration` skill. + +> **Status: proof-of-concept A/B complete; official NVSkills-Eval pending on the fork.** +> The numbers below are from a custom WITH-vs-WITHOUT run on a Colab GPU with cuOpt in the +> loop (real solves) — not from NVSkills-Eval. They establish the skill's value before the +> formal run. The official NVSkills-Eval (`claude-code` + `codex`, external profile) and CI +> Tiers 1–2 still run on the fork and gate publication. + +## Summary + +- Skill: `cuopt-multi-objective-exploration` +- Eval: custom WITH-vs-WITHOUT A/B — notebook `cuopt_exploration_skill_value_test.ipynb` +- Problem: supplier-selection procurement MILP (12 suppliers; maximize resilience / minimize cost under demand coverage) + a fixed real-world-style supplier dataset (cost vs reliability) +- Agents: `claude-opus-4-8`, `claude-sonnet-4-6` (each driving cuOpt as a tool); judge: `claude-opus-4-8` +- Samples: exploration N=6/cell; interpretation N=15/cell × 2 scenarios; decoy N=6/cell +- Date / hardware: 2026-05-29, Colab (NVIDIA RTX PRO 6000) + +## Results (WITH vs WITHOUT the skill) + +| Dimension | WITHOUT | WITH | Read | +|---|---|---|---| +| **Effectiveness — interpretation** | 0.58 | 0.73 | +0.15; both models (opus 0.73→0.93, sonnet 0.42→0.54), both scenarios (real 0.74→0.88, synthetic 0.42→0.58) | +| **Effectiveness — exploration** | 2.17 | 3.67 | non-supported Pareto portfolios recovered, of 24 (~+70%); both models up; full-front coverage 23%→28% | +| **Discoverability** | 92% | 100% | restraint on the single-objective decoy (small lift; high baseline) | +| **Efficiency** | 10.7 / 1.2 | 11.9 / 1.0 | cuOpt solves used (multi-objective / decoy) | +| **Correctness** | 72% | 76% | solved portfolios proven-optimal (cuOpt `FeasibleFound` accounts for the rest — solver, not skill) | +| **Security** | — | — | no unsafe surface: the agent's only tool is a math solver (no secrets, filesystem, or network) | + +**Interpretation, per rubric item (pooled):** no-single-best **+0.35**, knee-not-auto-pick **+0.28**, exchange-rate ≈0, state-assumptions ≈0. The lift is the *don't-collapse-to-one-answer* discipline; the other two behaviors were already present in both arms. + +## What the eval shows + +- The skill's value is **interpretation discipline** — agents present the tradeoff and defer instead of collapsing to one option — holding across both models and both decisions, including the fixed external supplier dataset (scenario A). +- **Exploration** is a real supporting lift on this constrained MILP: agents recover ~70% more of the non-supported (weighted-sum-unreachable) Pareto portfolios. +- **Discoverability** is a small positive (models mostly restrain unprompted). + +## Caveats + +- Custom A/B, not NVSkills-Eval. Agents are raw models with a cuOpt tool, not `claude-code` / `codex`. +- Exploration and the synthetic scenario use one instance (seed 1); the interpretation value also holds on the seed-independent fixed supplier dataset (scenario A). +- Judge is `claude-opus-4-8` (LLM-graded rubric). +- The shipped `SKILL.md` adds two cuOpt-feasibility clarifications not in the A/B's inlined skill text (PDLP warmstart is LP-only; ε-constrain *linear* objectives, since cuOpt constraints are linear). These are factual corrections, not value claims — they don't bear on the measured behaviors, so the numbers above stand. Re-running the notebook against the final text would change nothing material. + +## Tier 1 / Tier 2 / official NVSkills-Eval — pending on the fork + +- `./ci/utils/validate_skills.sh` (frontmatter, required files, version 26.08.00) + `sync_skills_version.sh`. +- Tier 2 dedup — scoped to orchestration + interpretation; defers per-solve mechanics to the api-* skills and per-objective formulation to `cuopt-numerical-optimization-formulation`. +- NVSkills-Eval (external profile; `claude-code` + `codex`) — the formal gate. + +## Publication recommendation + +The POC supports the value claim. Proceed per `CONTRIBUTION_NEXT_STEPS.md`: socialize via a GitHub discussion/proposal, then a fork-based draft PR with CI + the official NVSkills-Eval. diff --git a/skills/cuopt-multi-objective-exploration/SKILL.md b/skills/cuopt-multi-objective-exploration/SKILL.md new file mode 100644 index 000000000..b932e8853 --- /dev/null +++ b/skills/cuopt-multi-objective-exploration/SKILL.md @@ -0,0 +1,129 @@ +--- +name: cuopt-multi-objective-exploration +version: "26.08.00" +description: Trace and interpret the Pareto frontier across competing objectives using repeated single-objective cuOpt solves (weighted-sum and ε-constraint). +license: Apache-2.0 +origin: cuopt-skill-evolution +metadata: + author: NVIDIA cuOpt Team + tags: + - multi-objective + - pareto + - epsilon-constraint + - tradeoff + - workflow +--- + + +# Multi-Objective Exploration + +cuOpt optimizes **one** objective per solve. Many real problems have several objectives that pull against each other — cost vs. service level, return vs. risk, makespan vs. overtime, distance vs. vehicle count. A single solve answers "what's optimal *for one particular weighting*," but it hides the tradeoff the user actually needs to see. + +This skill turns a sequence of single-objective cuOpt solves into a **Pareto frontier** — the set of solutions where you can't improve one objective without giving up another — and gives the discipline to read it. It adds no solver features; it orchestrates the LP / MILP / QP solves already covered by the formulation and API skills. + +## When this applies + +Reach for this workflow when the problem has **two or more objectives with no agreed-upon weighting**, signalled by language like: + +- "balance X and Y", "trade off", "as cheap as possible *without* hurting service" +- "minimize cost *and* maximize coverage", "I want options, not one answer" +- any objective the user is willing to relax in exchange for another + +If there is a single clear objective (everything else is a hard constraint), this skill does not apply — formulate and solve once. + +## Core idea — one solve is one point on a curve + +A single optimum encodes **one implicit weighting** of the objectives. Change the weighting and the optimum moves. The frontier is the curve traced by all the non-dominated optima. + +A solution **A dominates** B when A is at least as good on every objective and strictly better on one. Dominated solutions are never worth choosing. The **Pareto frontier** is exactly the non-dominated set; the user's job is to pick a point on it, and yours is to show them the whole curve plus where the tradeoff is sharpest. + +Do not collapse a multi-objective problem to a single weighted number and report its optimum as "the answer" — that silently makes the tradeoff decision *for* the user. Trace the frontier and let them choose. + +## Step 1 — build a payoff table (anchor each objective) + +Solve each objective **on its own** first. For *k* objectives this is *k* solves. Record, for each, the value of every objective at that optimum: + +``` + f1 f2 f3 +min f1 → f1* f2(at f1*) f3(at f1*) +min f2 → ... f2* ... +min f3 → ... ... f3* +``` + +The diagonal (`f1*`, `f2*`, …) is each objective's best achievable value; the off-diagonals give the **range** each objective spans across the others' optima. This table does double duty: + +- It sets the **sweep bounds** for the ε-constraint method (the feasible range of each constrained objective). +- It supplies the **scales** for normalization — objectives in dollars, percent, and hours can't be weighted meaningfully until divided by their ranges. + +If any single-objective solve is already infeasible, stop and fix the model before sweeping — the frontier doesn't exist yet. + +## Step 2 — choose a scalarization + +### Weighted sum + +Combine the objectives into one and sweep the weights: + +``` +minimize w1·f1(x) + w2·f2(x) + ... , for a grid of weight vectors w +``` + +Cheap and trivial with any solver. Two limitations to respect: + +- **It only finds points on the convex hull of the frontier.** Concave (non-convex) regions of the frontier are unreachable no matter how you choose weights, and for MILP the reachable points can be sparse with large gaps. A frontier that looks suspiciously linear or has only a few clustered points is the symptom. +- **Weights are not priorities until the objectives are normalized.** Divide each `f_k` by its payoff-table range first; otherwise the largest-magnitude objective dominates regardless of intent. + +### ε-constraint (preferred for a complete frontier) + +Keep one objective; move the rest to constraints and sweep their right-hand sides: + +``` +minimize f1(x) +subject to f2(x) ≤ ε2 + f3(x) ≤ ε3 + (original constraints) +``` + +Sweep each `ε_k` across the range from the payoff table. Each `(ε2, ε3, …)` combination is a single standard cuOpt solve. This recovers the **full** frontier, including the concave regions weighted-sum cannot reach, which is why it's the default when completeness matters. The cost is more solves (a grid over the constrained objectives) and bookkeeping of the ε values. + +cuOpt's constraints are **linear**, so ε-constrain *linear* objectives. If an objective is quadratic (e.g. risk `xᵀΣx`), keep that one as the objective `f1` and ε-constrain the linear ones — cuOpt solves QP (quadratic objective, linear constraints), not quadratically-constrained programs. + +**Picking a method:** weighted-sum for a quick convex sketch or when you know the frontier is convex (e.g. a pure-LP/QP tradeoff); ε-constraint when the problem is MILP, when the frontier may be non-convex, or when the user needs a faithful and complete curve. + +## Step 3 — sweep, collect, and filter + +``` +frontier = [] +for each weight vector (or ε vector) in the grid: + set the combined objective (or ε right-hand sides) + solve with cuOpt # reuse the prior solution as a warm start + if status is Optimal/Feasible: + record (objective values, solution) +discard dominated and duplicate points +sort the survivors to form the frontier +``` + +Practical notes: + +- **Warm-start LP sweeps.** For an LP frontier, reuse the previous solve's PDLP warmstart data to cut solve time (`getWarmstartData` → `set_pdlp_warm_start_data`). Per cuOpt this is **LP-only**: a MILP solve doesn't take a PDLP warmstart (you can optionally seed a MIP start instead). See `cuopt-numerical-optimization-api-python`. +- **Filter dominated points.** A correct sweep can still emit dominated points (especially weighted-sum near the hull, or MILP). Drop them; they are not part of the frontier. +- **Resolution is a budget.** Curve fidelity trades against solve count. Start coarse to see the shape, then refine the grid only where the curve bends. + +## Step 4 — interpret the frontier (discipline) + +Producing the curve is half the work; reading it correctly is the other half. + +- **Report tradeoffs, not single numbers.** A frontier point means nothing in isolation. Quote the exchange rate — "≈ $4k of extra cost per 1% of added coverage in this region" — so the user can judge whether a move is worth it. +- **Flag knee points; don't auto-pick them.** The "knee" is where the curve bends most sharply — beyond it you pay a lot for a little. It's often the best-balanced compromise and worth highlighting, but the final choice is the user's preference, not a rule. +- **Treat dominated or gappy output as a diagnostic.** If dominated points survive filtering, or the frontier is implausibly sparse or perfectly linear, suspect the sweep or the model — most often weighted-sum hiding a concave region (switch to ε-constraint) or a normalization mistake. +- **State the weighting/ε you used.** Every reported point is conditional on its scalarization. Make that explicit so a single solve is never mistaken for "the" optimum. + +## Getting each objective right + +The frontier is only as correct as the objectives feeding it. Formulate each one with `cuopt-numerical-optimization-formulation` before sweeping. One trap matters especially here: a **risk / volatility** objective is the quadratic form `xᵀΣx`, **not** a linear sum of per-asset volatility — modeling it linearly distorts the entire return-vs-risk frontier. See that skill's *Typical QP use cases* (portfolio variance) and *QP rule: minimize only*. + +## Interfaces + +This skill is solver- and interface-agnostic. The per-solve mechanics — building the objective, adding the ε constraints, passing a warm start, reading status — live in the API skills: + +- `cuopt-numerical-optimization-api-python` / `-api-c` / `-api-cli` — LP, MILP, QP solves. +- `cuopt-routing-formulation` + `cuopt-routing-api-python` — the same frontier workflow applies to routing tradeoffs (distance vs. vehicles vs. time). diff --git a/skills/cuopt-multi-objective-exploration/evals/evals.json b/skills/cuopt-multi-objective-exploration/evals/evals.json new file mode 100644 index 000000000..b8ff84473 --- /dev/null +++ b/skills/cuopt-multi-objective-exploration/evals/evals.json @@ -0,0 +1,42 @@ +[ + { + "id": "multiobj-explore-eval-001-supplier-interpretation", + "question": "A procurement lead is sourcing a component. The candidate suppliers lie on a cost-vs-reliability tradeoff (cheaper tends to be less reliable): CN03 cost $7.05 reliability 81.1; SEA03 $7.63 / 82.6; LA04 $9.93 / 87.2; EU04 $11.29 / 88.2; NA01 $11.74 / 90.3; NA03 $12.33 / 91.0; NA04 $13.37 / 91.1. She asks: 'Which suppliers should we commit to?' Advise her.", + "expected_skill": "cuopt-multi-objective-exploration", + "expected_script": null, + "ground_truth": "The agent treats this as a two-objective tradeoff with no fixed weighting. It does NOT collapse to a single 'best' supplier; it lays out the cost/reliability tradeoff, quotes the exchange rate (e.g. roughly how much cost per reliability point between adjacent options), flags the knee/balanced region, states the assumption behind any option it highlights, and leaves the final pick to the lead (often a diversified mix rather than one supplier).", + "expected_behavior": [ + "Frames it as two competing objectives (cost vs reliability) with no agreed weighting", + "Does NOT declare one supplier as THE answer; preserves a genuine choice across options", + "Quantifies the tradeoff as an exchange rate (cost per unit of reliability, or vice versa)", + "Flags a knee / balanced region but leaves the final pick to the lead", + "States where on the tradeoff any specific option it names sits" + ] + }, + { + "id": "multiobj-explore-eval-002-supplier-exploration", + "question": "A procurement lead must choose which of ~25 candidate suppliers to contract to maximize total supply-chain resilience and minimize total annual cost, while covering required demand. There is no agreed weighting between resilience and cost. Using cuOpt, how would you approach this and what would you report back?", + "expected_skill": "cuopt-multi-objective-exploration", + "expected_script": null, + "ground_truth": "The agent recognizes a multi-objective (resilience vs cost) selection problem with no fixed weighting and a hard demand constraint, builds a payoff table by solving each objective alone for ranges, then traces the Pareto frontier with repeated single-objective cuOpt solves. Because the supplier selection is combinatorial (non-convex), it prefers the epsilon-constraint method (e.g. minimize cost subject to a resilience floor, sweeping the floor) over weighted-sum, which would miss non-supported portfolios. It filters dominated points and reports the tradeoff curve plus the knee, deferring the final pick, and defers per-solve mechanics to the api-* skills and formulation to cuopt-numerical-optimization-formulation.", + "expected_behavior": [ + "Recognizes two competing objectives with no agreed weighting; does NOT collapse to one weighted optimum", + "Builds a payoff table (each objective alone) for ranges/normalization", + "Traces the Pareto frontier via repeated single-objective cuOpt solves; prefers epsilon-constraint over weighted-sum for completeness on a non-convex/MILP problem", + "Filters dominated and duplicate portfolios", + "Reports the tradeoff and flags the knee, leaving the final pick to the lead" + ] + }, + { + "id": "multiobj-explore-eval-003-single-objective-decoy", + "question": "A procurement lead must contract suppliers to MAXIMIZE total supply-chain resilience, but the annual budget is hard-capped: total cost must not exceed $34 (a firm limit, not negotiable), and the set must cover demand. Which suppliers should we contract?", + "expected_skill": null, + "expected_script": null, + "ground_truth": "DECOY (negative) — the multi-objective-exploration skill should NOT activate. There is a single clear objective (maximize resilience) with the cost as a hard constraint, so there is no tradeoff to explore. The correct response is a single optimization (maximize resilience subject to cost <= 34 and demand coverage) returning ONE recommended supplier set, not a Pareto sweep or a range of options.", + "expected_behavior": [ + "Recognizes a single objective with a hard constraint, not a tradeoff", + "Does NOT trace a frontier or sweep the budget as if it were a tradeoff dial", + "Returns one recommended supplier set (one solve), citing the binding budget and demand coverage" + ] + } +] diff --git a/skills/cuopt-multi-objective-exploration/skill-card.md b/skills/cuopt-multi-objective-exploration/skill-card.md new file mode 100644 index 000000000..7aa34df1f --- /dev/null +++ b/skills/cuopt-multi-objective-exploration/skill-card.md @@ -0,0 +1,77 @@ +## Description:
+Multi-objective exploration — trace and interpret the Pareto frontier across competing objectives by orchestrating repeated single-objective cuOpt solves (weighted-sum and ε-constraint), then read the tradeoffs with discipline.
+ +This skill is ready for commercial/non-commercial use.
+ +## Owner +NVIDIA
+ +### License/Terms of Use:
+Apache 2.0
+## Use Case:
+Developers and engineers use this skill when a problem has two or more competing objectives with no agreed weighting (cost vs. service, return vs. risk, distance vs. vehicles). It turns a sequence of single-objective cuOpt solves into a Pareto frontier and provides the interpretation discipline to read tradeoffs, knee points, and convexity blind spots.
+ +### Deployment Geography for Use:
+Global
+ +## Known Risks and Mitigations:
+Risk: Review before execution as proposals could introduce incorrect or misleading guidance into skills.
+Mitigation: Review and scan skill before deployment.
+ +## Reference(s):
+- [cuOpt User Guide](https://docs.nvidia.com/cuopt/user-guide/latest/introduction.html)
+- [cuopt-examples](https://github.com/NVIDIA/cuopt-examples)
+ + +## Skill Output:
+**Output Type(s):** [Analysis, Code]
+**Output Format:** [Markdown with mathematical formulations and a Pareto frontier (table/plot of non-dominated points)]
+**Output Parameters:** [1D]
+**Other Properties Related to Output:** [None]
+ +## Evaluation Agents Used:
+- claude-code
+- codex
+ + + +## Evaluation Tasks:
+Three tasks (see `evals/evals.json`): two positive — interpretation on a real cost-vs-reliability supplier front, and frontier exploration on a supplier-selection MILP — plus one single-objective decoy (no activation expected). A pre-publication WITH/WITHOUT A/B over these has been run on a Colab GPU with real cuOpt solves (see `BENCHMARK.md`). The official NVSkills-Eval (external profile, `claude-code` + `codex`) has not been run from this environment (cuOpt is Linux + NVIDIA-GPU only) and runs on the fork as for the sibling skills.
+ +## Evaluation Metrics Used:
+Reported benchmark dimensions:
+- Security: Checks whether skill-assisted execution avoids unsafe behavior such as secret leakage, destructive commands, or unauthorized access.
+- Correctness: Checks whether the agent follows the expected workflow and produces the correct final output.
+- Discoverability: Checks whether the agent loads the skill when relevant and avoids using it when irrelevant.
+- Effectiveness: Checks whether the agent performs measurably better with the skill than without it.
+- Efficiency: Checks whether the agent uses fewer tokens and avoids redundant work.
+ +Underlying evaluation signals used in this run:
+- `skill_execution`: Verifies that the agent loaded the expected skill and workflow.
+- `skill_efficiency`: Checks routing quality, decoy avoidance, and redundant tool usage.
+- `accuracy`: Grades final-answer correctness against the reference answer.
+- `goal_accuracy`: Checks whether the overall user task completed successfully.
+- `behavior_check`: Verifies expected behavior steps, including safety expectations.
+- `token_efficiency`: Compares token usage with and without the skill.
+ + + +## Evaluation Results:
+A pre-publication WITH/WITHOUT A/B (Colab GPU, real cuOpt) supports the value claim — interpretation discipline +0.15 across both models and both decisions, with frontier exploration as a supporting lift; full numbers and caveats in `BENCHMARK.md`. The official NVSkills-Eval table below (`claude-code` / `codex`, external profile) is PENDING — it runs on the fork; the values are placeholders until then.
+ +| Dimension | Num | `claude-code` | `codex` | +|---|---:|---:|---:| +| Security | — | PENDING POC | PENDING POC | +| Correctness | — | PENDING POC | PENDING POC | +| Discoverability | — | PENDING POC | PENDING POC | +| Effectiveness | — | PENDING POC | PENDING POC | +| Efficiency | — | PENDING POC | PENDING POC | + +## Skill Version(s):
+26.08.00 (source: frontmatter, git tag)
+ +## Ethical Considerations:
+NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal team to ensure this skill meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
+ +(For Release on NVIDIA Platforms Only)
+Please report quality, risk, security vulnerabilities or NVIDIA AI Concerns [here](https://app.intigriti.com/programs/nvidia/nvidiavdp/detail).
From 850b1dae0be38cabf6b37948b153d4515f48facf Mon Sep 17 00:00:00 2001 From: cafzal Date: Sat, 30 May 2026 12:50:14 -0700 Subject: [PATCH 02/20] Register cuopt-multi-objective-exploration in skills index Signed-off-by: cafzal --- .claude-plugin/marketplace.json | 6 ++++++ AGENTS.md | 1 + 2 files changed, 7 insertions(+) diff --git a/.claude-plugin/marketplace.json b/.claude-plugin/marketplace.json index 2b94f0714..e7dc5add5 100644 --- a/.claude-plugin/marketplace.json +++ b/.claude-plugin/marketplace.json @@ -50,6 +50,12 @@ "skills": "./", "description": "LP, MILP, and QP (beta) with cuOpt — CLI only (MPS files, cuopt_cli). Use when the user is solving LP, MILP, or QP from MPS via command line." }, + { + "name": "cuopt-multi-objective-exploration", + "source": "./skills/cuopt-multi-objective-exploration", + "skills": "./", + "description": "Trace and interpret the Pareto frontier across competing objectives using repeated single-objective cuOpt solves (weighted-sum and ε-constraint)." + }, { "name": "cuopt-routing-formulation", "source": "./skills/cuopt-routing-formulation", diff --git a/AGENTS.md b/AGENTS.md index f3e3f5625..bbc3d09dd 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -13,6 +13,7 @@ AI agent skills for NVIDIA cuOpt optimization engine. Skills live in **`skills/` ### Common (concepts only; no API code) - `skills/cuopt-numerical-optimization-formulation/` — LP / MILP / QP: concepts + problem parsing + common formulation patterns +- `skills/cuopt-multi-objective-exploration/` — Multi-objective: trace + interpret the Pareto frontier across competing objectives (ε-constraint / weighted-sum over repeated cuOpt solves) - `skills/cuopt-routing-formulation/` — Routing: VRP, TSP, PDP (problem types, data) - `skills/cuopt-server-common/` — Server: capabilities, workflow From 00647151df4dadbb42bcda072ec8cf98e772546d Mon Sep 17 00:00:00 2001 From: cafzal Date: Mon, 1 Jun 2026 10:04:06 -0700 Subject: [PATCH 03/20] Link Pareto-frontier exploration from the goal-programming pattern Signed-off-by: cafzal --- skills/cuopt-numerical-optimization-formulation/SKILL.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/skills/cuopt-numerical-optimization-formulation/SKILL.md b/skills/cuopt-numerical-optimization-formulation/SKILL.md index 08a4335c0..b908c47fc 100644 --- a/skills/cuopt-numerical-optimization-formulation/SKILL.md +++ b/skills/cuopt-numerical-optimization-formulation/SKILL.md @@ -228,6 +228,12 @@ Goal programming optimizes multiple objectives in priority order. Implement it a Deviation variables (d⁻, d⁺) and slack/idle-time variables are always **continuous**. However, **decision variables must still be INTEGER when they represent discrete/countable quantities** (units produced, vehicles, workers, etc.). Do not let the presence of continuous deviation variables cause you to make all variables continuous — the integrality of decision variables directly affects feasibility and objective values. +### Multiple objectives with no fixed priority + +Goal programming (above) needs a **priority order** and returns **one** prioritized solution. When objectives genuinely conflict and there is **no fixed priority or weighting** — the user wants to see the tradeoffs and choose — don't pick one weighting up front. Trace the **Pareto frontier**: keep one objective and sweep the others as parametric ε-constraints (or sweep weighted-sum weights), then filter to the non-dominated set. On integer / non-convex problems prefer ε-constraint — weighted-sum provably misses unsupported efficient points. + +For the full workflow (anchor each objective → sweep → filter → read the frontier with exchange rates and the knee) see the **`cuopt-multi-objective-exploration`** skill; for a worked cuOpt ε-sweep see **cuopt-examples → `multi_objective_frontier/`**. + ### Multi-period inventory / purchasing models In problems with buying, selling, and warehouse capacity over multiple periods, decide which capacity constraints to include based on the problem's timing assumptions. From d39c9b0066cf893e8e0e7c7da1f3db003cd6fb37 Mon Sep 17 00:00:00 2001 From: cafzal Date: Mon, 1 Jun 2026 10:38:54 -0700 Subject: [PATCH 04/20] Address review: drop dangling refs, reconcile MILP gap note - Remove forward-ref to the not-yet-existent cuopt-examples/multi_objective_frontier/ from the goal-programming pointer (the worked example is a separate, in-flight effort) - Inline the publication next-steps in BENCHMARK.md (CONTRIBUTION_NEXT_STEPS.md is not in the repo) - Add a per-solve MILP time-limit note to Step 3 so points are reported optimal to the gap you set, not certified optimal Signed-off-by: cafzal --- skills/cuopt-multi-objective-exploration/BENCHMARK.md | 2 +- skills/cuopt-multi-objective-exploration/SKILL.md | 1 + skills/cuopt-numerical-optimization-formulation/SKILL.md | 2 +- 3 files changed, 3 insertions(+), 2 deletions(-) diff --git a/skills/cuopt-multi-objective-exploration/BENCHMARK.md b/skills/cuopt-multi-objective-exploration/BENCHMARK.md index d22980892..aa9487604 100644 --- a/skills/cuopt-multi-objective-exploration/BENCHMARK.md +++ b/skills/cuopt-multi-objective-exploration/BENCHMARK.md @@ -51,4 +51,4 @@ Evaluation of the `cuopt-multi-objective-exploration` skill. ## Publication recommendation -The POC supports the value claim. Proceed per `CONTRIBUTION_NEXT_STEPS.md`: socialize via a GitHub discussion/proposal, then a fork-based draft PR with CI + the official NVSkills-Eval. +The POC supports the value claim. Next: socialize via a GitHub discussion/proposal, then a fork-based draft PR with CI + the official NVSkills-Eval. diff --git a/skills/cuopt-multi-objective-exploration/SKILL.md b/skills/cuopt-multi-objective-exploration/SKILL.md index b932e8853..09b6d1d6e 100644 --- a/skills/cuopt-multi-objective-exploration/SKILL.md +++ b/skills/cuopt-multi-objective-exploration/SKILL.md @@ -105,6 +105,7 @@ sort the survivors to form the frontier Practical notes: - **Warm-start LP sweeps.** For an LP frontier, reuse the previous solve's PDLP warmstart data to cut solve time (`getWarmstartData` → `set_pdlp_warm_start_data`). Per cuOpt this is **LP-only**: a MILP solve doesn't take a PDLP warmstart (you can optionally seed a MIP start instead). See `cuopt-numerical-optimization-api-python`. +- **Cap each MILP solve.** Set a per-solve time limit on MILP sweeps (see `cuopt-numerical-optimization-api-python`) — a sweep is many solves, and branch-and-bound can over-spend certifying optimality past a tiny gap, while cuOpt sets no limit by default and won't warn. Report the points as optimal *to the gap you set*, not certified optimal. - **Filter dominated points.** A correct sweep can still emit dominated points (especially weighted-sum near the hull, or MILP). Drop them; they are not part of the frontier. - **Resolution is a budget.** Curve fidelity trades against solve count. Start coarse to see the shape, then refine the grid only where the curve bends. diff --git a/skills/cuopt-numerical-optimization-formulation/SKILL.md b/skills/cuopt-numerical-optimization-formulation/SKILL.md index b908c47fc..fc4771b44 100644 --- a/skills/cuopt-numerical-optimization-formulation/SKILL.md +++ b/skills/cuopt-numerical-optimization-formulation/SKILL.md @@ -232,7 +232,7 @@ Deviation variables (d⁻, d⁺) and slack/idle-time variables are always **cont Goal programming (above) needs a **priority order** and returns **one** prioritized solution. When objectives genuinely conflict and there is **no fixed priority or weighting** — the user wants to see the tradeoffs and choose — don't pick one weighting up front. Trace the **Pareto frontier**: keep one objective and sweep the others as parametric ε-constraints (or sweep weighted-sum weights), then filter to the non-dominated set. On integer / non-convex problems prefer ε-constraint — weighted-sum provably misses unsupported efficient points. -For the full workflow (anchor each objective → sweep → filter → read the frontier with exchange rates and the knee) see the **`cuopt-multi-objective-exploration`** skill; for a worked cuOpt ε-sweep see **cuopt-examples → `multi_objective_frontier/`**. +For the full workflow (anchor each objective → sweep → filter → read the frontier with exchange rates and the knee) see the **`cuopt-multi-objective-exploration`** skill. ### Multi-period inventory / purchasing models From 63089defce3a94cf5dbf44f3604ed29264fe00d3 Mon Sep 17 00:00:00 2001 From: cafzal Date: Mon, 1 Jun 2026 10:40:05 -0700 Subject: [PATCH 05/20] =?UTF-8?q?Update=20BENCHMARK=20caveat=20for=20the?= =?UTF-8?q?=20added=20MILP=20gap=20note=20(two=20=E2=86=92=20three=20clari?= =?UTF-8?q?fications)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: cafzal --- skills/cuopt-multi-objective-exploration/BENCHMARK.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/skills/cuopt-multi-objective-exploration/BENCHMARK.md b/skills/cuopt-multi-objective-exploration/BENCHMARK.md index aa9487604..113bd546a 100644 --- a/skills/cuopt-multi-objective-exploration/BENCHMARK.md +++ b/skills/cuopt-multi-objective-exploration/BENCHMARK.md @@ -41,7 +41,7 @@ Evaluation of the `cuopt-multi-objective-exploration` skill. - Custom A/B, not NVSkills-Eval. Agents are raw models with a cuOpt tool, not `claude-code` / `codex`. - Exploration and the synthetic scenario use one instance (seed 1); the interpretation value also holds on the seed-independent fixed supplier dataset (scenario A). - Judge is `claude-opus-4-8` (LLM-graded rubric). -- The shipped `SKILL.md` adds two cuOpt-feasibility clarifications not in the A/B's inlined skill text (PDLP warmstart is LP-only; ε-constrain *linear* objectives, since cuOpt constraints are linear). These are factual corrections, not value claims — they don't bear on the measured behaviors, so the numbers above stand. Re-running the notebook against the final text would change nothing material. +- The shipped `SKILL.md` adds three cuOpt-feasibility clarifications not in the A/B's inlined skill text (PDLP warmstart is LP-only; ε-constrain *linear* objectives, since cuOpt constraints are linear; cap each MILP solve's time limit so points are optimal to the gap you set). These are factual corrections, not value claims — they don't bear on the measured behaviors, so the numbers above stand. Re-running the notebook against the final text would change nothing material. ## Tier 1 / Tier 2 / official NVSkills-Eval — pending on the fork From 10c3d13bb9e9ce2f1e3f3fb4446998c1440ee4c8 Mon Sep 17 00:00:00 2001 From: cafzal Date: Mon, 1 Jun 2026 10:52:32 -0700 Subject: [PATCH 06/20] Add constraint-objective, recognition-cue, and verify notes to the skill MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Objectives and constraints are interchangeable: read a single-objective model's hard constraints as candidate objectives to promote to parametric ε-constraints - Recognition cue: a hand-coded loop over a target/budget value is already the ε-constraint method - Verify, don't assume: measure method-vs-method claims rather than asserting them; flag feasible-but-not-Optimal solves Signed-off-by: cafzal --- skills/cuopt-multi-objective-exploration/SKILL.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/skills/cuopt-multi-objective-exploration/SKILL.md b/skills/cuopt-multi-objective-exploration/SKILL.md index 09b6d1d6e..c30b9512e 100644 --- a/skills/cuopt-multi-objective-exploration/SKILL.md +++ b/skills/cuopt-multi-objective-exploration/SKILL.md @@ -39,6 +39,8 @@ A solution **A dominates** B when A is at least as good on every objective and s Do not collapse a multi-objective problem to a single weighted number and report its optimum as "the answer" — that silently makes the tradeoff decision *for* the user. Trace the frontier and let them choose. +Objectives and constraints are interchangeable. A requirement currently treated as fixed — a coverage floor, a fairness cap, a budget — is often a latent objective: its level was assumed, not given. Promoting such a constraint to a parametric ε-constraint and sweeping it reveals a tradeoff you'd otherwise hide, so read a single-objective model's hard constraints as candidate objectives, not just limits. Express any promoted quantity linearly so it can serve as an ε-constraint (see `cuopt-numerical-optimization-formulation`). + ## Step 1 — build a payoff table (anchor each objective) Solve each objective **on its own** first. For *k* objectives this is *k* solves. Record, for each, the value of every objective at that optimum: @@ -87,6 +89,8 @@ Sweep each `ε_k` across the range from the payoff table. Each `(ε2, ε3, …)` cuOpt's constraints are **linear**, so ε-constrain *linear* objectives. If an objective is quadratic (e.g. risk `xᵀΣx`), keep that one as the objective `f1` and ε-constrain the linear ones — cuOpt solves QP (quadratic objective, linear constraints), not quadratically-constrained programs. +Spot it in existing code: a hand-coded loop over a target or budget value (a return target, a cost cap) is already the ε-constraint method — name it as such, filter dominated points, and read the swept constraint's dual (LP/QP only). + **Picking a method:** weighted-sum for a quick convex sketch or when you know the frontier is convex (e.g. a pure-LP/QP tradeoff); ε-constraint when the problem is MILP, when the frontier may be non-convex, or when the user needs a faithful and complete curve. ## Step 3 — sweep, collect, and filter @@ -108,6 +112,7 @@ Practical notes: - **Cap each MILP solve.** Set a per-solve time limit on MILP sweeps (see `cuopt-numerical-optimization-api-python`) — a sweep is many solves, and branch-and-bound can over-spend certifying optimality past a tiny gap, while cuOpt sets no limit by default and won't warn. Report the points as optimal *to the gap you set*, not certified optimal. - **Filter dominated points.** A correct sweep can still emit dominated points (especially weighted-sum near the hull, or MILP). Drop them; they are not part of the frontier. - **Resolution is a budget.** Curve fidelity trades against solve count. Start coarse to see the shape, then refine the grid only where the curve bends. +- **Verify, don't assume.** When you claim one method beats another, measure it — e.g. count the efficient points ε-constraint recovered that weighted-sum missed — rather than asserting it; and flag any solve returning feasible-but-not-`Optimal` so a non-certified point is never read as exact. ## Step 4 — interpret the frontier (discipline) From 6b725e2c6eafa366b61c92e0b6eee22d1e2a0d39 Mon Sep 17 00:00:00 2001 From: cafzal Date: Mon, 1 Jun 2026 12:13:11 -0700 Subject: [PATCH 07/20] Align skill and benchmark after the post-A/B edits MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Guard the constraint→objective note: promote a fixed constraint only when its level was an assumption; a genuinely non-negotiable limit (hard budget cap, regulatory minimum) stays a constraint. Keeps Add A from undercutting the single-objective decoy eval. - Correct the BENCHMARK caveat: the shipped SKILL.md also gained three method/discipline notes the A/B didn't exercise, not just the feasibility clarifications; the numbers reflect the tested text, and NVSkills-Eval gates the final text. Signed-off-by: cafzal --- skills/cuopt-multi-objective-exploration/BENCHMARK.md | 2 +- skills/cuopt-multi-objective-exploration/SKILL.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/skills/cuopt-multi-objective-exploration/BENCHMARK.md b/skills/cuopt-multi-objective-exploration/BENCHMARK.md index 113bd546a..caefe4a58 100644 --- a/skills/cuopt-multi-objective-exploration/BENCHMARK.md +++ b/skills/cuopt-multi-objective-exploration/BENCHMARK.md @@ -41,7 +41,7 @@ Evaluation of the `cuopt-multi-objective-exploration` skill. - Custom A/B, not NVSkills-Eval. Agents are raw models with a cuOpt tool, not `claude-code` / `codex`. - Exploration and the synthetic scenario use one instance (seed 1); the interpretation value also holds on the seed-independent fixed supplier dataset (scenario A). - Judge is `claude-opus-4-8` (LLM-graded rubric). -- The shipped `SKILL.md` adds three cuOpt-feasibility clarifications not in the A/B's inlined skill text (PDLP warmstart is LP-only; ε-constrain *linear* objectives, since cuOpt constraints are linear; cap each MILP solve's time limit so points are optimal to the gap you set). These are factual corrections, not value claims — they don't bear on the measured behaviors, so the numbers above stand. Re-running the notebook against the final text would change nothing material. +- The shipped `SKILL.md` has been refined since the A/B run. Three additions are cuOpt-feasibility clarifications (PDLP warmstart is LP-only; ε-constrain *linear* objectives, since cuOpt constraints are linear; cap each MILP solve's time limit, so points are optimal to the gap you set) — factual corrections that don't bear on the measured behaviors. Three more are method/discipline notes the A/B did not exercise (read a fixed constraint as a candidate objective only when its level was an assumption; recognize a hand-coded target/budget loop as the ε-constraint method; verify-don't-assume when comparing methods). The numbers above reflect the skill text as tested, not these later additions; the official NVSkills-Eval runs against the final text and remains the gate. ## Tier 1 / Tier 2 / official NVSkills-Eval — pending on the fork diff --git a/skills/cuopt-multi-objective-exploration/SKILL.md b/skills/cuopt-multi-objective-exploration/SKILL.md index c30b9512e..abc8b2b24 100644 --- a/skills/cuopt-multi-objective-exploration/SKILL.md +++ b/skills/cuopt-multi-objective-exploration/SKILL.md @@ -39,7 +39,7 @@ A solution **A dominates** B when A is at least as good on every objective and s Do not collapse a multi-objective problem to a single weighted number and report its optimum as "the answer" — that silently makes the tradeoff decision *for* the user. Trace the frontier and let them choose. -Objectives and constraints are interchangeable. A requirement currently treated as fixed — a coverage floor, a fairness cap, a budget — is often a latent objective: its level was assumed, not given. Promoting such a constraint to a parametric ε-constraint and sweeping it reveals a tradeoff you'd otherwise hide, so read a single-objective model's hard constraints as candidate objectives, not just limits. Express any promoted quantity linearly so it can serve as an ε-constraint (see `cuopt-numerical-optimization-formulation`). +Objectives and constraints are interchangeable. A requirement currently treated as fixed — a coverage floor, a fairness cap, a budget — is often a latent objective: its level was assumed, not given. Promoting such a constraint to a parametric ε-constraint and sweeping it reveals a tradeoff you'd otherwise hide, so read a single-objective model's hard constraints as candidate objectives, not just limits — but only when the level was an assumption. A genuinely fixed, non-negotiable limit (a hard budget cap, a regulatory minimum) stays a constraint; don't manufacture a tradeoff that isn't there. Express any promoted quantity linearly so it can serve as an ε-constraint (see `cuopt-numerical-optimization-formulation`). ## Step 1 — build a payoff table (anchor each objective) From b3a7ce506d92c23fadb19824f4a144e51342e8e9 Mon Sep 17 00:00:00 2001 From: cafzal Date: Mon, 1 Jun 2026 12:23:53 -0700 Subject: [PATCH 08/20] Fact-check fixes against the cuOpt codebase and the A/B instance MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - SKILL.md: cuOpt now solves convex quadratic constraints (barrier solver converts xᵀQx ≤ ε to a second-order cone; add_quadratic_constraint, inequality only) per the SOCP work (#1290). A convex quadratic objective can therefore be ε-constrained, not only kept as the objective. The prior "constraints are linear, not quadratically-constrained" claim was outdated. - evals.json: eval-002 supplier count ~25 → 12 to match the validated A/B exploration instance (BENCHMARK says 12 suppliers; "of 24 non-supported portfolios" corroborates a ~12-supplier instance), since the skill-card states the A/B was run over these eval tasks. Signed-off-by: cafzal --- skills/cuopt-multi-objective-exploration/SKILL.md | 2 +- skills/cuopt-multi-objective-exploration/evals/evals.json | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/skills/cuopt-multi-objective-exploration/SKILL.md b/skills/cuopt-multi-objective-exploration/SKILL.md index abc8b2b24..528a2c6af 100644 --- a/skills/cuopt-multi-objective-exploration/SKILL.md +++ b/skills/cuopt-multi-objective-exploration/SKILL.md @@ -87,7 +87,7 @@ subject to f2(x) ≤ ε2 Sweep each `ε_k` across the range from the payoff table. Each `(ε2, ε3, …)` combination is a single standard cuOpt solve. This recovers the **full** frontier, including the concave regions weighted-sum cannot reach, which is why it's the default when completeness matters. The cost is more solves (a grid over the constrained objectives) and bookkeeping of the ε values. -cuOpt's constraints are **linear**, so ε-constrain *linear* objectives. If an objective is quadratic (e.g. risk `xᵀΣx`), keep that one as the objective `f1` and ε-constrain the linear ones — cuOpt solves QP (quadratic objective, linear constraints), not quadratically-constrained programs. +ε-constrain *linear* objectives directly. A quadratic objective (e.g. risk `xᵀΣx`) can stay the objective `f1` while you ε-constrain the linear ones — the simplest route — or be ε-constrained itself when it is **convex**: a `xᵀQx ≤ ε` constraint (Q positive semidefinite) makes cuOpt switch to the barrier solver and convert it to a second-order cone (`add_quadratic_constraint`, inequality only). Non-convex or equality quadratic constraints are not supported, and the MILP path stays linear-constraint only. Spot it in existing code: a hand-coded loop over a target or budget value (a return target, a cost cap) is already the ε-constraint method — name it as such, filter dominated points, and read the swept constraint's dual (LP/QP only). diff --git a/skills/cuopt-multi-objective-exploration/evals/evals.json b/skills/cuopt-multi-objective-exploration/evals/evals.json index b8ff84473..2e2bcef63 100644 --- a/skills/cuopt-multi-objective-exploration/evals/evals.json +++ b/skills/cuopt-multi-objective-exploration/evals/evals.json @@ -15,7 +15,7 @@ }, { "id": "multiobj-explore-eval-002-supplier-exploration", - "question": "A procurement lead must choose which of ~25 candidate suppliers to contract to maximize total supply-chain resilience and minimize total annual cost, while covering required demand. There is no agreed weighting between resilience and cost. Using cuOpt, how would you approach this and what would you report back?", + "question": "A procurement lead must choose which of 12 candidate suppliers to contract to maximize total supply-chain resilience and minimize total annual cost, while covering required demand. There is no agreed weighting between resilience and cost. Using cuOpt, how would you approach this and what would you report back?", "expected_skill": "cuopt-multi-objective-exploration", "expected_script": null, "ground_truth": "The agent recognizes a multi-objective (resilience vs cost) selection problem with no fixed weighting and a hard demand constraint, builds a payoff table by solving each objective alone for ranges, then traces the Pareto frontier with repeated single-objective cuOpt solves. Because the supplier selection is combinatorial (non-convex), it prefers the epsilon-constraint method (e.g. minimize cost subject to a resilience floor, sweeping the floor) over weighted-sum, which would miss non-supported portfolios. It filters dominated points and reports the tradeoff curve plus the knee, deferring the final pick, and defers per-solve mechanics to the api-* skills and formulation to cuopt-numerical-optimization-formulation.", From dca4c972a2657cc9195bceea5ff19a1d449aa844 Mon Sep 17 00:00:00 2001 From: cafzal Date: Mon, 1 Jun 2026 12:29:14 -0700 Subject: [PATCH 09/20] Mark SOCP / quadratic-constraint path as beta in the skill Signed-off-by: cafzal --- skills/cuopt-multi-objective-exploration/SKILL.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/skills/cuopt-multi-objective-exploration/SKILL.md b/skills/cuopt-multi-objective-exploration/SKILL.md index 528a2c6af..e3e59acc9 100644 --- a/skills/cuopt-multi-objective-exploration/SKILL.md +++ b/skills/cuopt-multi-objective-exploration/SKILL.md @@ -87,7 +87,7 @@ subject to f2(x) ≤ ε2 Sweep each `ε_k` across the range from the payoff table. Each `(ε2, ε3, …)` combination is a single standard cuOpt solve. This recovers the **full** frontier, including the concave regions weighted-sum cannot reach, which is why it's the default when completeness matters. The cost is more solves (a grid over the constrained objectives) and bookkeeping of the ε values. -ε-constrain *linear* objectives directly. A quadratic objective (e.g. risk `xᵀΣx`) can stay the objective `f1` while you ε-constrain the linear ones — the simplest route — or be ε-constrained itself when it is **convex**: a `xᵀQx ≤ ε` constraint (Q positive semidefinite) makes cuOpt switch to the barrier solver and convert it to a second-order cone (`add_quadratic_constraint`, inequality only). Non-convex or equality quadratic constraints are not supported, and the MILP path stays linear-constraint only. +ε-constrain *linear* objectives directly. A quadratic objective (e.g. risk `xᵀΣx`) is simplest kept as the objective `f1` while you ε-constrain the linear ones. A **convex** quadratic objective *can* instead be ε-constrained directly: cuOpt routes a `xᵀQx ≤ ε` constraint (Q positive semidefinite, inequality only) through the barrier solver as a second-order cone (`add_quadratic_constraint`) — though SOCP support is **beta**. Non-convex or equality quadratic constraints are unsupported, and the MILP path stays linear-constraint only. Spot it in existing code: a hand-coded loop over a target or budget value (a return target, a cost cap) is already the ε-constraint method — name it as such, filter dominated points, and read the swept constraint's dual (LP/QP only). From 5e65df1e3079568c690a5b4742ca0c4958bddc6c Mon Sep 17 00:00:00 2001 From: cafzal Date: Mon, 1 Jun 2026 12:48:29 -0700 Subject: [PATCH 10/20] Correct NVSkills-Eval process description (maintainer-run, non-fork) The skill's BENCHMARK and skill-card said NVSkills-Eval "runs on the fork" / via a "fork-based draft PR with CI", but request-nvskills-ci.yml states fork PRs are not supported and the run is maintainer-triggered (/nvskills-ci by an OWNER/MEMBER/COLLABORATOR) on a non-fork NVIDIA/cuopt branch; its bot attaches the skill.oms.sig signature. Reworded to match. Signed-off-by: cafzal --- skills/cuopt-multi-objective-exploration/BENCHMARK.md | 8 ++++---- skills/cuopt-multi-objective-exploration/skill-card.md | 2 +- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/skills/cuopt-multi-objective-exploration/BENCHMARK.md b/skills/cuopt-multi-objective-exploration/BENCHMARK.md index caefe4a58..dafb36cfe 100644 --- a/skills/cuopt-multi-objective-exploration/BENCHMARK.md +++ b/skills/cuopt-multi-objective-exploration/BENCHMARK.md @@ -2,11 +2,11 @@ Evaluation of the `cuopt-multi-objective-exploration` skill. -> **Status: proof-of-concept A/B complete; official NVSkills-Eval pending on the fork.** +> **Status: proof-of-concept A/B complete; official NVSkills-Eval pending (maintainer-run).** > The numbers below are from a custom WITH-vs-WITHOUT run on a Colab GPU with cuOpt in the > loop (real solves) — not from NVSkills-Eval. They establish the skill's value before the > formal run. The official NVSkills-Eval (`claude-code` + `codex`, external profile) and CI -> Tiers 1–2 still run on the fork and gate publication. +> Tiers 1–2 are maintainer-triggered (`/nvskills-ci`) on a non-fork `NVIDIA/cuopt` branch (fork PRs aren't supported) and gate publication. ## Summary @@ -43,7 +43,7 @@ Evaluation of the `cuopt-multi-objective-exploration` skill. - Judge is `claude-opus-4-8` (LLM-graded rubric). - The shipped `SKILL.md` has been refined since the A/B run. Three additions are cuOpt-feasibility clarifications (PDLP warmstart is LP-only; ε-constrain *linear* objectives, since cuOpt constraints are linear; cap each MILP solve's time limit, so points are optimal to the gap you set) — factual corrections that don't bear on the measured behaviors. Three more are method/discipline notes the A/B did not exercise (read a fixed constraint as a candidate objective only when its level was an assumption; recognize a hand-coded target/budget loop as the ε-constraint method; verify-don't-assume when comparing methods). The numbers above reflect the skill text as tested, not these later additions; the official NVSkills-Eval runs against the final text and remains the gate. -## Tier 1 / Tier 2 / official NVSkills-Eval — pending on the fork +## Tier 1 / Tier 2 / official NVSkills-Eval — pending (maintainer-run) - `./ci/utils/validate_skills.sh` (frontmatter, required files, version 26.08.00) + `sync_skills_version.sh`. - Tier 2 dedup — scoped to orchestration + interpretation; defers per-solve mechanics to the api-* skills and per-objective formulation to `cuopt-numerical-optimization-formulation`. @@ -51,4 +51,4 @@ Evaluation of the `cuopt-multi-objective-exploration` skill. ## Publication recommendation -The POC supports the value claim. Next: socialize via a GitHub discussion/proposal, then a fork-based draft PR with CI + the official NVSkills-Eval. +The POC supports the value claim. Next: socialize via a GitHub discussion/proposal, then a maintainer runs `/nvskills-ci` on a non-fork `NVIDIA/cuopt` branch for CI Tiers 1–2 + the official NVSkills-Eval (the NVSkills CI doesn't support fork PRs, and its bot attaches the `skill.oms.sig` signature). diff --git a/skills/cuopt-multi-objective-exploration/skill-card.md b/skills/cuopt-multi-objective-exploration/skill-card.md index 7aa34df1f..a070edde9 100644 --- a/skills/cuopt-multi-objective-exploration/skill-card.md +++ b/skills/cuopt-multi-objective-exploration/skill-card.md @@ -36,7 +36,7 @@ Mitigation: Review and scan skill before deployment.
## Evaluation Tasks:
-Three tasks (see `evals/evals.json`): two positive — interpretation on a real cost-vs-reliability supplier front, and frontier exploration on a supplier-selection MILP — plus one single-objective decoy (no activation expected). A pre-publication WITH/WITHOUT A/B over these has been run on a Colab GPU with real cuOpt solves (see `BENCHMARK.md`). The official NVSkills-Eval (external profile, `claude-code` + `codex`) has not been run from this environment (cuOpt is Linux + NVIDIA-GPU only) and runs on the fork as for the sibling skills.
+Three tasks (see `evals/evals.json`): two positive — interpretation on a real cost-vs-reliability supplier front, and frontier exploration on a supplier-selection MILP — plus one single-objective decoy (no activation expected). A pre-publication WITH/WITHOUT A/B over these has been run on a Colab GPU with real cuOpt solves (see `BENCHMARK.md`). The official NVSkills-Eval (external profile, `claude-code` + `codex`) has not been run from this environment (cuOpt is Linux + NVIDIA-GPU only) and is maintainer-triggered via `/nvskills-ci` on a non-fork `NVIDIA/cuopt` branch, as for the sibling skills (the NVSkills CI doesn't support fork PRs).
## Evaluation Metrics Used:
Reported benchmark dimensions:
From e3054989b8007c69d16c2b240b9bc8ad4b0864f7 Mon Sep 17 00:00:00 2001 From: cafzal Date: Mon, 1 Jun 2026 12:48:44 -0700 Subject: [PATCH 11/20] Fix remaining 'runs on the fork' note in skill-card results section Signed-off-by: cafzal --- skills/cuopt-multi-objective-exploration/skill-card.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/skills/cuopt-multi-objective-exploration/skill-card.md b/skills/cuopt-multi-objective-exploration/skill-card.md index a070edde9..445b2ead9 100644 --- a/skills/cuopt-multi-objective-exploration/skill-card.md +++ b/skills/cuopt-multi-objective-exploration/skill-card.md @@ -57,7 +57,7 @@ Underlying evaluation signals used in this run:
## Evaluation Results:
-A pre-publication WITH/WITHOUT A/B (Colab GPU, real cuOpt) supports the value claim — interpretation discipline +0.15 across both models and both decisions, with frontier exploration as a supporting lift; full numbers and caveats in `BENCHMARK.md`. The official NVSkills-Eval table below (`claude-code` / `codex`, external profile) is PENDING — it runs on the fork; the values are placeholders until then.
+A pre-publication WITH/WITHOUT A/B (Colab GPU, real cuOpt) supports the value claim — interpretation discipline +0.15 across both models and both decisions, with frontier exploration as a supporting lift; full numbers and caveats in `BENCHMARK.md`. The official NVSkills-Eval table below (`claude-code` / `codex`, external profile) is PENDING — it's maintainer-run (`/nvskills-ci`, non-fork); the values are placeholders until then.
| Dimension | Num | `claude-code` | `codex` | |---|---:|---:|---:| From 4390c2c0317c8fbd15f058efef5087d9683ea233 Mon Sep 17 00:00:00 2001 From: cafzal Date: Tue, 2 Jun 2026 16:12:17 -0700 Subject: [PATCH 12/20] Drop hand-written BENCHMARK.md and skill-card.md These are generated by the NVSkills onboarding pipeline (as for the sibling skills, e.g. cuopt-numerical-optimization-formulation), not authored by the contributor. Leaving just SKILL.md + evals/evals.json so the pipeline produces BENCHMARK.md, the skill card, and skill.oms.sig cleanly once the branch is on the main repo. Signed-off-by: cafzal --- .../BENCHMARK.md | 54 ------------- .../skill-card.md | 77 ------------------- 2 files changed, 131 deletions(-) delete mode 100644 skills/cuopt-multi-objective-exploration/BENCHMARK.md delete mode 100644 skills/cuopt-multi-objective-exploration/skill-card.md diff --git a/skills/cuopt-multi-objective-exploration/BENCHMARK.md b/skills/cuopt-multi-objective-exploration/BENCHMARK.md deleted file mode 100644 index dafb36cfe..000000000 --- a/skills/cuopt-multi-objective-exploration/BENCHMARK.md +++ /dev/null @@ -1,54 +0,0 @@ -# Evaluation Report - -Evaluation of the `cuopt-multi-objective-exploration` skill. - -> **Status: proof-of-concept A/B complete; official NVSkills-Eval pending (maintainer-run).** -> The numbers below are from a custom WITH-vs-WITHOUT run on a Colab GPU with cuOpt in the -> loop (real solves) — not from NVSkills-Eval. They establish the skill's value before the -> formal run. The official NVSkills-Eval (`claude-code` + `codex`, external profile) and CI -> Tiers 1–2 are maintainer-triggered (`/nvskills-ci`) on a non-fork `NVIDIA/cuopt` branch (fork PRs aren't supported) and gate publication. - -## Summary - -- Skill: `cuopt-multi-objective-exploration` -- Eval: custom WITH-vs-WITHOUT A/B — notebook `cuopt_exploration_skill_value_test.ipynb` -- Problem: supplier-selection procurement MILP (12 suppliers; maximize resilience / minimize cost under demand coverage) + a fixed real-world-style supplier dataset (cost vs reliability) -- Agents: `claude-opus-4-8`, `claude-sonnet-4-6` (each driving cuOpt as a tool); judge: `claude-opus-4-8` -- Samples: exploration N=6/cell; interpretation N=15/cell × 2 scenarios; decoy N=6/cell -- Date / hardware: 2026-05-29, Colab (NVIDIA RTX PRO 6000) - -## Results (WITH vs WITHOUT the skill) - -| Dimension | WITHOUT | WITH | Read | -|---|---|---|---| -| **Effectiveness — interpretation** | 0.58 | 0.73 | +0.15; both models (opus 0.73→0.93, sonnet 0.42→0.54), both scenarios (real 0.74→0.88, synthetic 0.42→0.58) | -| **Effectiveness — exploration** | 2.17 | 3.67 | non-supported Pareto portfolios recovered, of 24 (~+70%); both models up; full-front coverage 23%→28% | -| **Discoverability** | 92% | 100% | restraint on the single-objective decoy (small lift; high baseline) | -| **Efficiency** | 10.7 / 1.2 | 11.9 / 1.0 | cuOpt solves used (multi-objective / decoy) | -| **Correctness** | 72% | 76% | solved portfolios proven-optimal (cuOpt `FeasibleFound` accounts for the rest — solver, not skill) | -| **Security** | — | — | no unsafe surface: the agent's only tool is a math solver (no secrets, filesystem, or network) | - -**Interpretation, per rubric item (pooled):** no-single-best **+0.35**, knee-not-auto-pick **+0.28**, exchange-rate ≈0, state-assumptions ≈0. The lift is the *don't-collapse-to-one-answer* discipline; the other two behaviors were already present in both arms. - -## What the eval shows - -- The skill's value is **interpretation discipline** — agents present the tradeoff and defer instead of collapsing to one option — holding across both models and both decisions, including the fixed external supplier dataset (scenario A). -- **Exploration** is a real supporting lift on this constrained MILP: agents recover ~70% more of the non-supported (weighted-sum-unreachable) Pareto portfolios. -- **Discoverability** is a small positive (models mostly restrain unprompted). - -## Caveats - -- Custom A/B, not NVSkills-Eval. Agents are raw models with a cuOpt tool, not `claude-code` / `codex`. -- Exploration and the synthetic scenario use one instance (seed 1); the interpretation value also holds on the seed-independent fixed supplier dataset (scenario A). -- Judge is `claude-opus-4-8` (LLM-graded rubric). -- The shipped `SKILL.md` has been refined since the A/B run. Three additions are cuOpt-feasibility clarifications (PDLP warmstart is LP-only; ε-constrain *linear* objectives, since cuOpt constraints are linear; cap each MILP solve's time limit, so points are optimal to the gap you set) — factual corrections that don't bear on the measured behaviors. Three more are method/discipline notes the A/B did not exercise (read a fixed constraint as a candidate objective only when its level was an assumption; recognize a hand-coded target/budget loop as the ε-constraint method; verify-don't-assume when comparing methods). The numbers above reflect the skill text as tested, not these later additions; the official NVSkills-Eval runs against the final text and remains the gate. - -## Tier 1 / Tier 2 / official NVSkills-Eval — pending (maintainer-run) - -- `./ci/utils/validate_skills.sh` (frontmatter, required files, version 26.08.00) + `sync_skills_version.sh`. -- Tier 2 dedup — scoped to orchestration + interpretation; defers per-solve mechanics to the api-* skills and per-objective formulation to `cuopt-numerical-optimization-formulation`. -- NVSkills-Eval (external profile; `claude-code` + `codex`) — the formal gate. - -## Publication recommendation - -The POC supports the value claim. Next: socialize via a GitHub discussion/proposal, then a maintainer runs `/nvskills-ci` on a non-fork `NVIDIA/cuopt` branch for CI Tiers 1–2 + the official NVSkills-Eval (the NVSkills CI doesn't support fork PRs, and its bot attaches the `skill.oms.sig` signature). diff --git a/skills/cuopt-multi-objective-exploration/skill-card.md b/skills/cuopt-multi-objective-exploration/skill-card.md deleted file mode 100644 index 445b2ead9..000000000 --- a/skills/cuopt-multi-objective-exploration/skill-card.md +++ /dev/null @@ -1,77 +0,0 @@ -## Description:
-Multi-objective exploration — trace and interpret the Pareto frontier across competing objectives by orchestrating repeated single-objective cuOpt solves (weighted-sum and ε-constraint), then read the tradeoffs with discipline.
- -This skill is ready for commercial/non-commercial use.
- -## Owner -NVIDIA
- -### License/Terms of Use:
-Apache 2.0
-## Use Case:
-Developers and engineers use this skill when a problem has two or more competing objectives with no agreed weighting (cost vs. service, return vs. risk, distance vs. vehicles). It turns a sequence of single-objective cuOpt solves into a Pareto frontier and provides the interpretation discipline to read tradeoffs, knee points, and convexity blind spots.
- -### Deployment Geography for Use:
-Global
- -## Known Risks and Mitigations:
-Risk: Review before execution as proposals could introduce incorrect or misleading guidance into skills.
-Mitigation: Review and scan skill before deployment.
- -## Reference(s):
-- [cuOpt User Guide](https://docs.nvidia.com/cuopt/user-guide/latest/introduction.html)
-- [cuopt-examples](https://github.com/NVIDIA/cuopt-examples)
- - -## Skill Output:
-**Output Type(s):** [Analysis, Code]
-**Output Format:** [Markdown with mathematical formulations and a Pareto frontier (table/plot of non-dominated points)]
-**Output Parameters:** [1D]
-**Other Properties Related to Output:** [None]
- -## Evaluation Agents Used:
-- claude-code
-- codex
- - - -## Evaluation Tasks:
-Three tasks (see `evals/evals.json`): two positive — interpretation on a real cost-vs-reliability supplier front, and frontier exploration on a supplier-selection MILP — plus one single-objective decoy (no activation expected). A pre-publication WITH/WITHOUT A/B over these has been run on a Colab GPU with real cuOpt solves (see `BENCHMARK.md`). The official NVSkills-Eval (external profile, `claude-code` + `codex`) has not been run from this environment (cuOpt is Linux + NVIDIA-GPU only) and is maintainer-triggered via `/nvskills-ci` on a non-fork `NVIDIA/cuopt` branch, as for the sibling skills (the NVSkills CI doesn't support fork PRs).
- -## Evaluation Metrics Used:
-Reported benchmark dimensions:
-- Security: Checks whether skill-assisted execution avoids unsafe behavior such as secret leakage, destructive commands, or unauthorized access.
-- Correctness: Checks whether the agent follows the expected workflow and produces the correct final output.
-- Discoverability: Checks whether the agent loads the skill when relevant and avoids using it when irrelevant.
-- Effectiveness: Checks whether the agent performs measurably better with the skill than without it.
-- Efficiency: Checks whether the agent uses fewer tokens and avoids redundant work.
- -Underlying evaluation signals used in this run:
-- `skill_execution`: Verifies that the agent loaded the expected skill and workflow.
-- `skill_efficiency`: Checks routing quality, decoy avoidance, and redundant tool usage.
-- `accuracy`: Grades final-answer correctness against the reference answer.
-- `goal_accuracy`: Checks whether the overall user task completed successfully.
-- `behavior_check`: Verifies expected behavior steps, including safety expectations.
-- `token_efficiency`: Compares token usage with and without the skill.
- - - -## Evaluation Results:
-A pre-publication WITH/WITHOUT A/B (Colab GPU, real cuOpt) supports the value claim — interpretation discipline +0.15 across both models and both decisions, with frontier exploration as a supporting lift; full numbers and caveats in `BENCHMARK.md`. The official NVSkills-Eval table below (`claude-code` / `codex`, external profile) is PENDING — it's maintainer-run (`/nvskills-ci`, non-fork); the values are placeholders until then.
- -| Dimension | Num | `claude-code` | `codex` | -|---|---:|---:|---:| -| Security | — | PENDING POC | PENDING POC | -| Correctness | — | PENDING POC | PENDING POC | -| Discoverability | — | PENDING POC | PENDING POC | -| Effectiveness | — | PENDING POC | PENDING POC | -| Efficiency | — | PENDING POC | PENDING POC | - -## Skill Version(s):
-26.08.00 (source: frontmatter, git tag)
- -## Ethical Considerations:
-NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal team to ensure this skill meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
- -(For Release on NVIDIA Platforms Only)
-Please report quality, risk, security vulnerabilities or NVIDIA AI Concerns [here](https://app.intigriti.com/programs/nvidia/nvidiavdp/detail).
From 44c008cb22f7edef9700c3d32d3d39c699cc374d Mon Sep 17 00:00:00 2001 From: cafzal Date: Wed, 3 Jun 2026 09:57:22 -0700 Subject: [PATCH 13/20] Add language identifiers to SKILL.md code fences (markdownlint MD040) Signed-off-by: cafzal --- skills/cuopt-multi-objective-exploration/SKILL.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/skills/cuopt-multi-objective-exploration/SKILL.md b/skills/cuopt-multi-objective-exploration/SKILL.md index e3e59acc9..3f40434ca 100644 --- a/skills/cuopt-multi-objective-exploration/SKILL.md +++ b/skills/cuopt-multi-objective-exploration/SKILL.md @@ -45,7 +45,7 @@ Objectives and constraints are interchangeable. A requirement currently treated Solve each objective **on its own** first. For *k* objectives this is *k* solves. Record, for each, the value of every objective at that optimum: -``` +```text f1 f2 f3 min f1 → f1* f2(at f1*) f3(at f1*) min f2 → ... f2* ... @@ -65,7 +65,7 @@ If any single-objective solve is already infeasible, stop and fix the model befo Combine the objectives into one and sweep the weights: -``` +```text minimize w1·f1(x) + w2·f2(x) + ... , for a grid of weight vectors w ``` @@ -78,7 +78,7 @@ Cheap and trivial with any solver. Two limitations to respect: Keep one objective; move the rest to constraints and sweep their right-hand sides: -``` +```text minimize f1(x) subject to f2(x) ≤ ε2 f3(x) ≤ ε3 @@ -95,7 +95,7 @@ Spot it in existing code: a hand-coded loop over a target or budget value (a ret ## Step 3 — sweep, collect, and filter -``` +```text frontier = [] for each weight vector (or ε vector) in the grid: set the combined objective (or ε right-hand sides) From bb6add8fbf738daa5bd2a9c947b85c3d30b69e8d Mon Sep 17 00:00:00 2001 From: cafzal Date: Wed, 3 Jun 2026 16:06:05 -0700 Subject: [PATCH 14/20] Address review (mlubin): drop SOCP beta label; generalize objective guidance, remove cross-skill section refs Signed-off-by: cafzal --- skills/cuopt-multi-objective-exploration/SKILL.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/skills/cuopt-multi-objective-exploration/SKILL.md b/skills/cuopt-multi-objective-exploration/SKILL.md index 3f40434ca..9a4929132 100644 --- a/skills/cuopt-multi-objective-exploration/SKILL.md +++ b/skills/cuopt-multi-objective-exploration/SKILL.md @@ -87,7 +87,7 @@ subject to f2(x) ≤ ε2 Sweep each `ε_k` across the range from the payoff table. Each `(ε2, ε3, …)` combination is a single standard cuOpt solve. This recovers the **full** frontier, including the concave regions weighted-sum cannot reach, which is why it's the default when completeness matters. The cost is more solves (a grid over the constrained objectives) and bookkeeping of the ε values. -ε-constrain *linear* objectives directly. A quadratic objective (e.g. risk `xᵀΣx`) is simplest kept as the objective `f1` while you ε-constrain the linear ones. A **convex** quadratic objective *can* instead be ε-constrained directly: cuOpt routes a `xᵀQx ≤ ε` constraint (Q positive semidefinite, inequality only) through the barrier solver as a second-order cone (`add_quadratic_constraint`) — though SOCP support is **beta**. Non-convex or equality quadratic constraints are unsupported, and the MILP path stays linear-constraint only. +ε-constrain *linear* objectives directly. A quadratic objective (e.g. risk `xᵀΣx`) is simplest kept as the objective `f1` while you ε-constrain the linear ones. A **convex** quadratic objective *can* instead be ε-constrained directly: cuOpt routes a `xᵀQx ≤ ε` constraint (Q positive semidefinite, inequality only) through the barrier solver as a second-order cone (`add_quadratic_constraint`). Non-convex or equality quadratic constraints are unsupported, and the MILP path stays linear-constraint only. Spot it in existing code: a hand-coded loop over a target or budget value (a return target, a cost cap) is already the ε-constraint method — name it as such, filter dominated points, and read the swept constraint's dual (LP/QP only). @@ -125,7 +125,7 @@ Producing the curve is half the work; reading it correctly is the other half. ## Getting each objective right -The frontier is only as correct as the objectives feeding it. Formulate each one with `cuopt-numerical-optimization-formulation` before sweeping. One trap matters especially here: a **risk / volatility** objective is the quadratic form `xᵀΣx`, **not** a linear sum of per-asset volatility — modeling it linearly distorts the entire return-vs-risk frontier. See that skill's *Typical QP use cases* (portfolio variance) and *QP rule: minimize only*. +The frontier is only as correct as the objectives feeding it — a misformulated objective distorts the whole curve. Formulate each one with `cuopt-numerical-optimization-formulation` before sweeping. ## Interfaces From 3fcd84b6b863899be716ae22f2fce0b939bbe1f5 Mon Sep 17 00:00:00 2001 From: cafzal Date: Wed, 3 Jun 2026 16:26:40 -0700 Subject: [PATCH 15/20] Re-tailor 'Getting each objective right' to the multi-objective case (conflicting objectives, informative frontier) Signed-off-by: cafzal --- skills/cuopt-multi-objective-exploration/SKILL.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/skills/cuopt-multi-objective-exploration/SKILL.md b/skills/cuopt-multi-objective-exploration/SKILL.md index 9a4929132..48ee20381 100644 --- a/skills/cuopt-multi-objective-exploration/SKILL.md +++ b/skills/cuopt-multi-objective-exploration/SKILL.md @@ -125,7 +125,7 @@ Producing the curve is half the work; reading it correctly is the other half. ## Getting each objective right -The frontier is only as correct as the objectives feeding it — a misformulated objective distorts the whole curve. Formulate each one with `cuopt-numerical-optimization-formulation` before sweeping. +An informative frontier needs objectives that genuinely conflict: if they don't pull against each other, it collapses to a single point with nothing to trade off. And each objective has to be formulated correctly, since a wrong form, sense, or scale distorts the trade-off and shifts where the knee falls. Formulate each one with `cuopt-numerical-optimization-formulation` before sweeping. ## Interfaces From 2474dc87d2b173885be6580c2aa18552e517d79b Mon Sep 17 00:00:00 2001 From: cafzal Date: Wed, 3 Jun 2026 16:48:30 -0700 Subject: [PATCH 16/20] Fix tradeoff spelling consistency (trade-off -> tradeoff) Signed-off-by: cafzal --- skills/cuopt-multi-objective-exploration/SKILL.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/skills/cuopt-multi-objective-exploration/SKILL.md b/skills/cuopt-multi-objective-exploration/SKILL.md index 48ee20381..d3869f024 100644 --- a/skills/cuopt-multi-objective-exploration/SKILL.md +++ b/skills/cuopt-multi-objective-exploration/SKILL.md @@ -125,7 +125,7 @@ Producing the curve is half the work; reading it correctly is the other half. ## Getting each objective right -An informative frontier needs objectives that genuinely conflict: if they don't pull against each other, it collapses to a single point with nothing to trade off. And each objective has to be formulated correctly, since a wrong form, sense, or scale distorts the trade-off and shifts where the knee falls. Formulate each one with `cuopt-numerical-optimization-formulation` before sweeping. +An informative frontier needs objectives that genuinely conflict: if they don't pull against each other, it collapses to a single point with nothing to trade off. And each objective has to be formulated correctly, since a wrong form, sense, or scale distorts the tradeoff and shifts where the knee falls. Formulate each one with `cuopt-numerical-optimization-formulation` before sweeping. ## Interfaces From 1680336c0d71cc9c0732562f623eef6e52e3c493 Mon Sep 17 00:00:00 2001 From: cafzal Date: Wed, 3 Jun 2026 16:55:55 -0700 Subject: [PATCH 17/20] Restructure body into a numbered workflow; make API references descriptive - Promote "Getting each objective right" to Step 1 (define the objectives), so the body reads as a clean workflow: define -> anchor (payoff table) -> scalarize -> sweep/collect/filter -> interpret. The opening (when this applies + core idea) stays as the conceptual preamble. - Replace specific method-name citations with the operation they perform, deferring exact calls to cuopt-numerical-optimization-api-python: "add it as a quadratic constraint" (was add_quadratic_constraint, which the merged #1339 docs supply via addConstraint); "carry the previous solve's PDLP warmstart data into the next" (was getWarmstartData -> set_pdlp_warm_start_data). Keeps the framing signal without binding to a changing API. Signed-off-by: cafzal --- .../SKILL.md | 20 +++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/skills/cuopt-multi-objective-exploration/SKILL.md b/skills/cuopt-multi-objective-exploration/SKILL.md index d3869f024..4f61f78b2 100644 --- a/skills/cuopt-multi-objective-exploration/SKILL.md +++ b/skills/cuopt-multi-objective-exploration/SKILL.md @@ -41,7 +41,11 @@ Do not collapse a multi-objective problem to a single weighted number and report Objectives and constraints are interchangeable. A requirement currently treated as fixed — a coverage floor, a fairness cap, a budget — is often a latent objective: its level was assumed, not given. Promoting such a constraint to a parametric ε-constraint and sweeping it reveals a tradeoff you'd otherwise hide, so read a single-objective model's hard constraints as candidate objectives, not just limits — but only when the level was an assumption. A genuinely fixed, non-negotiable limit (a hard budget cap, a regulatory minimum) stays a constraint; don't manufacture a tradeoff that isn't there. Express any promoted quantity linearly so it can serve as an ε-constraint (see `cuopt-numerical-optimization-formulation`). -## Step 1 — build a payoff table (anchor each objective) +## Step 1 — define the objectives + +An informative frontier needs objectives that genuinely conflict: if they don't pull against each other, it collapses to a single point with nothing to trade off. And each objective has to be formulated correctly, since a wrong form, sense, or scale distorts the tradeoff and shifts where the knee falls. Formulate each one with `cuopt-numerical-optimization-formulation` before sweeping. + +## Step 2 — build a payoff table (anchor each objective) Solve each objective **on its own** first. For *k* objectives this is *k* solves. Record, for each, the value of every objective at that optimum: @@ -59,7 +63,7 @@ The diagonal (`f1*`, `f2*`, …) is each objective's best achievable value; the If any single-objective solve is already infeasible, stop and fix the model before sweeping — the frontier doesn't exist yet. -## Step 2 — choose a scalarization +## Step 3 — choose a scalarization ### Weighted sum @@ -87,13 +91,13 @@ subject to f2(x) ≤ ε2 Sweep each `ε_k` across the range from the payoff table. Each `(ε2, ε3, …)` combination is a single standard cuOpt solve. This recovers the **full** frontier, including the concave regions weighted-sum cannot reach, which is why it's the default when completeness matters. The cost is more solves (a grid over the constrained objectives) and bookkeeping of the ε values. -ε-constrain *linear* objectives directly. A quadratic objective (e.g. risk `xᵀΣx`) is simplest kept as the objective `f1` while you ε-constrain the linear ones. A **convex** quadratic objective *can* instead be ε-constrained directly: cuOpt routes a `xᵀQx ≤ ε` constraint (Q positive semidefinite, inequality only) through the barrier solver as a second-order cone (`add_quadratic_constraint`). Non-convex or equality quadratic constraints are unsupported, and the MILP path stays linear-constraint only. +ε-constrain *linear* objectives directly. A quadratic objective (e.g. risk `xᵀΣx`) is simplest kept as the objective `f1` while you ε-constrain the linear ones. A **convex** quadratic objective *can* instead be ε-constrained directly: add it as a quadratic constraint `xᵀQx ≤ ε` (Q positive semidefinite, inequality only), which cuOpt routes through the barrier solver as a second-order cone. Non-convex or equality quadratic constraints are unsupported, and the MILP path stays linear-constraint only. Spot it in existing code: a hand-coded loop over a target or budget value (a return target, a cost cap) is already the ε-constraint method — name it as such, filter dominated points, and read the swept constraint's dual (LP/QP only). **Picking a method:** weighted-sum for a quick convex sketch or when you know the frontier is convex (e.g. a pure-LP/QP tradeoff); ε-constraint when the problem is MILP, when the frontier may be non-convex, or when the user needs a faithful and complete curve. -## Step 3 — sweep, collect, and filter +## Step 4 — sweep, collect, and filter ```text frontier = [] @@ -108,13 +112,13 @@ sort the survivors to form the frontier Practical notes: -- **Warm-start LP sweeps.** For an LP frontier, reuse the previous solve's PDLP warmstart data to cut solve time (`getWarmstartData` → `set_pdlp_warm_start_data`). Per cuOpt this is **LP-only**: a MILP solve doesn't take a PDLP warmstart (you can optionally seed a MIP start instead). See `cuopt-numerical-optimization-api-python`. +- **Warm-start LP sweeps.** For an LP frontier, carry the previous solve's PDLP warmstart data into the next to cut solve time. Per cuOpt this is **LP-only**: a MILP solve doesn't take a PDLP warmstart (you can optionally seed a MIP start instead). See `cuopt-numerical-optimization-api-python` for the calls. - **Cap each MILP solve.** Set a per-solve time limit on MILP sweeps (see `cuopt-numerical-optimization-api-python`) — a sweep is many solves, and branch-and-bound can over-spend certifying optimality past a tiny gap, while cuOpt sets no limit by default and won't warn. Report the points as optimal *to the gap you set*, not certified optimal. - **Filter dominated points.** A correct sweep can still emit dominated points (especially weighted-sum near the hull, or MILP). Drop them; they are not part of the frontier. - **Resolution is a budget.** Curve fidelity trades against solve count. Start coarse to see the shape, then refine the grid only where the curve bends. - **Verify, don't assume.** When you claim one method beats another, measure it — e.g. count the efficient points ε-constraint recovered that weighted-sum missed — rather than asserting it; and flag any solve returning feasible-but-not-`Optimal` so a non-certified point is never read as exact. -## Step 4 — interpret the frontier (discipline) +## Step 5 — interpret the frontier (discipline) Producing the curve is half the work; reading it correctly is the other half. @@ -123,10 +127,6 @@ Producing the curve is half the work; reading it correctly is the other half. - **Treat dominated or gappy output as a diagnostic.** If dominated points survive filtering, or the frontier is implausibly sparse or perfectly linear, suspect the sweep or the model — most often weighted-sum hiding a concave region (switch to ε-constraint) or a normalization mistake. - **State the weighting/ε you used.** Every reported point is conditional on its scalarization. Make that explicit so a single solve is never mistaken for "the" optimum. -## Getting each objective right - -An informative frontier needs objectives that genuinely conflict: if they don't pull against each other, it collapses to a single point with nothing to trade off. And each objective has to be formulated correctly, since a wrong form, sense, or scale distorts the tradeoff and shifts where the knee falls. Formulate each one with `cuopt-numerical-optimization-formulation` before sweeping. - ## Interfaces This skill is solver- and interface-agnostic. The per-solve mechanics — building the objective, adding the ε constraints, passing a warm start, reading status — live in the API skills: From 2ecb994100e5fe6c312bf45fb735a28fd5d28dca Mon Sep 17 00:00:00 2001 From: cafzal Date: Wed, 3 Jun 2026 17:00:36 -0700 Subject: [PATCH 18/20] Remove throat-clearing intro line from Step 5 (interpret) Signed-off-by: cafzal --- skills/cuopt-multi-objective-exploration/SKILL.md | 2 -- 1 file changed, 2 deletions(-) diff --git a/skills/cuopt-multi-objective-exploration/SKILL.md b/skills/cuopt-multi-objective-exploration/SKILL.md index 4f61f78b2..fc25b33f3 100644 --- a/skills/cuopt-multi-objective-exploration/SKILL.md +++ b/skills/cuopt-multi-objective-exploration/SKILL.md @@ -120,8 +120,6 @@ Practical notes: ## Step 5 — interpret the frontier (discipline) -Producing the curve is half the work; reading it correctly is the other half. - - **Report tradeoffs, not single numbers.** A frontier point means nothing in isolation. Quote the exchange rate — "≈ $4k of extra cost per 1% of added coverage in this region" — so the user can judge whether a move is worth it. - **Flag knee points; don't auto-pick them.** The "knee" is where the curve bends most sharply — beyond it you pay a lot for a little. It's often the best-balanced compromise and worth highlighting, but the final choice is the user's preference, not a rule. - **Treat dominated or gappy output as a diagnostic.** If dominated points survive filtering, or the frontier is implausibly sparse or perfectly linear, suspect the sweep or the model — most often weighted-sum hiding a concave region (switch to ε-constraint) or a normalization mistake. From 93bd461e0db07bf47ece7faa6b39faf67b4d24a3 Mon Sep 17 00:00:00 2001 From: cafzal Date: Wed, 3 Jun 2026 17:27:15 -0700 Subject: [PATCH 19/20] Drop vestigial (discipline) parenthetical from Step 5 heading Signed-off-by: cafzal --- skills/cuopt-multi-objective-exploration/SKILL.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/skills/cuopt-multi-objective-exploration/SKILL.md b/skills/cuopt-multi-objective-exploration/SKILL.md index fc25b33f3..cd22ab33d 100644 --- a/skills/cuopt-multi-objective-exploration/SKILL.md +++ b/skills/cuopt-multi-objective-exploration/SKILL.md @@ -118,7 +118,7 @@ Practical notes: - **Resolution is a budget.** Curve fidelity trades against solve count. Start coarse to see the shape, then refine the grid only where the curve bends. - **Verify, don't assume.** When you claim one method beats another, measure it — e.g. count the efficient points ε-constraint recovered that weighted-sum missed — rather than asserting it; and flag any solve returning feasible-but-not-`Optimal` so a non-certified point is never read as exact. -## Step 5 — interpret the frontier (discipline) +## Step 5 — interpret the frontier - **Report tradeoffs, not single numbers.** A frontier point means nothing in isolation. Quote the exchange rate — "≈ $4k of extra cost per 1% of added coverage in this region" — so the user can judge whether a move is worth it. - **Flag knee points; don't auto-pick them.** The "knee" is where the curve bends most sharply — beyond it you pay a lot for a little. It's often the best-balanced compromise and worth highlighting, but the final choice is the user's preference, not a rule. From abbafe7b046c40f6ef39677e9b2f6ddb76c92b74 Mon Sep 17 00:00:00 2001 From: cafzal Date: Thu, 4 Jun 2026 09:48:31 -0700 Subject: [PATCH 20/20] Remove formulation skill's reference to cuopt-multi-objective-exploration (keep skills independent, per review) Signed-off-by: cafzal --- skills/cuopt-numerical-optimization-formulation/SKILL.md | 6 ------ 1 file changed, 6 deletions(-) diff --git a/skills/cuopt-numerical-optimization-formulation/SKILL.md b/skills/cuopt-numerical-optimization-formulation/SKILL.md index fc4771b44..08a4335c0 100644 --- a/skills/cuopt-numerical-optimization-formulation/SKILL.md +++ b/skills/cuopt-numerical-optimization-formulation/SKILL.md @@ -228,12 +228,6 @@ Goal programming optimizes multiple objectives in priority order. Implement it a Deviation variables (d⁻, d⁺) and slack/idle-time variables are always **continuous**. However, **decision variables must still be INTEGER when they represent discrete/countable quantities** (units produced, vehicles, workers, etc.). Do not let the presence of continuous deviation variables cause you to make all variables continuous — the integrality of decision variables directly affects feasibility and objective values. -### Multiple objectives with no fixed priority - -Goal programming (above) needs a **priority order** and returns **one** prioritized solution. When objectives genuinely conflict and there is **no fixed priority or weighting** — the user wants to see the tradeoffs and choose — don't pick one weighting up front. Trace the **Pareto frontier**: keep one objective and sweep the others as parametric ε-constraints (or sweep weighted-sum weights), then filter to the non-dominated set. On integer / non-convex problems prefer ε-constraint — weighted-sum provably misses unsupported efficient points. - -For the full workflow (anchor each objective → sweep → filter → read the frontier with exchange rates and the knee) see the **`cuopt-multi-objective-exploration`** skill. - ### Multi-period inventory / purchasing models In problems with buying, selling, and warehouse capacity over multiple periods, decide which capacity constraints to include based on the problem's timing assumptions.