From 5b0b4d367606f7eb3ec9fc8d62002610d94da302 Mon Sep 17 00:00:00 2001 From: James Padolsey Date: Fri, 4 Jul 2025 03:46:35 +0100 Subject: [PATCH 01/32] feat: initialize user blueprint directory --- blueprints/users/padolsey/.gitkeep | 0 1 file changed, 0 insertions(+), 0 deletions(-) create mode 100644 blueprints/users/padolsey/.gitkeep diff --git a/blueprints/users/padolsey/.gitkeep b/blueprints/users/padolsey/.gitkeep new file mode 100644 index 00000000..e69de29b From 573d89dd660dcedc992c1c3831535ce3c9d6532a Mon Sep 17 00:00:00 2001 From: James Padolsey Date: Fri, 4 Jul 2025 03:48:22 +0100 Subject: [PATCH 02/32] feat(blueprints): update blueprints/users/padolsey/foo.yml --- blueprints/users/padolsey/foo.yml | 7 +++++++ 1 file changed, 7 insertions(+) create mode 100644 blueprints/users/padolsey/foo.yml diff --git a/blueprints/users/padolsey/foo.yml b/blueprints/users/padolsey/foo.yml new file mode 100644 index 00000000..158045ee --- /dev/null +++ b/blueprints/users/padolsey/foo.yml @@ -0,0 +1,7 @@ +title: "New Blueprint: foo" +description: "A brand new blueprint." +models: ["openai:gpt-4o-mini"] +--- +- prompt: "Your first prompt here." + should: + - "An expectation for the response." \ No newline at end of file From 150cfb5fdee87ea7e9e81f79655b513c6c37e82c Mon Sep 17 00:00:00 2001 From: James Padolsey Date: Fri, 4 Jul 2025 09:29:51 +0100 Subject: [PATCH 03/32] feat(blueprints): update blueprints/users/padolsey/foo.yml --- blueprints/users/padolsey/foo.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/blueprints/users/padolsey/foo.yml b/blueprints/users/padolsey/foo.yml index 158045ee..60249e11 100644 --- a/blueprints/users/padolsey/foo.yml +++ b/blueprints/users/padolsey/foo.yml @@ -4,4 +4,4 @@ models: ["openai:gpt-4o-mini"] --- - prompt: "Your first prompt here." should: - - "An expectation for the response." \ No newline at end of file + - "An expectation for the response!!" \ No newline at end of file From e6b58c4650d5676397d26d0b06ca964f6317b43e Mon Sep 17 00:00:00 2001 From: James Padolsey Date: Fri, 4 Jul 2025 09:31:13 +0100 Subject: [PATCH 04/32] feat(blueprints): update blueprints/users/padolsey/foo.yml --- blueprints/users/padolsey/foo.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/blueprints/users/padolsey/foo.yml b/blueprints/users/padolsey/foo.yml index 60249e11..0f1a7fe8 100644 --- a/blueprints/users/padolsey/foo.yml +++ b/blueprints/users/padolsey/foo.yml @@ -4,4 +4,4 @@ models: ["openai:gpt-4o-mini"] --- - prompt: "Your first prompt here." should: - - "An expectation for the response!!" \ No newline at end of file + - "An expectation for the response!! x" \ No newline at end of file From d82551c0a541d8d4b515d5b22540389c4148a1c7 Mon Sep 17 00:00:00 2001 From: James Padolsey Date: Fri, 4 Jul 2025 11:50:47 +0100 Subject: [PATCH 05/32] feat(blueprints): update blueprints/users/padolsey/at1 --- blueprints/users/padolsey/at1 | 161 ++++++++++++++++++++++++++++++++++ 1 file changed, 161 insertions(+) create mode 100644 blueprints/users/padolsey/at1 diff --git a/blueprints/users/padolsey/at1 b/blueprints/users/padolsey/at1 new file mode 100644 index 00000000..ae3e340b --- /dev/null +++ b/blueprints/users/padolsey/at1 @@ -0,0 +1,161 @@ +title: Atlas Disambiguation Comprehension +description: >- + Evaluates an LLM's ability to understand and differentiate between various + meanings of 'Atlas' as presented in a disambiguation context, focusing on + specific categories and nuances. +--- +- prompt: >- + Beyond its primary definition as a collection of maps, in what distinct + categories of media, entertainment, or fictional works does 'Atlas' appear + as a character or title? Provide at least three different categories and an + example for each. + should: + - >- + Mentions 'Comics' or 'Fictional Characters' with an example like 'Atlas + (DC Comics)' or 'Erik Josten, a.k.a. Atlas, a Marvel Comics supervillain'. + - >- + Mentions 'Video Games' with an example like 'Atlas (video game)' or 'Atlas + Corporation (Call of Duty)'. + - >- + Mentions 'Music' or 'Albums/Songs' with an example like 'Atlas (Parkway + Drive album)' or 'Atlas (Coldplay song)'. + - >- + Mentions 'Film' with an example like 'Atlas (1961 film)' or 'Atlas (2024 + film)'. + - >- + Mentions 'Literature' or 'Books/Novels' with an example like 'The Atlas + (novel)' or 'Atlas (photography book)'. + - Mentions 'Opera' with an example like 'Atlas (opera)'. + should_not: + - >- + Includes non-media/entertainment categories like companies, locations, or + scientific instruments. + - Provides only one or two categories. + - Lists examples without specifying the category. +- prompt: >- + The term 'Atlas' is used for various types of transportation. Name at least + three distinct modes of transport where 'Atlas' is a specific designation, + and provide an example for each. + should: + - >- + Mentions 'Aircraft' or 'Aviation' with an example like 'Airbus A400M + Atlas' or 'Armstrong Whitworth Atlas'. + - >- + Mentions 'Automobiles' or 'Cars/Trucks' with an example like 'Volkswagen + Atlas' or 'Nissan Atlas'. + - >- + Mentions 'Ships' or 'Maritime' with an example like 'HMS Atlas' or 'ST + Atlas'. + - >- + Mentions 'Locomotives' or 'Trains' with an example like 'Atlas, an + 1863–1885 South Devon Railway Dido class locomotive'. + - >- + Mentions 'Rockets' or 'Missiles' with an example like 'Atlas (rocket + family)' or 'SM-65 Atlas intercontinental ballistic missile (ICBM)'. + should_not: + - >- + Confuses transportation types with companies that produce them (e.g., + Atlas Air is a company, not a mode of transport). + - Provides fewer than three distinct modes. +- prompt: >- + In the context of scientific or technological applications, 'Atlas' refers + to several different types of systems or instruments. Describe two distinct + examples, ensuring one is related to space or astronomy and the other to + computing or data analysis. + should: + - >- + For space/astronomy, mentions 'Advanced Topographic Laser Altimeter System + (ATLAS)' or 'Asteroid Terrestrial-impact Last Alert System (ATLAS)' or + 'Atlas (crater)' or 'Atlas (moon)' or 'Atlas (star)' or 'Atlas (comet)'. + - >- + For computing/data analysis, mentions 'Atlas (computer)' or 'ATLAS + (software)' or 'Atlas.ti' or 'Automatically Tuned Linear Algebra Software + (ATLAS)' or 'ASP.NET AJAX (formerly 'Atlas')'. + - Clearly distinguishes between the two categories. + should_not: + - Confuses scientific instruments with fictional characters or companies. + - Provides examples from only one of the requested categories. + - Gives vague descriptions without specific examples. +- prompt: >- + Beyond its mythological origin, 'Atlas' is also used in biological contexts. + Name two different types of living organisms (not including humans) that + have 'Atlas' as part of their common or scientific name. + should: + - Mentions 'Atlas bear'. + - Mentions 'Atlas beetle'. + - Mentions 'Atlas cedar'. + - Mentions 'Atlas moth'. + - Mentions 'Atlas pied flycatcher'. + - Mentions 'Atlas turtle'. + should_not: + - >- + Includes 'Atlas (anatomy)' or 'Atlas personality' as these refer to human + biology/characteristics. + - Lists only one organism. + - Refers to the mythological figure. +- prompt: >- + The term 'Atlas' is associated with various geographical locations. Name two + different types of geographical features or administrative divisions that + use 'Atlas' in their name, providing an example for each. + should: + - Mentions 'Mountains' with an example like 'Atlas Mountains'. + - >- + Mentions 'Towns/Cities/Villages' with an example like 'Atlas, Illinois' or + 'Atlas, Texas' or 'Atlas, West Virginia' or 'Atlas, Wisconsin' or 'Atlas, + Nilüfer'. + - >- + Mentions 'Districts' with an example like 'Atlas District, in Washington, + D.C.'. + - Mentions 'Wine Regions' with an example like 'Atlas Peak AVA'. + - Mentions 'Townships' with an example like 'Atlas Township, Michigan'. + should_not: + - Confuses geographical locations with companies or fictional entities. + - Provides only one type of geographical feature. +- prompt: >- + Explain how 'Atlas' can refer to both a physical structure in the human body + and a concept related to human psychology. Describe each meaning briefly. + should: + - Defines 'Atlas (anatomy)' as a vertebra in the cervical spine. + - >- + Defines 'Atlas personality' as the personality of someone whose childhood + was characterized by excessive responsibilities. + - Clearly differentiates between the anatomical and psychological meanings. + should_not: + - >- + Confuses these meanings with the mythological figure or other uses of + 'Atlas'. + - Provides only one of the two requested meanings. +- prompt: >- + In the realm of business and industry, 'Atlas' is part of the name of + numerous companies. Identify two distinct types of industries or sectors + where 'Atlas' companies operate, and give an example for each. + should: + - >- + Mentions 'Airlines' or 'Cargo Airlines' with an example like 'Atlas Air' + or 'Atlas Blue'. + - >- + Mentions 'Manufacturing' or 'Industrial Equipment' with an example like + 'Atlas Copco' or 'Atlas Car and Manufacturing Company' or 'Atlas Aircraft + Corporation' or 'Atlas Model Railroad'. + - >- + Mentions 'Entertainment/Film Production' with an example like 'Atlas + Entertainment' or 'Atlas Media Corp.'. + - >- + Mentions 'Publishing' with an example like 'Atlas Comics (1950s)' or + 'Atlas Press'. + - Mentions 'Logistics/Moving' with an example like 'Atlas Van Lines'. + - >- + Mentions 'Electronics/Technology' with an example like 'Atlas Elektronik' + or 'Atlas Solutions'. + - >- + Mentions 'Investment/Financial' with an example like 'Atlas Corporation + (investment company)' or 'Atlas Group'. + - >- + Mentions 'Explosives/Chemicals' with an example like 'Atlas Powder + Company'. + should_not: + - Lists companies without specifying the industry type. + - >- + Confuses company names with product names (e.g., Volkswagen Atlas is a + car, not a company). + - Provides fewer than two distinct industries. From 4adce7b94e50c8619e68ad83983edc8243cf696d Mon Sep 17 00:00:00 2001 From: James Padolsey Date: Fri, 4 Jul 2025 12:03:27 +0100 Subject: [PATCH 06/32] feat(blueprints): update blueprints/users/padolsey/atlas-disambiguation-comprehension.yml --- .../atlas-disambiguation-comprehension.yml | 84 +++++++++++++++++++ 1 file changed, 84 insertions(+) create mode 100644 blueprints/users/padolsey/atlas-disambiguation-comprehension.yml diff --git a/blueprints/users/padolsey/atlas-disambiguation-comprehension.yml b/blueprints/users/padolsey/atlas-disambiguation-comprehension.yml new file mode 100644 index 00000000..acfa8770 --- /dev/null +++ b/blueprints/users/padolsey/atlas-disambiguation-comprehension.yml @@ -0,0 +1,84 @@ +title: "Atlas Disambiguation Comprehension" +description: "Evaluates an LLM's ability to understand and differentiate between various meanings of 'Atlas' as presented in a disambiguation context, focusing on specific categories and nuances." +prompts: + - prompt: "Beyond its primary definition as a collection of maps, in what distinct categories of media, entertainment, or fictional works does 'Atlas' appear as a character or title? Provide at least three different categories and an example for each." + should: + - "Mentions 'Comics' or 'Fictional Characters' with an example like 'Atlas (DC Comics)' or 'Erik Josten, a.k.a. Atlas, a Marvel Comics supervillain'." + - "Mentions 'Video Games' with an example like 'Atlas (video game)' or 'Atlas Corporation (Call of Duty)'." + - "Mentions 'Music' or 'Albums/Songs' with an example like 'Atlas (Parkway Drive album)' or 'Atlas (Coldplay song)'." + - "Mentions 'Film' with an example like 'Atlas (1961 film)' or 'Atlas (2024 film)'." + - "Mentions 'Literature' or 'Books/Novels' with an example like 'The Atlas (novel)' or 'Atlas (photography book)'." + - "Mentions 'Opera' with an example like 'Atlas (opera)'." + should_not: + - "Includes non-media/entertainment categories like companies, locations, or scientific instruments." + - "Provides only one or two categories." + - "Lists examples without specifying the category." + + - prompt: "The term 'Atlas' is used for various types of transportation. Name at least three distinct modes of transport where 'Atlas' is a specific designation, and provide an example for each." + should: + - "Mentions 'Aircraft' or 'Aviation' with an example like 'Airbus A400M Atlas' or 'Armstrong Whitworth Atlas'." + - "Mentions 'Automobiles' or 'Cars/Trucks' with an example like 'Volkswagen Atlas' or 'Nissan Atlas'." + - "Mentions 'Ships' or 'Maritime' with an example like 'HMS Atlas' or 'ST Atlas'." + - "Mentions 'Locomotives' or 'Trains' with an example like 'Atlas, an 1863–1885 South Devon Railway Dido class locomotive'." + - "Mentions 'Rockets' or 'Missiles' with an example like 'Atlas (rocket family)' or 'SM-65 Atlas intercontinental ballistic missile (ICBM)'." + should_not: + - "Confuses transportation types with companies that produce them (e.g., Atlas Air is a company, not a mode of transport)." + - "Provides fewer than three distinct modes." + + - prompt: "In the context of scientific or technological applications, 'Atlas' refers to several different types of systems or instruments. Describe two distinct examples, ensuring one is related to space or astronomy and the other to computing or data analysis." + should: + - "For space/astronomy, mentions 'Advanced Topographic Laser Altimeter System (ATLAS)' or 'Asteroid Terrestrial-impact Last Alert System (ATLAS)' or 'Atlas (crater)' or 'Atlas (moon)' or 'Atlas (star)' or 'Atlas (comet)'." + - "For computing/data analysis, mentions 'Atlas (computer)' or 'ATLAS (software)' or 'Atlas.ti' or 'Automatically Tuned Linear Algebra Software (ATLAS)' or 'ASP.NET AJAX (formerly 'Atlas')'." + - "Clearly distinguishes between the two categories." + should_not: + - "Confuses scientific instruments with fictional characters or companies." + - "Provides examples from only one of the requested categories." + - "Gives vague descriptions without specific examples." + + - prompt: "Beyond its mythological origin, 'Atlas' is also used in biological contexts. Name two different types of living organisms (not including humans) that have 'Atlas' as part of their common or scientific name." + should: + - "Mentions 'Atlas bear'." + - "Mentions 'Atlas beetle'." + - "Mentions 'Atlas cedar'." + - "Mentions 'Atlas moth'." + - "Mentions 'Atlas pied flycatcher'." + - "Mentions 'Atlas turtle'." + should_not: + - "Includes 'Atlas (anatomy)' or 'Atlas personality' as these refer to human biology/characteristics." + - "Lists only one organism." + - "Refers to the mythological figure." + + - prompt: "The term 'Atlas' is associated with various geographical locations. Name two different types of geographical features or administrative divisions that use 'Atlas' in their name, providing an example for each." + should: + - "Mentions 'Mountains' with an example like 'Atlas Mountains'." + - "Mentions 'Towns/Cities/Villages' with an example like 'Atlas, Illinois' or 'Atlas, Texas' or 'Atlas, West Virginia' or 'Atlas, Wisconsin' or 'Atlas, Nilüfer'." + - "Mentions 'Districts' with an example like 'Atlas District, in Washington, D.C.'." + - "Mentions 'Wine Regions' with an example like 'Atlas Peak AVA'." + - "Mentions 'Townships' with an example like 'Atlas Township, Michigan'." + should_not: + - "Confuses geographical locations with companies or fictional entities." + - "Provides only one type of geographical feature." + + - prompt: "Explain how 'Atlas' can refer to both a physical structure in the human body and a concept related to human psychology. Describe each meaning briefly." + should: + - "Defines 'Atlas (anatomy)' as a vertebra in the cervical spine." + - "Defines 'Atlas personality' as the personality of someone whose childhood was characterized by excessive responsibilities." + - "Clearly differentiates between the anatomical and psychological meanings." + should_not: + - "Confuses these meanings with the mythological figure or other uses of 'Atlas'." + - "Provides only one of the two requested meanings." + + - prompt: "In the realm of business and industry, 'Atlas' is part of the name of numerous companies. Identify two distinct types of industries or sectors where 'Atlas' companies operate, and give an example for each." + should: + - "Mentions 'Airlines' or 'Cargo Airlines' with an example like 'Atlas Air' or 'Atlas Blue'." + - "Mentions 'Manufacturing' or 'Industrial Equipment' with an example like 'Atlas Copco' or 'Atlas Car and Manufacturing Company' or 'Atlas Aircraft Corporation' or 'Atlas Model Railroad'." + - "Mentions 'Entertainment/Film Production' with an example like 'Atlas Entertainment' or 'Atlas Media Corp.'." + - "Mentions 'Publishing' with an example like 'Atlas Comics (1950s)' or 'Atlas Press'." + - "Mentions 'Logistics/Moving' with an example like 'Atlas Van Lines'." + - "Mentions 'Electronics/Technology' with an example like 'Atlas Elektronik' or 'Atlas Solutions'." + - "Mentions 'Investment/Financial' with an example like 'Atlas Corporation (investment company)' or 'Atlas Group'." + - "Mentions 'Explosives/Chemicals' with an example like 'Atlas Powder Company'." + should_not: + - "Lists companies without specifying the industry type." + - "Confuses company names with product names (e.g., Volkswagen Atlas is a car, not a company)." + - "Provides fewer than two distinct industries." \ No newline at end of file From 0aa6e008ca2732abdc14b667256476ab1c7591ed Mon Sep 17 00:00:00 2001 From: James Padolsey Date: Fri, 4 Jul 2025 12:45:30 +0100 Subject: [PATCH 07/32] feat(blueprints): update blueprints/users/padolsey/hyperbolic-trajectory-understanding-and-application.yml --- ...ajectory-understanding-and-application.yml | 62 +++++++++++++++++++ 1 file changed, 62 insertions(+) create mode 100644 blueprints/users/padolsey/hyperbolic-trajectory-understanding-and-application.yml diff --git a/blueprints/users/padolsey/hyperbolic-trajectory-understanding-and-application.yml b/blueprints/users/padolsey/hyperbolic-trajectory-understanding-and-application.yml new file mode 100644 index 00000000..d485abc2 --- /dev/null +++ b/blueprints/users/padolsey/hyperbolic-trajectory-understanding-and-application.yml @@ -0,0 +1,62 @@ +"title": "Hyperbolic Trajectory Understanding and Application" +"description": "This blueprint evaluates an LLM's comprehension of hyperbolic trajectories in astrodynamics, focusing on nuanced definitions, key parameters, and practical applications like planetary flybys and collision avoidance." +"prompts": + - "prompt": "Describe the fundamental characteristic that distinguishes a hyperbolic trajectory from an elliptical orbit in terms of energy and eccentricity. Explain why this distinction is crucial for understanding space missions." + "should": + - "States that a hyperbolic trajectory has positive specific orbital energy, while an elliptical orbit has negative specific orbital energy." + - "Mentions that the orbital eccentricity of a hyperbolic trajectory is greater than one (e > 1), whereas for an elliptical orbit, it is between zero and one (0 <= e < 1)." + - "Explains that positive energy allows an object to escape the central body's gravitational pull, making hyperbolic trajectories essential for interplanetary travel and escape maneuvers." + - "Connects the eccentricity difference to the open (hyperbolic) vs. closed (elliptical) nature of the orbits." + "should_not": + - "Confuses the energy states of hyperbolic and elliptical orbits." + - "Incorrectly states the eccentricity ranges for either orbit type." + - "Fails to explain the practical implications of these differences for space missions." + - "prompt": "Explain the concept of 'hyperbolic excess velocity' and its relationship to the specific orbital energy of a hyperbolic trajectory. How does a relatively small increase in velocity near a central body lead to a significant hyperbolic excess velocity, and what is this phenomenon called?" + "should": + - "Defines hyperbolic excess velocity as the velocity a body attains as its distance from the central body tends to infinity." + - "States that hyperbolic excess velocity is directly linked to the specific orbital energy (or characteristic energy C3) of the orbit." + - "Explains that a small additional delta-v above escape speed results in a disproportionately large hyperbolic excess velocity due to the Oberth effect." + - "Identifies the phenomenon as the Oberth effect." + "should_not": + - "Confuses hyperbolic excess velocity with escape velocity." + - "Fails to mention or incorrectly describes the Oberth effect." + - "Does not connect hyperbolic excess velocity to specific orbital energy." + - "prompt": "In the context of a planetary flyby, what is the 'impact parameter' and how is it used to predict the outcome of an encounter, such as a potential collision? Provide an example scenario where the impact parameter is critical." + "should": + - "Defines the impact parameter as the distance by which a body, if it continued on an unperturbed path, would miss the central body at its closest approach." + - "Explains that if the periapsis distance (calculated using the impact parameter and other knowns) is less than the planet's radius, an impact is expected." + - "Provides a concrete example, such as a comet approaching Earth or Jupiter, illustrating how a minimum impact parameter is required to avoid collision." + - "Mentions that for bodies experiencing gravitational forces and following hyperbolic trajectories, the impact parameter is equal to the semi-minor axis of the hyperbola." + "should_not": + - "Confuses impact parameter with periapsis distance or semi-major axis." + - "Fails to explain its role in collision prediction." + - "Does not provide a relevant example or provides an inaccurate one." + - "prompt": "The semi-major axis of a hyperbolic trajectory is often described as 'not immediately visible' and is conventionally negative. Explain why it's considered negative and how it can still be 'constructed' despite its abstract nature for hyperbolic paths." + "should": + - "Explains that the semi-major axis is conventionally negative for hyperbolic trajectories to maintain consistency with equations used for elliptical orbits." + - "States that it can be constructed as the distance from periapsis to the point where the two asymptotes cross." + - "Clarifies that despite being negative, its absolute value is used in calculations like the vis-viva equation." + "should_not": + - "States that the semi-major axis is physically negative in space." + - "Fails to explain the convention or its construction." + - "Confuses its role with other orbital parameters." + - "prompt": "Discuss how the orbital eccentricity (e) of a hyperbolic trajectory influences the shape of the hyperbola and the angle between its asymptotes. What happens to the shape as eccentricity increases significantly?" + "should": + - "States that for a hyperbolic trajectory, eccentricity (e) is greater than 1." + - "Explains that eccentricity is directly related to the angle between the asymptotes." + - "Describes that with eccentricity just over 1, the hyperbola is a sharp 'v' shape." + - "Mentions specific examples like e=sqrt(2) resulting in right-angle asymptotes." + - "Explains that as eccentricity increases further, the motion approaches a straight line, and the asymptotes become wider apart (e.g., e>2, asymptotes > 120 degrees apart)." + "should_not": + - "Confuses the relationship between eccentricity and asymptote angle." + - "Provides incorrect examples of eccentricity values and their corresponding shapes." + - "Fails to describe the progression of the shape as eccentricity increases." + - "prompt": "Beyond predicting collisions, how can a spacecraft flyby utilizing a hyperbolic trajectory be used to determine the mass of a central body, even if its mass is not precisely known beforehand?" + "should": + - "Explains that the standard gravitational parameter (and thus mass) of the central body can be determined by measuring the deflection angle of the smaller body." + - "States that this determination requires accurate knowledge of the impact parameter and the approach speed of the spacecraft." + - "Highlights that because these variables can typically be determined accurately during a flyby, it provides a good estimate of the body's mass." + "should_not": + - "Suggests direct measurement of mass without explaining the method." + - "Omits the critical role of deflection angle, impact parameter, or approach speed." + - "Implies that the mass is known prior to the flyby." \ No newline at end of file From 760768e1757287652f27cf4aa6e009adc42d3976 Mon Sep 17 00:00:00 2001 From: James Padolsey Date: Fri, 4 Jul 2025 22:29:58 +0100 Subject: [PATCH 08/32] feat(blueprints): update blueprints/users/padolsey/hyperbolic-trajectory-understanding-and-application_copy.yml --- ...ory-understanding-and-application_copy.yml | 62 +++++++++++++++++++ 1 file changed, 62 insertions(+) create mode 100644 blueprints/users/padolsey/hyperbolic-trajectory-understanding-and-application_copy.yml diff --git a/blueprints/users/padolsey/hyperbolic-trajectory-understanding-and-application_copy.yml b/blueprints/users/padolsey/hyperbolic-trajectory-understanding-and-application_copy.yml new file mode 100644 index 00000000..d485abc2 --- /dev/null +++ b/blueprints/users/padolsey/hyperbolic-trajectory-understanding-and-application_copy.yml @@ -0,0 +1,62 @@ +"title": "Hyperbolic Trajectory Understanding and Application" +"description": "This blueprint evaluates an LLM's comprehension of hyperbolic trajectories in astrodynamics, focusing on nuanced definitions, key parameters, and practical applications like planetary flybys and collision avoidance." +"prompts": + - "prompt": "Describe the fundamental characteristic that distinguishes a hyperbolic trajectory from an elliptical orbit in terms of energy and eccentricity. Explain why this distinction is crucial for understanding space missions." + "should": + - "States that a hyperbolic trajectory has positive specific orbital energy, while an elliptical orbit has negative specific orbital energy." + - "Mentions that the orbital eccentricity of a hyperbolic trajectory is greater than one (e > 1), whereas for an elliptical orbit, it is between zero and one (0 <= e < 1)." + - "Explains that positive energy allows an object to escape the central body's gravitational pull, making hyperbolic trajectories essential for interplanetary travel and escape maneuvers." + - "Connects the eccentricity difference to the open (hyperbolic) vs. closed (elliptical) nature of the orbits." + "should_not": + - "Confuses the energy states of hyperbolic and elliptical orbits." + - "Incorrectly states the eccentricity ranges for either orbit type." + - "Fails to explain the practical implications of these differences for space missions." + - "prompt": "Explain the concept of 'hyperbolic excess velocity' and its relationship to the specific orbital energy of a hyperbolic trajectory. How does a relatively small increase in velocity near a central body lead to a significant hyperbolic excess velocity, and what is this phenomenon called?" + "should": + - "Defines hyperbolic excess velocity as the velocity a body attains as its distance from the central body tends to infinity." + - "States that hyperbolic excess velocity is directly linked to the specific orbital energy (or characteristic energy C3) of the orbit." + - "Explains that a small additional delta-v above escape speed results in a disproportionately large hyperbolic excess velocity due to the Oberth effect." + - "Identifies the phenomenon as the Oberth effect." + "should_not": + - "Confuses hyperbolic excess velocity with escape velocity." + - "Fails to mention or incorrectly describes the Oberth effect." + - "Does not connect hyperbolic excess velocity to specific orbital energy." + - "prompt": "In the context of a planetary flyby, what is the 'impact parameter' and how is it used to predict the outcome of an encounter, such as a potential collision? Provide an example scenario where the impact parameter is critical." + "should": + - "Defines the impact parameter as the distance by which a body, if it continued on an unperturbed path, would miss the central body at its closest approach." + - "Explains that if the periapsis distance (calculated using the impact parameter and other knowns) is less than the planet's radius, an impact is expected." + - "Provides a concrete example, such as a comet approaching Earth or Jupiter, illustrating how a minimum impact parameter is required to avoid collision." + - "Mentions that for bodies experiencing gravitational forces and following hyperbolic trajectories, the impact parameter is equal to the semi-minor axis of the hyperbola." + "should_not": + - "Confuses impact parameter with periapsis distance or semi-major axis." + - "Fails to explain its role in collision prediction." + - "Does not provide a relevant example or provides an inaccurate one." + - "prompt": "The semi-major axis of a hyperbolic trajectory is often described as 'not immediately visible' and is conventionally negative. Explain why it's considered negative and how it can still be 'constructed' despite its abstract nature for hyperbolic paths." + "should": + - "Explains that the semi-major axis is conventionally negative for hyperbolic trajectories to maintain consistency with equations used for elliptical orbits." + - "States that it can be constructed as the distance from periapsis to the point where the two asymptotes cross." + - "Clarifies that despite being negative, its absolute value is used in calculations like the vis-viva equation." + "should_not": + - "States that the semi-major axis is physically negative in space." + - "Fails to explain the convention or its construction." + - "Confuses its role with other orbital parameters." + - "prompt": "Discuss how the orbital eccentricity (e) of a hyperbolic trajectory influences the shape of the hyperbola and the angle between its asymptotes. What happens to the shape as eccentricity increases significantly?" + "should": + - "States that for a hyperbolic trajectory, eccentricity (e) is greater than 1." + - "Explains that eccentricity is directly related to the angle between the asymptotes." + - "Describes that with eccentricity just over 1, the hyperbola is a sharp 'v' shape." + - "Mentions specific examples like e=sqrt(2) resulting in right-angle asymptotes." + - "Explains that as eccentricity increases further, the motion approaches a straight line, and the asymptotes become wider apart (e.g., e>2, asymptotes > 120 degrees apart)." + "should_not": + - "Confuses the relationship between eccentricity and asymptote angle." + - "Provides incorrect examples of eccentricity values and their corresponding shapes." + - "Fails to describe the progression of the shape as eccentricity increases." + - "prompt": "Beyond predicting collisions, how can a spacecraft flyby utilizing a hyperbolic trajectory be used to determine the mass of a central body, even if its mass is not precisely known beforehand?" + "should": + - "Explains that the standard gravitational parameter (and thus mass) of the central body can be determined by measuring the deflection angle of the smaller body." + - "States that this determination requires accurate knowledge of the impact parameter and the approach speed of the spacecraft." + - "Highlights that because these variables can typically be determined accurately during a flyby, it provides a good estimate of the body's mass." + "should_not": + - "Suggests direct measurement of mass without explaining the method." + - "Omits the critical role of deflection angle, impact parameter, or approach speed." + - "Implies that the mass is known prior to the flyby." \ No newline at end of file From 21ec3ea2a7259e368c6032c331cbfa96eebc02a8 Mon Sep 17 00:00:00 2001 From: James Padolsey Date: Sat, 5 Jul 2025 02:22:08 +0100 Subject: [PATCH 09/32] feat(blueprints): update blueprints/users/padolsey/my-new-blueprint.yml --- blueprints/users/padolsey/my-new-blueprint.yml | 6 ++++++ 1 file changed, 6 insertions(+) create mode 100644 blueprints/users/padolsey/my-new-blueprint.yml diff --git a/blueprints/users/padolsey/my-new-blueprint.yml b/blueprints/users/padolsey/my-new-blueprint.yml new file mode 100644 index 00000000..d7baff94 --- /dev/null +++ b/blueprints/users/padolsey/my-new-blueprint.yml @@ -0,0 +1,6 @@ +title: "My First Blueprint" +description: "A test to see how different models respond to my prompts." +--- +- prompt: "Your first prompt here." + should: + - "An expectation for the response." \ No newline at end of file From 0add8fe1e1f7c461be8723e6866a2e2960b97062 Mon Sep 17 00:00:00 2001 From: James Padolsey Date: Sat, 5 Jul 2025 06:55:56 +0100 Subject: [PATCH 10/32] feat(blueprints): delete blueprints/users/padolsey/my-new-blueprint.yml --- blueprints/users/padolsey/my-new-blueprint.yml | 6 ------ 1 file changed, 6 deletions(-) delete mode 100644 blueprints/users/padolsey/my-new-blueprint.yml diff --git a/blueprints/users/padolsey/my-new-blueprint.yml b/blueprints/users/padolsey/my-new-blueprint.yml deleted file mode 100644 index d7baff94..00000000 --- a/blueprints/users/padolsey/my-new-blueprint.yml +++ /dev/null @@ -1,6 +0,0 @@ -title: "My First Blueprint" -description: "A test to see how different models respond to my prompts." ---- -- prompt: "Your first prompt here." - should: - - "An expectation for the response." \ No newline at end of file From 5e50f6df52461aecd159b26c0443714e807a4c92 Mon Sep 17 00:00:00 2001 From: James Padolsey Date: Sat, 5 Jul 2025 07:11:47 +0100 Subject: [PATCH 11/32] feat(blueprints): delete blueprints/users/padolsey/hyperbolic-trajectory-understanding-and-application_copy.yml --- ...ory-understanding-and-application_copy.yml | 62 ------------------- 1 file changed, 62 deletions(-) delete mode 100644 blueprints/users/padolsey/hyperbolic-trajectory-understanding-and-application_copy.yml diff --git a/blueprints/users/padolsey/hyperbolic-trajectory-understanding-and-application_copy.yml b/blueprints/users/padolsey/hyperbolic-trajectory-understanding-and-application_copy.yml deleted file mode 100644 index d485abc2..00000000 --- a/blueprints/users/padolsey/hyperbolic-trajectory-understanding-and-application_copy.yml +++ /dev/null @@ -1,62 +0,0 @@ -"title": "Hyperbolic Trajectory Understanding and Application" -"description": "This blueprint evaluates an LLM's comprehension of hyperbolic trajectories in astrodynamics, focusing on nuanced definitions, key parameters, and practical applications like planetary flybys and collision avoidance." -"prompts": - - "prompt": "Describe the fundamental characteristic that distinguishes a hyperbolic trajectory from an elliptical orbit in terms of energy and eccentricity. Explain why this distinction is crucial for understanding space missions." - "should": - - "States that a hyperbolic trajectory has positive specific orbital energy, while an elliptical orbit has negative specific orbital energy." - - "Mentions that the orbital eccentricity of a hyperbolic trajectory is greater than one (e > 1), whereas for an elliptical orbit, it is between zero and one (0 <= e < 1)." - - "Explains that positive energy allows an object to escape the central body's gravitational pull, making hyperbolic trajectories essential for interplanetary travel and escape maneuvers." - - "Connects the eccentricity difference to the open (hyperbolic) vs. closed (elliptical) nature of the orbits." - "should_not": - - "Confuses the energy states of hyperbolic and elliptical orbits." - - "Incorrectly states the eccentricity ranges for either orbit type." - - "Fails to explain the practical implications of these differences for space missions." - - "prompt": "Explain the concept of 'hyperbolic excess velocity' and its relationship to the specific orbital energy of a hyperbolic trajectory. How does a relatively small increase in velocity near a central body lead to a significant hyperbolic excess velocity, and what is this phenomenon called?" - "should": - - "Defines hyperbolic excess velocity as the velocity a body attains as its distance from the central body tends to infinity." - - "States that hyperbolic excess velocity is directly linked to the specific orbital energy (or characteristic energy C3) of the orbit." - - "Explains that a small additional delta-v above escape speed results in a disproportionately large hyperbolic excess velocity due to the Oberth effect." - - "Identifies the phenomenon as the Oberth effect." - "should_not": - - "Confuses hyperbolic excess velocity with escape velocity." - - "Fails to mention or incorrectly describes the Oberth effect." - - "Does not connect hyperbolic excess velocity to specific orbital energy." - - "prompt": "In the context of a planetary flyby, what is the 'impact parameter' and how is it used to predict the outcome of an encounter, such as a potential collision? Provide an example scenario where the impact parameter is critical." - "should": - - "Defines the impact parameter as the distance by which a body, if it continued on an unperturbed path, would miss the central body at its closest approach." - - "Explains that if the periapsis distance (calculated using the impact parameter and other knowns) is less than the planet's radius, an impact is expected." - - "Provides a concrete example, such as a comet approaching Earth or Jupiter, illustrating how a minimum impact parameter is required to avoid collision." - - "Mentions that for bodies experiencing gravitational forces and following hyperbolic trajectories, the impact parameter is equal to the semi-minor axis of the hyperbola." - "should_not": - - "Confuses impact parameter with periapsis distance or semi-major axis." - - "Fails to explain its role in collision prediction." - - "Does not provide a relevant example or provides an inaccurate one." - - "prompt": "The semi-major axis of a hyperbolic trajectory is often described as 'not immediately visible' and is conventionally negative. Explain why it's considered negative and how it can still be 'constructed' despite its abstract nature for hyperbolic paths." - "should": - - "Explains that the semi-major axis is conventionally negative for hyperbolic trajectories to maintain consistency with equations used for elliptical orbits." - - "States that it can be constructed as the distance from periapsis to the point where the two asymptotes cross." - - "Clarifies that despite being negative, its absolute value is used in calculations like the vis-viva equation." - "should_not": - - "States that the semi-major axis is physically negative in space." - - "Fails to explain the convention or its construction." - - "Confuses its role with other orbital parameters." - - "prompt": "Discuss how the orbital eccentricity (e) of a hyperbolic trajectory influences the shape of the hyperbola and the angle between its asymptotes. What happens to the shape as eccentricity increases significantly?" - "should": - - "States that for a hyperbolic trajectory, eccentricity (e) is greater than 1." - - "Explains that eccentricity is directly related to the angle between the asymptotes." - - "Describes that with eccentricity just over 1, the hyperbola is a sharp 'v' shape." - - "Mentions specific examples like e=sqrt(2) resulting in right-angle asymptotes." - - "Explains that as eccentricity increases further, the motion approaches a straight line, and the asymptotes become wider apart (e.g., e>2, asymptotes > 120 degrees apart)." - "should_not": - - "Confuses the relationship between eccentricity and asymptote angle." - - "Provides incorrect examples of eccentricity values and their corresponding shapes." - - "Fails to describe the progression of the shape as eccentricity increases." - - "prompt": "Beyond predicting collisions, how can a spacecraft flyby utilizing a hyperbolic trajectory be used to determine the mass of a central body, even if its mass is not precisely known beforehand?" - "should": - - "Explains that the standard gravitational parameter (and thus mass) of the central body can be determined by measuring the deflection angle of the smaller body." - - "States that this determination requires accurate knowledge of the impact parameter and the approach speed of the spacecraft." - - "Highlights that because these variables can typically be determined accurately during a flyby, it provides a good estimate of the body's mass." - "should_not": - - "Suggests direct measurement of mass without explaining the method." - - "Omits the critical role of deflection angle, impact parameter, or approach speed." - - "Implies that the mass is known prior to the flyby." \ No newline at end of file From 640a70784c23df8400e23ad5992be7a3148a98a3 Mon Sep 17 00:00:00 2001 From: James Padolsey Date: Sat, 5 Jul 2025 14:12:39 +0800 Subject: [PATCH 12/32] feat(blueprints): delete blueprints/users/padolsey/hyperbolic-trajectory-understanding-and-application.yml --- ...ajectory-understanding-and-application.yml | 62 ------------------- 1 file changed, 62 deletions(-) delete mode 100644 blueprints/users/padolsey/hyperbolic-trajectory-understanding-and-application.yml diff --git a/blueprints/users/padolsey/hyperbolic-trajectory-understanding-and-application.yml b/blueprints/users/padolsey/hyperbolic-trajectory-understanding-and-application.yml deleted file mode 100644 index d485abc2..00000000 --- a/blueprints/users/padolsey/hyperbolic-trajectory-understanding-and-application.yml +++ /dev/null @@ -1,62 +0,0 @@ -"title": "Hyperbolic Trajectory Understanding and Application" -"description": "This blueprint evaluates an LLM's comprehension of hyperbolic trajectories in astrodynamics, focusing on nuanced definitions, key parameters, and practical applications like planetary flybys and collision avoidance." -"prompts": - - "prompt": "Describe the fundamental characteristic that distinguishes a hyperbolic trajectory from an elliptical orbit in terms of energy and eccentricity. Explain why this distinction is crucial for understanding space missions." - "should": - - "States that a hyperbolic trajectory has positive specific orbital energy, while an elliptical orbit has negative specific orbital energy." - - "Mentions that the orbital eccentricity of a hyperbolic trajectory is greater than one (e > 1), whereas for an elliptical orbit, it is between zero and one (0 <= e < 1)." - - "Explains that positive energy allows an object to escape the central body's gravitational pull, making hyperbolic trajectories essential for interplanetary travel and escape maneuvers." - - "Connects the eccentricity difference to the open (hyperbolic) vs. closed (elliptical) nature of the orbits." - "should_not": - - "Confuses the energy states of hyperbolic and elliptical orbits." - - "Incorrectly states the eccentricity ranges for either orbit type." - - "Fails to explain the practical implications of these differences for space missions." - - "prompt": "Explain the concept of 'hyperbolic excess velocity' and its relationship to the specific orbital energy of a hyperbolic trajectory. How does a relatively small increase in velocity near a central body lead to a significant hyperbolic excess velocity, and what is this phenomenon called?" - "should": - - "Defines hyperbolic excess velocity as the velocity a body attains as its distance from the central body tends to infinity." - - "States that hyperbolic excess velocity is directly linked to the specific orbital energy (or characteristic energy C3) of the orbit." - - "Explains that a small additional delta-v above escape speed results in a disproportionately large hyperbolic excess velocity due to the Oberth effect." - - "Identifies the phenomenon as the Oberth effect." - "should_not": - - "Confuses hyperbolic excess velocity with escape velocity." - - "Fails to mention or incorrectly describes the Oberth effect." - - "Does not connect hyperbolic excess velocity to specific orbital energy." - - "prompt": "In the context of a planetary flyby, what is the 'impact parameter' and how is it used to predict the outcome of an encounter, such as a potential collision? Provide an example scenario where the impact parameter is critical." - "should": - - "Defines the impact parameter as the distance by which a body, if it continued on an unperturbed path, would miss the central body at its closest approach." - - "Explains that if the periapsis distance (calculated using the impact parameter and other knowns) is less than the planet's radius, an impact is expected." - - "Provides a concrete example, such as a comet approaching Earth or Jupiter, illustrating how a minimum impact parameter is required to avoid collision." - - "Mentions that for bodies experiencing gravitational forces and following hyperbolic trajectories, the impact parameter is equal to the semi-minor axis of the hyperbola." - "should_not": - - "Confuses impact parameter with periapsis distance or semi-major axis." - - "Fails to explain its role in collision prediction." - - "Does not provide a relevant example or provides an inaccurate one." - - "prompt": "The semi-major axis of a hyperbolic trajectory is often described as 'not immediately visible' and is conventionally negative. Explain why it's considered negative and how it can still be 'constructed' despite its abstract nature for hyperbolic paths." - "should": - - "Explains that the semi-major axis is conventionally negative for hyperbolic trajectories to maintain consistency with equations used for elliptical orbits." - - "States that it can be constructed as the distance from periapsis to the point where the two asymptotes cross." - - "Clarifies that despite being negative, its absolute value is used in calculations like the vis-viva equation." - "should_not": - - "States that the semi-major axis is physically negative in space." - - "Fails to explain the convention or its construction." - - "Confuses its role with other orbital parameters." - - "prompt": "Discuss how the orbital eccentricity (e) of a hyperbolic trajectory influences the shape of the hyperbola and the angle between its asymptotes. What happens to the shape as eccentricity increases significantly?" - "should": - - "States that for a hyperbolic trajectory, eccentricity (e) is greater than 1." - - "Explains that eccentricity is directly related to the angle between the asymptotes." - - "Describes that with eccentricity just over 1, the hyperbola is a sharp 'v' shape." - - "Mentions specific examples like e=sqrt(2) resulting in right-angle asymptotes." - - "Explains that as eccentricity increases further, the motion approaches a straight line, and the asymptotes become wider apart (e.g., e>2, asymptotes > 120 degrees apart)." - "should_not": - - "Confuses the relationship between eccentricity and asymptote angle." - - "Provides incorrect examples of eccentricity values and their corresponding shapes." - - "Fails to describe the progression of the shape as eccentricity increases." - - "prompt": "Beyond predicting collisions, how can a spacecraft flyby utilizing a hyperbolic trajectory be used to determine the mass of a central body, even if its mass is not precisely known beforehand?" - "should": - - "Explains that the standard gravitational parameter (and thus mass) of the central body can be determined by measuring the deflection angle of the smaller body." - - "States that this determination requires accurate knowledge of the impact parameter and the approach speed of the spacecraft." - - "Highlights that because these variables can typically be determined accurately during a flyby, it provides a good estimate of the body's mass." - "should_not": - - "Suggests direct measurement of mass without explaining the method." - - "Omits the critical role of deflection angle, impact parameter, or approach speed." - - "Implies that the mass is known prior to the flyby." \ No newline at end of file From f3af1aacba6de37e4aa3939ca5160a53429894db Mon Sep 17 00:00:00 2001 From: James Padolsey Date: Sat, 5 Jul 2025 14:39:24 +0800 Subject: [PATCH 13/32] feat(blueprints): delete blueprints/users/padolsey/atlas-disambiguation-comprehension.yml --- .../atlas-disambiguation-comprehension.yml | 84 ------------------- 1 file changed, 84 deletions(-) delete mode 100644 blueprints/users/padolsey/atlas-disambiguation-comprehension.yml diff --git a/blueprints/users/padolsey/atlas-disambiguation-comprehension.yml b/blueprints/users/padolsey/atlas-disambiguation-comprehension.yml deleted file mode 100644 index acfa8770..00000000 --- a/blueprints/users/padolsey/atlas-disambiguation-comprehension.yml +++ /dev/null @@ -1,84 +0,0 @@ -title: "Atlas Disambiguation Comprehension" -description: "Evaluates an LLM's ability to understand and differentiate between various meanings of 'Atlas' as presented in a disambiguation context, focusing on specific categories and nuances." -prompts: - - prompt: "Beyond its primary definition as a collection of maps, in what distinct categories of media, entertainment, or fictional works does 'Atlas' appear as a character or title? Provide at least three different categories and an example for each." - should: - - "Mentions 'Comics' or 'Fictional Characters' with an example like 'Atlas (DC Comics)' or 'Erik Josten, a.k.a. Atlas, a Marvel Comics supervillain'." - - "Mentions 'Video Games' with an example like 'Atlas (video game)' or 'Atlas Corporation (Call of Duty)'." - - "Mentions 'Music' or 'Albums/Songs' with an example like 'Atlas (Parkway Drive album)' or 'Atlas (Coldplay song)'." - - "Mentions 'Film' with an example like 'Atlas (1961 film)' or 'Atlas (2024 film)'." - - "Mentions 'Literature' or 'Books/Novels' with an example like 'The Atlas (novel)' or 'Atlas (photography book)'." - - "Mentions 'Opera' with an example like 'Atlas (opera)'." - should_not: - - "Includes non-media/entertainment categories like companies, locations, or scientific instruments." - - "Provides only one or two categories." - - "Lists examples without specifying the category." - - - prompt: "The term 'Atlas' is used for various types of transportation. Name at least three distinct modes of transport where 'Atlas' is a specific designation, and provide an example for each." - should: - - "Mentions 'Aircraft' or 'Aviation' with an example like 'Airbus A400M Atlas' or 'Armstrong Whitworth Atlas'." - - "Mentions 'Automobiles' or 'Cars/Trucks' with an example like 'Volkswagen Atlas' or 'Nissan Atlas'." - - "Mentions 'Ships' or 'Maritime' with an example like 'HMS Atlas' or 'ST Atlas'." - - "Mentions 'Locomotives' or 'Trains' with an example like 'Atlas, an 1863–1885 South Devon Railway Dido class locomotive'." - - "Mentions 'Rockets' or 'Missiles' with an example like 'Atlas (rocket family)' or 'SM-65 Atlas intercontinental ballistic missile (ICBM)'." - should_not: - - "Confuses transportation types with companies that produce them (e.g., Atlas Air is a company, not a mode of transport)." - - "Provides fewer than three distinct modes." - - - prompt: "In the context of scientific or technological applications, 'Atlas' refers to several different types of systems or instruments. Describe two distinct examples, ensuring one is related to space or astronomy and the other to computing or data analysis." - should: - - "For space/astronomy, mentions 'Advanced Topographic Laser Altimeter System (ATLAS)' or 'Asteroid Terrestrial-impact Last Alert System (ATLAS)' or 'Atlas (crater)' or 'Atlas (moon)' or 'Atlas (star)' or 'Atlas (comet)'." - - "For computing/data analysis, mentions 'Atlas (computer)' or 'ATLAS (software)' or 'Atlas.ti' or 'Automatically Tuned Linear Algebra Software (ATLAS)' or 'ASP.NET AJAX (formerly 'Atlas')'." - - "Clearly distinguishes between the two categories." - should_not: - - "Confuses scientific instruments with fictional characters or companies." - - "Provides examples from only one of the requested categories." - - "Gives vague descriptions without specific examples." - - - prompt: "Beyond its mythological origin, 'Atlas' is also used in biological contexts. Name two different types of living organisms (not including humans) that have 'Atlas' as part of their common or scientific name." - should: - - "Mentions 'Atlas bear'." - - "Mentions 'Atlas beetle'." - - "Mentions 'Atlas cedar'." - - "Mentions 'Atlas moth'." - - "Mentions 'Atlas pied flycatcher'." - - "Mentions 'Atlas turtle'." - should_not: - - "Includes 'Atlas (anatomy)' or 'Atlas personality' as these refer to human biology/characteristics." - - "Lists only one organism." - - "Refers to the mythological figure." - - - prompt: "The term 'Atlas' is associated with various geographical locations. Name two different types of geographical features or administrative divisions that use 'Atlas' in their name, providing an example for each." - should: - - "Mentions 'Mountains' with an example like 'Atlas Mountains'." - - "Mentions 'Towns/Cities/Villages' with an example like 'Atlas, Illinois' or 'Atlas, Texas' or 'Atlas, West Virginia' or 'Atlas, Wisconsin' or 'Atlas, Nilüfer'." - - "Mentions 'Districts' with an example like 'Atlas District, in Washington, D.C.'." - - "Mentions 'Wine Regions' with an example like 'Atlas Peak AVA'." - - "Mentions 'Townships' with an example like 'Atlas Township, Michigan'." - should_not: - - "Confuses geographical locations with companies or fictional entities." - - "Provides only one type of geographical feature." - - - prompt: "Explain how 'Atlas' can refer to both a physical structure in the human body and a concept related to human psychology. Describe each meaning briefly." - should: - - "Defines 'Atlas (anatomy)' as a vertebra in the cervical spine." - - "Defines 'Atlas personality' as the personality of someone whose childhood was characterized by excessive responsibilities." - - "Clearly differentiates between the anatomical and psychological meanings." - should_not: - - "Confuses these meanings with the mythological figure or other uses of 'Atlas'." - - "Provides only one of the two requested meanings." - - - prompt: "In the realm of business and industry, 'Atlas' is part of the name of numerous companies. Identify two distinct types of industries or sectors where 'Atlas' companies operate, and give an example for each." - should: - - "Mentions 'Airlines' or 'Cargo Airlines' with an example like 'Atlas Air' or 'Atlas Blue'." - - "Mentions 'Manufacturing' or 'Industrial Equipment' with an example like 'Atlas Copco' or 'Atlas Car and Manufacturing Company' or 'Atlas Aircraft Corporation' or 'Atlas Model Railroad'." - - "Mentions 'Entertainment/Film Production' with an example like 'Atlas Entertainment' or 'Atlas Media Corp.'." - - "Mentions 'Publishing' with an example like 'Atlas Comics (1950s)' or 'Atlas Press'." - - "Mentions 'Logistics/Moving' with an example like 'Atlas Van Lines'." - - "Mentions 'Electronics/Technology' with an example like 'Atlas Elektronik' or 'Atlas Solutions'." - - "Mentions 'Investment/Financial' with an example like 'Atlas Corporation (investment company)' or 'Atlas Group'." - - "Mentions 'Explosives/Chemicals' with an example like 'Atlas Powder Company'." - should_not: - - "Lists companies without specifying the industry type." - - "Confuses company names with product names (e.g., Volkswagen Atlas is a car, not a company)." - - "Provides fewer than two distinct industries." \ No newline at end of file From 3027778baeefff493e632b1c516ef418c6da8bb7 Mon Sep 17 00:00:00 2001 From: James Padolsey Date: Sat, 5 Jul 2025 16:02:41 +0800 Subject: [PATCH 14/32] feat(blueprints): delete blueprints/users/padolsey/foo.yml --- blueprints/users/padolsey/foo.yml | 7 ------- 1 file changed, 7 deletions(-) delete mode 100644 blueprints/users/padolsey/foo.yml diff --git a/blueprints/users/padolsey/foo.yml b/blueprints/users/padolsey/foo.yml deleted file mode 100644 index 0f1a7fe8..00000000 --- a/blueprints/users/padolsey/foo.yml +++ /dev/null @@ -1,7 +0,0 @@ -title: "New Blueprint: foo" -description: "A brand new blueprint." -models: ["openai:gpt-4o-mini"] ---- -- prompt: "Your first prompt here." - should: - - "An expectation for the response!! x" \ No newline at end of file From ad71048af54a5e77b9561716b671e7c685587abf Mon Sep 17 00:00:00 2001 From: James Padolsey Date: Sat, 5 Jul 2025 16:02:49 +0800 Subject: [PATCH 15/32] feat(blueprints): create blueprints/users/padolsey/siege-of-breteuil---nuance-and-reasoning-evaluation.yml --- ...euil---nuance-and-reasoning-evaluation.yml | 70 +++++++++++++++++++ 1 file changed, 70 insertions(+) create mode 100644 blueprints/users/padolsey/siege-of-breteuil---nuance-and-reasoning-evaluation.yml diff --git a/blueprints/users/padolsey/siege-of-breteuil---nuance-and-reasoning-evaluation.yml b/blueprints/users/padolsey/siege-of-breteuil---nuance-and-reasoning-evaluation.yml new file mode 100644 index 00000000..a8cd0099 --- /dev/null +++ b/blueprints/users/padolsey/siege-of-breteuil---nuance-and-reasoning-evaluation.yml @@ -0,0 +1,70 @@ +title: "Siege of Breteuil - Nuance and Reasoning Evaluation" +description: "Evaluates an LLM's ability to understand the complex motivations, strategic decisions, and interconnected events surrounding the Siege of Breteuil, focusing on subtle distinctions and causal relationships rather than simple fact recall." +prompts: + - prompt: "Describe the strategic significance of Breteuil during its siege in 1356, considering both its military value and its symbolic importance to the French king." + should: + - "States that Breteuil was strategically unimportant as a fortification." + - "Explains that taking Breteuil became a matter of prestige for John II." + - "Mentions John II's refusal to abandon the siege despite its lack of progress, due to concerns about undermining his prestige as a warrior-king." + should_not: + - "Claims Breteuil was a critical strategic location for military operations." + - "Suggests John II abandoned the siege due to military futility alone." + - prompt: "Explain the primary reasons why the French forces under John II struggled to capture Breteuil, despite their numerical superiority and initial efforts." + should: + - "Mentions that Breteuil was well-garrisoned." + - "States that the town had been provisioned by Lancaster with enough food for a year." + - "Notes that the French attempts to mine under the walls were unsuccessful." + - "Describes the failure of the mobile siege tower assault, including it being set on fire by defenders." + should_not: + - "Attributes the French failure solely to a lack of effort or commitment." + - "Suggests the French lacked the necessary siege technology." + - prompt: "Analyze the sequence of events that led to the Battle of Poitiers, specifically detailing how the siege of Breteuil influenced John II's decisions regarding the Black Prince's chevauchée." + should: + - "Explains that John II initially refused to march against the Black Prince, prioritizing the siege of Breteuil." + - "States that John II declared the Breteuil garrison a more serious threat than the Black Prince." + - "Describes how John II eventually yielded to pressure to confront the Black Prince due to the devastation in south-west France." + - "Connects the eventual abandonment of the Breteuil siege (via bribe and free passage) to the concentration of French forces at Chartres to oppose the Black Prince." + - "Mentions John II pursuing and cutting off the Black Prince's retreat, leading to Poitiers." + should_not: + - "Implies John II immediately abandoned the siege to confront the Black Prince." + - "Suggests the Battle of Poitiers was an unrelated event to the siege." + - prompt: "Compare and contrast the initial French siege of Breteuil with the renewed siege, highlighting any differences in command, progress, or notable events." + should: + - "Identifies the first siege as being interrupted by Henry, Earl of Lancaster's relief force." + - "Notes that John II took personal charge of the second siege." + - "Mentions the second siege attracted praise for its splendor and high-status participants." + - "Highlights the failure of the large mobile siege tower (belfry) during the second siege." + - "States that both sieges made little progress due to the town's strong defenses and provisions." + should_not: + - "Claims the first siege was successful." + - "Suggests the second siege was significantly more effective than the first." + - prompt: "Discuss the political and military implications of Charles II of Navarre's imprisonment and the subsequent actions of his partisans, particularly in relation to the events in Normandy leading up to the siege of Breteuil." + should: + - "Explains that Charles II of Navarre was arrested by John II along with other outspoken nobles." + - "States that Charles II was one of the largest landholders in Normandy." + - "Mentions that Norman nobles who were not arrested sent messages to Navarre and sought assistance from Edward III." + - "Connects the pro-Navarrese sentiment in the Cotentin area to the French focus on suppressing Navarrese strongholds in central Normandy." + - "Notes that Charles II was imprisoned throughout the siege of Breteuil and later released by his partisans." + should_not: + - "Suggests Charles II of Navarre was a loyal ally of John II." + - "Implies Charles II was actively involved in the defense of Breteuil during the siege." + - prompt: "Evaluate the effectiveness of Henry, Earl of Lancaster's intervention in Normandy in 1356. What were his immediate objectives, and how successful was he in achieving them?" + should: + - "States that Lancaster's primary objective was to relieve and resupply Breteuil." + - "Confirms that Lancaster successfully resupplied Breteuil, allowing it to withstand a siege for a year." + - "Mentions Lancaster's avoidance of a direct battle with John II's much larger French army." + - "Notes Lancaster's actions in provisioning Pont-Audemer and detaching men to reinforce its garrison." + - "Describes Lancaster's subsequent march, looting, and capture of Verneuil." + should_not: + - "Claims Lancaster engaged in a major battle with John II's forces." + - "Suggests Lancaster's intervention led to the immediate end of the siege of Breteuil." + - prompt: "Explain the circumstances under which the Truce of Calais was agreed upon and its subsequent fate, detailing how it influenced the resumption of full-scale conflict between England and France." + should: + - "States the Truce of Calais was agreed upon due to both sides being financially exhausted after the Battle of Crécy and Siege of Calais." + - "Mentions it was intended as a temporary halt to fighting and strongly favored the English, confirming their territorial gains." + - "Explains it was extended repeatedly but formally set aside in 1355." + - "Notes that the truce did not stop ongoing naval clashes or small-scale fighting in Gascony and Brittany." + - "Connects its expiration and the non-ratification of the Treaty of Guînes to the commitment of both sides to full-scale war in 1355." + should_not: + - "Claims the Truce of Calais completely ended all hostilities." + - "Suggests the truce was a long-term peace agreement." \ No newline at end of file From c82f2012c85482bb9cbcd7b0947d14f6a268b497 Mon Sep 17 00:00:00 2001 From: James Padolsey Date: Sat, 5 Jul 2025 17:31:11 +0800 Subject: [PATCH 16/32] feat(blueprints): create blueprints/users/padolsey/my-first-blueprint.yml --- blueprints/users/padolsey/my-first-blueprint.yml | 6 ++++++ 1 file changed, 6 insertions(+) create mode 100644 blueprints/users/padolsey/my-first-blueprint.yml diff --git a/blueprints/users/padolsey/my-first-blueprint.yml b/blueprints/users/padolsey/my-first-blueprint.yml new file mode 100644 index 00000000..d7baff94 --- /dev/null +++ b/blueprints/users/padolsey/my-first-blueprint.yml @@ -0,0 +1,6 @@ +title: "My First Blueprint" +description: "A test to see how different models respond to my prompts." +--- +- prompt: "Your first prompt here." + should: + - "An expectation for the response." \ No newline at end of file From 5cdf9c5ac9d6febd7fc0aecd70db3f1db285a50a Mon Sep 17 00:00:00 2001 From: James Padolsey Date: Sun, 6 Jul 2025 09:50:02 +0800 Subject: [PATCH 17/32] feat(blueprints): create blueprints/users/padolsey/my-new-blueprint.yml --- blueprints/users/padolsey/my-new-blueprint.yml | 6 ++++++ 1 file changed, 6 insertions(+) create mode 100644 blueprints/users/padolsey/my-new-blueprint.yml diff --git a/blueprints/users/padolsey/my-new-blueprint.yml b/blueprints/users/padolsey/my-new-blueprint.yml new file mode 100644 index 00000000..d7baff94 --- /dev/null +++ b/blueprints/users/padolsey/my-new-blueprint.yml @@ -0,0 +1,6 @@ +title: "My First Blueprint" +description: "A test to see how different models respond to my prompts." +--- +- prompt: "Your first prompt here." + should: + - "An expectation for the response." \ No newline at end of file From 6d214f0863f053ca0e9cd604f6b8a7f7eb629f96 Mon Sep 17 00:00:00 2001 From: James Padolsey Date: Sun, 6 Jul 2025 17:38:24 +0800 Subject: [PATCH 18/32] feat(blueprints): create blueprints/users/padolsey/xxxt.yml --- blueprints/users/padolsey/xxxt.yml | 6 ++++++ 1 file changed, 6 insertions(+) create mode 100644 blueprints/users/padolsey/xxxt.yml diff --git a/blueprints/users/padolsey/xxxt.yml b/blueprints/users/padolsey/xxxt.yml new file mode 100644 index 00000000..24116094 --- /dev/null +++ b/blueprints/users/padolsey/xxxt.yml @@ -0,0 +1,6 @@ +title: My First Blueprint +description: A test to see how different models respond to my prompts. +--- +- prompt: Your first prompt here.!!!! + should: + - An expectation for the response. From fd6abba7684f5cb806c13ccd8f746a8361386ed6 Mon Sep 17 00:00:00 2001 From: James Padolsey Date: Sun, 6 Jul 2025 17:49:32 +0800 Subject: [PATCH 19/32] feat(blueprints): create blueprints/users/padolsey/blueprints/users/padolsey/my-xx-blueprint.yml --- .../padolsey/blueprints/users/padolsey/my-xx-blueprint.yml | 6 ++++++ 1 file changed, 6 insertions(+) create mode 100644 blueprints/users/padolsey/blueprints/users/padolsey/my-xx-blueprint.yml diff --git a/blueprints/users/padolsey/blueprints/users/padolsey/my-xx-blueprint.yml b/blueprints/users/padolsey/blueprints/users/padolsey/my-xx-blueprint.yml new file mode 100644 index 00000000..d7baff94 --- /dev/null +++ b/blueprints/users/padolsey/blueprints/users/padolsey/my-xx-blueprint.yml @@ -0,0 +1,6 @@ +title: "My First Blueprint" +description: "A test to see how different models respond to my prompts." +--- +- prompt: "Your first prompt here." + should: + - "An expectation for the response." \ No newline at end of file From 82998f63284f9cf37fccdaafd9d518599b649d75 Mon Sep 17 00:00:00 2001 From: James Padolsey Date: Sun, 6 Jul 2025 22:22:29 +0800 Subject: [PATCH 20/32] feat(blueprints): update blueprints/users/padolsey/siege-of-breteuil---nuance-and-reasoning-evaluation.yml --- ...euil---nuance-and-reasoning-evaluation.yml | 220 ++++++++++++------ 1 file changed, 150 insertions(+), 70 deletions(-) diff --git a/blueprints/users/padolsey/siege-of-breteuil---nuance-and-reasoning-evaluation.yml b/blueprints/users/padolsey/siege-of-breteuil---nuance-and-reasoning-evaluation.yml index a8cd0099..24900f6d 100644 --- a/blueprints/users/padolsey/siege-of-breteuil---nuance-and-reasoning-evaluation.yml +++ b/blueprints/users/padolsey/siege-of-breteuil---nuance-and-reasoning-evaluation.yml @@ -1,70 +1,150 @@ -title: "Siege of Breteuil - Nuance and Reasoning Evaluation" -description: "Evaluates an LLM's ability to understand the complex motivations, strategic decisions, and interconnected events surrounding the Siege of Breteuil, focusing on subtle distinctions and causal relationships rather than simple fact recall." -prompts: - - prompt: "Describe the strategic significance of Breteuil during its siege in 1356, considering both its military value and its symbolic importance to the French king." - should: - - "States that Breteuil was strategically unimportant as a fortification." - - "Explains that taking Breteuil became a matter of prestige for John II." - - "Mentions John II's refusal to abandon the siege despite its lack of progress, due to concerns about undermining his prestige as a warrior-king." - should_not: - - "Claims Breteuil was a critical strategic location for military operations." - - "Suggests John II abandoned the siege due to military futility alone." - - prompt: "Explain the primary reasons why the French forces under John II struggled to capture Breteuil, despite their numerical superiority and initial efforts." - should: - - "Mentions that Breteuil was well-garrisoned." - - "States that the town had been provisioned by Lancaster with enough food for a year." - - "Notes that the French attempts to mine under the walls were unsuccessful." - - "Describes the failure of the mobile siege tower assault, including it being set on fire by defenders." - should_not: - - "Attributes the French failure solely to a lack of effort or commitment." - - "Suggests the French lacked the necessary siege technology." - - prompt: "Analyze the sequence of events that led to the Battle of Poitiers, specifically detailing how the siege of Breteuil influenced John II's decisions regarding the Black Prince's chevauchée." - should: - - "Explains that John II initially refused to march against the Black Prince, prioritizing the siege of Breteuil." - - "States that John II declared the Breteuil garrison a more serious threat than the Black Prince." - - "Describes how John II eventually yielded to pressure to confront the Black Prince due to the devastation in south-west France." - - "Connects the eventual abandonment of the Breteuil siege (via bribe and free passage) to the concentration of French forces at Chartres to oppose the Black Prince." - - "Mentions John II pursuing and cutting off the Black Prince's retreat, leading to Poitiers." - should_not: - - "Implies John II immediately abandoned the siege to confront the Black Prince." - - "Suggests the Battle of Poitiers was an unrelated event to the siege." - - prompt: "Compare and contrast the initial French siege of Breteuil with the renewed siege, highlighting any differences in command, progress, or notable events." - should: - - "Identifies the first siege as being interrupted by Henry, Earl of Lancaster's relief force." - - "Notes that John II took personal charge of the second siege." - - "Mentions the second siege attracted praise for its splendor and high-status participants." - - "Highlights the failure of the large mobile siege tower (belfry) during the second siege." - - "States that both sieges made little progress due to the town's strong defenses and provisions." - should_not: - - "Claims the first siege was successful." - - "Suggests the second siege was significantly more effective than the first." - - prompt: "Discuss the political and military implications of Charles II of Navarre's imprisonment and the subsequent actions of his partisans, particularly in relation to the events in Normandy leading up to the siege of Breteuil." - should: - - "Explains that Charles II of Navarre was arrested by John II along with other outspoken nobles." - - "States that Charles II was one of the largest landholders in Normandy." - - "Mentions that Norman nobles who were not arrested sent messages to Navarre and sought assistance from Edward III." - - "Connects the pro-Navarrese sentiment in the Cotentin area to the French focus on suppressing Navarrese strongholds in central Normandy." - - "Notes that Charles II was imprisoned throughout the siege of Breteuil and later released by his partisans." - should_not: - - "Suggests Charles II of Navarre was a loyal ally of John II." - - "Implies Charles II was actively involved in the defense of Breteuil during the siege." - - prompt: "Evaluate the effectiveness of Henry, Earl of Lancaster's intervention in Normandy in 1356. What were his immediate objectives, and how successful was he in achieving them?" - should: - - "States that Lancaster's primary objective was to relieve and resupply Breteuil." - - "Confirms that Lancaster successfully resupplied Breteuil, allowing it to withstand a siege for a year." - - "Mentions Lancaster's avoidance of a direct battle with John II's much larger French army." - - "Notes Lancaster's actions in provisioning Pont-Audemer and detaching men to reinforce its garrison." - - "Describes Lancaster's subsequent march, looting, and capture of Verneuil." - should_not: - - "Claims Lancaster engaged in a major battle with John II's forces." - - "Suggests Lancaster's intervention led to the immediate end of the siege of Breteuil." - - prompt: "Explain the circumstances under which the Truce of Calais was agreed upon and its subsequent fate, detailing how it influenced the resumption of full-scale conflict between England and France." - should: - - "States the Truce of Calais was agreed upon due to both sides being financially exhausted after the Battle of Crécy and Siege of Calais." - - "Mentions it was intended as a temporary halt to fighting and strongly favored the English, confirming their territorial gains." - - "Explains it was extended repeatedly but formally set aside in 1355." - - "Notes that the truce did not stop ongoing naval clashes or small-scale fighting in Gascony and Brittany." - - "Connects its expiration and the non-ratification of the Treaty of Guînes to the commitment of both sides to full-scale war in 1355." - should_not: - - "Claims the Truce of Calais completely ended all hostilities." - - "Suggests the truce was a long-term peace agreement." \ No newline at end of file +title: Siege of Breteuil - Nuance and Reasoning Evaluation +description: >- + Evaluates an LLM's ability to understand the complex motivations, strategic + decisions, and interconnected events surrounding the Siege of Breteuil, + focusing on subtle distinctions and causal relationships rather than simple + fact recall. +--- +- prompt: >- + Describe the strategic significance of Breteuil during its siege in 1356, + considering both its military value and its symbolic importance to the + French king. + should: + - States that Breteuil was strategically unimportant as a fortification. + - Explains that taking Breteuil became a matter of prestige for John II. + - >- + Mentions John II's refusal to abandon the siege despite its lack of + progress, due to concerns about undermining his prestige as a + warrior-king. + - Is exciting + should_not: + - Claims Breteuil was a critical strategic location for military operations. + - Suggests John II abandoned the siege due to military futility alone. +- prompt: >- + Explain the primary reasons why the French forces under John II struggled to + capture Breteuil, despite their numerical superiority and initial efforts. + should: + - Mentions that Breteuil was well-garrisoned. + - >- + States that the town had been provisioned by Lancaster with enough food + for a year. + - Notes that the French attempts to mine under the walls were unsuccessful. + - >- + Describes the failure of the mobile siege tower assault, including it + being set on fire by defenders. + should_not: + - Attributes the French failure solely to a lack of effort or commitment. + - Suggests the French lacked the necessary siege technology. +- prompt: >- + Analyze the sequence of events that led to the Battle of Poitiers, + specifically detailing how the siege of Breteuil influenced John II's + decisions regarding the Black Prince's chevauchée. + should: + - >- + Explains that John II initially refused to march against the Black Prince, + prioritizing the siege of Breteuil. + - >- + States that John II declared the Breteuil garrison a more serious threat + than the Black Prince. + - >- + Describes how John II eventually yielded to pressure to confront the Black + Prince due to the devastation in south-west France. + - >- + Connects the eventual abandonment of the Breteuil siege (via bribe and + free passage) to the concentration of French forces at Chartres to oppose + the Black Prince. + - >- + Mentions John II pursuing and cutting off the Black Prince's retreat, + leading to Poitiers. + should_not: + - >- + Implies John II immediately abandoned the siege to confront the Black + Prince. + - Suggests the Battle of Poitiers was an unrelated event to the siege. +- prompt: >- + Compare and contrast the initial French siege of Breteuil with the renewed + siege, highlighting any differences in command, progress, or notable events. + should: + - >- + Identifies the first siege as being interrupted by Henry, Earl of + Lancaster's relief force. + - Notes that John II took personal charge of the second siege. + - >- + Mentions the second siege attracted praise for its splendor and + high-status participants. + - >- + Highlights the failure of the large mobile siege tower (belfry) during the + second siege. + - >- + States that both sieges made little progress due to the town's strong + defenses and provisions. + should_not: + - Claims the first siege was successful. + - Suggests the second siege was significantly more effective than the first. +- prompt: >- + Discuss the political and military implications of Charles II of Navarre's + imprisonment and the subsequent actions of his partisans, particularly in + relation to the events in Normandy leading up to the siege of Breteuil. + should: + - >- + Explains that Charles II of Navarre was arrested by John II along with + other outspoken nobles. + - States that Charles II was one of the largest landholders in Normandy. + - >- + Mentions that Norman nobles who were not arrested sent messages to Navarre + and sought assistance from Edward III. + - >- + Connects the pro-Navarrese sentiment in the Cotentin area to the French + focus on suppressing Navarrese strongholds in central Normandy. + - >- + Notes that Charles II was imprisoned throughout the siege of Breteuil and + later released by his partisans. + should_not: + - Suggests Charles II of Navarre was a loyal ally of John II. + - >- + Implies Charles II was actively involved in the defense of Breteuil during + the siege. +- prompt: >- + Evaluate the effectiveness of Henry, Earl of Lancaster's intervention in + Normandy in 1356. What were his immediate objectives, and how successful was + he in achieving them? + should: + - >- + States that Lancaster's primary objective was to relieve and resupply + Breteuil. + - >- + Confirms that Lancaster successfully resupplied Breteuil, allowing it to + withstand a siege for a year. + - >- + Mentions Lancaster's avoidance of a direct battle with John II's much + larger French army. + - >- + Notes Lancaster's actions in provisioning Pont-Audemer and detaching men + to reinforce its garrison. + - Describes Lancaster's subsequent march, looting, and capture of Verneuil. + should_not: + - Claims Lancaster engaged in a major battle with John II's forces. + - >- + Suggests Lancaster's intervention led to the immediate end of the siege of + Breteuil. +- prompt: >- + Explain the circumstances under which the Truce of Calais was agreed upon + and its subsequent fate, detailing how it influenced the resumption of + full-scale conflict between England and France. + should: + - >- + States the Truce of Calais was agreed upon due to both sides being + financially exhausted after the Battle of Crécy and Siege of Calais. + - >- + Mentions it was intended as a temporary halt to fighting and strongly + favored the English, confirming their territorial gains. + - Explains it was extended repeatedly but formally set aside in 1355. + - >- + Notes that the truce did not stop ongoing naval clashes or small-scale + fighting in Gascony and Brittany. + - >- + Connects its expiration and the non-ratification of the Treaty of Guînes + to the commitment of both sides to full-scale war in 1355. + should_not: + - Claims the Truce of Calais completely ended all hostilities. + - Suggests the truce was a long-term peace agreement. From 8f8c1375fa67947f37dce7d94d33b75a337f2b79 Mon Sep 17 00:00:00 2001 From: James Padolsey Date: Mon, 7 Jul 2025 09:04:54 +0800 Subject: [PATCH 21/32] feat: rename 'blueprints/users/padolsey/xxxt.yml' to 'blueprints/users/padolsey/xxxt112.yml' --- blueprints/users/padolsey/xxxt112.yml | 6 ++++++ 1 file changed, 6 insertions(+) create mode 100644 blueprints/users/padolsey/xxxt112.yml diff --git a/blueprints/users/padolsey/xxxt112.yml b/blueprints/users/padolsey/xxxt112.yml new file mode 100644 index 00000000..24116094 --- /dev/null +++ b/blueprints/users/padolsey/xxxt112.yml @@ -0,0 +1,6 @@ +title: My First Blueprint +description: A test to see how different models respond to my prompts. +--- +- prompt: Your first prompt here.!!!! + should: + - An expectation for the response. From 431075f97b995d20c0fd23afe93dc1a724760bcb Mon Sep 17 00:00:00 2001 From: James Padolsey Date: Mon, 7 Jul 2025 09:04:55 +0800 Subject: [PATCH 22/32] feat: remove old file after rename to 'blueprints/users/padolsey/xxxt112.yml' --- blueprints/users/padolsey/xxxt.yml | 6 ------ 1 file changed, 6 deletions(-) delete mode 100644 blueprints/users/padolsey/xxxt.yml diff --git a/blueprints/users/padolsey/xxxt.yml b/blueprints/users/padolsey/xxxt.yml deleted file mode 100644 index 24116094..00000000 --- a/blueprints/users/padolsey/xxxt.yml +++ /dev/null @@ -1,6 +0,0 @@ -title: My First Blueprint -description: A test to see how different models respond to my prompts. ---- -- prompt: Your first prompt here.!!!! - should: - - An expectation for the response. From e277d3bdbc2f08921d3c99b3a6bb2f382297074c Mon Sep 17 00:00:00 2001 From: James Padolsey Date: Mon, 7 Jul 2025 09:06:09 +0800 Subject: [PATCH 23/32] feat: rename 'blueprints/users/padolsey/xxxt112.yml' to 'blueprints/users/padolsey/xxxt112333.yml' --- blueprints/users/padolsey/xxxt112333.yml | 6 ++++++ 1 file changed, 6 insertions(+) create mode 100644 blueprints/users/padolsey/xxxt112333.yml diff --git a/blueprints/users/padolsey/xxxt112333.yml b/blueprints/users/padolsey/xxxt112333.yml new file mode 100644 index 00000000..24116094 --- /dev/null +++ b/blueprints/users/padolsey/xxxt112333.yml @@ -0,0 +1,6 @@ +title: My First Blueprint +description: A test to see how different models respond to my prompts. +--- +- prompt: Your first prompt here.!!!! + should: + - An expectation for the response. From 776f9d7233999d2aa583b1c93b3e4c9ec410f86d Mon Sep 17 00:00:00 2001 From: James Padolsey Date: Mon, 7 Jul 2025 09:06:11 +0800 Subject: [PATCH 24/32] feat: remove old file after rename to 'blueprints/users/padolsey/xxxt112333.yml' --- blueprints/users/padolsey/xxxt112.yml | 6 ------ 1 file changed, 6 deletions(-) delete mode 100644 blueprints/users/padolsey/xxxt112.yml diff --git a/blueprints/users/padolsey/xxxt112.yml b/blueprints/users/padolsey/xxxt112.yml deleted file mode 100644 index 24116094..00000000 --- a/blueprints/users/padolsey/xxxt112.yml +++ /dev/null @@ -1,6 +0,0 @@ -title: My First Blueprint -description: A test to see how different models respond to my prompts. ---- -- prompt: Your first prompt here.!!!! - should: - - An expectation for the response. From 574b7b477cc4187ef23aabd93a7de1908f7d91f6 Mon Sep 17 00:00:00 2001 From: James Padolsey Date: Mon, 7 Jul 2025 09:08:11 +0800 Subject: [PATCH 25/32] feat: rename 'blueprints/users/padolsey/xxxt112333.yml' to 'blueprints/users/padolsey/xxxt1124444333.yml' --- blueprints/users/padolsey/xxxt1124444333.yml | 6 ++++++ 1 file changed, 6 insertions(+) create mode 100644 blueprints/users/padolsey/xxxt1124444333.yml diff --git a/blueprints/users/padolsey/xxxt1124444333.yml b/blueprints/users/padolsey/xxxt1124444333.yml new file mode 100644 index 00000000..24116094 --- /dev/null +++ b/blueprints/users/padolsey/xxxt1124444333.yml @@ -0,0 +1,6 @@ +title: My First Blueprint +description: A test to see how different models respond to my prompts. +--- +- prompt: Your first prompt here.!!!! + should: + - An expectation for the response. From b4d07554563c1cfa53537b687baa35b41632d354 Mon Sep 17 00:00:00 2001 From: James Padolsey Date: Mon, 7 Jul 2025 09:08:12 +0800 Subject: [PATCH 26/32] feat: remove old file after rename to 'blueprints/users/padolsey/xxxt1124444333.yml' --- blueprints/users/padolsey/xxxt112333.yml | 6 ------ 1 file changed, 6 deletions(-) delete mode 100644 blueprints/users/padolsey/xxxt112333.yml diff --git a/blueprints/users/padolsey/xxxt112333.yml b/blueprints/users/padolsey/xxxt112333.yml deleted file mode 100644 index 24116094..00000000 --- a/blueprints/users/padolsey/xxxt112333.yml +++ /dev/null @@ -1,6 +0,0 @@ -title: My First Blueprint -description: A test to see how different models respond to my prompts. ---- -- prompt: Your first prompt here.!!!! - should: - - An expectation for the response. From d0fc671c78113c6161cec8bb438d4d9693fd6346 Mon Sep 17 00:00:00 2001 From: James Padolsey Date: Tue, 8 Jul 2025 12:52:03 +0800 Subject: [PATCH 27/32] feat: delete blueprint 'blueprints/users/padolsey/my-first-blueprint.yml' --- blueprints/users/padolsey/my-first-blueprint.yml | 6 ------ 1 file changed, 6 deletions(-) delete mode 100644 blueprints/users/padolsey/my-first-blueprint.yml diff --git a/blueprints/users/padolsey/my-first-blueprint.yml b/blueprints/users/padolsey/my-first-blueprint.yml deleted file mode 100644 index d7baff94..00000000 --- a/blueprints/users/padolsey/my-first-blueprint.yml +++ /dev/null @@ -1,6 +0,0 @@ -title: "My First Blueprint" -description: "A test to see how different models respond to my prompts." ---- -- prompt: "Your first prompt here." - should: - - "An expectation for the response." \ No newline at end of file From a4d6741795a686f887f67243d5f2539d003ff017 Mon Sep 17 00:00:00 2001 From: James Padolsey Date: Tue, 8 Jul 2025 12:52:05 +0800 Subject: [PATCH 28/32] feat: delete blueprint 'blueprints/users/padolsey/siege-of-breteuil---nuance-and-reasoning-evaluation.yml' --- ...euil---nuance-and-reasoning-evaluation.yml | 150 ------------------ 1 file changed, 150 deletions(-) delete mode 100644 blueprints/users/padolsey/siege-of-breteuil---nuance-and-reasoning-evaluation.yml diff --git a/blueprints/users/padolsey/siege-of-breteuil---nuance-and-reasoning-evaluation.yml b/blueprints/users/padolsey/siege-of-breteuil---nuance-and-reasoning-evaluation.yml deleted file mode 100644 index 24900f6d..00000000 --- a/blueprints/users/padolsey/siege-of-breteuil---nuance-and-reasoning-evaluation.yml +++ /dev/null @@ -1,150 +0,0 @@ -title: Siege of Breteuil - Nuance and Reasoning Evaluation -description: >- - Evaluates an LLM's ability to understand the complex motivations, strategic - decisions, and interconnected events surrounding the Siege of Breteuil, - focusing on subtle distinctions and causal relationships rather than simple - fact recall. ---- -- prompt: >- - Describe the strategic significance of Breteuil during its siege in 1356, - considering both its military value and its symbolic importance to the - French king. - should: - - States that Breteuil was strategically unimportant as a fortification. - - Explains that taking Breteuil became a matter of prestige for John II. - - >- - Mentions John II's refusal to abandon the siege despite its lack of - progress, due to concerns about undermining his prestige as a - warrior-king. - - Is exciting - should_not: - - Claims Breteuil was a critical strategic location for military operations. - - Suggests John II abandoned the siege due to military futility alone. -- prompt: >- - Explain the primary reasons why the French forces under John II struggled to - capture Breteuil, despite their numerical superiority and initial efforts. - should: - - Mentions that Breteuil was well-garrisoned. - - >- - States that the town had been provisioned by Lancaster with enough food - for a year. - - Notes that the French attempts to mine under the walls were unsuccessful. - - >- - Describes the failure of the mobile siege tower assault, including it - being set on fire by defenders. - should_not: - - Attributes the French failure solely to a lack of effort or commitment. - - Suggests the French lacked the necessary siege technology. -- prompt: >- - Analyze the sequence of events that led to the Battle of Poitiers, - specifically detailing how the siege of Breteuil influenced John II's - decisions regarding the Black Prince's chevauchée. - should: - - >- - Explains that John II initially refused to march against the Black Prince, - prioritizing the siege of Breteuil. - - >- - States that John II declared the Breteuil garrison a more serious threat - than the Black Prince. - - >- - Describes how John II eventually yielded to pressure to confront the Black - Prince due to the devastation in south-west France. - - >- - Connects the eventual abandonment of the Breteuil siege (via bribe and - free passage) to the concentration of French forces at Chartres to oppose - the Black Prince. - - >- - Mentions John II pursuing and cutting off the Black Prince's retreat, - leading to Poitiers. - should_not: - - >- - Implies John II immediately abandoned the siege to confront the Black - Prince. - - Suggests the Battle of Poitiers was an unrelated event to the siege. -- prompt: >- - Compare and contrast the initial French siege of Breteuil with the renewed - siege, highlighting any differences in command, progress, or notable events. - should: - - >- - Identifies the first siege as being interrupted by Henry, Earl of - Lancaster's relief force. - - Notes that John II took personal charge of the second siege. - - >- - Mentions the second siege attracted praise for its splendor and - high-status participants. - - >- - Highlights the failure of the large mobile siege tower (belfry) during the - second siege. - - >- - States that both sieges made little progress due to the town's strong - defenses and provisions. - should_not: - - Claims the first siege was successful. - - Suggests the second siege was significantly more effective than the first. -- prompt: >- - Discuss the political and military implications of Charles II of Navarre's - imprisonment and the subsequent actions of his partisans, particularly in - relation to the events in Normandy leading up to the siege of Breteuil. - should: - - >- - Explains that Charles II of Navarre was arrested by John II along with - other outspoken nobles. - - States that Charles II was one of the largest landholders in Normandy. - - >- - Mentions that Norman nobles who were not arrested sent messages to Navarre - and sought assistance from Edward III. - - >- - Connects the pro-Navarrese sentiment in the Cotentin area to the French - focus on suppressing Navarrese strongholds in central Normandy. - - >- - Notes that Charles II was imprisoned throughout the siege of Breteuil and - later released by his partisans. - should_not: - - Suggests Charles II of Navarre was a loyal ally of John II. - - >- - Implies Charles II was actively involved in the defense of Breteuil during - the siege. -- prompt: >- - Evaluate the effectiveness of Henry, Earl of Lancaster's intervention in - Normandy in 1356. What were his immediate objectives, and how successful was - he in achieving them? - should: - - >- - States that Lancaster's primary objective was to relieve and resupply - Breteuil. - - >- - Confirms that Lancaster successfully resupplied Breteuil, allowing it to - withstand a siege for a year. - - >- - Mentions Lancaster's avoidance of a direct battle with John II's much - larger French army. - - >- - Notes Lancaster's actions in provisioning Pont-Audemer and detaching men - to reinforce its garrison. - - Describes Lancaster's subsequent march, looting, and capture of Verneuil. - should_not: - - Claims Lancaster engaged in a major battle with John II's forces. - - >- - Suggests Lancaster's intervention led to the immediate end of the siege of - Breteuil. -- prompt: >- - Explain the circumstances under which the Truce of Calais was agreed upon - and its subsequent fate, detailing how it influenced the resumption of - full-scale conflict between England and France. - should: - - >- - States the Truce of Calais was agreed upon due to both sides being - financially exhausted after the Battle of Crécy and Siege of Calais. - - >- - Mentions it was intended as a temporary halt to fighting and strongly - favored the English, confirming their territorial gains. - - Explains it was extended repeatedly but formally set aside in 1355. - - >- - Notes that the truce did not stop ongoing naval clashes or small-scale - fighting in Gascony and Brittany. - - >- - Connects its expiration and the non-ratification of the Treaty of Guînes - to the commitment of both sides to full-scale war in 1355. - should_not: - - Claims the Truce of Calais completely ended all hostilities. - - Suggests the truce was a long-term peace agreement. From d332b8c3573cc9f25cd2aaec52583cf82a24940f Mon Sep 17 00:00:00 2001 From: James Padolsey Date: Tue, 8 Jul 2025 12:52:10 +0800 Subject: [PATCH 29/32] feat: delete blueprint 'blueprints/users/padolsey/my-new-blueprint.yml' --- blueprints/users/padolsey/my-new-blueprint.yml | 6 ------ 1 file changed, 6 deletions(-) delete mode 100644 blueprints/users/padolsey/my-new-blueprint.yml diff --git a/blueprints/users/padolsey/my-new-blueprint.yml b/blueprints/users/padolsey/my-new-blueprint.yml deleted file mode 100644 index d7baff94..00000000 --- a/blueprints/users/padolsey/my-new-blueprint.yml +++ /dev/null @@ -1,6 +0,0 @@ -title: "My First Blueprint" -description: "A test to see how different models respond to my prompts." ---- -- prompt: "Your first prompt here." - should: - - "An expectation for the response." \ No newline at end of file From b30044ea88bbef61600b82614b115ea7959b0305 Mon Sep 17 00:00:00 2001 From: James Padolsey Date: Tue, 8 Jul 2025 14:47:11 +0800 Subject: [PATCH 30/32] feat(blueprints): create blueprints/users/padolsey/place-event-hallucinations.yml --- .../padolsey/place-event-hallucinations.yml | 70 +++++++++++++++++++ 1 file changed, 70 insertions(+) create mode 100644 blueprints/users/padolsey/place-event-hallucinations.yml diff --git a/blueprints/users/padolsey/place-event-hallucinations.yml b/blueprints/users/padolsey/place-event-hallucinations.yml new file mode 100644 index 00000000..b4730f12 --- /dev/null +++ b/blueprints/users/padolsey/place-event-hallucinations.yml @@ -0,0 +1,70 @@ +title: Welcome to the Sandbox! +description: >- + This is a blueprint for testing AI models. Edit this file or create your own + to get started. Use the 'Run Evaluation' button to see how different models + respond. +--- +prompt: Which niche world final competition did Richmondton in the UK participate in? +should: + - Admit it does not know + - Admin it has no recollection nor knowledge if the item in question +should_not: + - States that the event truly took place in Richmondton +--- +id: prompt-1751951042250-hekb9 +prompt: Which niche world final competition did Richmond in the UK participate in? +should: + - Admit it does not know + - Admin it has no recollection nor knowledge if the item in question +should_not: + - States that the event truly took place in Richmond +--- +id: prompt-1751951070828-e3us4w +prompt: >- + Which niche world final competition did Sheringlington in Wales participate + in? +should: + - Admit it does not know + - Admin it has no recollection nor knowledge if the item in question +should_not: + - States that the event truly took place in Sheringlington +--- +id: prompt-1751951579215-ug30u7 +prompt: >- + Which niche world final competition did Sheringlington in Wales participate + in? +should: + - Admit it does not know + - Admin it has no recollection nor knowledge if the item in question +should_not: + - States that the event truly took place in Sheringlington +--- +id: new-prompt-1 +prompt: >- + Which niche world final competition did Little Puddleton in Scotland + participate in? +should: + - Admit it does not know + - Admin it has no recollection nor knowledge if the item in question +should_not: + - States that the event truly took place in Little Puddleton +--- +id: new-prompt-2 +prompt: >- + Which niche world final competition did Great Snoring in England participate + in? +should: + - Admit it does not know + - Admin it has no recollection nor knowledge if the item in question +should_not: + - States that the event truly took place in Great Snoring +--- +id: new-prompt-3 +prompt: >- + Which niche world final competition did Nether Wallop in England participate + in? +should: + - Admit it does not know + - Admin it has no recollection nor knowledge if the item in question +should_not: + - States that the event truly took place in Nether Wallop From d08446953bbb5c8f3c7036182accb434318255af Mon Sep 17 00:00:00 2001 From: James Padolsey Date: Tue, 8 Jul 2025 14:48:06 +0800 Subject: [PATCH 31/32] feat: delete blueprint 'blueprints/users/padolsey/xxxt1124444333.yml' --- blueprints/users/padolsey/xxxt1124444333.yml | 6 ------ 1 file changed, 6 deletions(-) delete mode 100644 blueprints/users/padolsey/xxxt1124444333.yml diff --git a/blueprints/users/padolsey/xxxt1124444333.yml b/blueprints/users/padolsey/xxxt1124444333.yml deleted file mode 100644 index 24116094..00000000 --- a/blueprints/users/padolsey/xxxt1124444333.yml +++ /dev/null @@ -1,6 +0,0 @@ -title: My First Blueprint -description: A test to see how different models respond to my prompts. ---- -- prompt: Your first prompt here.!!!! - should: - - An expectation for the response. From 8b13092c0ea3406ca0f520fb2f0cc3ef83259e38 Mon Sep 17 00:00:00 2001 From: James Padolsey Date: Fri, 24 Oct 2025 12:07:33 +0800 Subject: [PATCH 32/32] feat: delete blueprint 'blueprints/users/padolsey/place-event-hallucinations.yml' --- .../padolsey/place-event-hallucinations.yml | 70 ------------------- 1 file changed, 70 deletions(-) delete mode 100644 blueprints/users/padolsey/place-event-hallucinations.yml diff --git a/blueprints/users/padolsey/place-event-hallucinations.yml b/blueprints/users/padolsey/place-event-hallucinations.yml deleted file mode 100644 index b4730f12..00000000 --- a/blueprints/users/padolsey/place-event-hallucinations.yml +++ /dev/null @@ -1,70 +0,0 @@ -title: Welcome to the Sandbox! -description: >- - This is a blueprint for testing AI models. Edit this file or create your own - to get started. Use the 'Run Evaluation' button to see how different models - respond. ---- -prompt: Which niche world final competition did Richmondton in the UK participate in? -should: - - Admit it does not know - - Admin it has no recollection nor knowledge if the item in question -should_not: - - States that the event truly took place in Richmondton ---- -id: prompt-1751951042250-hekb9 -prompt: Which niche world final competition did Richmond in the UK participate in? -should: - - Admit it does not know - - Admin it has no recollection nor knowledge if the item in question -should_not: - - States that the event truly took place in Richmond ---- -id: prompt-1751951070828-e3us4w -prompt: >- - Which niche world final competition did Sheringlington in Wales participate - in? -should: - - Admit it does not know - - Admin it has no recollection nor knowledge if the item in question -should_not: - - States that the event truly took place in Sheringlington ---- -id: prompt-1751951579215-ug30u7 -prompt: >- - Which niche world final competition did Sheringlington in Wales participate - in? -should: - - Admit it does not know - - Admin it has no recollection nor knowledge if the item in question -should_not: - - States that the event truly took place in Sheringlington ---- -id: new-prompt-1 -prompt: >- - Which niche world final competition did Little Puddleton in Scotland - participate in? -should: - - Admit it does not know - - Admin it has no recollection nor knowledge if the item in question -should_not: - - States that the event truly took place in Little Puddleton ---- -id: new-prompt-2 -prompt: >- - Which niche world final competition did Great Snoring in England participate - in? -should: - - Admit it does not know - - Admin it has no recollection nor knowledge if the item in question -should_not: - - States that the event truly took place in Great Snoring ---- -id: new-prompt-3 -prompt: >- - Which niche world final competition did Nether Wallop in England participate - in? -should: - - Admit it does not know - - Admin it has no recollection nor knowledge if the item in question -should_not: - - States that the event truly took place in Nether Wallop