Add 9 new Green AI patterns to the catalogue#407
Conversation
Signed-off-by: Russell Trow <russell@greensoftware.foundation>
Signed-off-by: Russell Trow <russell@greensoftware.foundation>
Updated base URL format for Docusaurus configuration. Signed-off-by: Russell Trow <russell@greensoftware.foundation>
Signed-off-by: Russell Trow <russell@greensoftware.foundation>
Signed-off-by: Russell Trow <russell@greensoftware.foundation>
Signed-off-by: Russell Trow <russell@greensoftware.foundation>
Signed-off-by: Russell Trow <russell@greensoftware.foundation>
Updated GH deploy workflow to allow manual runs Signed-off-by: Russell Trow <russell@greensoftware.foundation>
Remove several legacy AI/architecture pages and replace them with reorganized, up-to-date guidance. Adds new system-topology patterns (efficient-hardware, on-demand-execution for agent workloads, run-ai-models-edge), new development docs (right-sized models, optimize data storage), and an operations doc for carbon-aware scheduling. Also updates pre-trained-transfer-learning metadata/content (author and expanded guidance). Consolidates and modernizes AI sustainability guidance and authorship (Naveen Balani).
Rename file from 'Use right-sized and energy-efficient AI models .md' to 'right-sized-energy-efficient-ai-models.md' to remove trailing space and normalize the filename to kebab-case. No content changes were made; this improves consistency and prevents issues with linking and tooling that don't handle spaces well.
…vements - Add ## Cost Impact section to all 7 AI patterns (between SCI Impact and Assumptions) - Fix stray trailing quote in pattern-02 h1 title - Strengthen edge deployment assumption (memory/compute/power specifics) - Strengthen transfer learning fine-tuning cost caveat - Strengthen on-demand execution stateful workflow assumption Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace 'Enterprise Architect' with 'Solution Architect' in three AI patterns to match the personas defined at patterns.greensoftware.foundation/personas/ Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Split the originally scoped Pattern 4 into two focused patterns per Naveen Balani's approval: - 4A: Select efficient ML frameworks and inference runtimes (Development) Covers framework/runtime selection criteria, inference-optimised runtimes, hardware-specific optimisations, and benchmarking guidance. - 4B: Optimize agent orchestration to reduce unnecessary model calls (Development) Covers caching, conditional logic, batching, early termination, and workflow profiling for agentic AI systems. Both patterns follow the full GSF template including Cost Impact and description front matter fields. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
All other patterns in the repo use empty tags fields. Comma-separated string values are not valid YAML arrays and caused a ValidationError on deploy. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> EOF )
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Five redirects pointed to old pattern paths that no longer exist: - compress-ml-models-for-inference → right-sized-energy-efficient-ai-models - energy-efficent-ai-edge → run-ai-models-edge - efficent-format-for-model-training → optimize-data-storage-ai-training - right-hardware-type → efficient-hardware-ai-workloads - leverage-sustainable-regions → carbon-aware-ai-scheduling Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
@LiyaMath @franziska-warncke @navveenb here are the new AI patterns on the staging website: https://russelltrow.github.io/gsf-patterns/personas/ai-ml-engineer/ |
|
|
||
| AI workloads such as training, fine-tuning, and inference require significant compute resources. The type of hardware used, including CPUs, GPUs, TPUs, and specialized accelerators, has a direct impact on energy efficiency and performance. | ||
|
|
||
| Different hardware options vary in their ability to execute AI workloads efficiently. Selecting appropriate hardware and compute resources improves utilization, reduces execution time, and lowers overall energy consumption. |
There was a problem hiding this comment.
I would also add that the orchestration is critical here. The suggested selection should not happen manually but should depend on a characterisation of the system heterogeneity.
I believe we can refine this here by looking at middleware systems that do the workload balacing, dispatching, and monitoring in closed-loop.
There was a problem hiding this comment.
Excellent suggestion. We'll explicitly mention that orchestration layers can act as closed-loop resource controllers, continuously adjusting allocations based on workload requirements, utilization, and efficiency objectives to avoid over-provisioning.
|
|
||
| ## Solution | ||
|
|
||
| - Choose hardware that is optimized for the specific workload, such as GPUs or TPUs for parallel processing tasks |
There was a problem hiding this comment.
Here can become more fine-grained, basically saying that we can define time scales for each task to be executed on CPU / GPU / TPU or accelerators by offering a "catalogue" of possible solutions for the needed time scales.
| ## Solution | ||
|
|
||
| - Choose hardware that is optimized for the specific workload, such as GPUs or TPUs for parallel processing tasks | ||
| - Use specialized accelerators where available to improve efficiency |
There was a problem hiding this comment.
Let us build a catalogue for accelerator integration. We need to discuss here at the orchestration level (middleware) to enable the monitoring workload dispatching, with judiciously allocated resources.
There was a problem hiding this comment.
Great suggestion, updated the principle.
| ## Solution | ||
|
|
||
| - Deploy models on edge devices or local infrastructure to reduce data transfer to centralized systems | ||
| - Perform data preprocessing tasks such as filtering, cleansing, and feature generation locally |
There was a problem hiding this comment.
I would complete "problem/workload-specific preprocessing"
|
|
||
| ## SCI Impact | ||
|
|
||
| **SCI = (E × I) + M per R** |
There was a problem hiding this comment.
I guess the formula changes here as we have edge and cloud measurements. Shall we split the contributions? We might, of course, run into the problem of having the cloud "absorb" the edge values.
There was a problem hiding this comment.
The SCI formula should work. However, in hybrid edge-cloud architectures, energy, carbon intensity, and embodied emissions should be measured across all participating components, including edge devices, networking, and centralized infrastructure, and aggregated when calculating SCI
|
|
||
| ## Cost Impact | ||
|
|
||
| - **Cloud compute costs:** Reduced by moving inference to edge devices |
There was a problem hiding this comment.
I would also add the availability of the edge devices. Dependent on the network connection type, the edge device availability and responsiveness would also play a role
|
|
||
| - **Cloud compute costs:** Reduced by moving inference to edge devices | ||
| - **Network costs:** Lower data transfer to centralized systems | ||
| - **Edge device costs:** Increased due to deploying hardware at the edge |
There was a problem hiding this comment.
I would add the edge device to the cost impact. We can definitely use vonNeumann machines with low price and high energy need, or in-memory compute non-vonNeumann machine with a higher price (relatively) and ultra-low power consumption.
|
|
||
| ## Assumptions | ||
|
|
||
| - Edge or local devices have sufficient memory, compute capacity, and power to run the target model without requiring additional optimization |
There was a problem hiding this comment.
In my experience, I have also made tradeoffs between the pre-processing costs on the near-data edge device and the actual ML model, which comes anyway quantised / compressed for inference.
There was a problem hiding this comment.
Excellent point. Preprocessing itself consumes compute resources and should be evaluated alongside model execution when designing edge architectures.
| ## Assumptions | ||
|
|
||
| - Edge or local devices have sufficient memory, compute capacity, and power to run the target model without requiring additional optimization | ||
| - Workloads can be partitioned effectively between edge and cloud |
There was a problem hiding this comment.
This is an open question: how to quantify/predict workloads and dispatch the workload smartly between cloud and edge. We need to define metrics also for the partitioning.
There was a problem hiding this comment.
Good point, i will extend the considerations section to highlight the need for metrics and evaluation criteria when deciding how workloads should be distributed
| Using efficient data storage and access patterns improves data retrieval performance and reduces the overall resource footprint of both training and runtime systems. | ||
|
|
||
| ## Solution | ||
|
|
There was a problem hiding this comment.
Would smart methods for serialization/deserialization also play a role here? Especially when talking about cloud-edge distribution?
There was a problem hiding this comment.
Great suggestion. Will expand the solution section to include serialization efficiency as an important consideration.
|
|
||
| ## Considerations | ||
|
|
||
| - Compatibility with existing tools and pipelines must be evaluated |
There was a problem hiding this comment.
Here the conversion tools might also introduce costs
| - Use caching mechanisms to avoid re-processing identical inputs or identical tool results | ||
| - Implement conditional logic to skip unnecessary model calls when prior results can be reused | ||
| - Prefer direct tool calls or API integrations over calling models to transform simple data | ||
| - Use streaming and progressive results where possible instead of processing entire responses at once |
There was a problem hiding this comment.
Yes, this included harmonising the event processing concepts eventually. There are lessons learned from stream processing and especially for AI models serving could benefit.
There was a problem hiding this comment.
Good point, will broaden the guidance to acknowledge that streaming and event-driven processing patterns can reduce unnecessary computation and improve responsiveness.
|
|
||
| ## Assumptions | ||
|
|
||
| - Workflows can be analyzed and profiled to identify inefficiencies |
There was a problem hiding this comment.
In my experience the state profiling is not that efficient. We need good calibration with deployed workflows, especially when considering the cloud-edge continuum
There was a problem hiding this comment.
Agreed, will modify the wording to reflect this
| - Caching strategies must account for data freshness and accuracy requirements | ||
| - Some tasks genuinely require multiple model calls; avoid false economy measures | ||
| - Agent design patterns vary (ReAct, Tree of Thought, etc.); optimization strategies differ by pattern | ||
| - Monitoring and profiling agent execution requires observable logging and metrics |
There was a problem hiding this comment.
In this case, we need mechanisms as in feedback control systems. Here, we would have a controller fed with max resources, values of continuously monitored resource consumption, and the capacity to orchestrate the execution of agents in an event-based manner. Imagine that continuous monitoring would be costly, I would say event-based reactive control, basically, the system would react to relevant changes in the metrics. There is so much potential here!
There was a problem hiding this comment.
Great suggestion. Closed-loop monitoring and event-driven adaptation can help optimize agent execution while avoiding the overhead of continuous monitoring. Will incorporate this as a consideration for advanced orchestration scenarios.
| - Pre-trained models may introduce biases or limitations from their original training data | ||
| - Fine-tuning large foundation models can still require substantial compute resources comparable to training from scratch; evaluate the true cost-benefit of fine-tuning vs. full training for your use case | ||
| - Licensing and usage restrictions of pre-trained models must be evaluated | ||
| - Model suitability should be validated for the specific domain |
There was a problem hiding this comment.
Here would be hard to find a diverse spectrum of domain-specific models, and then, to train them, we need more data and resources. This domain-specific validation would also have a price.
There was a problem hiding this comment.
Great point. Domain-specific pre-trained models may not always be available, and adapting or validating them for specialized domains can require additional data, compute, and evaluation effort. We'll expand the considerations section to reflect these trade-offs.
|
|
||
| AI and ML models vary significantly in size, architecture, complexity, and resource requirements. Larger models typically require more compute, memory, and storage, leading to higher energy consumption during both training and inference. | ||
|
|
||
| Using models that are appropriately sized and architecturally efficient for the task avoids unnecessary resource usage. This includes selecting smaller or task-specific models, choosing energy-efficient architectures at equivalent capability levels, and applying optimization techniques to reduce model footprint without sacrificing required performance. |
There was a problem hiding this comment.
This might come as a measure competing with the re-use of larger models and retraining. Model choice is also a decision not always available to all teams bringing AI into production. Offering a model zoo is typically good, but choosing is still a problem. A problem/model catalogue would be great.
There was a problem hiding this comment.
Good point, will clarify this
| - Prefer optimized or distilled versions of larger models for fine-tuning and inference | ||
| - Apply model compression techniques such as quantization, pruning, and knowledge distillation | ||
| - Remove redundant or inactive parameters where possible | ||
| - Evaluate model options based on both performance and energy efficiency before selection |
There was a problem hiding this comment.
Especially hard in edge deployment. The estimated values have some variance with respect to the deployed values on the hardware.
| - Remove redundant or inactive parameters where possible | ||
| - Evaluate model options based on both performance and energy efficiency before selection | ||
| - Continuously evaluate newer model variants that offer improved efficiency | ||
| - Avoid defaulting to the largest available model when simpler alternatives can achieve similar outcomes |
There was a problem hiding this comment.
This is problem-dependent and taps into more parameters of the problem: deployment hardware type and resources (i.e., DSP, GPU, NPU) available, type of data, and pre-processing. For instance, one can use for temporal data both feedforward and recurrent models, with the latter being more resource-efficient (i.e., fewer parameters) but harder to build.
There was a problem hiding this comment.
Excellent point. Model suitability depends not only on task complexity but also on deployment hardware, data characteristics, preprocessing requirements, and operational constraints. Will incorporate these factors into the considerations section.
|
|
||
| - **Compute costs:** Reduced due to smaller model sizes and faster inference | ||
| - **Infrastructure costs:** Lower due to reduced memory and storage requirements | ||
| - **Benchmarking overhead:** May add cost for performance testing across model variants |
There was a problem hiding this comment.
This should be a must! More work to be done here, but I would argue the overhead is needed to enable all the other cost savings.
There was a problem hiding this comment.
Agreed, added the trade-off
| ## Assumptions | ||
|
|
||
| - Smaller or optimized models can meet the functional requirements of the application | ||
| - Model performance can be validated against acceptable thresholds |
There was a problem hiding this comment.
Acceptable thresholds are problem specific.
There was a problem hiding this comment.
Good point. Performance thresholds vary by application and should be defined according to business and functional requirements.
|
|
||
| - Smaller or optimized models can meet the functional requirements of the application | ||
| - Model performance can be validated against acceptable thresholds | ||
| - Efficiency improvements do not significantly degrade output quality |
| - Some complex tasks may require larger models | ||
| - Over-optimization can degrade performance | ||
| - Fine-tuning larger models may be necessary for complex domain-specific tasks | ||
| - Periodic re-evaluation is needed as workloads and models evolve |
There was a problem hiding this comment.
This monitoring comes with overhead but can be integrated into the previously defined event-based platform to handle resource allocation, as profiling is the core of the lifecyle.
There was a problem hiding this comment.
Yes, will add that monitoring approaches should balance observability benefits with resource consumption.
|
|
||
| Different frameworks and runtimes vary significantly in their ability to leverage hardware capabilities, execute operations efficiently, and minimize computational overhead. Inefficient framework choices can lead to unnecessary compute consumption, poor hardware utilization, and increased energy expenditure for the same workload. | ||
|
|
||
| Selecting efficient ML frameworks and inference runtimes improves model execution performance and reduces the carbon footprint of AI training and inference. |
There was a problem hiding this comment.
A very important point here is that the community tries to unfold a harmonised and interoperable framework APIs, so that one can deploy the same model on various backends. This approach of streamlining model deployment to specialised hardware happens already (see from the neuromorphic accelerators community, the SNNtorch and the Neuromorphic Intermediate Representation). This would open the stage for truly heterogeneous systems with large benefits in performance and sustainability, once a workload orchestrator is in place.
See: https://neuroir.org/ and https://snntorch.readthedocs.io/en/latest/
There was a problem hiding this comment.
Good point, will extend the pattern to highlight the importance of interoperable runtimes and portable model representations.
|
|
||
| ## Solution | ||
|
|
||
| - Choose frameworks that efficiently utilize available hardware (GPUs, TPUs, specialized accelerators) |
There was a problem hiding this comment.
Here, the community starts opening the way for new systems. For example, a heterogeneous orchestrator is the one from https://klepsydra.com/.
Efficient workload dispatching, monitoring, and balancing on heterogeneous edge systems.
There was a problem hiding this comment.
We will acknowledge that framework choices may influence workload portability and orchestration across diverse hardware platforms.
| - Use optimized inference layers that reduce latency and compute overhead compared to training frameworks | ||
| - Select frameworks with strong compiler optimization and memory management capabilities | ||
| - Benchmark framework options under your actual workload conditions before committing to production | ||
| - Keep frameworks and runtime dependencies updated to benefit from performance and efficiency improvements |
There was a problem hiding this comment.
Here, backwards compatibility and the option to also "retrofit" existing hardware in large heterogeneous systems is still an open problems. Imagine we have existing systems which could deliver more performance in a more efficient way, but back compatibility and runtime dependencies create friction.
There was a problem hiding this comment.
Great point, will add this consideration to ensure organizations evaluate migration friction and retrofit opportunities.
Enhanced the document on efficient hardware for AI workloads by adding details on workload profiling, hardware optimization, and orchestration systems. Updated trade-off considerations for specialized accelerators and added new references for benchmarking. Signed-off-by: Navveen Balani <88837066+navveenb@users.noreply.github.com>
Enhanced recommendations for event-driven execution and resource management. Expanded sections on cost impact, assumptions, and considerations for on-demand workloads. Signed-off-by: Navveen Balani <88837066+navveenb@users.noreply.github.com>
Enhanced the documentation on deploying AI models at the edge by adding details on workload classification, preprocessing tasks, and cost implications of edge devices. Signed-off-by: Navveen Balani <88837066+navveenb@users.noreply.github.com>
Expanded on data handling strategies for AI training, emphasizing efficient serialization, compatibility considerations, and the balance between compression and decompression costs. Signed-off-by: Navveen Balani <88837066+navveenb@users.noreply.github.com>
Enhanced recommendations for optimizing agent workflows by incorporating telemetry and event-driven processing. Updated considerations to emphasize the need for adaptive orchestration and careful evaluation of trade-offs. Signed-off-by: Navveen Balani <88837066+navveenb@users.noreply.github.com>
Updated the note on model suitability for specific domains to emphasize the need for additional resources and validation efforts. Signed-off-by: Navveen Balani <88837066+navveenb@users.noreply.github.com>
Updated evaluation criteria for model selection to include energy efficiency benchmarks and clarified assumptions regarding performance and quality thresholds. Signed-off-by: Navveen Balani <88837066+navveenb@users.noreply.github.com>
Updated considerations for selecting ML frameworks and inference runtimes, emphasizing compatibility and portability. Signed-off-by: Navveen Balani <88837066+navveenb@users.noreply.github.com>
Summary
This PR contributes 9 new green software patterns focused on AI and ML workloads, authored by Naveen Balani. The patterns cover the full AI lifecycle — from development decisions through to runtime operations — and are structured according to the GSF pattern template.
New patterns
Development
right-sized-energy-efficient-ai-models.md— Select and optimize AI models appropriately sized for the task to reduce compute, memory, and energy consumption during training and inference.data-handling/optimize-data-storage-ai-training.md— Use efficient storage formats, compression, and indexing strategies for AI datasets and embeddings to reduce storage footprint, data transfer, and retrieval compute.pre-trained-transfer-learning.md— Fine-tune existing pre-trained models instead of training from scratch to dramatically reduce the compute, energy, and time required for model development.select-efficient-ml-frameworks-inference-runtimes.md— Choose ML frameworks and inference runtimes that best match your hardware and workload to reduce compute overhead and improve energy efficiency across training and production inference.optimize-agent-orchestration-reduce-model-calls.md— Design agentic AI workflows to minimise redundant model invocations and unnecessary compute through caching, conditional logic, and efficient orchestration patterns.Architecture
system-topology/run-ai-models-edge.md— Deploy AI inference on edge devices or local infrastructure to reduce data transfer, network energy use, and reliance on centralised cloud compute.system-topology/efficient-hardware-ai-workloads.md— Match AI workloads to the most energy-efficient hardware accelerator or instance type to improve utilisation and reduce energy consumption per inference or training run.system-topology/on-demand-execution-ai-agent-workloads.md— Trigger AI and agent workloads only when needed using serverless or event-driven platforms to eliminate idle compute and reduce unnecessary energy consumption.Operations
operations/carbon-aware-ai-scheduling.md— Reduce the carbon impact of AI workloads by running them in cloud regions with lower grid carbon intensity and scheduling deferrable jobs during periods of high renewable energy availability.Pattern structure
Each pattern follows the standard GSF template and includes:
## Description— problem context and motivation## Solution— actionable guidance## SCI Impact— mapping to the E, I, M, and R factors of the SCI equation## Cost Impact— compute, infrastructure, and trade-off considerations## Assumptions— preconditions for the pattern to apply## Considerations— trade-offs and caveats## References— citations and further readingAll patterns include a
descriptionfield in YAML front matter for catalogue indexing, and personas are aligned to the official GSF persona list.Note on patterns 4A and 4B
The ML frameworks and agent orchestration patterns were originally scoped as a single pattern. Following a review with Naveen Balani, they were split into two focused patterns — one covering execution engine selection (4A) and one covering workflow design for agentic systems (4B) — as the two decisions are made by different personas at different points in development.
Test plan
🤖 Generated with Claude Code