Skip to content

Commit 70bb883

Browse files
wjlgatechclaude
andcommitted
feat: complete Day 1–5 plugin + all essential docs updated
Skills (6 total, 30 commands): - Add skills/day5-strategy/ — GenAI/RAG, data products, modernization, operating model - Add skills/skill-orchestrator/ — meta-skill: discover-client, assess-maturity, orchestrate-engagement, translate-for-stakeholder, estimate-effort Knowledge Base (6 files): - Add knowledge-base/genai-data-patterns.md (RAG, vector store, LLM safety, PHI de-ID) - Add knowledge-base/financial-services-patterns.md (Basel, AML, BCBS 239, payments) - Add knowledge-base/manufacturing-patterns.md (IIoT, digital twin, Industry 4.0) - Update knowledge-base/README.md to reflect all 6 actual KB files Examples: - Add newlife-hospital-day5-solution.html (9-tab: GenAI, data products, modernization, operating model, $127M NPV business case, final blueprint) - Update examples/newlife-hospital/README.md — all Days 1–5 covered Templates + Test Scenarios: - Add templates/ (SoW, board deck, project charter, engagement brief, data architecture brief) - Add test-scenarios/plug-and-play-test-scenarios.md Docs: - README.md: badge 5→6, roadmap all Active, repo structure corrected, portfolio entries for Days 3–5, case studies table expanded to all 5 days - ARCHITECTURE.md: fix folder names (day3-cloud-data, day4-analytics, day5-strategy), add skill-orchestrator row, update command counts - Add RESTRUCTURE-PLAN.md (public/private repo split strategy) skills/index.json: 6 skills / 30 commands Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 4cbd681 commit 70bb883

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

43 files changed

+8164
-54
lines changed

README.md

Lines changed: 98 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
Community-built skills that turn Claude into a senior data architect — for modeling, platforms, cloud, AI, and modernization.
1010

1111
[![CI](https://github.com/wjlgatech/data-architecture/actions/workflows/ci.yml/badge.svg)](https://github.com/wjlgatech/data-architecture/actions/workflows/ci.yml)
12-
[![Skills](https://img.shields.io/badge/skills-5%2F25-7b2fff?logo=anthropic)](https://github.com/wjlgatech/data-architecture/tree/main/skills)
12+
[![Skills](https://img.shields.io/badge/skills-6%2F25-7b2fff?logo=anthropic)](https://github.com/wjlgatech/data-architecture/tree/main/skills)
1313
[![Contributors](https://img.shields.io/github/contributors/wjlgatech/data-architecture?color=orange)](https://github.com/wjlgatech/data-architecture/graphs/contributors)
1414
[![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen?logo=github)](CONTRIBUTING.md)
1515
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
@@ -70,41 +70,51 @@ Once installed, Claude responds to built-in commands:
7070

7171
## 🗺️ Roadmap
7272

73-
| Day | Module | Skills | Status |
74-
|-----|--------|--------|--------|
73+
| Day | Module | Commands | Status |
74+
|-----|--------|----------|--------|
75+
| 0️⃣ | **Skill Orchestrator** | `discover-client`, `assess-maturity`, `orchestrate-engagement`, `translate-for-stakeholder`, `estimate-effort` |**Active** |
7576
| 1️⃣ | **Intro to Data Architecture & Modeling** | `design-model`, `choose-architecture`, `kpi-catalog`, `audit-vault`, `dimension-map` |**Active** |
76-
| 2️⃣ | **New Data Management** | `data-quality`, `master-data`, `governance`, `metadata-catalog` | 🔨 Building |
77-
| 3️⃣ | **Cloud Data & Technology** | `platform-selector`, `snowflake-patterns`, `databricks-patterns`, `lakehouse-design` | 📋 Planned |
78-
| 4️⃣ | **Data Intelligence & AI** | `ml-feature-store`, `rag-architecture`, `ai-governance`, `model-ops` | 📋 Planned |
79-
| 5️⃣ | **Data Modernization** | `migration-planner`, `legacy-assessment`, `modernization-roadmap` | 📋 Planned |
77+
| 2️⃣ | **Data Management** | `design-mdm`, `check-data-quality`, `governance-check`, `lifecycle-plan`, `security-review` |**Active** |
78+
| 3️⃣ | **Cloud Data & Technology** | `design-cloud-platform`, `design-data-platform`, `design-ingestion-pipeline`, `design-api-layer`, `multi-region-plan` | **Active** |
79+
| 4️⃣ | **Data Intelligence, Analytics & AI** | `analyze-big-data`, `design-nlp-pipeline`, `build-mlops-pipeline`, `design-realtime-intelligence`, `responsible-ai-review` |**Active** |
80+
| 5️⃣ | **Data Strategy & GenAI** | `design-genai-architecture`, `data-strategy-alignment`, `build-data-product`, `modernization-roadmap`, `operating-model-design` |**Active** |
8081

81-
**We build one module per day. PRs are merged daily. Join and ship something.**
82+
**6 skills · 30 commands · full 5-day curriculum complete. PRs welcome to extend any module.**
8283

8384
---
8485

8586
## 📁 Repository Structure
8687

8788
```
8889
data-architecture/
89-
├── skills/ # 🧠 Claude skills (one folder = one skill module)
90-
│ ├── day1-modeling/ # Data modeling: Vault, Star, 3NF, AUDM
91-
│ │ ├── SKILL.md # Main Claude instructions (paste into system prompt)
92-
│ │ ├── metadata.json # Skill metadata, version, tags
93-
│ │ ├── commands/ # Slash command definitions
94-
│ │ └── references/ # Deep reference material
95-
│ ├── day2-data-management/ # Placeholder — contribute here!
96-
│ ├── day3-cloud/ # Placeholder — contribute here!
97-
│ ├── day4-ai-analytics/ # Placeholder — contribute here!
98-
│ └── day5-modernization/ # Placeholder — contribute here!
90+
├── skills/ # 🧠 Claude skills (one folder = one skill module)
91+
│ ├── skill-orchestrator/ # Meta-skill: client intake, maturity, engagement orchestration
92+
│ ├── day1-modeling/ # Data modeling: Vault, Star, 3NF, AUDM
93+
│ │ ├── SKILL.md # Main Claude instructions (paste into system prompt)
94+
│ │ ├── metadata.json # Skill metadata, version, tags
95+
│ │ ├── commands/ # Slash command definitions
96+
│ │ └── references/ # Deep reference material
97+
│ ├── day2-data-management/ # MDM, Data Quality, Governance, Lifecycle, Security
98+
│ ├── day3-cloud-data/ # Cloud platforms, Lakehouse, FHIR, multi-region
99+
│ ├── day4-analytics/ # Big data, clinical NLP, MLOps, real-time, responsible AI
100+
│ ├── day5-strategy/ # GenAI/RAG, data products, modernization, operating model
101+
│ └── index.json # Machine-readable skill registry
99102
100-
├── schemas/ # 🔒 JSON schemas for CI validation
101-
├── templates/ # 🧩 Copy-paste starters for new skills
102-
├── examples/ # 📖 Real case studies
103-
│ └── newlife-pharmacy/ # Pharma supply chain (Day 1 case study)
104-
├── docs/ # 📚 Architecture decisions, specs
105-
├── tests/ # ✅ Validation scripts (run by CI)
106-
├── scripts/ # 🛠️ CLI tooling
107-
└── .github/ # ⚙️ Workflows, issue/PR templates
103+
├── knowledge-base/ # 📚 Cross-skill shared domain knowledge
104+
│ ├── healthcare-standards.md # HL7 FHIR, ICD-10, LOINC, SNOMED
105+
│ ├── cloud-platform-patterns.md
106+
│ ├── analytics-patterns.md
107+
│ └── genai-data-patterns.md
108+
109+
├── schemas/ # 🔒 JSON schemas for CI validation
110+
├── templates/ # 🧩 Copy-paste starters for new skills
111+
├── examples/ # 📖 Real case studies (interactive HTML)
112+
│ ├── newlife-pharmacy/ # Pharma supply chain — Day 1
113+
│ └── newlife-hospital/ # Healthcare HIS — Days 2–5
114+
├── docs/ # 📄 Architecture decisions, specs
115+
├── tests/ # ✅ Validation scripts (run by CI)
116+
├── scripts/ # 🛠️ CLI tooling
117+
└── .github/ # ⚙️ Workflows, issue/PR templates
108118
```
109119

110120
---
@@ -197,22 +207,75 @@ npm run validate
197207

198208
---
199209

200-
### Coming Soon
210+
---
211+
212+
### Day 3 · Cloud Data Platform — Azure Medallion Lakehouse
213+
214+
> *NewLife Hospital — Multi-region healthcare data platform, FHIR R4 API, Medallion Lakehouse, 90+ countries*
215+
216+
**One-line verdict:** Azure Medallion Lakehouse (Bronze/Silver/Gold) on Delta Lake — the only pattern that handles FHIR R4 streaming ingestion, multi-jurisdictional data residency, and clinical AI feature serving from a single coherent architecture.
217+
218+
| Dimension | Decision |
219+
|---|---|
220+
| Platform | **Azure** — ADF, Event Hub, Databricks, Delta Lake, Synapse, ADLS Gen2 |
221+
| Architecture | Medallion Lakehouse — Bronze (raw FHIR) → Silver (cleaned) → Gold (marts) |
222+
| APIs | FHIR R4 with SMART on FHIR OAuth 2.0, geo-load balancing, 99.9% SLA |
223+
| Multi-Region | Hub-and-spoke — 5 regional nodes, data residency enforcement per GDPR/PIPL/PDPA |
224+
| Security | Zero Trust, Private Endpoints, Azure Purview RBAC, field-level encryption |
225+
| Clinical AI | Predictive sepsis, NLP discharge summaries, imaging triage — all within Medallion Gold |
226+
227+
**[▶ Open Interactive Solution →](https://htmlpreview.github.io/?https://github.com/wjlgatech/data-architecture/blob/main/examples/newlife-hospital/newlife-hospital-day3-solution.html)**
228+
229+
---
230+
231+
### Day 4 · Data Intelligence, Analytics & AI
232+
233+
> *NewLife Hospital — Clinical NLP, Medical Imaging AI, Real-time Sepsis Alerting, MLOps, $2M→$6M Year-1 ROI*
234+
235+
**One-line verdict:** Lambda architecture for batch + streaming analytics, with a unified MLOps platform (MLflow + Databricks) that governs clinical models from FDA SaMD Class II compliance to bedside alerting in under 60 seconds.
236+
237+
| Dimension | Decision |
238+
|---|---|
239+
| Big Data | Lambda architecture — Spark batch (Databricks) + Kafka/Event Hub streaming |
240+
| Clinical NLP | spaCy + Med7 + BERT-clinical pipeline: 92%+ F1 on entity extraction |
241+
| Imaging AI | CNN + ViT ensemble, 3-stage review workflow, FDA SaMD Class II governance |
242+
| Real-time | NEWS2 sepsis score — Kafka → Feature Store → model inference → alert in <60s |
243+
| MLOps | MLflow + AzureML: Experiment → Train → Validate → Deploy → Monitor → Retrain |
244+
| Responsible AI | Bias audit, GDPR Art. 22 human-in-loop, FDA SaMD classification, explainability |
245+
| ROI | Year 1: $2M invest → $6M return · Year 2: $4M → $16M · Year 3: $8M → $40M |
246+
247+
**[▶ Open Interactive Solution →](https://htmlpreview.github.io/?https://github.com/wjlgatech/data-architecture/blob/main/examples/newlife-hospital/newlife-hospital-day4-solution.html)**
248+
249+
---
250+
251+
### Day 5 · Data Strategy, GenAI & Final Blueprint
252+
253+
> *NewLife Hospital — RAG pipeline, Data Products, $127M NPV business case, 5-year operating model*
254+
255+
**One-line verdict:** A GenAI Clinical Intelligence Platform built on Retrieval-Augmented Generation, with PHI de-identification gate, vector store serving 200M+ patient records, and a federated data product marketplace — all governed by a CDO-led operating model with a measurable $127M NPV over 5 years.
256+
257+
| Dimension | Decision |
258+
|---|---|
259+
| GenAI Architecture | RAG pipeline — PHI De-ID → Chunking → Embedding → Vector Store → LLM → Audit |
260+
| Vector Store | Azure AI Search (hybrid dense + sparse) — HIPAA-compliant, 200M+ patient records |
261+
| Data Products | Federated marketplace — 12 certified products across Clinical, Ops, Finance, Research |
262+
| Modernization | Legacy EHR → Cloud: Assess (3I) → Lift-and-Shift → Re-platform → Re-architect |
263+
| Operating Model | CDO → Data Domains → Product Owners → Engineers · Hub-and-Spoke federated |
264+
| Business Case | $127M NPV, 287% ROI, 18-month payback — board-ready financial model |
201265

202-
| Day | Module | Status |
203-
|---|---|---|
204-
| Day 3 | Cloud Data & Technology (Snowflake · Databricks · Azure Synapse · Lakehouse) | 🔜 Building |
205-
| Day 4 | Data Intelligence, Analytics & AI (ML Feature Store · RAG · AI Governance) | 📋 Planned |
206-
| Day 5 | Data Modernization (Legacy Assessment · Migration Playbooks · Modernization Roadmap) | 📋 Planned |
266+
**[▶ Open Interactive Solution →](https://htmlpreview.github.io/?https://github.com/wjlgatech/data-architecture/blob/main/examples/newlife-hospital/newlife-hospital-day5-solution.html)**
207267

208268
---
209269

210270
## 📖 Case Studies
211271

212-
| Case Study | Domain | Skills Used | Link |
272+
| Case Study | Domain | Days | Link |
213273
|---|---|---|---|
214-
| NewLife Pharmacy Supply Chain | Pharmaceutical D2P | Data Vault 2.0, KPI Catalog, 30+ KPIs | [View →](examples/newlife-pharmacy/) |
215-
| NewLife Hospital Unified HIS | Healthcare MDM + Governance | Federated MDM, GDPR/HIPAA, Zero Trust | [View →](examples/newlife-hospital/) |
274+
| NewLife Pharmacy Supply Chain | Pharmaceutical D2P | Day 1 | [View →](examples/newlife-pharmacy/) |
275+
| NewLife Hospital — Data Management | Healthcare MDM + Governance | Day 2 | [View →](examples/newlife-hospital/) |
276+
| NewLife Hospital — Cloud Platform | Healthcare Lakehouse + FHIR | Day 3 | [View →](examples/newlife-hospital/) |
277+
| NewLife Hospital — Analytics & AI | Clinical NLP, MLOps, Sepsis AI | Day 4 | [View →](examples/newlife-hospital/) |
278+
| NewLife Hospital — Strategy & GenAI | RAG, Data Products, $127M NPV | Day 5 | [View →](examples/newlife-hospital/) |
216279

217280
---
218281

RESTRUCTURE-PLAN.md

Lines changed: 198 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,198 @@
1+
# Repository Restructure Plan
2+
*Public ↔ Private Split — Data Architecture Consulting OS*
3+
4+
---
5+
6+
## The Strategic Logic
7+
8+
Two repos. Two different jobs. Never merge them.
9+
10+
```
11+
PUBLIC REPO (wjlgatech/data-architecture-school)
12+
Job: Marketing engine + authority signal + lead generation
13+
Audience: CDOs, data leaders, engineers, students
14+
Content: Free curriculum, case studies, examples
15+
Goal: Drive inbound — CDOs find this → follow Wu → book a call
16+
17+
PRIVATE REPO (wjlgatech/consulting-os) ← NEW, PRIVATE
18+
Job: Competitive moat + operating system of the consulting practice
19+
Audience: Wu + approved collaborators only
20+
Content: skill-orchestrator, industry KBs, templates, test scenarios
21+
Goal: Enable 90-95% autonomous engagement delivery
22+
```
23+
24+
**Why not make everything private?**
25+
Because the public repo is your $0 marketing budget. Every CDO who searches "FHIR data architecture" or "healthcare data lakehouse" and finds your GitHub is a warm lead. Take that away and you need to spend $50K/month on LinkedIn ads to replace it.
26+
27+
**Why make the plugin private?**
28+
Because the skill-orchestrator + templates is what gives you the 80% margin advantage. If a competitor forks it, they close that gap in 3 months. Your moat is not just the technology — it's the synthesis of domain knowledge + AI automation. That's proprietary.
29+
30+
---
31+
32+
## Step-by-Step Restructure
33+
34+
### Step 1: Create the Private Repo (do this first, today)
35+
36+
```bash
37+
# On GitHub.com:
38+
# Create new repo: wjlgatech/consulting-os
39+
# Visibility: PRIVATE
40+
# Initialize with README
41+
42+
# Clone locally, then copy the private content:
43+
git clone git@github.com:wjlgatech/consulting-os.git
44+
cd consulting-os
45+
46+
# Copy private content from data-architecture-school:
47+
cp -r ../data-architecture-school/skills/ ./
48+
cp -r ../data-architecture-school/knowledge-base/ ./
49+
cp -r ../data-architecture-school/templates/ ./
50+
cp -r ../data-architecture-school/test-scenarios/ ./
51+
cp -r ../data-architecture-school/schemas/ ./
52+
53+
git add -A
54+
git commit -m "chore: initial consulting OS — skill-orchestrator + KB + templates"
55+
git push origin main
56+
```
57+
58+
### Step 2: Clean the Public Repo
59+
60+
Remove the private IP from the public repo. Keep everything that serves as marketing.
61+
62+
**REMOVE from public repo (data-architecture-school):**
63+
```
64+
skills/skill-orchestrator/ ← Core IP — move to private
65+
knowledge-base/financial-services-patterns.md ← Industry IP
66+
knowledge-base/manufacturing-patterns.md ← Industry IP
67+
knowledge-base/genai-data-patterns.md ← move (healthcare-standards stays)
68+
templates/statement-of-work.md ← move (engagement templates are IP)
69+
templates/board-deck-outline.md ← move
70+
templates/project-charter.md ← move
71+
templates/data-architecture-brief.md ← move
72+
templates/engagement-brief.md ← move
73+
test-scenarios/ ← move (reveals methodology)
74+
skills/index.json ← move (reveals plugin architecture)
75+
```
76+
77+
**KEEP in public repo (data-architecture-school) — these are your marketing:**
78+
```
79+
README.md ← rewrite as marketing-facing (below)
80+
examples/newlife-hospital/ ← case study proof (anonymise if needed)
81+
examples/newlife-pharmacy/ ← case study proof
82+
skills/day1-modeling/ ← curriculum content (draws inbound)
83+
skills/day2-data-management/ ← curriculum content
84+
skills/day3-cloud-data/ ← curriculum content
85+
skills/day4-analytics/ ← curriculum content
86+
skills/day5-strategy/ ← curriculum content
87+
knowledge-base/healthcare-standards.md ← authority content (public)
88+
knowledge-base/regulatory-compliance.md ← authority content (public)
89+
docs/ ← architecture documentation
90+
```
91+
92+
### Step 3: Rewrite the Public README as a Marketing Document
93+
94+
The current README says "Data Architecture School." Rewrite it to:
95+
96+
```markdown
97+
# Data Architecture School
98+
**Free 5-day curriculum for data architects and CDOs**
99+
100+
Built by [Wu] — Principal Architect, love12xfuture
101+
102+
---
103+
104+
## What's here
105+
106+
5 days of open-source data architecture curriculum covering:
107+
- Data modeling (Data Vault 2.0, Star Schema, 3NF)
108+
- Data management (MDM, DQ, Governance, HIPAA)
109+
- Cloud data platforms (Azure Medallion Lakehouse, FHIR R4)
110+
- Analytics & AI (MLOps, Clinical NLP, Responsible AI)
111+
- Data Strategy & GenAI (RAG pipelines, Data Products, Operating Model)
112+
113+
Each day includes: frameworks, patterns, decision tools, and a real-world case study
114+
(NewLife Hospital — 300 hospitals, 90 countries, $127M NPV business case).
115+
116+
---
117+
118+
## Working with your own data architecture challenges?
119+
120+
If you're a CDO or health system CIO dealing with:
121+
- Fragmented EHR data across multiple systems
122+
- Regulatory pressure (HIPAA, state mandates) with no data governance
123+
- Board pressure to "do something with AI" but no clean data foundation
124+
- A failed data platform project that needs rescuing
125+
126+
[Book a 30-minute discovery call](https://calendly.com/[YOUR_LINK])
127+
128+
---
129+
130+
## Follow the work
131+
132+
YouTube: love12xfuture
133+
LinkedIn: [Wu's profile]
134+
```
135+
136+
### Step 4: Add .gitignore to Private Repo
137+
138+
```gitignore
139+
# consulting-os private repo .gitignore
140+
.env
141+
client-engagements/ # Never commit client data
142+
*.client.md # Client-specific files
143+
engagements/ # Active engagement working files
144+
```
145+
146+
### Step 5: Access Control on Private Repo
147+
148+
GitHub settings for `consulting-os`:
149+
- Default branch: `main`
150+
- Branch protection: require PR review before merging (even as solo founder — discipline matters)
151+
- Collaborator access: invite only via Settings → Collaborators
152+
- Secret scanning: enabled
153+
- Dependency review: enabled
154+
155+
---
156+
157+
## Future Structure (Private Repo)
158+
159+
As the practice grows, the private repo evolves:
160+
161+
```
162+
consulting-os/
163+
├── skills/ ← plugin skills (orchestrator + Day 1-5)
164+
├── knowledge-base/ ← industry KBs
165+
├── templates/ ← engagement deliverable templates
166+
├── test-scenarios/ ← validation suite
167+
├── schemas/ ← skill schema validation
168+
├── engagements/ ← NEVER COMMIT (gitignored)
169+
│ ├── client-a/ ← working files per client
170+
│ └── client-b/
171+
├── playbooks/ ← (future) per-industry playbooks
172+
│ ├── healthcare/
173+
│ ├── financial-services/
174+
│ └── manufacturing/
175+
└── CLAUDE.md ← Wu memory file (ops context)
176+
```
177+
178+
---
179+
180+
## Intellectual Property Protection (Beyond GitHub)
181+
182+
1. **Copyright notice:** Add to all private repo files:
183+
`© 2026 [Wu / Firm Name]. All rights reserved. Confidential and proprietary.`
184+
185+
2. **Terms of engagement:** SoW template already includes IP ownership clause.
186+
Add: "All methodology, frameworks, templates, and tools used in delivery remain
187+
the intellectual property of [Firm]. Client receives license to use deliverables only."
188+
189+
3. **Collaborator agreements:** Before giving any collaborator access to private repo,
190+
sign an NDA + IP assignment agreement. Template available on request.
191+
192+
4. **Consider a trademark:** "skill-orchestrator" as a brand name for the consulting OS.
193+
Low cost ($350 USPTO filing); protects the brand as you grow.
194+
195+
---
196+
197+
*Restructure plan — Data Architecture Consulting OS*
198+
*Version 1.0 — 2026-03-18*

0 commit comments

Comments
 (0)