graph TD
deploytotest[Deploy DAGs to test GCC]
deploydags[Deploy DAGs into this AirFlow<br/>starting with CKAN data load]
deploygcc[Deploy Airflow<br/>i.e. Google Cloud Composer]
nhsdag[NHS DAG for loading to bigquery]
nhs[NHS Done: instance updated<br/>with extension and working in production]
logging[Logging]
reporting[Reporting]
othersite["Other Site Done"]
start[Start] --> deploygcc
start --> logging
multinodedag --> deploytotest
subgraph General Dev of AirCan
errors[Error Handling]
aircanlib[AirCan lib refactoring]
multinodedag[Multi Node DAG]
logging --> reporting
end
subgraph Deploy into Datopian Cluster
deploytotest[Deploy DAGs to test GCC] --> deploydags
deploygcc --> deploydags
end
subgraph CKAN Integration
setschema[Set Schema from Resource]
endckan[End CKAN work]
setschema --> endckan
end
deploydags --> nhsdag
deploydags --> othersite
endckan --> nhs
subgraph NHS
nhsdag --> nhs
end
classDef done fill:#21bf73,stroke:#333,stroke-width:1px;
classDef nearlydone fill:lightgreen,stroke:#333,stroke-width:1px;
classDef inprogress fill:orange,stroke:#333,stroke-width:1px;
classDef next fill:lightblue,stroke:#333,stroke-width:1px;
class multinodedag done;
class versioning nearlydone;
class setschema,errors,deploydags,nhsdag,deploygcc inprogress;
This the uber-epic for the complete evolution of CKAN DataStore load to AirCan.
Acceptance
Tasks
npm test. Right now they are pointing to a temporary CKAN instance (not to DX) and they run the entire flow. No CI for this on github atmrun_idwhich you can pass in to the DAG and which it uses in logging etc when running it so we can reliably track logs etc. Also move airflow status info into logs (so we don't depend on AirFlow API).aircan_status(run_id)function that can be turned into an API in CKAN (or elsewhere)Plan of work (from 4 nov)
FUTURE after this
Detailed