This repository was archived by the owner on Nov 1, 2024. It is now read-only.
feat: add inference and evaluation script with dataset transformations#733
Open
mattmazzola wants to merge 12 commits into
Open
feat: add inference and evaluation script with dataset transformations#733mattmazzola wants to merge 12 commits into
mattmazzola wants to merge 12 commits into
Conversation
tupini07
reviewed
Jun 9, 2023
Comment on lines
+44
to
+45
| tokenizer_vocab_file_path="/mnt/input_data_dir/pretrained_models/OPT/dependencies/gpt2-vocab.json", | ||
| tokenizer_merges_file_path="/mnt/input_data_dir/pretrained_models/OPT/dependencies/gpt2-merges.txt", |
There was a problem hiding this comment.
If Metaseq has a standardized path for the vocab and merges files then we'll need to replace them here :) If not we might need to remove the default value.
tupini07
reviewed
Jun 9, 2023
tupini07
reviewed
Jun 9, 2023
tupini07
reviewed
Jun 9, 2023
mattmazzola
commented
Jun 12, 2023
Comment on lines
+45
to
+54
| RUN pip install \ | ||
| aim==3.16.2 \ | ||
| py-rouge==1.1 \ | ||
| rouge_score==0.1.2 \ | ||
| parlai==1.7.1 \ | ||
| evaluate==0.4.0 | ||
|
|
||
| ENV NLTK_DATA="/usr/share/nltk_data" | ||
| RUN python -c "import nltk; nltk.download('punkt', download_dir='${NLTK_DATA}')" | ||
|
|
Author
There was a problem hiding this comment.
This likely isn't the correct place to make this change.
It is only snippet from our whole Dockerfile which adds the evaluation libraries
| from metaseq.data.datasets.types import CommonDatasetConfiguration, DatasetConfiguration, DatasetConfigurationTeacherGenerated, DatasetModelConfig, DatasetModelHooks, DatasetTeacherGeneratedDataHooks, IdentityDict | ||
|
|
||
| # Visual diagram of where hooks/functions are called during inference or data generation | ||
| # https://excalidraw.com/#json=zoAk_TdynBHQnP9vZufGm,ekcVg_HqiF79cAp58_HKRQ |
Author
There was a problem hiding this comment.
This visualization may be important for understanding
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Issue
Solutions
Add script for model inference and evaluation
Add mappings between dataset and pipeline configuration of eval libraries, metrics, and transformation functions
(4) something something->4Added necessary evaluation libraries and re-implemented some metrics
Add PromptGeneratror to create few-shot prompts based on configuration using Jinja templates
This PR is quite large so it may be hard to make sense of.
Originally was only going to be inference.py and few other modifications, but then I kept brining in missing dependencies to avoid gaps and it grew a lot 🤔
Testing
Did not test 😔
Related to: #726
Much of this work was done by @sahajgg, @tupini07, and @anselmwang 🙏