Cm3 integration by urielsinger · Pull Request #727 · facebookresearch/metaseq

urielsinger · 2023-06-05T21:41:49Z

No description provided.

- make cm3 compliant with main branch

- revert inference mode

ArmenAg · 2023-06-06T06:51:25Z

This has some specific logic to CM3leon project (i.e., img token conversions), are we sure we want to land this in main? Do we want to possible pick a branch and merge everything into there and periodically update to main?

ArmenAg · 2023-06-06T07:02:17Z

-                process_group=distributed_utils.get_data_parallel_group(),
-            )
+            model = task.build_model(cfg.model)
+            if not isinstance(model, FullyShardedDataParallel):


Just to confirm, this is for loading up consolidated model for training?

Yes.
I added support to change the MP size during job lunch, and for that I need to wrap it in FullyShardedDataParallel inside the build_model.
As I don't want to double wrap it, I needed to add this if..

ArmenAg · 2023-06-06T07:02:42Z

@@ -4,6 +4,7 @@
 # LICENSE file in the root directory of this source tree.


These are changes to the cm3 objectives that i landed in scaling_racm3 correct?

Yes, exactly.

ArmenAg · 2023-06-06T07:04:13Z

Code LGTM. @suchenzang to give guidance on whether or not to land here.

lilisierrayu · 2023-06-07T16:24:37Z

    def _create_cm3_special_tokens(self):
        self.cm3_sentinel_end = "<eoss>"
+        self.cm3_break = "<racm3:break>"
+        self.dictionary.add_symbol(self.cm3_break)


It's all looking great.
We want to make a change here to recycle unused embedding index of cm3_break and sentinel for the next version.
Should i just add a commit on top of this PR? Or do I file a different PR?

lilisierrayu · 2023-06-08T01:55:41Z

@@ -200,24 +209,31 @@ def get_document_boundaries(self, item: torch.Tensor):
            boundaries = boundaries + [item.size(0)]


is get_document_boundaries() robust to the case that there is no break tokens

urielsinger added 8 commits May 15, 2023 15:56

- support seq_len > 2048 (4096 and 8192)

dc55c3f

- make cm3 compliant with main branch

old/new tokens conversion

1a63cb5

fsdp double wrap disable

6fba607

- local symlink

3c40198

- revert inference mode

- local symlink

d464d96

- revert inference mode

Merge remote-tracking branch 'origin/main'

f37dcc2

pr fix

56f8b54

revert "force_distributed=True"

c5a58b6

urielsinger requested review from ArmenAg, Xirider, adampolyak, andrewPoulton, bashnick, davides, igormolybogFB, klshuster, lilisierrayu, moyapchen, ngoyal2707, punitkoura, suchenzang, tangbinh and zycalice as code owners June 5, 2023 21:41

facebook-github-bot added the cla signed label Jun 5, 2023

ArmenAg reviewed Jun 6, 2023

View reviewed changes

fixed

f0b5275

lilisierrayu reviewed Jun 7, 2023

View reviewed changes

lilisierrayu reviewed Jun 8, 2023

View reviewed changes

urielsinger and others added 16 commits June 11, 2023 09:32

Merge remote-tracking branch 'origin/main'

d54eca8

improve free port finding for single node dist init

d30921d

Merge remote-tracking branch 'origin/cm3_seq_len'

865c4b3

- pytorch FSDP support

75b74e9

fix bug

61f8792

fix bug

4a677a4

back to fairscale

af57884

back to fairscale

59556ee

fix delete_old_checkpoint_files

82d4c77

stop training when loss_scale reached minimum

a72de97

stop training when loss_scale reached minimum

00df75a

add validate_on_first_step support

bc68a84

fix for single files

d5f50e8

add no_c10d support

a42d648

Merge remote-tracking branch 'origin/cm3_seq_len' into cm3_seq_len

2c484fa

criterion fsdp

7e7b5e3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cm3 integration#727

Cm3 integration#727
urielsinger wants to merge 25 commits into
mainfrom
cm3_seq_len

urielsinger commented Jun 5, 2023

Uh oh!

ArmenAg commented Jun 6, 2023

Uh oh!

ArmenAg Jun 6, 2023

Uh oh!

urielsinger Jun 6, 2023

Uh oh!

ArmenAg Jun 6, 2023

Uh oh!

urielsinger Jun 6, 2023

Uh oh!

ArmenAg commented Jun 6, 2023

Uh oh!

lilisierrayu Jun 7, 2023

Uh oh!

lilisierrayu Jun 8, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

		@@ -4,6 +4,7 @@
		# LICENSE file in the root directory of this source tree.

		@@ -200,24 +209,31 @@ def get_document_boundaries(self, item: torch.Tensor):
		boundaries = boundaries + [item.size(0)]

Conversation

urielsinger commented Jun 5, 2023

Uh oh!

ArmenAg commented Jun 6, 2023

Uh oh!

ArmenAg Jun 6, 2023

Choose a reason for hiding this comment

Uh oh!

urielsinger Jun 6, 2023

Choose a reason for hiding this comment

Uh oh!

ArmenAg Jun 6, 2023

Choose a reason for hiding this comment

Uh oh!

urielsinger Jun 6, 2023

Choose a reason for hiding this comment

Uh oh!

ArmenAg commented Jun 6, 2023

Uh oh!

lilisierrayu Jun 7, 2023

Choose a reason for hiding this comment

Uh oh!

lilisierrayu Jun 8, 2023

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants