Repository
<aside>
|██████████|██████████|██████████|██████████|██████████|███░░░░░░░|░░░░░░░|░░░░░░░
</aside>
Paper
understanding research from code >>>>>>(mostly) understanding research from paper
<aside>
9/67: |█████████░|░░░░░░░░░░|░░░░░░░░░░|░░░░░░░░░░|░░░░░░░░░░|░░░░░░░░░░|░░░░░░░|
</aside>
Overview & Main techniques
Major Contributions
- Gram Anchoring (training phase): dense feature maps degrading during long training schedules
- post-hoc strategies
- fixing features performance gradual decrease in long training
(visualized in a patch similarity map)
- ViT variant 7B main model, axial RoPE
- lastly: high-re post-processing phase & distillation: single teacher multiple students procedure
Related work
previous SSL approaches for vision models:
- extracting supervisory signals from parts of an image & predicting other parts
- patch re-ordering
- inpainting
- re-colorization
- …
Dataset
Learning objective
Cool techniques
Notes:
- contrastive loss (e.g: siamese, infoNCE, …)
- implementation to-do’s
Translation differences
-
to go back to
-
project notes
-
files overview
!!!!!! sinkhorn in ibot patch loss all reduce on batch size with no guard