Repository
<aside>
|██████████|██████████|██░░░░░░░░|░░░░░░░░░░|░░░░░░░░░░|░░░░░░░░░░|░░░░░░░|░░░░░░░
</aside>
Paper
<aside>
9/67: |█████████░|░░░░░░░░░░|░░░░░░░░░░|░░░░░░░░░░|░░░░░░░░░░|░░░░░░░░░░|░░░░░░░|
</aside>
Overview & Main techniques
Major Contributions
- Gram Anchoring (training phase): dense feature maps degrading during long training schedules
- post-hoc strategies
- fixing features performance gradual decrease in long training
(visualized in a patch similarity map)
- ViT variant 7B main model, axial RoPE
- lastly: high-re post-processing phase & distillation: single teacher multiple students procedure
Related work
previous SSL approaches for vision models:
- extracting supervisory signals from parts of an image & predicting other parts
- patch re-ordering
- inpainting
- re-colorization
- …
Dataset
Learning objective
Cool techniques
Notes:
- contrastive loss (e.g: siamese, infoNCE, …)
- implementation to-do’s
Translation differences
- to go back to
- project notes