Knowledge Distillation! This is a ~15.8M parameter student model distilled from a CLIP-based teacher model (157M params), running entirely in your browser.
| Model | Parameters | Size | Encoder |
|---|---|---|---|
| Teacher (CLIP ViT-B/32) | 157.8M | ~600 MB | Vision Transformer |
| Student (This demo) | 15.8M | ~60 MB | ResNet-style CNN |
The student uses a ResNet-style CNN encoder with residual blocks, trained using:
Compare with the original CLIP-based model to see the quality difference.
Drag & drop an image here, or click to select
Or try a sample image: