Auto Seed Vl2 [TESTED]
: (1) Performance on highly structured tasks (e.g., VQA with relational reasoning) drops by 6% compared to exemplar replay. (2) The generator’s meta-update requires 5% of training data as a validation set – not always available. (3) Seed interpretability: unlike real images, seeds are opaque vectors. 8. Conclusion We presented Auto-Seed VL2, a framework for autonomous seed generation in vision-language continual learning. By synthesizing compact, cross-modal aligned seeds conditioned on task gradients, Auto-Seed VL2 eliminates the need for storing real data while achieving superior performance over replay-based methods. Our results demonstrate that synthetic embedding replay is a viable and often superior alternative to exemplar storage. Future work includes extending to online (single-pass) continual learning and exploring seed decomposition for compositional tasks. Acknowledgments [Redacted for blind review] References [1] Radford, A., et al. (2021). Learning transferable visual models from natural language supervision. ICML.
| Configuration | Avg Acc | Drop | |----------------------------------------|---------|------| | Full Auto-Seed VL2 | 82.2 | — | | w/o consistency loss (( \mathcalL \textconsist )) | 75.4 | -6.8 | | w/o gradient-conditioned generation (random seeds) | 68.9 | -13.3 | | w/o meta-update of ( G \phi ) | 74.1 | -8.1 | | w/o seed pruning (full memory) | 82.0 | -0.2 (ns) | auto seed vl2
: Auto-Seed VL2 outperforms all baselines, including ER-VLM with 10× more memory, and beats generative replay by over 13 points on average. The BLEU-4 score on C→F is particularly striking, indicating that generated seeds capture caption semantics well. 6.2 Ablation Study Removing components from Auto-Seed VL2 on C→R: : (1) Performance on highly structured tasks (e
[5] Zhang, Y., et al. (2024). VLM-CL: A benchmark for continual learning in vision-language models. NeurIPS Datasets Track. Our results demonstrate that synthetic embedding replay is