https://arxiv.org/abs/2103.00020 Learning Transferable Visual Models From Natural Language SupervisionState-of-the-art computer vision systems are trained to predict a fixed set of predetermined object categories. This restricted form of supervision limits their generality and usability since additional labeled data is needed to specify any other visual coarxiv.org CLIP Githubhttps://github.com/..