'ViT' 태그의 글 목록

ViT 2

[멀티모달 기초논문] VIT - An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

https://arxiv.org/abs/2010.11929 An Image is Worth 16x16 Words: Transformers for Image Recognition at ScaleWhile the Transformer architecture has become the de-facto standard for natural language processing tasks, its applications to computer vision remain limited. In vision, attention is either applied in conjunction with convolutional networks, or used to reparxiv.org 1. Abstract 기존 트랜스포머는 자연어..

멀티모달_프로젝트 2025.03.13

[딥러닝, 논문리뷰] CLIP -Learning Transferable Visual Models From Natural Language Supervision - 이론 1편

https://arxiv.org/abs/2103.00020 Learning Transferable Visual Models From Natural Language SupervisionState-of-the-art computer vision systems are trained to predict a fixed set of predetermined object categories. This restricted form of supervision limits their generality and usability since additional labeled data is needed to specify any other visual coarxiv.org CLIP Githubhttps://github.com/..

딥러닝, 논문 리뷰 2025.02.10

으죨

영업, 데이터분석, ML/DL 공부한 내용 정리하는 블로그

자연어처리, ViT, 멀티모달, encoder, llm, BCE, 손실함수, Deep Dive, deep learning, 퍼셉트론, DeepLearning, Attention, 최적화, transformer, 논문리뷰, Seq2Seq, nlp, 딥러닝, 임베딩, MSE,

Today :
Yesterday :

일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31

ViT 2

티스토리툴바