으죨

  • 홈
  • 태그
  • 방명록

ViT 2

[멀티모달 기초논문] VIT - An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

https://arxiv.org/abs/2010.11929 An Image is Worth 16x16 Words: Transformers for Image Recognition at ScaleWhile the Transformer architecture has become the de-facto standard for natural language processing tasks, its applications to computer vision remain limited. In vision, attention is either applied in conjunction with convolutional networks, or used to reparxiv.org 1. Abstract 기존 트랜스포머는 자연어..

멀티모달_프로젝트 2025.03.13

[딥러닝, 논문리뷰] CLIP -Learning Transferable Visual Models From Natural Language Supervision - 이론 1편

https://arxiv.org/abs/2103.00020 Learning Transferable Visual Models From Natural Language SupervisionState-of-the-art computer vision systems are trained to predict a fixed set of predetermined object categories. This restricted form of supervision limits their generality and usability since additional labeled data is needed to specify any other visual coarxiv.org CLIP Githubhttps://github.com/..

딥러닝, 논문 리뷰 2025.02.10
이전
1
다음
더보기
프로필사진

으죨

영업, 데이터분석, ML/DL 공부한 내용 정리하는 블로그

  • 분류 전체보기 (40)
    • 머신러닝 교과서_파이토치편 (3)
    • 딥러닝, 논문 리뷰 (33)
    • 프로젝트 (1)
    • 딥러닝 기초 수학 (1)
    • 멀티모달_프로젝트 (2)
    • CS기초 (0)

Tag

자연어처리, ViT, 멀티모달, encoder, llm, BCE, 손실함수, Deep Dive, deep learning, 퍼셉트론, DeepLearning, Attention, 최적화, transformer, 논문리뷰, Seq2Seq, nlp, 딥러닝, 임베딩, MSE,

최근글과 인기글

  • 최근글
  • 인기글

최근댓글

공지사항

페이스북 트위터 플러그인

  • Facebook
  • Twitter

Archives

Calendar

«   2026/03   »
일 월 화 수 목 금 토
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30 31

방문자수Total

  • Today :
  • Yesterday :

Copyright © AXZ Corp All rights reserved.

티스토리툴바