R&Dix 공식 블로그

R&Dix: R&D for Innovative eXploration

지식이 행동이 되고, 열정이 혁신이 되는 곳 – R&Dix

Deep Learning Accelerations 2

[arXiv’ 24] Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache 논문 리뷰

[arXiv’ 24] Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache 논문 리뷰Paper: Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache (Arxiv, 2024) 대규모 언어 모델(LLM) 추론 서비스에서는 요청마다 지원하는 최대 입력 문맥 길이가 크게 달라진다.예를 들어,OpenAI의 ChatGPT는 128K 토큰,구글 Gemini는 1000K 토큰,LongRoPE 연구는 2000K 토큰까지 지원한다.그러나 LLM은 AR(Autoregressive) 방식으로..

Deep Learning Accelerations 2025.04.11

[arXiv’ 24] Galaxy: A Resource-Efficient Collaborative Edge AI System for In-situ Transformer Inference 논문 리뷰

(1) Transformer 아키텍쳐에서 진행되는 연산 이해(2) Collective Communication 에 대한 이해원본 문서 링크: https://arxiv.org/abs/2405.17245 Galaxy: A Resource-Efficient Collaborative Edge AI System for In-situ Transformer InferenceTransformer-based models have unlocked a plethora of powerful intelligent applications at the edge, such as voice assistant in smart home. Traditional deployment approaches offload the inference ..

Deep Learning Accelerations 2025.03.29

R&Dix 공식 블로그

R&Dix (RNDix) 의 공식 테크 블로그 입니다. 공식 홈페이지는 http://rndix.co.kr/ 입니다.

딥러닝 가속화, 딥러닝, 로그구조병합트리, kv캐시, 키밸류, distattention, 시스템소프트웨어, approximate sorting, bi-level routing attention, infinite-llm, kv-store, lsm-tree, kv cache, 논문리뷰, Deep Learning, 논문 리뷰, biformer, arxiv, sparse attention, 딥러닝 #deep learning # 딥러닝 가속화 #모델 병렬화 #galaxy #논문 리뷰,

Today :
Yesterday :

일	월	화	수	목	금	토
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30

Deep Learning Accelerations 2

티스토리툴바