<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
  <channel>
    <title>Sevity Blog</title>
    <link>https://sevity.tistory.com/</link>
    <description>sevity@ymail.com
github: https://github.com/sevity
youtube: https://www.youtube.com/user/linowmik</description>
    <language>ko</language>
    <pubDate>Wed, 15 Apr 2026 00:09:53 +0900</pubDate>
    <generator>TISTORY</generator>
    <ttl>100</ttl>
    <managingEditor>sevity</managingEditor>
    <image>
      <title>Sevity Blog</title>
      <url>https://tistory1.daumcdn.net/tistory/2835757/attach/0a44fcf8f43849958f46dd61e32551d5</url>
      <link>https://sevity.tistory.com</link>
    </image>
    <item>
      <title>snowflake</title>
      <link>https://sevity.tistory.com/308</link>
      <description>&lt;p data-ke-size=&quot;size16&quot;&gt;글로벌하게 유니키한 키를 만들어야 할때, 보통은 uuid를 떠올리지만 snowflake방식을 선택하면 아래와 같은 장점이 있다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;table style=&quot;border-collapse: collapse; width: 100%;&quot; border=&quot;1&quot; data-end=&quot;508&quot; data-start=&quot;45&quot; data-ke-align=&quot;alignLeft&quot;&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;b&gt; 비교 항목 &lt;/b&gt;&lt;/td&gt;
&lt;td&gt;UUIDv4&lt;/td&gt;
&lt;td&gt;Snowflake&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-end=&quot;136&quot; data-start=&quot;87&quot;&gt;
&lt;td data-col-size=&quot;sm&quot; data-end=&quot;103&quot; data-start=&quot;87&quot;&gt;&lt;b&gt;크기(Storage)&lt;/b&gt;&lt;/td&gt;
&lt;td data-end=&quot;120&quot; data-start=&quot;103&quot; data-col-size=&quot;sm&quot;&gt;128 bit (16 바이트)&lt;/td&gt;
&lt;td data-end=&quot;136&quot; data-start=&quot;120&quot; data-col-size=&quot;sm&quot;&gt;64 bit (8 바이트)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-end=&quot;197&quot; data-start=&quot;137&quot;&gt;
&lt;td data-col-size=&quot;sm&quot; data-end=&quot;149&quot; data-start=&quot;137&quot;&gt;&lt;b&gt;인덱스 단편화&lt;/b&gt;&lt;/td&gt;
&lt;td data-end=&quot;176&quot; data-start=&quot;149&quot; data-col-size=&quot;sm&quot;&gt;완전 랜덤 삽입 &amp;rarr; B-tree 인덱스 단편화&amp;uarr;&lt;/td&gt;
&lt;td data-end=&quot;197&quot; data-start=&quot;176&quot; data-col-size=&quot;sm&quot;&gt;시간순으로 순차적 증가 &amp;rarr; 단편화&amp;darr;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-end=&quot;254&quot; data-start=&quot;198&quot;&gt;
&lt;td data-col-size=&quot;sm&quot; data-end=&quot;221&quot; data-start=&quot;198&quot;&gt;&lt;b&gt;순차성(Time-sortable)&lt;/b&gt;&lt;/td&gt;
&lt;td data-end=&quot;225&quot; data-start=&quot;221&quot; data-col-size=&quot;sm&quot;&gt;불가능&lt;/td&gt;
&lt;td data-end=&quot;254&quot; data-start=&quot;225&quot; data-col-size=&quot;sm&quot;&gt;상위 비트에 밀리초 타임스탬프 내장 &amp;rarr; 정렬 가능&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-end=&quot;321&quot; data-start=&quot;255&quot;&gt;
&lt;td data-col-size=&quot;sm&quot; data-end=&quot;265&quot; data-start=&quot;255&quot;&gt;&lt;b&gt;생성 비용&lt;/b&gt;&lt;/td&gt;
&lt;td data-end=&quot;297&quot; data-start=&quot;265&quot; data-col-size=&quot;sm&quot;&gt;랜덤 엔트로피 생성 &amp;rarr; CPU &amp;amp; 메모리 부하 다소 높음&lt;/td&gt;
&lt;td data-end=&quot;321&quot; data-start=&quot;297&quot; data-col-size=&quot;sm&quot;&gt;단순 비트 연산 + 시퀀스 &amp;rarr; 매우 경량&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-end=&quot;368&quot; data-start=&quot;322&quot;&gt;
&lt;td data-col-size=&quot;sm&quot; data-end=&quot;335&quot; data-start=&quot;322&quot;&gt;&lt;b&gt;메타데이터 내장&lt;/b&gt;&lt;/td&gt;
&lt;td data-end=&quot;338&quot; data-start=&quot;335&quot; data-col-size=&quot;sm&quot;&gt;없음&lt;/td&gt;
&lt;td data-end=&quot;368&quot; data-start=&quot;338&quot; data-col-size=&quot;sm&quot;&gt;타임스탬프&amp;middot;데이터센터ID&amp;middot;노드ID&amp;middot;시퀀스 정보 포함&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-end=&quot;440&quot; data-start=&quot;369&quot;&gt;
&lt;td data-col-size=&quot;sm&quot; data-end=&quot;379&quot; data-start=&quot;369&quot;&gt;&lt;b&gt;충돌 위험&lt;/b&gt;&lt;/td&gt;
&lt;td data-end=&quot;402&quot; data-start=&quot;379&quot; data-col-size=&quot;sm&quot;&gt;충돌 확률 극히 낮으나 완전히 배제 불가&lt;/td&gt;
&lt;td data-end=&quot;440&quot; data-start=&quot;402&quot; data-col-size=&quot;sm&quot;&gt;노드ID+시퀀스 조합으로 충돌 사실상 0 (전제: 노드ID 관리)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-end=&quot;508&quot; data-start=&quot;441&quot;&gt;
&lt;td data-col-size=&quot;sm&quot; data-end=&quot;456&quot; data-start=&quot;441&quot;&gt;&lt;b&gt;디버깅&amp;middot;추적 편의성&lt;/b&gt;&lt;/td&gt;
&lt;td data-end=&quot;479&quot; data-start=&quot;456&quot; data-col-size=&quot;sm&quot;&gt;ID만으로 &amp;ldquo;언제 생성됐는지&amp;rdquo; 파악 불가&lt;/td&gt;
&lt;td data-end=&quot;508&quot; data-start=&quot;479&quot; data-col-size=&quot;sm&quot;&gt;ID만으로 생성 시각&amp;middot;생성 주체(노드) 추적 가능&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h3 data-end=&quot;1304&quot; data-start=&quot;1270&quot; data-ke-size=&quot;size23&quot;&gt;언제 UUID를 쓰고, 언제 Snowflake를 쓰나?&lt;/h3&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;1522&quot; data-start=&quot;1306&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;1409&quot; data-start=&quot;1306&quot;&gt;&lt;b&gt;UUIDv4&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;1409&quot; data-start=&quot;1323&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;1379&quot; data-start=&quot;1323&quot;&gt;단순 분산 식별자(ID)만 필요하고, 시간순 정렬&amp;middot;인덱스 단편화 이슈가 크지 않은 애플리케이션&lt;/li&gt;
&lt;li data-end=&quot;1409&quot; data-start=&quot;1382&quot;&gt;예: 각종 리소스(사용자&amp;middot;세션&amp;middot;토큰) 식별 등&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li data-end=&quot;1522&quot; data-start=&quot;1411&quot;&gt;&lt;b&gt;Snowflake&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;1522&quot; data-start=&quot;1431&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;1487&quot; data-start=&quot;1431&quot;&gt;고TPS 분산 시스템에서 &lt;b&gt;시간순 처리&lt;/b&gt;&amp;middot;&lt;b&gt;인덱스 효율&lt;/b&gt;&amp;middot;&lt;b&gt;멱등 ID&lt;/b&gt;가 중요할 때&lt;/li&gt;
&lt;li data-end=&quot;1522&quot; data-start=&quot;1490&quot;&gt;예: 이벤트 로깅, 메시징 시스템, 주문&amp;middot;트랜잭션 ID&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;</description>
      <category>System Architect</category>
      <author>sevity</author>
      <guid isPermaLink="true">https://sevity.tistory.com/308</guid>
      <comments>https://sevity.tistory.com/308#entry308comment</comments>
      <pubDate>Mon, 23 Jun 2025 20:09:53 +0900</pubDate>
    </item>
    <item>
      <title>OIDC(Open ID Connect)</title>
      <link>https://sevity.tistory.com/306</link>
      <description>&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;a href=&quot;https://sevity.tistory.com/287&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;OAuth2.0&lt;/a&gt;위에서 추가로 설계된 스펙으로 기존 엑세스토큰, 리프레시토큰과 더불어 ID Token을 발행하여 &amp;ldquo;인증(Authentication)&amp;rdquo; 기능을 추가한 &lt;b&gt;표준 프로토콜&lt;/b&gt;입니다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;179&quot; data-start=&quot;131&quot;&gt;OAuth 2.0이 &amp;ldquo;리소스 접근 권한(Authorization)&amp;rdquo;을 다룬다면,&lt;/li&gt;
&lt;li data-end=&quot;223&quot; data-start=&quot;182&quot;&gt;OIDC는 &amp;ldquo;사용자 인증(Authentication)&amp;rdquo;을 다룹니다.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;필수 클레임(iss, sub, aud, exp 등) + 요청한 소수의 추가 클레임(예: email, name)&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;ID Token에 담기는 필드 예시&lt;/p&gt;
&lt;pre id=&quot;code_1750178701978&quot; class=&quot;bash&quot; data-ke-language=&quot;bash&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;{
  &quot;iss&quot;: &quot;https://auth.gangnamunni.com/&quot;,     // 토큰 발급자(Identity Provider)  
  &quot;sub&quot;: &quot;248289761001&quot;,                       // 사용자 고유 식별자  
  &quot;aud&quot;: &quot;points-service-prod&quot;,                // 토큰 대상(클라이언트 ID)  
  &quot;exp&quot;: 1718865600,                           // 토큰 만료 시각 (Unix timestamp)  
  &quot;iat&quot;: 1718862000,                           // 토큰 발급 시각  
  &quot;auth_time&quot;: 1718861990,                     // 사용자가 실제 인증한 시각  
  &quot;nonce&quot;: &quot;n-0S6_WzA2Mj&quot;,                      // 리플레이 공격 방지용 값  
  &quot;email&quot;: &quot;janedoe@example.com&quot;,              // 요청한 추가 클레임: 이메일  
  &quot;email_verified&quot;: true,                      // 이메일 검증 여부  
  &quot;name&quot;: &quot;Jane Doe&quot;,                          // 요청한 추가 클레임: 이름  
  &quot;preferred_username&quot;: &quot;j.doe&quot;                // 요청한 추가 클레임: 사용자명
}&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;하지만 아래 이유로..&lt;/p&gt;
&lt;blockquote data-ke-style=&quot;style2&quot;&gt;정적 정보: 발급 시점의 정보만 담겨, 사용자가 프로필을 변경해도 토큰에는 반영되지 않음 &lt;br /&gt;사이즈 한계: 너무 많은 필드를 넣으면 JWT가 커져 네트워크/성능 부담 &amp;uarr; &lt;br /&gt;보안 고려: 민감 정보(주소, 휴대폰 등)를 토큰에 영구 저장하면 위험&lt;/blockquote&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;UserInfo Endpoint 추가&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;자세한건 다시 api찔러서 확인&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-end=&quot;2115&quot; data-start=&quot;2099&quot; data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;서비스 특성&lt;/b&gt;에 따라&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;2187&quot; data-start=&quot;2119&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;2130&quot; data-start=&quot;2119&quot;&gt;ID 토큰만,&lt;/li&gt;
&lt;li data-end=&quot;2148&quot; data-start=&quot;2134&quot;&gt;UserInfo만,&lt;/li&gt;
&lt;li data-end=&quot;2187&quot; data-start=&quot;2152&quot;&gt;혹은 둘 다 쓰는 패턴을 &lt;b&gt;자유롭게 선택&lt;/b&gt;하시면 됩니다.&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 data-end=&quot;559&quot; data-start=&quot;525&quot; data-ke-size=&quot;size26&quot;&gt;UserInfo Endpoint를 왜 추가했나?&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;884&quot; data-start=&quot;560&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;610&quot; data-start=&quot;560&quot;&gt;&lt;b&gt;목적&lt;/b&gt;: &amp;ldquo;로그인한 사용자에 대한 &lt;b&gt;동적&amp;middot;확장 프로필&lt;/b&gt;&amp;rdquo;을 안전하게 조회&amp;rdquo;&lt;/li&gt;
&lt;li data-end=&quot;726&quot; data-start=&quot;611&quot;&gt;&lt;b&gt;동작 방식&lt;/b&gt;:
&lt;ol style=&quot;list-style-type: decimal;&quot; data-end=&quot;726&quot; data-start=&quot;628&quot; data-ke-list-type=&quot;decimal&quot;&gt;
&lt;li data-end=&quot;674&quot; data-start=&quot;628&quot;&gt;클라이언트가 access_token을 이용해 /userinfo 호출&lt;/li&gt;
&lt;li data-end=&quot;726&quot; data-start=&quot;677&quot;&gt;서버가 현재 스코프(scope)&amp;middot;권한에 맞는 &lt;b&gt;최신 프로필&lt;/b&gt;(JSON) 반환&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;li data-end=&quot;884&quot; data-start=&quot;727&quot;&gt;&lt;b&gt;장점&lt;/b&gt;:
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;884&quot; data-start=&quot;741&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;774&quot; data-start=&quot;741&quot;&gt;&lt;b&gt;최신성&lt;/b&gt;: 호출 시점의 유저 정보를 그대로 반영&lt;/li&gt;
&lt;li data-end=&quot;838&quot; data-start=&quot;777&quot;&gt;&lt;b&gt;선택적 노출&lt;/b&gt;: scope(email profile phone 등)에 따라 필요한 정보만 제공&lt;/li&gt;
&lt;li data-end=&quot;884&quot; data-start=&quot;841&quot;&gt;&lt;b&gt;토큰 경량화&lt;/b&gt;: 민감&amp;middot;대용량 데이터는 토큰이 아니라 API로 분리&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;/userinfo라는 엔드포인트 이름 자체도 &lt;b&gt;OIDC Core 1.0&lt;/b&gt; 사양(섹션 5.3)에 명시된 &lt;b&gt;표준 필드&lt;/b&gt;(userinfo_endpoint)입니다. 클라이언트는 .well-known/openid-configuration에서 이 URL을 동적으로 조회(discovery)하도록 되어 있고, 실제 경로는 /userinfo가 아니어도 메타데이터에 정의된 대로 호출하면 됩니다. 즉, /userinfo는 OIDC 스펙에 포함된 공식 기능이며, &amp;ldquo;스펙 확장&amp;rdquo;이 아니라 인증과 프로필 조회를 분리하기 위해 설계된 표준입니다&lt;/p&gt;</description>
      <category>System Architect</category>
      <author>sevity</author>
      <guid isPermaLink="true">https://sevity.tistory.com/306</guid>
      <comments>https://sevity.tistory.com/306#entry306comment</comments>
      <pubDate>Wed, 18 Jun 2025 01:50:19 +0900</pubDate>
    </item>
    <item>
      <title>outbox 패턴</title>
      <link>https://sevity.tistory.com/305</link>
      <description>&lt;p data-end=&quot;228&quot; data-start=&quot;125&quot; data-ke-size=&quot;size16&quot;&gt;&amp;ldquo;아웃박스 패턴은 분산 트랜잭션을 쓰지 않고도 데이터베이스 업데이트와 외부 시스템(Kafka나 REST API 등) 호출을 &lt;b&gt;사실상 하나의 트랜잭션&lt;/b&gt;처럼 보이게 하는 기법입니다.&lt;/p&gt;
&lt;ol style=&quot;list-style-type: decimal;&quot; data-end=&quot;503&quot; data-start=&quot;231&quot; data-ke-list-type=&quot;decimal&quot;&gt;
&lt;li data-end=&quot;306&quot; data-start=&quot;231&quot;&gt;비즈니스 로직 DB 쓰기와 동일한 DB 트랜잭션에서 &amp;lsquo;Outbox&amp;rsquo; 테이블에 이벤트 메시지를 &lt;b&gt;append&lt;/b&gt;(INSERT)&lt;/li&gt;
&lt;li data-end=&quot;357&quot; data-start=&quot;309&quot;&gt;트랜잭션 커밋 시점에 비즈니스 데이터와 Outbox INSERT가 함께 커밋됨&lt;/li&gt;
&lt;li data-end=&quot;431&quot; data-start=&quot;360&quot;&gt;별도 프로세스(Outbox Poller)가 이 테이블을 폴링해서 메시지를 읽고 Kafka 프로듀서나 외부 API를 호출&lt;/li&gt;
&lt;li data-end=&quot;503&quot; data-start=&quot;434&quot;&gt;호출 성공 시 Outbox 레코드의 status=PROCESSED(또는 sent_at) 같은 플래그를 업데이트&amp;rdquo;&lt;/li&gt;
&lt;/ol&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;ldquo;Outbox 패턴으로 비즈니스 데이터와 이벤트 기록을 &lt;b&gt;하나의 DB 트랜잭션&lt;/b&gt;에 묶어 최소-Once 전송을 확보합니다.&lt;br /&gt;이후 Outbox Poller가 Kafka에 이벤트를 보낼 때는 &lt;b&gt;Idempotent Producer&lt;/b&gt;(enable.idempotence=true)와 &lt;b&gt;Kafka Transaction API&lt;/b&gt;를 활용해, &amp;lsquo;메시지 전송 + 소비 오프셋 커밋&amp;rsquo;을 하나의 카프카 트랜잭션으로 처리합니다.&lt;br /&gt;마지막으로, 소비 측에서 claimId를 키로 사용하는 idempotent 처리 로직을 적용하면, 네트워크 오류나 재시도 상황에서도 &lt;b&gt;정확히 한 번만&lt;/b&gt; downstream에 반영되는 구조를 완성할 수 있습니다.&amp;rdquo;&lt;/p&gt;</description>
      <category>System Architect</category>
      <author>sevity</author>
      <guid isPermaLink="true">https://sevity.tistory.com/305</guid>
      <comments>https://sevity.tistory.com/305#entry305comment</comments>
      <pubDate>Mon, 16 Jun 2025 23:17:44 +0900</pubDate>
    </item>
    <item>
      <title>vector db</title>
      <link>https://sevity.tistory.com/304</link>
      <description>&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;div&gt;&lt;br /&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;table style=&quot;border-collapse: collapse; width: 100%; height: 247px;&quot; border=&quot;1&quot; data-is-only-node=&quot;&quot; data-is-last-node=&quot;&quot; data-end=&quot;1304&quot; data-start=&quot;76&quot; data-ke-align=&quot;alignLeft&quot;&gt;
&lt;tbody&gt;
&lt;tr style=&quot;height: 19px;&quot;&gt;
&lt;td style=&quot;height: 19px;&quot;&gt;특성&lt;/td&gt;
&lt;td style=&quot;height: 19px;&quot;&gt;pgvector&lt;/td&gt;
&lt;td style=&quot;height: 19px;&quot;&gt;FAISS&lt;/td&gt;
&lt;td style=&quot;height: 19px;&quot;&gt;Pinecone&lt;/td&gt;
&lt;/tr&gt;
&lt;tr style=&quot;height: 38px;&quot; data-end=&quot;581&quot; data-start=&quot;419&quot;&gt;
&lt;td style=&quot;height: 38px;&quot; data-col-size=&quot;sm&quot; data-end=&quot;438&quot; data-start=&quot;419&quot;&gt;설정 난이도&lt;/td&gt;
&lt;td style=&quot;height: 38px;&quot; data-end=&quot;496&quot; data-start=&quot;438&quot; data-col-size=&quot;md&quot;&gt;● 중간&lt;br /&gt;&amp;ndash; PostgreSQL 설치 및 CREATE EXTENSION vector 필요&lt;/td&gt;
&lt;td style=&quot;height: 38px;&quot; data-end=&quot;544&quot; data-start=&quot;496&quot; data-col-size=&quot;sm&quot;&gt;● 낮음&lt;br /&gt;&amp;ndash; pip install faiss-cpu 로 즉시 사용 가능&lt;/td&gt;
&lt;td style=&quot;height: 38px;&quot; data-end=&quot;581&quot; data-start=&quot;544&quot; data-col-size=&quot;sm&quot;&gt;● 낮음&lt;br /&gt;&amp;ndash; 라이브러리 설치 후 API 키 설정 필요&lt;/td&gt;
&lt;/tr&gt;
&lt;tr style=&quot;height: 38px;&quot; data-end=&quot;726&quot; data-start=&quot;582&quot;&gt;
&lt;td style=&quot;height: 38px;&quot; data-col-size=&quot;sm&quot; data-end=&quot;599&quot; data-start=&quot;582&quot;&gt;메타데이터 필터링&lt;/td&gt;
&lt;td style=&quot;height: 38px;&quot; data-end=&quot;650&quot; data-start=&quot;599&quot; data-col-size=&quot;md&quot;&gt;● 강력&lt;br /&gt;&amp;ndash; SQL WHERE&amp;middot;JOIN으로 사전 필터링 가능&lt;/td&gt;
&lt;td style=&quot;height: 38px;&quot; data-end=&quot;688&quot; data-start=&quot;650&quot; data-col-size=&quot;sm&quot;&gt;● 제한적&lt;br /&gt;&amp;ndash; 검색 후 애플리케이션 레벨 후처리 필요&lt;/td&gt;
&lt;td style=&quot;height: 38px;&quot; data-end=&quot;726&quot; data-start=&quot;688&quot; data-col-size=&quot;sm&quot;&gt;● 지원&lt;br /&gt;&amp;ndash; 쿼리 시 메타데이터 필터 인수로 사용 가능&lt;/td&gt;
&lt;/tr&gt;
&lt;tr style=&quot;height: 38px;&quot; data-end=&quot;872&quot; data-start=&quot;727&quot;&gt;
&lt;td style=&quot;height: 38px;&quot; data-col-size=&quot;sm&quot; data-end=&quot;749&quot; data-start=&quot;727&quot;&gt;비용&lt;/td&gt;
&lt;td style=&quot;height: 38px;&quot; data-end=&quot;795&quot; data-start=&quot;749&quot; data-col-size=&quot;md&quot;&gt;● 무료 오픈소스&lt;br /&gt;&amp;ndash; 인프라(서버) 비용 별도&lt;/td&gt;
&lt;td style=&quot;height: 38px;&quot; data-end=&quot;833&quot; data-start=&quot;795&quot; data-col-size=&quot;sm&quot;&gt;● 무료 오픈소스&lt;/td&gt;
&lt;td style=&quot;height: 38px;&quot; data-end=&quot;872&quot; data-start=&quot;833&quot; data-col-size=&quot;sm&quot;&gt;● 유료 매니지드 서비스&lt;br /&gt;&amp;ndash; 사용량 기반 과금&lt;/td&gt;
&lt;/tr&gt;
&lt;tr style=&quot;height: 38px;&quot; data-end=&quot;1014&quot; data-start=&quot;873&quot;&gt;
&lt;td style=&quot;height: 38px;&quot; data-col-size=&quot;sm&quot; data-end=&quot;891&quot; data-start=&quot;873&quot;&gt;네트워크 의존도&lt;/td&gt;
&lt;td style=&quot;height: 38px;&quot; data-end=&quot;939&quot; data-start=&quot;891&quot; data-col-size=&quot;md&quot;&gt;● 로컬/사내 네트워크&lt;br /&gt;&amp;ndash; DB 서버 필요(실습시로컬도 가능)&lt;/td&gt;
&lt;td style=&quot;height: 38px;&quot; data-end=&quot;976&quot; data-start=&quot;939&quot; data-col-size=&quot;sm&quot;&gt;● 오프라인 지원&lt;br /&gt;&amp;ndash; 완전 로컬 실행 가능&lt;/td&gt;
&lt;td style=&quot;height: 38px;&quot; data-end=&quot;1014&quot; data-start=&quot;976&quot; data-col-size=&quot;sm&quot;&gt;● 원격 호출 필수&lt;br /&gt;&amp;ndash; 네트워크 레이턴시 존재&lt;/td&gt;
&lt;/tr&gt;
&lt;tr style=&quot;height: 38px;&quot; data-end=&quot;1161&quot; data-start=&quot;1015&quot;&gt;
&lt;td style=&quot;height: 38px;&quot; data-col-size=&quot;sm&quot; data-end=&quot;1034&quot; data-start=&quot;1015&quot;&gt;운영&amp;middot;모니터링&lt;/td&gt;
&lt;td style=&quot;height: 38px;&quot; data-end=&quot;1088&quot; data-start=&quot;1034&quot; data-col-size=&quot;md&quot;&gt;● 기존 Postgres 툴&lt;br /&gt;(PgAdmin, Datadog 등) 활용&lt;/td&gt;
&lt;td style=&quot;height: 38px;&quot; data-end=&quot;1124&quot; data-start=&quot;1088&quot; data-col-size=&quot;sm&quot;&gt;● 별도 구축 필요&lt;br /&gt;&amp;ndash; 모니터링&amp;middot;백업 전략 직접 수립&lt;/td&gt;
&lt;td style=&quot;height: 38px;&quot; data-end=&quot;1161&quot; data-start=&quot;1124&quot; data-col-size=&quot;sm&quot;&gt;● 관리형 대시보드&amp;middot;모니터링 제공&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;저장공간&lt;/td&gt;
&lt;td&gt;● Postgres 테이블 내에 벡터∙메타데이터 저장&lt;br /&gt;&amp;ndash; DB 크기에 비례&lt;br /&gt;&amp;ndash; 백업/압축 툴 활용 가능&lt;/td&gt;
&lt;td&gt;● 기본은 메모리 인덱스&lt;br /&gt;&amp;ndash; save_local() 시 로컬 파일(index.faiss) 생성&lt;br /&gt;&amp;ndash; 디스크 사용량 &amp;asymp; 4바이트&amp;times;dim&amp;times;N + 인덱스 오버헤드&lt;br /&gt;&amp;ndash; PQ 등 압축 옵션 가능&lt;/td&gt;
&lt;td&gt;● 매니지드 스토리지&lt;br /&gt;&amp;ndash; 사용량 기반 과금에 스토리지 포함&lt;br /&gt;&amp;ndash; 자동 복제&amp;middot;압축 옵션 제공&lt;br /&gt;&amp;ndash; 백업&amp;middot;고가용성 내장&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;</description>
      <category>Programming/LLM RAG</category>
      <author>sevity</author>
      <guid isPermaLink="true">https://sevity.tistory.com/304</guid>
      <comments>https://sevity.tistory.com/304#entry304comment</comments>
      <pubDate>Sat, 14 Jun 2025 22:40:09 +0900</pubDate>
    </item>
    <item>
      <title>Pinecone</title>
      <link>https://sevity.tistory.com/303</link>
      <description>&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;순서&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;먼저 &lt;a href=&quot;https://app.pinecone.io/&quot; target=&quot;_blank&quot; rel=&quot;noopener&amp;nbsp;noreferrer&quot;&gt;https://app.pinecone.io/&lt;/a&gt; 방문해서 api-key를 생성한다.&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1079&quot; data-origin-height=&quot;522&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/elZ5IA/btsOALgaLwi/ZkdiszWi6is9K2eItQ7pj1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/elZ5IA/btsOALgaLwi/ZkdiszWi6is9K2eItQ7pj1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/elZ5IA/btsOALgaLwi/ZkdiszWi6is9K2eItQ7pj1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FelZ5IA%2FbtsOALgaLwi%2FZkdiszWi6is9K2eItQ7pj1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;1079&quot; height=&quot;522&quot; data-origin-width=&quot;1079&quot; data-origin-height=&quot;522&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;&lt;b&gt;vs pgvector&lt;/b&gt;&lt;/h3&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-start=&quot;2027&quot; data-end=&quot;2145&quot;&gt;&lt;b&gt;pgvector&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc; color: #333333; text-align: start;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-start=&quot;2046&quot; data-end=&quot;2089&quot;&gt;이미 PostgreSQL 기반 인프라가 있고, 자체 호스팅을 선호할 때&lt;/li&gt;
&lt;li data-start=&quot;2092&quot; data-end=&quot;2145&quot;&gt;SQL과 벡터를&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;한 곳에서&lt;/b&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;관리하며, 커스터마이징&amp;middot;확장성을 직접 책임지고 싶을 때&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li data-end=&quot;2026&quot; data-start=&quot;1926&quot;&gt;&lt;b&gt;Pinecone&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;2026&quot; data-start=&quot;1945&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;1981&quot; data-start=&quot;1945&quot;&gt;&amp;ldquo;운영 부담 없이&amp;rdquo; 곧바로 대규모 서비스 전환이 필요할 때&lt;/li&gt;
&lt;li data-end=&quot;2026&quot; data-start=&quot;1984&quot;&gt;메타데이터&amp;middot;하이브리드 검색&amp;middot;자동 스케일링 같은 고급 기능을 즉시 활용&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;실습&lt;/h3&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;pre id=&quot;code_1749980747782&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;#!/usr/bin/env python3
&quot;&quot;&quot;
demo_pinecone.py ─ Pinecone 4.x + OpenAI 임베딩 실습
&quot;&quot;&quot;

import os, sys, time, logging
from dotenv import load_dotenv
import openai
from pinecone import Pinecone, ServerlessSpec      # ⬅️ 새 방식

# ──────── 로그 설정 ────────
logging.basicConfig(
    level=logging.INFO,
    format=&quot;%(asctime)s [%(levelname)s] %(message)s&quot;,
    datefmt=&quot;%H:%M:%S&quot;,
)
log = logging.getLogger(__name__)

# ──────── 환경 변수 읽기 ────────
load_dotenv()
try:
    OPENAI_API_KEY  = os.environ[&quot;OPENAI_API_KEY&quot;]
    PINECONE_API_KEY = os.environ[&quot;PINECONE_API_KEY&quot;]
    PINECONE_ENV     = os.environ[&quot;PINECONE_ENV&quot;]    # ex) us-east-1-aws
except KeyError as e:
    log.error(&quot;환경변수 %s 가 없습니다 (.env 확인)&quot;, e.args[0]); sys.exit(1)

openai.api_key = OPENAI_API_KEY

# Pinecone 인스턴스 생성
pc = Pinecone(api_key=PINECONE_API_KEY)

# env 문자열을 region / cloud 로 분해 (us-east-1-aws &amp;rarr; us-east-1 + aws)
*region, cloud = PINECONE_ENV.split(&quot;-&quot;)
REGION = &quot;-&quot;.join(region)   # us-east-1
CLOUD  = cloud              # aws

INDEX  = &quot;demo-index&quot;
DIM    = 1536

# ──────── 인덱스 생성 / 재사용 ────────
def wait_ready(name):
    while True:
        state = pc.describe_index(name).status.state
        if state == &quot;Ready&quot;:
            return
        log.info(&quot;   ↳ index status = %s &amp;hellip; 대기 중&quot;, state)
        time.sleep(2)

if INDEX not in pc.list_indexes().names():
    log.info(&quot;새 인덱스 생성: %s&quot;, INDEX)
    pc.create_index(
        name=INDEX,
        dimension=DIM,
        metric=&quot;cosine&quot;,
        spec=ServerlessSpec(cloud=CLOUD, region=REGION),
    )
    wait_ready(INDEX)
else:
    log.info(&quot;기존 인덱스 재사용: %s&quot;, INDEX)

index = pc.Index(INDEX)

# ──────── 데이터 업서트 ────────
DOCS = [
    &quot;쿠팡은 한국 최대의 전자상거래 기업이다.&quot;,
    &quot;파인콘은 벡터 데이터베이스 서비스다.&quot;,
    &quot;오픈AI는 GPT-4o 모델을 발표했다.&quot;,
    &quot;서울의 여름은 덥고 습하다.&quot;,
    &quot;벡터 검색은 의미 기반 유사도를 계산한다.&quot;,
]

def embed(texts):
    resp = openai.embeddings.create(
        model=&quot;text-embedding-3-small&quot;,
        input=texts,
    )
    return [d.embedding for d in resp.data]

log.info(&quot;문서 %d개 임베딩 &amp;rarr; Pinecone 업서트&quot;, len(DOCS))
vecs = embed(DOCS)
index.upsert(
    vectors=[(f&quot;id-{i}&quot;, v, {&quot;text&quot;: DOCS[i]}) for i, v in enumerate(vecs)]
)

# ──────── 질의 ────────
QUESTION = &quot;유사도 검색을 위한 데이터베이스&quot;
log.info(&quot;쿼리: &amp;ldquo;%s&amp;rdquo;&quot;, QUESTION)
q_vec = embed([QUESTION])[0]

res = index.query(vector=q_vec, top_k=3, include_metadata=True)
log.info(&quot;결과 (Top-3):&quot;)
for rnk, m in enumerate(res.matches, 1):
    log.info(&quot; %d. %s (score=%.4f)&quot;, rnk, m.metadata[&quot;text&quot;], m.score)

log.info(&quot;완료 ✅&quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;</description>
      <category>Programming/LLM RAG</category>
      <author>sevity</author>
      <guid isPermaLink="true">https://sevity.tistory.com/303</guid>
      <comments>https://sevity.tistory.com/303#entry303comment</comments>
      <pubDate>Fri, 13 Jun 2025 21:57:36 +0900</pubDate>
    </item>
    <item>
      <title>LangChain/LangGraph</title>
      <link>https://sevity.tistory.com/302</link>
      <description>&lt;h3 data-ke-size=&quot;size23&quot;&gt;LangChain 개요&lt;/h3&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;LangChain은 단순한 &amp;ldquo;추론(inference) 엔진&amp;rdquo;을 넘어 LLM 애플리케이션을 짜는 데 필요한 거의 모든 구성 요소를 모아놓은 프레임워크입니다.&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;865&quot; data-start=&quot;810&quot;&gt;&lt;b&gt;추론 파이프라인&lt;/b&gt;을 단계별로 나눠(입력&amp;rarr;검색&amp;rarr;결합&amp;rarr;생성&amp;rarr;후처리) 관리하기 쉽게 해 줍니다.&lt;/li&gt;
&lt;li data-end=&quot;955&quot; data-start=&quot;866&quot;&gt;벡터 DB 통합도 &lt;b&gt;플러그인처럼 끼워 쓰는 수준&lt;/b&gt;으로 제공해, FAISS&amp;middot;Pinecone&amp;middot;Weaviate&amp;middot;pgvector 등과 바로 연결할 수 있습니다.&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 data-end=&quot;1485&quot; data-start=&quot;1458&quot; data-ke-size=&quot;size26&quot;&gt;사용자 규모 (GitHub Stars 기준)&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;1766&quot; data-start=&quot;1487&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;1558&quot; data-start=&quot;1487&quot;&gt;&lt;b&gt;LangChain&lt;/b&gt;: 109,000+ stars &lt;span data-state=&quot;closed&quot;&gt;&lt;span&gt;&lt;a href=&quot;https://github.com/langchain-ai/langchain&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;github.com&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/a&gt;&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;li data-end=&quot;1624&quot; data-start=&quot;1559&quot;&gt;&lt;b&gt;vLLM&lt;/b&gt;: 49,500+ stars &lt;span data-state=&quot;closed&quot;&gt;&lt;span&gt;&lt;a href=&quot;https://github.com/vllm-project/vllm&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;github.com&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/a&gt;&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;li data-end=&quot;1692&quot; data-start=&quot;1625&quot;&gt;&lt;b&gt;SGLang&lt;/b&gt;: 15,100+ stars &lt;span data-state=&quot;closed&quot;&gt;&lt;span&gt;&lt;a href=&quot;https://github.com/sgl-project/sglang&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;github.com&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/a&gt;&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;li data-end=&quot;1766&quot; data-start=&quot;1693&quot;&gt;&lt;b&gt;TensorRT-LLM&lt;/b&gt;: 10,700+ stars &lt;span data-state=&quot;closed&quot;&gt;&lt;span&gt;&lt;a href=&quot;https://github.com/NVIDIA/TensorRT-LLM&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;github.com&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/a&gt;&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;vLLM과의 차이점&lt;/h3&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;vLLM이 &amp;ldquo;고성능 텍스트 생성&amp;rdquo;에 집중한 반면, LangChain은 그 &lt;b&gt;앞단&amp;middot;옆단&lt;/b&gt;을 모두 지원해 줍니다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;벡터 DB(RAG)&amp;middot;도구 호출&amp;middot;에이전트 지원&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;vLLM은 &quot;성성만&quot; 잘함&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;langChain은 오히려 자체 추론엔진 없음(백엔드에 vLLM&amp;middot;OpenAI&amp;middot;LLama&amp;middot;TensorRT 등 연결)&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;Transformer + FastAPI와의 차이점&lt;/h3&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;1907&quot; data-start=&quot;1762&quot;&gt;&lt;b&gt;Transformer+FastAPI&lt;/b&gt;:
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;1907&quot; data-start=&quot;1793&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;1853&quot; data-start=&quot;1793&quot;&gt;모델 불러오고, POST /generate 엔드포인트 만들어서, 분절된 코드를 직접 연결해야 함.&lt;/li&gt;
&lt;li data-end=&quot;1907&quot; data-start=&quot;1856&quot;&gt;검색(RAG)&amp;middot;메모리&amp;middot;툴 호출 같은 부가 기능은 모두 &amp;ldquo;맨땅에 헤딩&amp;rdquo;으로 처음부터 구현.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li data-end=&quot;2034&quot; data-start=&quot;1909&quot;&gt;&lt;b&gt;LangChain&lt;/b&gt;:
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;2034&quot; data-start=&quot;1930&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;2004&quot; data-start=&quot;1930&quot;&gt;PromptTemplate, Retriever, Chain, Agent 클래스로 &amp;ldquo;블록 쌓듯&amp;rdquo; 조합만 하면 끝.&lt;/li&gt;
&lt;li data-end=&quot;2034&quot; data-start=&quot;2007&quot;&gt;코드량 절감과 유지보수성이 월등히 높아집니다.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;주요장점&lt;/h3&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;2123&quot; data-start=&quot;2066&quot;&gt;&lt;b&gt;Prompt 템플릿 관리&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;2123&quot; data-start=&quot;2092&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;2123&quot; data-start=&quot;2092&quot;&gt;변수 바인딩, 다국어, 조건부 로직 처리까지 내장&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li data-end=&quot;2217&quot; data-start=&quot;2124&quot;&gt;&lt;b&gt;체인(Chain) 단위 워크플로우&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;2217&quot; data-start=&quot;2155&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;2217&quot; data-start=&quot;2155&quot;&gt;검색(Retrieval) &amp;rarr; 요약(Summarization) &amp;rarr; 생성(Generation) 과정을 모듈화&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li data-end=&quot;2317&quot; data-start=&quot;2218&quot;&gt;&lt;b&gt;벡터 DB 통합&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;2317&quot; data-start=&quot;2239&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;2285&quot; data-start=&quot;2239&quot;&gt;FAISS&amp;middot;Pinecone&amp;middot;Weaviate&amp;middot;pgvector 등과 커넥터 제공&lt;/li&gt;
&lt;li data-end=&quot;2317&quot; data-start=&quot;2289&quot;&gt;유사도 검색 결과를 자동으로 프롬프트에 결합&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li data-end=&quot;2427&quot; data-start=&quot;2318&quot;&gt;&lt;b&gt;에이전트(Agent) 프레임워크&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;2427&quot; data-start=&quot;2348&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;2392&quot; data-start=&quot;2348&quot;&gt;외부 API 호출, 계산 툴, 웹 스크래핑까지 &amp;ldquo;도구(tool)&amp;rdquo;로 래핑&lt;/li&gt;
&lt;li data-end=&quot;2427&quot; data-start=&quot;2396&quot;&gt;모델이 상황에 따라 적절한 도구를 호출하도록 제어&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li data-end=&quot;2488&quot; data-start=&quot;2428&quot;&gt;&lt;b&gt;메모리&amp;middot;대화 관리&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;2488&quot; data-start=&quot;2450&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;2488&quot; data-start=&quot;2450&quot;&gt;대화형 챗봇에 필요한 세션 관리, 요약, 장기 메모리 등 지원&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li data-end=&quot;2546&quot; data-start=&quot;2489&quot;&gt;&lt;b&gt;로깅&amp;middot;디버깅&amp;middot;모니터링&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;2546&quot; data-start=&quot;2513&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;2546&quot; data-start=&quot;2513&quot;&gt;실행 트레이스, 토큰 사용량, 체인 단계별 출력 로그&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li data-end=&quot;2604&quot; data-start=&quot;2547&quot;&gt;&lt;b&gt;플러그인 에코시스템&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;2604&quot; data-start=&quot;2570&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;2604&quot; data-start=&quot;2570&quot;&gt;커스텀 컴포넌트, UI 통합, 클라우드 배포 옵션 등 풍부&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;LangChain이 없으면 불편한점&lt;/h3&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;2872&quot; data-start=&quot;2638&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;2685&quot; data-start=&quot;2638&quot;&gt;&lt;b&gt;반복 코드&lt;/b&gt;: prompt 작성&amp;middot;변경할 때마다 엔드투엔드 코드 전부 수정&lt;/li&gt;
&lt;li data-end=&quot;2730&quot; data-start=&quot;2686&quot;&gt;&lt;b&gt;유사도 검색&lt;/b&gt;: 벡터 DB 연결&amp;middot;검색&amp;rarr;프롬프트 결합 로직 직접 구현&lt;/li&gt;
&lt;li data-end=&quot;2788&quot; data-start=&quot;2731&quot;&gt;&lt;b&gt;도구 호출 관리&lt;/b&gt;: OpenAI 함수 호출, 외부 API 체계적인 에러 핸들링 모두 수작업&lt;/li&gt;
&lt;li data-end=&quot;2830&quot; data-start=&quot;2789&quot;&gt;&lt;b&gt;메모리 관리&lt;/b&gt;: 대화 컨텍스트 쌓기&amp;middot;요약&amp;middot;회수 로직 직접 구현&lt;/li&gt;
&lt;li data-end=&quot;2872&quot; data-start=&quot;2831&quot;&gt;&lt;b&gt;디버깅 어려움&lt;/b&gt;: 어느 단계에서 뭘 잘못했는지 추적하기 어려움&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-end=&quot;2944&quot; data-start=&quot;2874&quot; data-ke-size=&quot;size16&quot;&gt;&amp;rarr; 결과적으로 &lt;b&gt;개발 속도 저하&lt;/b&gt;, &lt;b&gt;유지보수 부담 증가&lt;/b&gt;, &lt;b&gt;기능 확장 난이도 상승&lt;/b&gt;이라는 비용을 치러야 합니다.&lt;/p&gt;
&lt;p data-end=&quot;2944&quot; data-start=&quot;2874&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h3 data-end=&quot;2944&quot; data-start=&quot;2874&quot; data-ke-size=&quot;size23&quot;&gt;LangGraph&lt;/h3&gt;
&lt;p data-end=&quot;2944&quot; data-start=&quot;2874&quot; data-ke-size=&quot;size16&quot;&gt;langChain이 일종의 직선적인 체인구조라면 LangGraph는 Airflow처럼 Dag으로 분기/관리해줄 수 있게 해줌&lt;/p&gt;
&lt;p data-end=&quot;2944&quot; data-start=&quot;2874&quot; data-ke-size=&quot;size16&quot;&gt;langChain회사에서 2023년말 2024년초에 발표&lt;/p&gt;</description>
      <category>Programming/LLM RAG</category>
      <author>sevity</author>
      <guid isPermaLink="true">https://sevity.tistory.com/302</guid>
      <comments>https://sevity.tistory.com/302#entry302comment</comments>
      <pubDate>Fri, 13 Jun 2025 21:51:34 +0900</pubDate>
    </item>
    <item>
      <title>TensorRT-LLM</title>
      <link>https://sevity.tistory.com/301</link>
      <description>&lt;p data-ke-size=&quot;size16&quot;&gt;특징&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;TensorRT-LLM은 NVIDIA가 공식으로 오픈소스화한 프로젝트입니다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;오직 NVIDIA GPU + TensorRT 런타임 환경에서만 동작합니다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;장점&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;1628&quot; data-start=&quot;1566&quot;&gt;&lt;b&gt;최고 수준의 추론 성능&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;1628&quot; data-start=&quot;1591&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;1628&quot; data-start=&quot;1591&quot;&gt;낮은 지연(latency)&amp;middot;높은 처리량(throughput)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li data-end=&quot;1687&quot; data-start=&quot;1629&quot;&gt;&lt;b&gt;퀀타이제이션&amp;middot;메모리 최적화&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;1687&quot; data-start=&quot;1656&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;1687&quot; data-start=&quot;1656&quot;&gt;FP16, INT8 변환으로 VRAM 사용량 절감&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li data-end=&quot;1742&quot; data-start=&quot;1688&quot;&gt;&lt;b&gt;상용 GPU 활용 극대화&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;1742&quot; data-start=&quot;1714&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;1742&quot; data-start=&quot;1714&quot;&gt;NVIDIA 드라이버&amp;middot;TensorRT 직접 제어&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;단점&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;231&quot; data-start=&quot;158&quot;&gt;vLLM은 pip install vllm만으로 시작할 수 있고, GPU가 없어도 CPU 모드로 바로 돌려볼 수 있습니다.&lt;/li&gt;
&lt;li data-end=&quot;305&quot; data-start=&quot;234&quot;&gt;TensorRT-LLM은 CUDA, TensorRT SDK 버전 호환성, 빌드&amp;middot;변환 스크립트 디버깅 등을 거쳐야 합니다.
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;305&quot; data-start=&quot;234&quot;&gt;모델변환도 한번 해줘야함&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li data-end=&quot;412&quot; data-start=&quot;324&quot;&gt;vLLM은 동적 배칭(dynamic batching), 토크나이저 변경, 파이프라인 훅(hook) 삽입 등 개발 중에 코드 수정만으로 바로 반영됩니다.&lt;/li&gt;
&lt;li data-end=&quot;495&quot; data-start=&quot;415&quot;&gt;TensorRT-LLM은 변환된 엔진(.plan)이 고정 그래프(graph) 형태라, 모델 구조나 토크나이저를 바꾸면 다시 변환해야 합니다.&lt;/li&gt;
&lt;li data-end=&quot;589&quot; data-start=&quot;518&quot;&gt;&lt;b&gt;TensorRT-LLM 전용&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;589&quot; data-start=&quot;544&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;589&quot; data-start=&quot;544&quot;&gt;오직 NVIDIA GPU + TensorRT 런타임 환경에서만 동작합니다.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li data-end=&quot;730&quot; data-start=&quot;590&quot;&gt;&lt;b&gt;vLLM은 범용&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;730&quot; data-start=&quot;608&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;666&quot; data-start=&quot;608&quot;&gt;NVIDIA GPU는 물론, AMD GPU (ROCm), 심지어 CPU만 있어도 구동 가능합니다.&lt;/li&gt;
&lt;li data-end=&quot;730&quot; data-start=&quot;669&quot;&gt;다양한 환경에서 &amp;ldquo;동작 여부 확인&amp;rdquo; &amp;rarr; &amp;ldquo;간단 성능 테스트&amp;rdquo; &amp;rarr; &amp;ldquo;프로덕션 전환&amp;rdquo; 워크플로우를 지원합니다.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;따라서&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;1454&quot; data-start=&quot;1426&quot;&gt;&lt;b&gt;vLLM&lt;/b&gt;으로 빠르게 기능 개발하고,&lt;/li&gt;
&lt;li data-end=&quot;1501&quot; data-start=&quot;1457&quot;&gt;안정화되면 &lt;b&gt;TensorRT-LLM&lt;/b&gt;으로 전환해 최종 성능을 극대화&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;실습해본 경험&lt;/h3&gt;
&lt;h4 data-ke-size=&quot;size20&quot;&gt;TensorRT-LLM설치&lt;/h4&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;설치하는것 자체가 c++ compiler, NVCC, MPI등 여러 의존성설치 문제가 있었고 쉽지 않았다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;설치후에는 라마7b모델 받아서 모델변환하고 엔진빌드하고 서버띄운다음에 테스트하면 됐다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;모델변환은 아마 허깅페이스의 저장포맷(체크포인트)을 TensorRT용으로 바꾸는거고&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;엔진빌드는 커널단위로 컴파일하는걸 포함해서 빌드하는걸 말하는듯하다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h4 data-ke-size=&quot;size20&quot;&gt;라마7b모델받기&lt;/h4&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;이거 받을때도 허깅페이스에서 권한요청해서 메일로 승인받고서야 git clone으로 받을수 있었다. 승인은 1시간내로 금방 되긴함&lt;/p&gt;
&lt;pre id=&quot;code_1749828454855&quot; class=&quot;bash&quot; data-ke-language=&quot;bash&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;mkdir -p ~/models &amp;amp;&amp;amp; cd ~/models
git lfs install
git clone https://huggingface.co/meta-llama/Llama-2-7b-hf&lt;/code&gt;&lt;/pre&gt;
&lt;h4 data-ke-size=&quot;size20&quot;&gt;모델변환&lt;/h4&gt;
&lt;pre id=&quot;code_1749828408742&quot; class=&quot;bash&quot; data-ke-language=&quot;bash&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;cd ~/workspace/TensorRT-LLM/examples/models/core/llama

python3 convert_checkpoint.py \
  --model_dir ~/models/Llama-2-7b-hf \
  --output_dir ~/workspace/TensorRT-LLM/trt_llama2_7b_ckpt_int8 \
  --dtype float16 \
  --use_weight_only \
  --weight_only_precision int8 \
  --per_channel \
  --calib_dataset wikitext \
  --calib_size 100&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;내 그래픽카드가 8GB짜리라서 int8로 추가로 줄이는 작업이 들어갔다.&lt;/p&gt;
&lt;h4 data-ke-size=&quot;size20&quot;&gt;엔진빌드&lt;/h4&gt;
&lt;pre id=&quot;code_1749828360358&quot; class=&quot;bash&quot; data-ke-language=&quot;bash&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;trtllm-build \
  --checkpoint_dir trt_llama2_7b_ckpt_int8 \
  --gemm_plugin auto \
  --output_dir trt_llama2_7b_engine_int8_small \
  --max_seq_len 2048 \
  --max_batch_size 1 \
  --paged_state enable&lt;/code&gt;&lt;/pre&gt;
&lt;h4 data-ke-size=&quot;size20&quot;&gt;추론서버 띄우기&lt;/h4&gt;
&lt;pre id=&quot;code_1749828306056&quot; class=&quot;bash&quot; data-ke-language=&quot;bash&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;trtllm-serve serve trt_llama2_7b_engine_int8_small \
  --tokenizer ~/models/Llama-2-7b-hf \
  --host 0.0.0.0 \
  --port 8002 \
  --max_batch_size 1 \
  --max_seq_len 1024&lt;/code&gt;&lt;/pre&gt;
&lt;h4 data-ke-size=&quot;size20&quot;&gt;테스트&lt;/h4&gt;
&lt;pre id=&quot;code_1749828279258&quot; class=&quot;bash&quot; data-ke-language=&quot;bash&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;curl http://localhost:8002/v1/completions \
  -H &quot;Content-Type: application/json&quot; \
  -d '{
    &quot;model&quot;: &quot;trt_llama2_7b_engine_int8_small&quot;,
    &quot;prompt&quot;: &quot;안녕, 너 이름이 뭐야?&quot;,
    &quot;max_tokens&quot;: 128
  }'&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;</description>
      <category>Programming/LLM RAG</category>
      <author>sevity</author>
      <guid isPermaLink="true">https://sevity.tistory.com/301</guid>
      <comments>https://sevity.tistory.com/301#entry301comment</comments>
      <pubDate>Fri, 13 Jun 2025 21:46:21 +0900</pubDate>
    </item>
    <item>
      <title>SGLang</title>
      <link>https://sevity.tistory.com/299</link>
      <description>&lt;h3 data-ke-size=&quot;size23&quot;&gt;개요&lt;/h3&gt;
&lt;div style=&quot;background-color: #ffffff; color: #000000; text-align: start;&quot;&gt;sglang은 왜써요?&lt;/div&gt;
&lt;div style=&quot;background-color: #ffffff; color: #000000; text-align: start;&quot;&gt;- vLLM은 기본적으로 local Hugging face모델을 위한 인퍼런스 엔진&lt;/div&gt;
&lt;div style=&quot;background-color: #ffffff; color: #000000; text-align: start;&quot;&gt;- SGLang은 --providoer openai같은 옵션 한줄로 바로 OpenAI API엔드포인트를 띄울 수 있음&lt;/div&gt;
&lt;div style=&quot;background-color: #ffffff; color: #000000; text-align: start;&quot;&gt;-- 로컬모델/OpenAI등을 동시에 서빙하거나 조건부로 라우팅하는것도 CLI/설정만으로 처리가능&lt;/div&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;실습&lt;/h3&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;도커로 추론엔진 실행&lt;/p&gt;
&lt;pre id=&quot;code_1749799362762&quot; class=&quot;bash&quot; data-ke-language=&quot;bash&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;docker run --gpus all \
  -p 8001:8001 \
  -v ~/.cache/huggingface:/root/.cache/huggingface \
  --ipc=host \
  lmsysorg/sglang:latest \
  python3 -m sglang.launch_server \
    --model-path gpt2 \
    --host 0.0.0.0 \
    --port 8001 \
    --device cuda&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;동작테스트&lt;/p&gt;
&lt;pre id=&quot;code_1749799393191&quot; class=&quot;bash&quot; data-ke-language=&quot;bash&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;curl -X POST http://localhost:8001/generate \
  -H &quot;Content-Type: application/json&quot; \
  -d '{&quot;prompt&quot;:&quot;테스트 중입니다:&quot;,&quot;max_new_tokens&quot;:20}'&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;결과&lt;/p&gt;
&lt;pre id=&quot;code_1749799429253&quot; class=&quot;bash&quot; data-ke-language=&quot;bash&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;sevity@DESKTOP-7500F:~$ curl -X POST http://localhost:8001/generate \
  -H &quot;Content-Type: application/json&quot; \
  -d '{&quot;text&quot;:&quot;테스트 중입니다:&quot;,&quot;sampling_params&quot;:{&quot;max_new_tokens&quot;:20}}'
  
{&quot;text&quot;:&quot;더 고 중입니다. �&quot;,&quot;meta_info&quot;:{&quot;id&quot;:&quot;e01db0f5a42f48d99e02e9d2bcf29216&quot;,&quot;finish_reason&quot;:{&quot;type&quot;:&quot;length&quot;,&quot;length&quot;:20},&quot;prompt_tokens&quot;:20,&quot;completion_tokens&quot;:20,&quot;cached_tokens&quot;:0,&quot;e2e_latency&quot;:0.3301219940185547}}
sevity@DESKTOP-7500F:~$&lt;/code&gt;&lt;/pre&gt;</description>
      <category>Programming/LLM RAG</category>
      <author>sevity</author>
      <guid isPermaLink="true">https://sevity.tistory.com/299</guid>
      <comments>https://sevity.tistory.com/299#entry299comment</comments>
      <pubDate>Fri, 13 Jun 2025 16:23:54 +0900</pubDate>
    </item>
    <item>
      <title>트랜스포머</title>
      <link>https://sevity.tistory.com/298</link>
      <description>&lt;p data-ke-size=&quot;size16&quot;&gt;트랜스포머 레이어 =&lt;/p&gt;
&lt;ol style=&quot;list-style-type: decimal;&quot; data-end=&quot;610&quot; data-start=&quot;477&quot; data-ke-list-type=&quot;decimal&quot;&gt;
&lt;li data-end=&quot;511&quot; data-start=&quot;477&quot;&gt;Multi-Head Attention (특수 레이어)&lt;/li&gt;
&lt;li data-end=&quot;540&quot; data-start=&quot;514&quot;&gt;Residual + Layer Norm&lt;/li&gt;
&lt;li data-end=&quot;581&quot; data-start=&quot;543&quot;&gt;Feed-Forward (일반 Dense Layer)&lt;/li&gt;
&lt;li data-end=&quot;610&quot; data-start=&quot;584&quot;&gt;Residual + Layer Norm&lt;/li&gt;
&lt;/ol&gt;</description>
      <category>AI, ML/ML</category>
      <author>sevity</author>
      <guid isPermaLink="true">https://sevity.tistory.com/298</guid>
      <comments>https://sevity.tistory.com/298#entry298comment</comments>
      <pubDate>Thu, 12 Jun 2025 22:52:43 +0900</pubDate>
    </item>
    <item>
      <title>vLLM</title>
      <link>https://sevity.tistory.com/297</link>
      <description>&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;서론&lt;/h3&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;요청량이 많지 않고, 단순히 &amp;ldquo;모델 하나 띄워서 한두 개의 요청만 처리&amp;rdquo;한다면 transformers + FastAPI(또는 Flask) 정도면 충분&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;여기서 &amp;ldquo;모델 띄워주는 역할&amp;rdquo;은 Hugging Face의 transformers 라이브러리가 담당&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;예시&lt;/p&gt;
&lt;pre id=&quot;code_1749734056275&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;# main.py
from fastapi import FastAPI
from transformers import pipeline

app = FastAPI()

# 1) transformers 파이프라인으로 모델 로드
generator = pipeline(
    &quot;text-generation&quot;,
    model=&quot;gpt2&quot;,            # 원하는 모델 지정
    device=0                  # GPU가 있다면 0, CPU만 있으면 -1
)

@app.post(&quot;/generate&quot;)
async def generate(payload: dict):
    prompt = payload[&quot;prompt&quot;]
    # 2) transformers로 텍스트 생성
    outputs = generator(prompt, max_new_tokens=50)
    return {&quot;choices&quot;: outputs}&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;또는 OpenAI&amp;nbsp;등&amp;nbsp;외부&amp;nbsp;API&amp;nbsp;사용&amp;nbsp;시:&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;자체 GPU 없이 &amp;ldquo;OpenAI API&amp;rdquo;나 &amp;ldquo;Anthropic API&amp;rdquo;처럼 원격으로 모델을 호출한다면 vLLM은 필요 없다.&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h3 data-ke-size=&quot;size23&quot;&gt;vLLM&lt;/h3&gt;
&lt;div&gt;
&lt;div&gt;
&lt;div&gt;
&lt;div&gt;
&lt;div&gt;
&lt;div data-message-model-slug=&quot;o4-mini-high&quot; data-message-id=&quot;f03ef4b1-ea74-4fd4-9385-ccd3daee0e0b&quot; data-message-author-role=&quot;assistant&quot;&gt;
&lt;div&gt;
&lt;div&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-end=&quot;498&quot; data-start=&quot;19&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-end=&quot;498&quot; data-start=&quot;298&quot;&gt;&lt;b&gt;vLLM이 빛을 발하는 순간&lt;/b&gt;:
&lt;ol style=&quot;list-style-type: decimal;&quot; data-end=&quot;498&quot; data-start=&quot;325&quot; data-ke-list-type=&quot;decimal&quot;&gt;
&lt;li data-end=&quot;350&quot; data-start=&quot;325&quot;&gt;&lt;b&gt;자체 GPU&lt;/b&gt;를 보유하고 있고,&lt;/li&gt;
&lt;li data-end=&quot;383&quot; data-start=&quot;353&quot;&gt;&lt;b&gt;높은 동시 처리량&lt;/b&gt;(수십&amp;middot;수백 TPS)과&lt;/li&gt;
&lt;li data-end=&quot;416&quot; data-start=&quot;386&quot;&gt;&lt;b&gt;낮은 지연(Latency)&lt;/b&gt;이 요구될 때&lt;/li&gt;
&lt;li data-end=&quot;498&quot; data-start=&quot;419&quot;&gt;&lt;b&gt;동적 배칭&lt;/b&gt;, &lt;b&gt;요청 스케줄링&lt;/b&gt;, &lt;b&gt;스트리밍&lt;/b&gt; 같은 최적화 로직을&lt;br /&gt;직접 구현하지 않고 바로 활용하고 싶을 때&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-is-only-node=&quot;&quot; data-is-last-node=&quot;&quot; data-end=&quot;554&quot; data-start=&quot;500&quot; data-ke-size=&quot;size16&quot;&gt;이런 조건이 모두 맞아떨어질 때 vLLM이 &amp;ldquo;생산(Production)급&amp;rdquo;으로 가치를 발휘합니다.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div&gt;
&lt;div&gt;&amp;nbsp;&lt;/div&gt;
&lt;/div&gt;
&lt;div&gt;&amp;nbsp;&lt;/div&gt;
&lt;/div&gt;
&lt;div&gt;&amp;nbsp;&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;</description>
      <category>Programming/LLM RAG</category>
      <author>sevity</author>
      <guid isPermaLink="true">https://sevity.tistory.com/297</guid>
      <comments>https://sevity.tistory.com/297#entry297comment</comments>
      <pubDate>Thu, 12 Jun 2025 22:15:45 +0900</pubDate>
    </item>
  </channel>
</rss>