Rafi Hasan
  • Blog
  • Reading
  • Projects
  • Research
  • About
    <- BACKAll tags
    Tag

    llm

    1 post

    April 2, 2026

    LLM Serving and the Bus That Never Stops

    In-flight batching is the trick that keeps LLM serving from wasting GPU seats.

    machine-learningllminference

    All tags

    architecture(1)async(1)await(1)backpropagation(1)concurrency(3)deep-learning(1)generics(1)golang(3)inference(1)leaks(2)llm(1)machine-learning(2)memory(2)neural-network(1)nodejs(2)optimization(3)rails(1)ruby(1)rust(3)word-embedding(1)workerpool(2)

    © 2026 Rafi Hasan.

    GitHub