llama.cpp b9109
spec : parallel drafting support (#22838) spec : refactor spec : drop support for incompatible vocabs spec : update common_speculative_init() cont : pass seq_id cont : dedup ctx_seq_rm_type server : sketch the ctx_dft decode loop server : draft prompt cache and checkpoints server : improve ctx names server, spec : transition to unified spec context cont : sync main and drft contexts cont : async drft eval when possible cont : handle non-ckpt models cont : pass correct n_past for drafting cont : process images throught the draft context spec : handle draft running out of context server : fix mtmd draft processing server : fix URL for draft model server : add comment server : clean-up + dry speculative-simple : update spec : fix n_past type server : fix slot ctx_drft ptr tools : update readme naming : improve consistency spec : refactor for multi-sequence speculative context cont : prepare params cont : prepare params spec : support parallel drafts server : support parallel drafting llama : reuse device buffers when possible server, spec : clean-up cont : clean-up cont : minor spec : reset `drafting` flag at the end spec : introduce `common_speculative_process()` spec : allow for…
- github.comllama.cpp b9109primary