This article introduces autonomous inferencing - a way to process large volumes of prompts or run compute-intensive agentic systems with performance that surpasses off-the-shelf inference solutions - take a look at the benchmarks!| outerbounds.com
I’ve been running a podcast for close to half a decade now, called The Work Item. Publishing new episodes generally takes a bit of time because of all the prep work that needs to happen beforehand. I now get to use AI to automate a pretty tedious part of the process.| den.dev
Benchmarking llama 3.1 8B Instruct with vLLM using BeFOri to benchmark time to first token (TTFT), inter-token latency, end to end latency, and throughput| blog.ori.co