A Ruby Backend for the Vercel AI SDK

If you have shipped an AI feature in the last two years, there is a good chance the chat box was a Vercel AI SDK hook. useChat, useCompletion, and useObject are the de-facto front end for AI apps: they handle the streaming, the optimistic message list, the tool-call rendering, the loading and error states — all the fiddly UI work you would otherwise rebuild for every project. Drop one in, point it at an endpoint, and you have a polished assistant interface in an afternoon.

The catch is the endpoint. Those hooks do not consume “some JSON.” They consume a very specific wire format — the Data Stream Protocol, also called the UI Message Stream Protocol — and they are strict about it. The protocol is deliberately language-agnostic; Vercel documents non-JavaScript backends (Python/FastAPI) speaking it. But the implementations that ship in the box are JavaScript and Python. If your backend is Rails, you were on your own: hand-roll the Server-Sent-Events framing and the exact part encoding by reading the TypeScript source, and keep it in sync as the protocol evolves.

So we built the missing piece: ai_stream, a pure-Ruby, zero-dependency encoder for that protocol. It lets a Rails or Rack backend stream text, reasoning, tool calls, sources, files, and custom data to a Vercel-AI-SDK front end with the exact frames it expects — so the polished useChat UI works in front of a Ruby app, unchanged.

What the protocol actually is

Underneath the marketing, the Data Stream Protocol is refreshingly simple, which is the whole reason a small gem can implement it faithfully. The response is a stream of Server-Sent Events. Each event is a single JSON object framed exactly like this:

data: {"type":"text-delta","id":"msg_1","delta":"Hi"}\n\n

The stream is terminated with a sentinel frame:

data: [DONE]\n\n

And the HTTP response must carry one header so the SDK treats the body as a UI message stream rather than something else:

x-vercel-ai-ui-message-stream: v1

Every meaningful event the front end can render is just a JSON object with a type and a handful of fields: start opens a message, text-start / text-delta / text-end stream a block of text token-by-token, tool calls have their own lifecycle of part types, and finish closes the message before [DONE] closes the connection. That is the entire job: emit the right JSON, frame it as SSE, send the header, terminate cleanly. The difficulty is not conceptual — it is that the SDK is unforgiving about field names (toolCallId, not tool_call_id; inputTextDelta, not delta) and about ordering, and getting those wrong produces a front end that silently renders nothing. The value of a library here is that someone read the TypeScript carefully once so you do not have to.

A Writer that does no IO

The lower of the gem's two layers is AiStream::Writer — the encoder. The design decision worth calling out is what it writes to: any object that responds to <<. A String, an IO, a Rack stream, an array buffer — the Writer does not care, and it performs no IO of its own beyond sink << frame.

That sounds like a small thing. It is the thing that makes the whole library trustworthy. Because the Writer is decoupled from any real connection, you can hand it a plain String and assert on the exact bytes it produces:

buf = +""
w   = AiStream::Writer.new(buf)
w.start
id = w.text_start
w.text_delta("Hello", id: id)
w.text_delta(" world", id: id)
w.text_end(id: id)
w.finish
w.done

buf
# => "data: {\"type\":\"start\",\"messageId\":\"...\"}\n\n" +
#    "data: {\"type\":\"text-start\",\"id\":\"...\"}\n\n" + ...

That is exactly how the gem tests itself — no mock server, no HTTP, no API key, just bytes in and bytes out. When a wire protocol is the entire contract, being able to assert on the wire directly is the difference between a library you can vouch for and one you hope works. The 31 tests in CI are all of this shape, and they run green across Ruby 3.0 through 3.4.

The full part vocabulary

A faithful implementation has to cover the whole protocol, not just the happy path of streaming text, because real assistants reason, call tools, cite sources, and fail. The Writer exposes a method for every part type:

Below all of that is the low-level emit(type:, ...), which writes any pre-shaped part hash after checking it carries a :type. That escape hatch matters for a protocol that is still evolving: if Vercel adds a part type after this release, you can emit it today without waiting for a gem update. And once you call done to write the [DONE] sentinel, the Writer flips closed — any further emit raises ClosedError rather than corrupting a terminated stream. Small guard, but it turns a subtle protocol violation into a loud, immediate exception.

A Stream that is lazy and Rack-shaped

The higher layer is AiStream::Stream, and it exists so the common case is one block. You hand it a block that receives a Writer; it gives you back a lazy, re-enumerable, Rack-compatible response body. The block does not run when you construct the Stream — it runs the first time something pulls bytes out of the body. That laziness is what makes the first token flush to the browser immediately instead of buffering the whole answer server-side. And the [DONE] terminator is appended for you automatically, so you cannot forget it.

body = AiStream::Stream.new do |w|
  w.start
  id = w.text_start
  w.text_delta("Hello", id: id)
  w.text_delta(" world", id: id)
  w.text_end(id: id)
  w.finish
end

body.to_s
# data: {"type":"start",...}
#
# data: {"type":"text-start",...}
#
# ... (deltas) ...
#
# data: {"type":"finish"}
#
# data: [DONE]

Because a Stream is a valid Rack body — it responds to #each and yields complete SSE frames — it drops straight into plain Rack:

run lambda { |env|
  body = AiStream::Stream.new do |w|
    w.start
    w.text("Hi from Rack")
    w.finish
  end
  headers = AiStream::HEADERS.merge("content-type" => "text/event-stream")
  [200, headers, body]
}

And in Rails it pairs with ActionController::Live to stream straight to the client as tokens arrive from whatever model you are using:

class ChatController < ApplicationController
  include ActionController::Live

  def create
    AiStream::HEADERS.each { |k, v| response.headers[k] = v }
    response.headers["Content-Type"] = "text/event-stream"

    AiStream::Stream.new do |w|
      w.start
      id = w.text_start
      # Pipe tokens from any source — here, ruby_llm:
      RubyLLM.chat.ask(params[:prompt]) do |chunk|
        w.text_delta(chunk.content, id: id)
      end
      w.text_end(id: id)
      w.finish
    end.each { |frame| response.stream.write(frame) }
  ensure
    response.stream.close
  end
end

The front end is whatever you already wrote — useChat({ api: "/chat" }), untouched. That is the point: the React side does not know or care that the bytes are now coming from Ruby.

Tool calls, the streaming way

Tool calls are where the protocol earns its complexity, and where a half-built encoder falls down. The lifecycle lets you stream a tool's arguments as the model produces them, run the tool, and stream the result — all bracketed into steps the front end can render as distinct phases:

AiStream::Stream.new do |w|
  w.start
  w.start_step
  w.tool_input_start(tool_call_id: "t1", tool_name: "get_weather")
  w.tool_input_delta(tool_call_id: "t1", delta: '{"city":')
  w.tool_input_delta(tool_call_id: "t1", delta: '"SF"}')
  w.tool_input_available(tool_call_id: "t1", tool_name: "get_weather",
                         input: { city: "SF" })
  w.tool_output_available(tool_call_id: "t1", output: { temp: 64 })
  w.finish_step
  w.start_step
  w.text("It's 64°F in San Francisco.")
  w.finish_step
  w.finish
end

When you already have the input in hand and just want to show the call and its result, the tool_call convenience collapses it to one line, generating and sharing the toolCallId across both parts:

w.tool_call(tool_name: "search", input: { q: "ruby" }, output: { hits: 3 })

It composes; it does not compete

One deliberate non-goal: ai_stream does not talk to any model. It is provider-agnostic by design, sitting strictly downstream of whatever produced the tokens. Pair it with ruby_llm, ruby-openai, a raw HTTP stream from some other provider, or even canned text in a test — ai_stream's only job is to encode those tokens into frames the Vercel front end understands. It composes with the existing Ruby AI stack instead of trying to replace any of it, and it stays out of the business of API keys, retries, and model selection entirely. That narrowness is a feature: a protocol encoder should encode the protocol and nothing else.

Where it stands

Being honest about maturity: ai_stream is brand-new, version 0.1.0. It is MIT-licensed, has zero runtime dependencies, and ships 31 passing tests run across Ruby 3.0–3.4 in CI. It does not yet have download counts or production deployments to point at — it is new code, offered because the gap it fills is real and nothing else filled it. What it does have is a faithful, fully-covered implementation of the protocol and tests that assert on the actual wire bytes, which is the part that determines whether a useChat front end lights up or sits there blank.

Install it

ai_stream is on RubyGems, MIT-licensed, zero-dependency, and works with any Rack or Rails app. Add it the usual way:

# in a Gemfile
bundle add ai_stream

# or standalone
gem install ai_stream

The source is on GitHub (MIT), and the README has the full set of examples — Rack, Rails with ActionController::Live, tool-call lifecycles, and the complete part table. If you are putting a Vercel AI SDK front end in front of a Ruby backend, this is the piece that was missing.