Databricks has a nice party trick: it can host an LLM endpoint that looks like an OpenAI chat-completions API.
Aye Chat has a different party trick: it can treat LLMs like a backend detail, as long as it gets a structured JSON response it can render and (optionally) apply to files.
This post is the handshake between the two.
Scope note: this is a deep dive of the current implementation of Aye Chat’s Databricks plugin (
DatabricksModelPlugin). It’s not a general “Databricks + LLMs” tutorial.
Where the integration lives
The Databricks integration is implemented as a model plugin:
- File:
plugins/databricks_model.py - Class:
DatabricksModelPlugin - Main hook:
on_command() - Command it intercepts:
local_model_invoke
In Aye Chat’s plugin architecture, model plugins can intercept local_model_invoke and return an LLM response object that the rest of the app can render and/or apply.
Activation: the plugin is silent unless you invite it
This plugin has strong “don’t bother me unless configured” energy.
It only activates when both env vars are present:
AYE_DBX_API_URLAYE_DBX_API_KEY
The check is centralized:
def _is_databricks_configured() -> bool:
return bool(os.environ.get("AYE_DBX_API_URL") and os.environ.get("AYE_DBX_API_KEY"))
Why this matters:
- If not configured,
on_command()returnsNoneand Aye Chat falls back to other model backends. - Users who don’t care about Databricks get true “zero config.”
- Users who do care can opt in with two env vars and zero ceremony.
Configuration and model selection
The plugin expects the Databricks endpoint to behave like an OpenAI-compatible chat completions API (or a compatible proxy).
Required
-
AYE_DBX_API_URL- Full URL to
POSTchat completion requests. - Example (illustrative):
https://<workspace-host>/serving-endpoints/<endpoint>/invocations
- Full URL to
-
AYE_DBX_API_KEY- Bearer token used for
Authorization.
- Bearer token used for
Optional
AYE_DBX_MODEL- Defaults to:
gpt-3.5-turbo
- Defaults to:
Yes, the default says gpt-3.5-turbo. No, this plugin isn’t emotionally attached to that string. It just needs a model field to put in the payload.
Lifecycle: new_chat vs local_model_invoke
The plugin handles two command names.
1) new_chat
If configured, new_chat resets conversation state by:
- deleting
.aye/chat_history.json(Databricks plugin history file) - resetting in-memory
self.chat_history
Translation: when you start a fresh session in Aye Chat, the Databricks plugin doesn’t cling to the past.
2) local_model_invoke
This is the main inference path:
- Load existing history from disk
- Build the message list (system + history + new user message)
- POST to the Databricks endpoint
- Extract JSON from the model output (even if it tries to write a novel first)
- Store lightweight history (no repeated file contents)
- Return a parsed LLM response (with optional token usage)
Prompt construction: one message for the model, another for history
Aye Chat can include repo context (files / RAG snippets) in the prompt. The Databricks plugin uses two representations of the same idea.
A) The full user message (sent to the API)
user_message = build_user_message(prompt, source_files)
This includes:
- the user’s prompt
- the full contents of
source_files(as gathered by Aye Chat)
That’s what the model needs to actually do the work.
B) The lightweight history message (saved to disk)
history_message = build_history_message(prompt, source_files)
This stores a compact representation (typically prompt + filenames), not full file contents.
Why: if you store full file contents in history on every request, .aye/chat_history.json turns into a data hoarder. Performance degrades, diffs get silly, and you start paying a storage bill for your own laziness.
This design is unglamorous, and therefore correct.
System prompt behavior
The plugin uses the shared SYSTEM_PROMPT by default (imported from aye.model.config).
But the invoker can override it per request:
effective_system_prompt = system_prompt if system_prompt else SYSTEM_PROMPT
Then the plugin builds OpenAI-style messages with the system prompt first:
messages = (
[{"role": "system", "content": effective_system_prompt}]
+ self.chat_history[conv_id]
+ [{"role": "user", "content": user_message}]
)
The request payload (OpenAI-style, by design)
Headers:
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {api_key}",
}
Payload:
payload = {
"model": model_name,
"messages": messages,
"temperature": 0.7,
"max_tokens": max_output_tokens,
}
Notable behavior:
- The plugin assumes an OpenAI-like schema:
model,messages,temperature,max_tokens. - Timeout is generous:
LLM_TIMEOUT = 600.0
Because sometimes the model needs a minute. And sometimes it needs a minute to think, plus another minute to dramatically clear its throat.
Response handling: extracting JSON from “helpful” output
Aye Chat expects the assistant to return a JSON object (usually with summary and optional updated_files).
In practice, models often return:
- a paragraph of explanation
- a code fence
- three different “final” answers
- then a JSON object
So the plugin uses _extract_json_object().
What _extract_json_object() does
It tries, in order:
json.loads(raw_response)directly- If that fails, it scans the text for balanced
{ ... }candidates while being string/escape-aware - It parses candidates and typically picks the last valid object
It’s not elegant. It’s resilient. (Those are often the same thing.)
A small caveat worth knowing
After extraction, the plugin does:
generated_text = json.dumps(generated_json)
If extraction fails, generated_json may be None, which turns into the literal JSON string:
null
Depending on how parse_llm_response() handles that, you may get confusing parse failures.
If you ever find yourself staring into the abyss wondering “why is the model output null?”, this is the first flashlight to grab.
Chat history: stored locally, lightweight on purpose
The plugin persists history in:
.aye/chat_history.json
It’s keyed by a conversation id derived from chat_id:
conv_id = get_conversation_id(chat_id)
On each successful request it appends:
- the user’s lightweight history message
- the assistant’s response as a JSON string (not raw prose)
self.chat_history[conv_id].append({"role": "user", "content": history_message})
self.chat_history[conv_id].append({"role": "assistant", "content": generated_text})
self._save_history()
This keeps future requests grounded without turning your history file into a landfill.
Parsing into Aye Chat’s internal response shape
Once the plugin has a JSON string in generated_text, it calls:
parsed_response = parse_llm_response(generated_text, self.debug)
parse_llm_response() converts the JSON into Aye Chat’s internal response schema.
Typically you’ll see fields like:
summaryupdated_files: [{ file_name, file_content }, ...]
Those updated_files are what Aye Chat can apply optimistically to disk (with automatic snapshots so you can restore instantly).
Token usage passthrough
If the Databricks endpoint includes an OpenAI-like usage block, the plugin passes it through:
usage = result.get("usage")
if usage:
parsed_response["token_usage"] = {
"prompt_tokens": usage.get("prompt_tokens", 0),
"completion_tokens": usage.get("completion_tokens", 0),
"total_tokens": usage.get("total_tokens", 0),
}
Why you care:
- debugging prompt growth (especially with repo context)
- monitoring costs (when applicable)
- comparing RAG/context strategies across runs
Error handling (aka “tell me what broke, not poetry about failure”)
The plugin distinguishes between:
HTTP status errors
It catches httpx.HTTPStatusError and builds messages like:
DBX API error: <status_code> - <detail>
It tries to parse a JSON error body and extract error.message when possible.
Generic exceptions
Anything else becomes:
Error calling Databricks API: <exception>
Verbose / debug output
verbose: prints status code and raw response text (for non-200)debug: prints internal message history and response blocks
This is especially useful when wiring new endpoints, where the biggest issues are usually:
- schema mismatch
- response shape differences
- “the model didn’t output JSON like you asked” (shocking)
Minimal setup example
export AYE_DBX_API_URL="https://.../invocations"
export AYE_DBX_API_KEY="dapi..."
export AYE_DBX_MODEL="your-model-name"
Then run Aye Chat normally. If configured, this plugin will intercept local_model_invoke.
Troubleshooting checklist
If nothing happens (or worse, something happens but it’s wrong):
-
Env vars
AYE_DBX_API_URLset?AYE_DBX_API_KEYset?
-
Endpoint compatibility
- Accepts
messageschat format? - Accepts
max_tokens?
- Accepts
-
Response shape
- Does
result["choices"][0]["message"]["content"]exist? - Does
contentcontain a JSON object Aye Chat can parse?
- Does
-
Model output discipline
- Extra prose is usually fine;
_extract_json_object()can recover. - If extraction becomes
null, your endpoint likely isn’t returning JSON-like content in the expected place.
- Extra prose is usually fine;
-
Turn on the lights
verbose ondebug on
Summary
The Databricks integration is a clean, opt-in model plugin that:
- activates only when configured via environment variables
- sends OpenAI-style chat completion payloads to your Databricks endpoint
- builds rich prompts with file context, but stores lightweight history
- extracts JSON robustly from messy model output
- returns a structured response Aye Chat can apply to files
- surfaces token usage when the endpoint provides it
If you’re extending or deploying this integration, the two most important things to validate are:
- endpoint schema compatibility (messages in, choices out)
- response format consistency (JSON object inside
choices[0].message.content)
Everything else is just plumbing. Occasionally wet plumbing, but still.
About Aye Chat
Aye Chat is an open-source, AI-powered terminal workspace that brings AI directly into command-line workflows. Edit files, run commands, and chat with your codebase without leaving the terminal - with an optimistic workflow backed by instant local snapshots.
Support Us
- Star our GitHub repository - it helps new users discover Aye Chat.
- Spread the word. Share Aye Chat with your team and friends who live in the terminal.