Databricks Integration Deep Dive: Teaching Aye Chat to Talk to Your Endpoint

Databricks has a nice party trick: it can host an LLM endpoint that looks like an OpenAI chat-completions API.

Aye Chat has a different party trick: it can treat LLMs like a backend detail, as long as it gets a structured JSON response it can render and (optionally) apply to files.

This post is the handshake between the two.

Scope note: this is a deep dive of the current implementation of Aye Chat’s Databricks plugin (DatabricksModelPlugin). It’s not a general “Databricks + LLMs” tutorial.

Where the integration lives

The Databricks integration is implemented as a model plugin:

File: plugins/databricks_model.py
Class: DatabricksModelPlugin
Main hook: on_command()
Command it intercepts: local_model_invoke

In Aye Chat’s plugin architecture, model plugins can intercept local_model_invoke and return an LLM response object that the rest of the app can render and/or apply.

Activation: the plugin is silent unless you invite it

This plugin has strong “don’t bother me unless configured” energy.

It only activates when both env vars are present:

AYE_DBX_API_URL
AYE_DBX_API_KEY

The check is centralized:

def _is_databricks_configured() -> bool:
    return bool(os.environ.get("AYE_DBX_API_URL") and os.environ.get("AYE_DBX_API_KEY"))

Why this matters:

If not configured, on_command() returns None and Aye Chat falls back to other model backends.
Users who don’t care about Databricks get true “zero config.”
Users who do care can opt in with two env vars and zero ceremony.

Configuration and model selection

The plugin expects the Databricks endpoint to behave like an OpenAI-compatible chat completions API (or a compatible proxy).

Required

AYE_DBX_API_URL
- Full URL to POST chat completion requests.
- Example (illustrative):
  - https://<workspace-host>/serving-endpoints/<endpoint>/invocations
AYE_DBX_API_KEY
- Bearer token used for Authorization.

Optional

AYE_DBX_MODEL
- Defaults to: gpt-3.5-turbo

Yes, the default says gpt-3.5-turbo. No, this plugin isn’t emotionally attached to that string. It just needs a model field to put in the payload.

Lifecycle: `new_chat` vs `local_model_invoke`

The plugin handles two command names.

1) `new_chat`

If configured, new_chat resets conversation state by:

deleting .aye/chat_history.json (Databricks plugin history file)
resetting in-memory self.chat_history

Translation: when you start a fresh session in Aye Chat, the Databricks plugin doesn’t cling to the past.

2) `local_model_invoke`

This is the main inference path:

Load existing history from disk
Build the message list (system + history + new user message)
POST to the Databricks endpoint
Extract JSON from the model output (even if it tries to write a novel first)
Store lightweight history (no repeated file contents)
Return a parsed LLM response (with optional token usage)

Prompt construction: one message for the model, another for history

Aye Chat can include repo context (files / RAG snippets) in the prompt. The Databricks plugin uses two representations of the same idea.

A) The full user message (sent to the API)

user_message = build_user_message(prompt, source_files)

This includes:

the user’s prompt
the full contents of source_files (as gathered by Aye Chat)

That’s what the model needs to actually do the work.

B) The lightweight history message (saved to disk)

history_message = build_history_message(prompt, source_files)

This stores a compact representation (typically prompt + filenames), not full file contents.

Why: if you store full file contents in history on every request, .aye/chat_history.json turns into a data hoarder. Performance degrades, diffs get silly, and you start paying a storage bill for your own laziness.

This design is unglamorous, and therefore correct.

System prompt behavior

The plugin uses the shared SYSTEM_PROMPT by default (imported from aye.model.config).

But the invoker can override it per request:

effective_system_prompt = system_prompt if system_prompt else SYSTEM_PROMPT

Then the plugin builds OpenAI-style messages with the system prompt first:

messages = (
  [{"role": "system", "content": effective_system_prompt}]
  + self.chat_history[conv_id]
  + [{"role": "user", "content": user_message}]
)

The request payload (OpenAI-style, by design)

Headers:

headers = {
  "Content-Type": "application/json",
  "Authorization": f"Bearer {api_key}",
}

Payload:

payload = {
  "model": model_name,
  "messages": messages,
  "temperature": 0.7,
  "max_tokens": max_output_tokens,
}

Notable behavior:

The plugin assumes an OpenAI-like schema: model, messages, temperature, max_tokens.
Timeout is generous:

LLM_TIMEOUT = 600.0

Because sometimes the model needs a minute. And sometimes it needs a minute to think, plus another minute to dramatically clear its throat.

Response handling: extracting JSON from “helpful” output

Aye Chat expects the assistant to return a JSON object (usually with summary and optional updated_files).

In practice, models often return:

a paragraph of explanation
a code fence
three different “final” answers
then a JSON object

So the plugin uses _extract_json_object().

What `_extract_json_object()` does

It tries, in order:

json.loads(raw_response) directly
If that fails, it scans the text for balanced { ... } candidates while being string/escape-aware
It parses candidates and typically picks the last valid object

It’s not elegant. It’s resilient. (Those are often the same thing.)

A small caveat worth knowing

After extraction, the plugin does:

generated_text = json.dumps(generated_json)

If extraction fails, generated_json may be None, which turns into the literal JSON string:

null

Depending on how parse_llm_response() handles that, you may get confusing parse failures.

If you ever find yourself staring into the abyss wondering “why is the model output null?”, this is the first flashlight to grab.

Chat history: stored locally, lightweight on purpose

The plugin persists history in:

.aye/chat_history.json

It’s keyed by a conversation id derived from chat_id:

conv_id = get_conversation_id(chat_id)

On each successful request it appends:

the user’s lightweight history message
the assistant’s response as a JSON string (not raw prose)

self.chat_history[conv_id].append({"role": "user", "content": history_message})
self.chat_history[conv_id].append({"role": "assistant", "content": generated_text})
self._save_history()

This keeps future requests grounded without turning your history file into a landfill.

Parsing into Aye Chat’s internal response shape

Once the plugin has a JSON string in generated_text, it calls:

parsed_response = parse_llm_response(generated_text, self.debug)

parse_llm_response() converts the JSON into Aye Chat’s internal response schema.

Typically you’ll see fields like:

summary
updated_files: [{ file_name, file_content }, ...]

Those updated_files are what Aye Chat can apply optimistically to disk (with automatic snapshots so you can restore instantly).

Token usage passthrough

If the Databricks endpoint includes an OpenAI-like usage block, the plugin passes it through:

usage = result.get("usage")
if usage:
    parsed_response["token_usage"] = {
        "prompt_tokens": usage.get("prompt_tokens", 0),
        "completion_tokens": usage.get("completion_tokens", 0),
        "total_tokens": usage.get("total_tokens", 0),
    }

Why you care:

debugging prompt growth (especially with repo context)
monitoring costs (when applicable)
comparing RAG/context strategies across runs

Error handling (aka “tell me what broke, not poetry about failure”)

The plugin distinguishes between:

HTTP status errors

It catches httpx.HTTPStatusError and builds messages like:

DBX API error: <status_code> - <detail>

It tries to parse a JSON error body and extract error.message when possible.

Generic exceptions

Anything else becomes:

Error calling Databricks API: <exception>

Verbose / debug output

verbose: prints status code and raw response text (for non-200)
debug: prints internal message history and response blocks

This is especially useful when wiring new endpoints, where the biggest issues are usually:

schema mismatch
response shape differences
“the model didn’t output JSON like you asked” (shocking)

Minimal setup example

export AYE_DBX_API_URL="https://.../invocations"
export AYE_DBX_API_KEY="dapi..."
export AYE_DBX_MODEL="your-model-name"

Then run Aye Chat normally. If configured, this plugin will intercept local_model_invoke.

Troubleshooting checklist

If nothing happens (or worse, something happens but it’s wrong):

Env vars
- AYE_DBX_API_URL set?
- AYE_DBX_API_KEY set?
Endpoint compatibility
- Accepts messages chat format?
- Accepts max_tokens?
Response shape
- Does result["choices"][0]["message"]["content"] exist?
- Does content contain a JSON object Aye Chat can parse?
Model output discipline
- Extra prose is usually fine; _extract_json_object() can recover.
- If extraction becomes null, your endpoint likely isn’t returning JSON-like content in the expected place.
Turn on the lights
- verbose on
- debug on

Summary

The Databricks integration is a clean, opt-in model plugin that:

activates only when configured via environment variables
sends OpenAI-style chat completion payloads to your Databricks endpoint
builds rich prompts with file context, but stores lightweight history
extracts JSON robustly from messy model output
returns a structured response Aye Chat can apply to files
surfaces token usage when the endpoint provides it

If you’re extending or deploying this integration, the two most important things to validate are:

endpoint schema compatibility (messages in, choices out)
response format consistency (JSON object inside choices[0].message.content)

Everything else is just plumbing. Occasionally wet plumbing, but still.

About Aye Chat

Aye Chat is an open-source, AI-powered terminal workspace that brings AI directly into command-line workflows. Edit files, run commands, and chat with your codebase without leaving the terminal - with an optimistic workflow backed by instant local snapshots.

Support Us

Star our GitHub repository - it helps new users discover Aye Chat.
Spread the word. Share Aye Chat with your team and friends who live in the terminal.

Where the integration lives#

Activation: the plugin is silent unless you invite it#

Configuration and model selection#

Required#

Optional#

Lifecycle: new_chat vs local_model_invoke#

1) new_chat#

2) local_model_invoke#

Prompt construction: one message for the model, another for history#

A) The full user message (sent to the API)#

B) The lightweight history message (saved to disk)#

System prompt behavior#

The request payload (OpenAI-style, by design)#

Response handling: extracting JSON from “helpful” output#

What _extract_json_object() does#

A small caveat worth knowing#

Chat history: stored locally, lightweight on purpose#

Parsing into Aye Chat’s internal response shape#

Token usage passthrough#

Error handling (aka “tell me what broke, not poetry about failure”)#

HTTP status errors#

Generic exceptions#

Verbose / debug output#

Minimal setup example#

Troubleshooting checklist#

Summary#

About Aye Chat#

Support Us#