client.chat.complete()
Send a chat completion request and return the full response.
Parameters
Model ID. Example:
"kf-reasoning-10b".Conversation history as a list of
{"role": ..., "content": ...} dicts.
Roles: "user", "assistant", "system".System prompt. Prepended automatically as a
system message.Maximum tokens to generate. Default
1024.Sampling temperature between
0.0 and 2.0. Default 0.7.Nucleus sampling probability. Default
1.0.Any additional parameters passed through to the API.
Returns: ChatCompletion
| Field | Type | Description |
|---|---|---|
id | str | Request ID |
model | str | Model used |
choices | List[Choice] | Generated responses |
usage | Usage or None | Token usage |
Choice:
| Field | Type | Description |
|---|---|---|
index | int | Index in choices list |
message | Message | The generated message |
finish_reason | str | Why generation stopped |
Message has role and content fields.
client.chat.stream()
Send a streaming chat request. Returns a ChatStream context manager.
Parameters
Same ascomplete().
Returns: ChatStream
Use as a context manager and iterate over StreamChunk objects.
| Method | Returns | Description |
|---|---|---|
__iter__ | chunks | Yields StreamChunk objects |
get_final_text() | str | Full concatenated text after iteration |
StreamChunk:
| Field | Type | Description |
|---|---|---|
id | str | Request ID |
model | str | Model used |
delta | str | Text content of this chunk |
finish_reason | str or None | "stop" on the final chunk |
