Skip to main content

client.chat.complete()

Send a chat completion request and return the full response.
response = client.chat.complete(
    model="kf-reasoning-10b",
    messages=[{"role": "user", "content": "Hello"}],
)

Parameters

model
str
required
Model ID. Example: "kf-reasoning-10b".
messages
List[dict]
required
Conversation history as a list of {"role": ..., "content": ...} dicts. Roles: "user", "assistant", "system".
system
str
System prompt. Prepended automatically as a system message.
max_tokens
int
Maximum tokens to generate. Default 1024.
temperature
float
Sampling temperature between 0.0 and 2.0. Default 0.7.
top_p
float
Nucleus sampling probability. Default 1.0.
extra
dict
Any additional parameters passed through to the API.

Returns: ChatCompletion

FieldTypeDescription
idstrRequest ID
modelstrModel used
choicesList[Choice]Generated responses
usageUsage or NoneToken usage
Each Choice:
FieldTypeDescription
indexintIndex in choices list
messageMessageThe generated message
finish_reasonstrWhy generation stopped
Message has role and content fields.

client.chat.stream()

Send a streaming chat request. Returns a ChatStream context manager.
with client.chat.stream(
    model="kf-reasoning-10b",
    messages=[{"role": "user", "content": "Hello"}],
) as stream:
    for chunk in stream:
        print(chunk.delta, end="", flush=True)

Parameters

Same as complete().

Returns: ChatStream

Use as a context manager and iterate over StreamChunk objects.
MethodReturnsDescription
__iter__chunksYields StreamChunk objects
get_final_text()strFull concatenated text after iteration
Each StreamChunk:
FieldTypeDescription
idstrRequest ID
modelstrModel used
deltastrText content of this chunk
finish_reasonstr or None"stop" on the final chunk