This page provides an overview of APIs related to View Assistant.
View Assistant is a comprehensive solution built into View AI that enables conversational experiences with protected, on-premises data. View Assistant exposes both simple chat APIs (interact directly with a large language model) and retrieval augmented generation (RAG) APIs (interact with a large language model using your data as context), alongside both an easy-to-use built-in conversational AI interface and a standalone deployable conversational AI interface.
Chat with Model
To chat with a large language model, without using your data as context, call POST /v1.0/chat
to the Assistant service, which by default listens on port 8331
. Submit a request body containing the following parameters:
Question
string
the question to ask the language modelModelName
string
the name of the model, as understood by OllamaStream
bool
set toTrue
to enable streaming; do not useFalse
OllamaHostname
string
the hostname on which Ollama can be reachedOllamaPort
int
the port on which Ollama can be reached
curl -X POST http://localhost:8331/v1.0/chat \
-H "Content-Type: application/json" \
-H "Authorization: Bearer [accesskey]" \
-d '
{
"Question": "Tell a very short joke?",
"ModelName": "gemma2:2b",
"Stream": "True",
"OllamaHostname": "localhost",
"OllamaPort": 11434
}'
The response will be sent using chunked transfer-encoding and a content-type of text/event-stream
, meaning each chunk in the response will be encoded as an event. An example response, using curl -v --raw
is as follows:
curl -v --raw -X POST http://localhost:8331/v1.0/chat \
-H "Content-Type: application/json" \
-H "Authorization: Bearer [accesskey]" \
-d '
{
"Question": "Tell a very short joke?",
"ModelName": "llama3.1:latest",
"Stream": "True",
"OllamaHostname": "localhost",
"OllamaPort": 11434
}'
* Host viewdemo:8331 was resolved.
* IPv6: (none)
* IPv4: 192.168.254.129
* Trying 192.168.254.129:8331...
* Connected to viewdemo (192.168.254.129) port 8331
> POST /v1.0/chat HTTP/1.1
> Host: viewdemo:8331
> User-Agent: curl/8.9.1
> Accept: */*
> Content-Type: application/json
> Authorization: default
> Content-Length: 135
>
* upload completely sent off: 135 bytes
< HTTP/1.1 200 OK
< date: Tue, 29 Oct 2024 17:05:07 GMT
< server: uvicorn
< content-type: text/event-stream; charset=utf-8
< Transfer-Encoding: chunked
<
19
data: {"token": "Here"}
17
data: {"token": "'s"}
... removed for brevity ...
Your HTTP client should use chunked transfer encoding and deserialize each line beginning with data:
as a payload line. If the string that follows data:
is deserializable to JSON, the token
property can be extracted and appended to the resultant display. Refer to the View C# SDK for Assistant for more details.
Retrieval Augmented Generation
The retrieval augmented generation (RAG) API follows the same syntax as the chat API, but uses a separate endpoint and a request body with more properties. The endpoint for the RAG API is POST /v1.0/rag
and the request body has the following structure:
{
"PromptPrefix": "You are a helpful AI assistant. Please use the information that follows as context to answer the user question listed below. Do not make up an answer. If you do not know, say you do not know. ",
"Question": "What information do you have available?",
"MaxResults": 10,
"Temperature": 0.1,
"TopP": 0.95,
"MaxTokens": 2048,
"GenerationModel": "llama3.1:latest",
"GenerationProvider": "ollama",
"OllamaHostname": "alienwarer10",
"OllamaPort": 11434,
"VectorDatabaseHostname": "pgvector",
"VectorDatabaseName": "vectordb",
"VectorDatabaseUser": "postgres",
"VectorDatabasePassword": "password",
"Stream": true,
"ContextSort": true,
"ContextScope": 2,
"Rerank": true,
"RerankTopK": 10
}
Similar to the chat API, the RAG API will return a result using chunked transfer encoding and a content-type of text/event-stream
, meaning your HTTP client should account for these.
The primary difference between the chat API and the RAG API is that the RAG API retrieves context from relevant documents and includes that context in the prompt prior to passing the prompt to the language model