This page provides an overview of APIs related to View Assistant.

View Assistant is a comprehensive solution built into View AI that enables conversational experiences with protected, on-premises data. View Assistant exposes both simple chat APIs (interact directly with a large language model) and retrieval augmented generation (RAG) APIs (interact with a large language model using your data as context), alongside both an easy-to-use built-in conversational AI interface and a standalone deployable conversational AI interface.

Chat with Model

To chat with a large language model, without using your data as context, call POST /v1.0/chat to the Assistant service, which by default listens on port 8331. Submit a request body containing the following parameters:

  • Question string the question to ask the language model
  • ModelName string the name of the model, as understood by Ollama
  • Stream bool set to True to enable streaming; do not use False
  • OllamaHostname string the hostname on which Ollama can be reached
  • OllamaPort int the port on which Ollama can be reached
curl -X POST http://localhost:8331/v1.0/chat \
     -H "Content-Type: application/json" \
     -H "Authorization: Bearer [accesskey]" \
     -d '
{
    "Question": "Tell a very short joke?",
    "ModelName": "gemma2:2b",
    "Stream": "True",
    "OllamaHostname": "localhost",
    "OllamaPort": 11434
}'

The response will be sent using chunked transfer-encoding and a content-type of text/event-stream, meaning each chunk in the response will be encoded as an event. An example response, using curl -v --raw is as follows:

curl -v --raw -X POST http://localhost:8331/v1.0/chat \
     -H "Content-Type: application/json" \
     -H "Authorization: Bearer [accesskey]" \
     -d '
{
    "Question": "Tell a very short joke?",
    "ModelName": "llama3.1:latest",
    "Stream": "True",
    "OllamaHostname": "localhost",
    "OllamaPort": 11434
}'
* Host viewdemo:8331 was resolved.
* IPv6: (none)
* IPv4: 192.168.254.129
*   Trying 192.168.254.129:8331...
* Connected to viewdemo (192.168.254.129) port 8331
> POST /v1.0/chat HTTP/1.1
> Host: viewdemo:8331
> User-Agent: curl/8.9.1
> Accept: */*
> Content-Type: application/json
> Authorization: default
> Content-Length: 135
>
* upload completely sent off: 135 bytes
< HTTP/1.1 200 OK
< date: Tue, 29 Oct 2024 17:05:07 GMT
< server: uvicorn
< content-type: text/event-stream; charset=utf-8
< Transfer-Encoding: chunked
<
19
data: {"token": "Here"}


17
data: {"token": "'s"}

... removed for brevity ...

Your HTTP client should use chunked transfer encoding and deserialize each line beginning with data: as a payload line. If the string that follows data: is deserializable to JSON, the token property can be extracted and appended to the resultant display. Refer to the View C# SDK for Assistant for more details.

Retrieval Augmented Generation

The retrieval augmented generation (RAG) API follows the same syntax as the chat API, but uses a separate endpoint and a request body with more properties. The endpoint for the RAG API is POST /v1.0/rag and the request body has the following structure:

{
    "PromptPrefix": "You are a helpful AI assistant.  Please use the information that follows as context to answer the user question listed below.  Do not make up an answer.  If you do not know, say you do not know.  ",
    "Question": "What information do you have available?",
    "MaxResults": 10,
    "Temperature": 0.1,
    "TopP": 0.95,
    "MaxTokens": 2048,
    "GenerationModel": "llama3.1:latest",
    "GenerationProvider": "ollama",
    "OllamaHostname": "alienwarer10",
    "OllamaPort": 11434,
    "VectorDatabaseHostname": "pgvector",
    "VectorDatabaseName": "vectordb",
    "VectorDatabaseUser": "postgres",
    "VectorDatabasePassword": "password",
    "Stream": true,
    "ContextSort": true,
    "ContextScope": 2,
    "Rerank": true,
    "RerankTopK": 10
}

Similar to the chat API, the RAG API will return a result using chunked transfer encoding and a content-type of text/event-stream, meaning your HTTP client should account for these.

The primary difference between the chat API and the RAG API is that the RAG API retrieves context from relevant documents and includes that context in the prompt prior to passing the prompt to the language model