Nebula Block
  • Overview
  • Getting Started
    • Quickstart
    • Account Setup
    • Billing Information
    • Deploy Products
  • Core Services
    • Inference Models
      • Text Generation
      • Text Generation (Vision)
      • Image Generation
      • Embedding Generation
      • Model List
    • GPU Instances
      • Quickstart
    • Object Storage
      • Get Started
      • Tutorials
        • Linux/Mac
        • Windows
      • SDK
        • Golang
        • Python
        • Java
    • SSH Keys
      • Quickstart
  • API Reference
    • Platform API
      • Authentication
      • Instances
        • List Products
        • Create GPU Instance
        • List User Instances
        • List Deleted User Instances
        • User Instance Detail
        • Delete GPU Instance
        • Start GPU Instance
        • Stop GPU Instance
        • Reboot GPU Instance
      • SSH Keys
        • List SSH Keys
        • Rename SSH Key
        • Delete SSH Key
      • API Keys
        • List API Keys
        • Delete API Key
      • Billing
        • List Invoices
        • Download Invoice
        • Get Payment History
    • Inference API (OpenAI Compatible)
      • List Models
      • Generate Text
      • Generate Text (Vision)
      • Generate Images
      • Generate Embeddings
  • Team
  • Tier
  • Referral
  • Glossary
  • Contact Us
Powered by GitBook
On this page
  • HTTP Request
  • Response Attributes
  • Example
  1. API Reference
  2. Inference API (OpenAI Compatible)

Generate Text (Vision)

Generate Text (Vision).

Return the generated text based on the given textual and image inputs.

HTTP Request

POST {API_URL}/chat/completions

where the API_URL = https://inference.nebulablock.com/v1. The body has the following parameters:

  • messages array: An array of message objects. Each object should have:

    • role string: The role of the message sender (e.g., "user").

    • content list: A list containing the different inputs (recall an input can be either text or image). Each input is represented as a dict with the following key-value pairs:

      • type string: The type of input (e.g., "text" or "image_url").

      • image_url dict: If type is "image_url", this contains a dict representing the URL of the image with the following key-value pair:

        • url string: The URL of the image.

      • text string: If type is "text", the text input.

Note that only one of url or text can be provided in the input dict, and depends on the type.

  • model string: The model to use for generating the response.

  • max_tokens integer or null: The maximum number of tokens to generate. If null, the model's default will be used.

  • temperature float: Sampling temperature. Higher values make the output more random, while lower values make it more focused and deterministic.

  • top_p float: Nucleus sampling probability. The model will consider the results of the tokens with top_p probability mass. In other words, a higher value will result in more diverse outputs, while a lower value will result in more repetitive outputs.

  • stream boolean: Whether to stream the response in chunks or not.

Response Attributes

id string

A unique identifier for the completion request.

created integer

A Unix timestamp representing when the response was generated.

model string

The specific AI model used to generate the response.

object string

The type of response object (e.g., "chat.completion.chunk" for a streamed chunk or chat.completion for a non-chunked completion).

system_fingerprint string

A unique identifier for the system that generated the response, if available.

choices array

An array containing completion objects. Each object has the following fields:

  • finish_reason string: The reason the completion finished.

  • index integer: An index demarking this completion object.

  • message dict: Contains data on the generated output.

    • content string: The generated text for this completion object.

    • role string: Specifies the role of the AI (e.g., "assistant").

    • tool_calls array: Contains information about the tools used in generating the completion, if available.

    • function_calls array: Contains information about the functions used in generating the completion, if available.

usage dict

A dictionary containing information about the inference request, in key-value pairs:

  • completion_tokens integer: The number of tokens generated in the completion for a completion action.

  • prompt_tokens integer: The number of tokens in the prompt.

  • total_tokens integer: The total number of tokens (prompt and completion combined).

  • completion_tokens_details null: Additional details about the completion tokens, if available.

  • prompt_tokens_details null: Additional details about the prompt tokens, if available.

service_tier string

The service tier used for the completion request.

prompt_logprobs array

An array containing the log probabilities of the tokens in the prompt, if available.

Example

Request

curl -X POST "https://inference.nebulablock.com/v1/chat/completions" \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer $NEBULA_API_KEY" \
    --data-raw '{
        "messages": [
			{"role":"user","content":[
			{"type":"image_url","image_url":
			{"url":"https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg"}},
			{"type":"text","text":"What is this image?"}
		]}],
        "model": "Qwen/Qwen2.5-VL-7B-Instruct",
        "max_tokens": null, 
        "temperature": 1,
        "top_p": 0.9,
        "stream": false
    }'

Response

A successful generation response (non-streaming) will contain a chat.completion object, and should look like this:

    {
    "id": "chatcmpl-7ba48f119a564f4ea02b6a41386a3e40",
    "created": 1740689977,
    "model": "Qwen/Qwen2.5-VL-7B-Instruct",
    "object": "chat.completion",
    "system_fingerprint": null,
    "choices": [
        {
            "finish_reason": "stop",
            "index": 0,
            "message": {
                "content": "This image shows ....",
                "role": "assistant",
                "tool_calls": null,
                "function_call": null
            }
        }
    ],
    "usage": {
        "completion_tokens": 91,
        "prompt_tokens": 3604,
        "total_tokens": 3695,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
    },
    "service_tier": null,
    "prompt_logprobs": null
}

As is the case with text generation, if you set stream to True you can get the entire generated completion in 1 chat.completion object:

{
    "id": "chatcmpl-3812731562554b23a32dd80fbb7d0d09",
    "created": 1740692435,
    "model": "Qwen/Qwen2.5-VL-7B-Instruct",
    "object": "chat.completion.chunk",
    "choices": [
        {
            "index": 0,
            "delta": {
                "content": " setting"
            }
        }
    ]
}
{ 
  ...
}
...
PreviousGenerate TextNextGenerate Images

Last updated 21 days ago

For more examples, see the section.

Inference Models