Text Generation
Use these models to generate text, whether it's to review code, write a story, etc.
Models available
Available text models:
DeepSeek-V3-0324: The leading non-reasoning model.
DeepSeek-R1-Distill-Llama-70B: High-quality text generation for complex queries.
DeepSeek-R1-Distill-Qwen-32B: Efficient text generation with lower resource usage.
Llama3.3-70B: Advanced conversational AI with extensive knowledge.
Qwen-QwQ-32B: The AI model from the Qwen series, designed for reasoning and problem-solving.
Qwen2.5-Coder-32B: Specialized in code generation and completion.
Using the Models
Go to the Nebula Block website.
Log in, and ensure you have enough credits.
Click on the "Inference Models" tab and select your model.
Select your parameters, enter your text (and image, if applicable) and just press Enter!
The parameters you can tweak are outlined below:
Messages: The current dialogue between the user and the model.
System Prompt: Set of instructions, guidelines, and contextual information, which tell the AI how to respond to the queries.
Output Length: The maximum number of tokens that will be generated for each response.
Temperature: Temperature controls randomness. Higher values increase diversity.
Top P: A higher value will result in more diverse outputs, while a lower value will result in more repetitive outputs.
Stream: If set to
true
, the response will be streamed in chunks. If False, the entire generation will be returned in one response.
Through API Endpoint
This option is to use our API endpoint directly in your projects. Below are some code snippets to get you started!
NOTE: Don't forget to use your API key. See the API Reference and the Overview for more details on authentication.
Using cURL
curl -X POST "https://inference.nebulablock.com/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $NEBULA_API_KEY" \
--data-raw '{
"messages": [
{"role":"user","content":"Is Montreal a thriving hub for the AI industry?"}
],
"model": "meta-llama/Llama-3.3-70B-Instruct",
"max_tokens": null,
"temperature": 1,
"top_p": 0.9,
"stream": false
}'
Using Python
import requests
import os
url = "https://inference.nebulablock.com/v1/chat/completions"
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {os.environ.get('NEBULA_API_KEY')}"
}
data = {
"messages":[
{"role":"user","content":"Is Montreal a thriving hub for the AI industry?"}
],
"model":"meta-llama/Llama-3.3-70B-Instruct",
"max_tokens":None,
"temperature":1,
"top_p":0.9,
"stream":False
}
response = requests.post(url, headers=headers, json=data)
print(response.json())
Using JavaScript
const url = "https://inference.nebulablock.com/v1/chat/completions";
const headers = {
"Content-Type": "application/json",
"Authorization": `Bearer ${process.env.NEBULA_API_KEY}`
};
const data = {
messages: [
{ role: "user", content: "Is Montreal a thriving hub for the AI industry?" }
],
model: "meta-llama/Llama-3.3-70B-Instruct",
max_tokens: null,
temperature: 1,
top_p: 0.9,
stream: false
};
fetch(url, {
method: 'POST',
headers: headers,
body: JSON.stringify(data)
})
.then(response => response.json())
.then(data => {
console.log(JSON.stringify(data, null, 2));
})
.catch(error => console.error('Error:', error));
Selecting a Model
To specify the desired model, use this mapping for the model_name
:
DeepSeek-V3-0324:
deepseek-ai/DeepSeek-V3-0324
DeepSeek-R1-Distill-Llama-70B:
deepseek-ai/DeepSeek-R1-Distill-Llama-70B
DeepSeek-R1-Distill-Qwen-32B:
deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
Llama3.3-70B:
meta-llama/Llama-3.3-70B-Instruct
Qwen-QwQ-32B:
Qwen/QwQ-32B
Qwen2.5-Coder-32B:
Qwen/Qwen2.5-Coder-32B-Instruct
Response Example
A successful generation response (non-streaming) will contain a chat.completion
object, and should look like this:
{
"id": "chatcmpl-ec0014bc38e2cad1e45d47f7f01f6569",
"created": 1740432179,
"model": "meta-llama/Llama-3.3-70B-Instruct",
"object": "chat.completion",
"system_fingerprint": null,
"choices": [
{
"finish_reason": "stop",
"index": 0,
"message": {
"content": "Yes! Montreal is the home of cutting edge ... research.",
"role": "assistant",
"tool_calls": null,
"function_call": null
}
}
],
"usage": {
"completion_tokens": 695,
"prompt_tokens": 42,
"total_tokens": 737,
"completion_tokens_details": null,
"prompt_tokens_details": null
},
"service_tier": null,
"prompt_logprobs": null
}
This represents the entire generated response from the inference. Alternatively, the streaming option (stream: true
in the request body) will return several responses, each containing a chat.completion.chunk
object, and will look like this:
{
"id": "chatcmpl-289eb1f670a58c5cde47ddb634aad595",
"created": 1740432271,
"model": "meta-llama/Llama-3.3-70B-Instruct",
"object": "chat.completion.chunk",
"choices": [
{
"index": 0,
"delta": {
"content": " everyone"
}
}
]
}
{
...
}
...
where the content of each response will contain the generated token. These tokens put together form the complete response.
Feel free to explore refer to the API Reference for more details. cat Inference_Models/Text_Generation.md
Last updated