Text Generation
Last updated
Last updated
Use these models to generate text, whether it's to review code, write a story, etc.
Available text models:
DeepSeek-R1-Distill-Llama-70B: High-quality text generation for complex queries.
DeepSeek-R1-Distill-Qwen-32B: Efficient text generation with lower resource usage.
Llama3.3-70B: Advanced conversational AI with extensive knowledge.
Llama3.1-8B: General-purpose text generation at low cost.
Qwen2.5-Coder-32B: Specialized in code generation and completion.
Go to the website.
Log in, and ensure you have enough credits.
Click on the "Serverless Endpoints" tab and select your model.
Select your parameters, enter your text (and image, if applicable) and just press Enter!
The parameters you can tweak are outlined below:
Messages: The current dialogue between the user and the model.
System Prompt: Set of instructions, guidelines, and contextual information, which tell the AI how to respond to the queries.
Output Length: The maximum number of tokens that will be generated for each response.
Temperature: Temperature controls randomness. Higher values increase diversity.
Top P: A higher value will result in more diverse outputs, while a lower value will result in more repetitive outputs.
Stream: If set to true
, the response will be streamed in chunks. If False, the entire generation will be returned in one response.
This option is to use our API endpoint directly in your projects. Below are some code snippets to get you started!
To specify the desired model, use this mapping for the model_name
:
DeepSeek-R1-Distill-Llama-70B: deepseek-ai/DeepSeek-R1-Distill-Llama-70B
DeepSeek-R1-Distill-Qwen-32B: deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
Llama3.3-70B: meta-llama/Llama-3.3-70B-Instruct
Llama3.1-8B: meta-llama/Llama-3.1-8B-Instruct
Qwen2.5-Coder-32B: Qwen/Qwen2.5-Coder-32B-Instruct
A successful generation response (non-streaming) will contain a chat.completion
object, and should look like this:
This represents the entire generated response from the inference. Alternatively, the streaming option (stream: true
in the request body) will return several responses, each containing a chat.completion.chunk
object, and will look like this:
where the content of each response will contain the generated token. These tokens put together form the complete response.
NOTE: Don't forget to use your API key. See the and the for more details on authentication.
Feel free to explore refer to the for more details.