Inference Demo: Llama 3.1 8B Instruct - 4-bit IMQ quantized
OpenAI-compatible endpoint behind the scenes:
/v1/chat/completions
Explain in 3-5 sentences: The future of artificial intelligence is...
Generate
max_tokens
temperature
Output