Inference Demo: Llama 3.1 8B Instruct - 4-bit IMQ quantized

OpenAI-compatible endpoint behind the scenes: /v1/chat/completions

max_tokens temperature

Output