How to Use Phi-4 GGUF: A Quick Guide

4 min readDec 14, 2024

Microsoft’s Phi-4 is an advanced language model that has recently been made available in GGUF format, allowing for local deployment and use. This guide will walk you through the process of setting up and using Phi-4 GGUF on your own machine, enabling you to harness its capabilities for various natural language processing tasks.

If you are seeking an All-in-One AI platform that manages all your AI subscriptions in one place, including:

Virtually any LLMs, such as: Claude 3.5 Sonnet, Google Gemini, GPT-40 and GPT-o1, Qwen Models & Other Open Source Models.
You can even use the uncensored Dolphin Mistral & Llama models!
Best AI Image Generation Models such as: FLUX, Stable Diffusion 3.5, Recraft

You can even use AI Video Generation Models such as Minimax, Runway Gen-3 and Luma AI with Anakin AI

Anakin.ai — One-Stop AI App Platform

Generate Content, Images, Videos, and Voice; Craft Automated Workflows, Custom AI Apps, and Intelligent Agents. Your…

app.anakin.ai

Phi-4: Small But Mighty

Phi-4 is the latest iteration in Microsoft’s Phi series of language models. It represents a significant advancement in AI technology, designed to handle a wide range of language tasks with improved efficiency and accuracy. The GGUF (GPT-Generated Unified Format) is a file format optimized for efficient loading and inference of large language models on consumer-grade hardware.

Key Features of Phi-4:

Advanced natural language understanding
Improved context retention
Enhanced performance on various NLP tasks

Benefits of GGUF Format:

Reduced memory footprint
Faster loading times
Optimized for consumer hardware

Comparing Phi-4 Performance with other popular models on AMC 10/12 Tests

Download Phi-4 GGUF

To begin using Phi-4 GGUF, you first need to download the model files. As of now, an unofficial release is available through a community member’s Hugging Face repository.

Steps to Download:

Visit the Hugging Face repository: https://huggingface.co/matteogeniaccio/phi-4/tree/main
Choose the quantization option that suits your needs (Q8_0, Q6_K, or f16)
Download the selected model file

Note: The official release from Microsoft is expected in the near future, which may offer additional features or optimizations.

Setting Up Your Environment

Before running Phi-4 GGUF, you need to set up your environment with the necessary tools and dependencies.Required Software:

Python 3.7 or higher
Git (for cloning repositories)
A compatible inference engine (e.g., llama.cpp or Ollama)

Installation Steps:

Install Python from the official website if not already installed
Install Git from git-scm.com if not present on your system
Choose and install an inference engine (detailed in the next sections)

Using Phi-4 GGUF with llama.cpp

llama.cpp is a popular inference engine for running large language models locally. Here’s how to set it up for use with Phi-4 GGUF:

Setting Up llama.cpp:

Clone the llama.cpp repository:

git clone https://github.com/ggerganov/llama.cpp.git\

Navigate to the cloned directory:

cd llama.cpp

Build the project:

make

Running Phi-4 with llama.cpp:

Place your downloaded Phi-4 GGUF file in the models directory
Run the model using the following command:

./main -m models/phi-4-q8_0.gguf -n 1024 --repeat_penalty 1.1 --temp 0.1 -p "Your prompt here"

Adjust the parameters as needed for your specific use case.

For more details, you can read this PR on llama.cpp repo:

Add support for Microsoft Phi-4 model by fairydreaming · Pull Request #10817 · ggerganov/llama.cpp

This PR adds support for Microsoft Phi-4 model. Fixes #10814. A model name value from general.name ("Phi 4") was used…

github.com

Hey, if you are working with APIs, Apidog is here to make your life easier. It’s an all-in-one API development tool that streamlines the entire process — from design and documentation to testing and debugging.

Apidog — the all-in-one API development tool

Deploying Phi-4 GGUF with Ollama

Ollama is another excellent tool for running language models locally, offering a more user-friendly interface.Installing Ollama:

Visit https://ollama.ai/ and download the appropriate version for your operating system
Follow the installation instructions provided on the website

Running the Phi-4 Model in Ollama:

Create a Modelfile named Modelfile with the following content:

Run the following command to test the model:

ollama run vanilj/Phi-4

vanilj/Phi-4

Microsoft's Phi 4 model

ollama.com

Conclusion

Phi-4 GGUF represents a significant step forward in making advanced language models accessible for local deployment. By following this guide, you should now be equipped to download, set up, and use Phi-4 GGUF for various natural language processing tasks. As you explore its capabilities, remember to stay updated with the latest developments and best practices in the rapidly evolving field of AI and language models.