How to Use Phi-4 GGUF: A Quick Guide
Microsoft’s Phi-4 is an advanced language model that has recently been made available in GGUF format, allowing for local deployment and use. This guide will walk you through the process of setting up and using Phi-4 GGUF on your own machine, enabling you to harness its capabilities for various natural language processing tasks.
If you are seeking an All-in-One AI platform that manages all your AI subscriptions in one place, including:
- Virtually any LLMs, such as: Claude 3.5 Sonnet, Google Gemini, GPT-40 and GPT-o1, Qwen Models & Other Open Source Models.
- You can even use the uncensored Dolphin Mistral & Llama models!
- Best AI Image Generation Models such as: FLUX, Stable Diffusion 3.5, Recraft
- You can even use AI Video Generation Models such as Minimax, Runway Gen-3 and Luma AI with Anakin AI
Phi-4: Small But Mighty
Phi-4 is the latest iteration in Microsoft’s Phi series of language models. It represents a significant advancement in AI technology, designed to handle a wide range of language tasks with improved efficiency and accuracy. The GGUF (GPT-Generated Unified Format) is a file format optimized for efficient loading and inference of large language models on consumer-grade hardware.
Key Features of Phi-4:
- Advanced natural language understanding
- Improved context retention
- Enhanced performance on various NLP tasks
Benefits of GGUF Format:
- Reduced memory footprint
- Faster loading times
- Optimized for consumer hardware
Download Phi-4 GGUF
To begin using Phi-4 GGUF, you first need to download the model files. As of now, an unofficial release is available through a community member’s Hugging Face repository.
Steps to Download:
- Visit the Hugging Face repository: https://huggingface.co/matteogeniaccio/phi-4/tree/main
- Choose the quantization option that suits your needs (Q8_0, Q6_K, or f16)
- Download the selected model file
Note: The official release from Microsoft is expected in the near future, which may offer additional features or optimizations.
Setting Up Your Environment
Before running Phi-4 GGUF, you need to set up your environment with the necessary tools and dependencies.Required Software:
- Python 3.7 or higher
- Git (for cloning repositories)
- A compatible inference engine (e.g., llama.cpp or Ollama)
Installation Steps:
- Install Python from the official website if not already installed
- Install Git from git-scm.com if not present on your system
- Choose and install an inference engine (detailed in the next sections)
Using Phi-4 GGUF with llama.cpp
llama.cpp is a popular inference engine for running large language models locally. Here’s how to set it up for use with Phi-4 GGUF:
Setting Up llama.cpp:
- Clone the llama.cpp repository:
git clone https://github.com/ggerganov/llama.cpp.git\
- Navigate to the cloned directory:
cd llama.cpp
- Build the project:
make
Running Phi-4 with llama.cpp:
- Place your downloaded Phi-4 GGUF file in the models directory
- Run the model using the following command:
./main -m models/phi-4-q8_0.gguf -n 1024 --repeat_penalty 1.1 --temp 0.1 -p "Your prompt here"
Adjust the parameters as needed for your specific use case.
For more details, you can read this PR on llama.cpp repo:
Deploying Phi-4 GGUF with Ollama
Ollama is another excellent tool for running language models locally, offering a more user-friendly interface.Installing Ollama:
- Visit https://ollama.ai/ and download the appropriate version for your operating system
- Follow the installation instructions provided on the website
Running the Phi-4 Model in Ollama:
- Create a Modelfile named
Modelfile
with the following content:
Run the following command to test the model:
ollama run vanilj/Phi-4
More details in the link:
Conclusion
Phi-4 GGUF represents a significant step forward in making advanced language models accessible for local deployment. By following this guide, you should now be equipped to download, set up, and use Phi-4 GGUF for various natural language processing tasks. As you explore its capabilities, remember to stay updated with the latest developments and best practices in the rapidly evolving field of AI and language models.