GPT2-Based Text Stylization Model with Llama.cpp Wrapper

This repository provides a GPT2-based text stylization LLM model, wrapped with Llama.cpp to ensure compatibility across various platforms, including iOS and WebGL. The model is designed to transform generic texts into stylized, game or user-tailored dialogue.

Features

Text Rewriting: A GPT2-based line rewriter for stylizing texts.
Cross-Platform: Compatible with WebGL, iOS, Android, Windows x64, and MacOS (ARM).
Async and Sync Execution: Supports asynchronous execution using C++20 threads and synchronous execution as well.
Customizable: Allows configuration of Llama.cpp parameters directly within the UnityEditor UI, with setup and execution callbacks.
Quantised models on-board: We this package with three model quantised resolutions
- **_Q4_K_M_** most heavily quantised (simplified) one, 110mb
- **_Q8_0_** 175mb
- **_bf16_** original resolution model, 321mb

Try It Online

Experience the model directly in your browser (with the **_Q4_K_M_** model):

Model Usage Example

Provide the following input:

<input> How are you today? <inputEnds>
<style> Pirate's Orwellian and Poetic Question <styleEnds>
<output>

And it will generate:

Hey there, how fares ye today?

Model Information

Training Data: Finetuned on a corpus of over 0.5 million dialogue lines (64,343,812 tokens) synthetically generated using GPT-4.
Tools: Output filtering tools provided in C# for easy editing.
Wrapper: Built on Llama.cpp release tag b3490, supporting the latest Llama 3.1 405b model.
Model Formats: Weights are available in bf16, int8, q4_k_m resolutions in .gguf format.

Recommendations

To get the best results, consider using the following styles:

Mood Styles: Inquisitive, Emotional, Intellectual, Dynamic, Noble, Light, Foreboding.
Writing Styles: Historical, Modern and Contemporary, Genre-Specific, Expressive and Creative.

Tip: Keep your style descriptions short and concise. Combining mood and writing styles often yields the best results.

Integration Notes

WebGL: Limited to 4GB of RAM (x86), and published pages may not work on mobile devices with less RAM (e.g., iOS 17).
MacOS/iOS Builds: Requires Mac OS-based computers with CMake and Xcode installed.
AI Output: The LLM model may generate tokens that are not parsable by certain fonts.
Model Performance: Optimized for one-liners and single sentences.
CPP-docs: avaliable here, helpfull for rebuilding wrappers
CSharp (this docs): avaliable here

License

The model and code are released under Unity3D Asset Store’s standard End User License Agreement. See the full EULA here.

Coding

Motivating usage example:

using System.Collections;
using Assets.StyledLines.Runtime.ModelFilesUI;
using UnityEngine;
 
 
public class MinimalExample : MonoBehaviour
{
    public BinaryAsset binaryFileAsset;
 
    private ModelController _llmController = null;
    private void Start()
    {
        _llmController = new ModelController(binaryFileAsset);
        StartCoroutine(GenerateLine());
    }
 
    private IEnumerator GenerateLine()
    {
        string line = "I love you!";
        string style = "Simple, Confident, Shakespearean";
 
        var prompt = $"<input> {line} <inputEnds>\\n<style> {style} <styleEnds>\\n<output>";
        var id = _llmController.model.GenerateAsync(prompt);
 
        while (!_llmController.model.IsGenerationReady(id))
        {
            yield return new WaitForSeconds(0.3f);
        }
 
        string result = _llmController.model.GetGenerationResults(id);
        result = ModelController.ExtractFirstPart(result, "<out");
        Debug.Log(result); // prints out something like "My heart is boundless with affection!"
    }
}

Types to First Take a Look At

ModelController
Description:
The ModelController class is responsible for setting up and configuring the Llama model for inference. It handles loading model parameters, registering necessary callbacks, and managing the context for the inference operations. This class is foundational for configuring the model and ensuring it is ready for use in various tasks.

Why it's Important:
Understanding how to configure and control the Llama model is critical, and ModelController provides the tools to do just that. It's the backbone of the model configuration process, enabling the proper setup of inference parameters and ensuring the model operates as expected.

References:
- Class: ModelController
- Methods: ExtractFirstPart, GetTempPath
CallbackWrapper
Description:
This static class serves as a bridge between the native library and the managed code, allowing you to register and handle callbacks for logging, token generation, and completion events. It is essential for capturing and responding to various model events during inference operations.

Why it's Important:
Callbacks are critical in asynchronous programming, especially when dealing with model inference. CallbackWrapper allows you to handle these callbacks effectively, enabling responsive and interactive model operations. Without this class, integrating with native library callbacks would be cumbersome and error-prone.

References:
- Class: CallbackWrapper
- Methods: RegisterLogCallback, RegisterTokenCallback, RegisterCompletionCallback
RunModelAsync
Description:
This class is detailed example on managing the lifecycle of asynchronous model generation tasks. It handles initialization, generation, and the state transitions of the Llama model, showing how to run and control the model. The RunModelAsync class also provides methods to monitor and manage model outputs, log handling, and event dispatching.

Why it's Important:
If you're working with asynchronous operations and need to manage the state and lifecycle of model inference, this class is your go-to. It encapsulates all the necessary functionality to start, monitor, and stop model tasks in an asynchronous environment, making it essential for any non-blocking operations.

References:
- Class: RunModelAsync
- State Enumeration: RunModelAsync.State
- Methods: Generate, DoSetup, Start, Update
DialogManager
Description:
The DialogManager class is an example designed to manage dialog interactions between characters in a scene. It processes dialog scripts, assigns styles to character speech, and handles the display of text above characters using Unity's TextMeshPro. This class integrates with the Llama model to dynamically generate character speech based on predefined styles and input text.

Why it's Important:
DialogManager is essential if you're creating interactive scenes or narrative-driven content where characters engage in dialog. It automates the process of managing dialog sequences, ensuring smooth transitions and dynamic content generation.

References:
- Class: DialogManager
- Methods: StartDialog, ShowNextLine, ReparseDialogAndStyles
MinimalExample
Description:
The MinimalExample class is a simple demonstration of how to set up and use the ModelController to generate text asynchronously. It initializes the model with a binary asset, sends a prompt for text generation, and outputs the generated text to the Unity console.

Why it's Important:
This class provides a basic, hands-on example of how to get started with the LlamaLibrary. It's perfect for beginners who want to see the basics of model interaction in action without diving into more complex systems.

References:
- Class: MinimalExample
- Methods: Start, GenerateLine