Getting started with GPT-2

It’s a bit tricky to see where to begin with OpenAI’s (in)famous GPT-2 model. This blog post is our first in a small series about NLP. We hope it helps!

Getting Python

Our preferred way of installing and managing Python, particularly for machine learning tasks, is to use the Anaconda Environment.

⚠️ Anaconda’s environments don’t quite work like virtualenv, or other Python environment systems that you might be familiar with. They don’t store things in the location you specify, they store things in the system-wide Anaconda directory (e.g. on macOS in “/Users/USER/anaconda3/envs/”). Just remember that once you activate them, all commands are inside the environment.

Anaconda bundles a package manager, an environment manager, and a variety of other tools that make using and managing Python environments easier.

➡️ Download the Anaconda installer for Windows or macOS

Once you’ve installed Anaconda, following these instructions to make an Anaconda Environment to use with GPT-2.

➡️ Create a new Anaconda Environment named GPT2 and running Python 3.x (the version of Python you need to be running to work with GPT-2 at the moment):

conda create -n GPT2 python=3

➡️ Activate the Conda environment:

conda activate GPT2

Getting and using GPT-2

➡️ Clone the GPT-2 repository to your computer:

git clone https://github.com/openai/gpt-2.git

➡️ While inside the activated Conda environment you’re using, install the requirements specified inside the GPT-2 repository:

pip install --upgrade -r requirements.txt

➡️ Use the model downloading script that comes with the GPT-2 repository to download a pre-trained model:

python3 download_model.py 774M

⚠️ You can specify which of the various GPT-2 models you want to download. Above, we’re requesting a download of the 774 million parameter model, which is about 3.1 gigabytes. You can also request the smaller 124M or 355M models, or the much larger 1.5B model.

You might need to wait a little while as the script downloads the model. You’ll see something like this:

Fetching model.ckpt.data-00000-of-00001: 44%|███████ | 1.36G/3.10G [02:16<03:44, 7.77Mit/s]

➡️ You’ll then need to open the file src/interactive_conditional_samples.py using your favourite programming editor, and update the model_name to the one that you downloaded:

def interact_model(
    model_name='774M',
    seed=None,
    nsamples=1,
    batch_size=1,
    length=None,
    temperature=1,
    top_k=0,
    top_p=1,
    models_dir='models',
):

➡️ You can then execute the Python script:

python3 src/interactive_conditional_samples.py

This will fire up the model, allowing you to enter some text. You’ll eventually (likely after some warnings that you can ignore about your CPU and the version of TensorFlow) see something like this:

Model prompt >>>

Enter some text and press return (we recommend only a sentence or two), then wait a bit and see what the model generates in response. This could take a while, depending on your computer’s capabilities.

We’ll be back with a follow-up article, exploring how you can actually use GPT-2 for something useful. Stay tuned!


For more AI content, check out our latest book, Practical Artificial Intelligence with Swift! It covers using Swift for AI in iOS applications, using Apple’s CreateML, CoreML, and Turi Create. If you like filling your brain with words, why not fill them with ours?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s