Get Started

Robot Whisperer: How to Fine-tune GPT-3 (In 15 Minutes & Without Writing a Single Line of Code)

By Miha Cacic

July 24th, 2023

If you're like most AI-enthusiasts right now, you:

Love the idea of AI
Want to use it in your life and business
Yet prompting never gives you the speed, quality, or reliability that you need

Six months ago, my co-author (Mark 👋) and I (Miha 👋) were in the same situation.

We loved using ChatGPT on a daily basis as a helpful assistant. It was remarkable how we could scribble down off-the-cuff instructions and get insightful perspectives that fueled our creativity. But when it came to creating automations for our businesses and daily tasks… we hit a wall.

Prompt engineering was not just bad; it was an absolute disaster.

The results were slow, inconsistent, expensive, and overly verbose—I mean, really, how many times does ChatGPT need to say “I’m sorry for the confusion,” in one conversation?

If you've also failed to get meaningful results with prompt engineering and ChatGPT, take comfort in knowing that it's normal and, most importantly, not your fault. And if you have even attempted to solve this problem through fine-tuning, only to feel like you're grappling with an alien language that requires a PhD in computer science, again, you're not alone.

Now here's the good news: in the next 15 minutes, I'm going to equip you with a revolutionary new tool - our proprietary "Robot Whisperer" framework, which, born out of our own struggle and triumph, will allow you to craft your own AI models for any use-case under the sun.

And these models will be lighting fast, high quality, reliable, and cheap.

And the best part? You won't need to write a single line of python code in the CLI (or even know what that means) to make them! We’ll give you all the tools and know-how you need so that you can simply push the right buttons and get amazing results.

We've developed the "Robot Whisperer" framework after literally no-one could teach us how to fine-tune our AI models.

Mark and I run multiple 5, 6, and 7-figure companies, and when we saw what large language models (LLMs) were capable of, we knew we wanted to leverage them to streamline our business operations.

Despite our excitement, we hit roadblocks attempting to integrate ChatGPT into our processes. This propelled us to start exploring the creation of our own models. We joined several Discord groups, coming across a handful of AI enthusiasts who were working on fine-tuning their models. Our hope was rekindled.

However, no one had a good process to follow.

The industry was in its infancy, and the lack of expertise was glaringly evident.

That's when we decided to strike out on our own.

For six grueling months, we relentlessly experimented, trained models, and suffered failure after failure. We didn't leave our rooms. Our efforts seemed futile. The strain began to take its toll. Our mothers feared for our health. Our wives began to cry. We told them everything would be okay…

We persevered, fueled by the belief that we were on the brink of a significant breakthrough.

And indeed, we were.

Our breakthrough came in stages, each success fueling the next.

First, we figured out how to classify incoming emails. Then we made an AI that could categorize photo galleries for Mark's company. Then an AI that could decide whether a business is a good fit for my marketing services. And an AI that could identify profitable Google search terms. And even an eCommerce AI that recommends complementary products based on shopping cart items.

We built AI models that were not just functional but transformative.

In the end, we had replaced several contractors and VAs, enhancing our services and products. It was a rebirth of sorts, akin to discovering a superpower. We had unlocked a potential that could not only revolutionize our lives and businesses, but also countless others.

We named our process the "Robot Whisperer" framework, because it felt like we had discovered the language of machines: enabling us to command them to perform complex tasks that we had previously only dreamt of.

Today, we want to share this secret language with you.

This secret isn't meant only for tech gurus or Silicon Valley bigwigs.

It's a tool that you, regardless of your technical know-how or AI experience, can master and apply today.

So, if you've ever found yourself challenged by the limits of off-the-shelf AI or wished you could customize AI to meet your needs, here's your golden opportunity. You won't have to endure sleepless nights as we did. There'll be no complex coding or alien AI jargon. You'll be given a new superpower: to mold AI to your will.

And if you think that you can't—that Mark and I are somehow special—it's time to shatter that myth.

Jordan Silvera used the "Robot Whisperer" framework to build his own Cold Outreach Personalization Software

Mark and I aren't unique in our success with the "Robot Whisperer" framework. It has been used by individuals from different walks of life, demonstrating the extraordinary impact AI can have when fine-tuned to perfection.

Take Jordan Silvera, for example, a pioneer who successfully used the "Robot Whisperer" framework to create his own Cold Outreach Personalization Software.

In 2023, when Jordan plunged into the world of AI, he identified a massive hurdle marketers have: the time-consuming task of writing personalized emails to potential clients. Like many before him, Jordan first tried solving this problem with prompting.

He logged into ChatGPT, scraped LinkedIn profiles, and fed them in. He created a persona to prime the AI with, gave it detailed instructions, and even provided examples of the desired output. In a creative twist, he even had two AIs converse with each other, critiquing each other's ideas to improve output quality.

Yet, his trials were only met with moderate success. His system worked, but it was far from efficient. It was slow, unreliable, and the quality wasn’t there yet. To make up for this, Jordan began drafting increasingly lengthy prompts and engaging more AI agents. His prompts became so exhaustive that they hit the limit, and generating a single completion took minutes.

I suggested a radical change — abandon the prompts altogether and shift to a leaner, faster, and more cost-effective model. Jordan hesitated, fearing a drop in quality.

“Isn’t ChatGPT-4 the best there is?”

But, we reassured him that, armed with hundreds or even thousands of examples of well-executed tasks, a leaner model could become specialized, delivering higher quality outputs than even ChatGPT or other advanced models could produce using prompts.

What were the implications for Jordan?

Cost Savings: Since prompts no longer needed to be processed each time the model was called, the costs decreased dramatically.

Superior Performance: Jordan's fine-tuned model maintained the same quality output as the slower, more powerful models (like GPT-4), but at a much higher speed.

Enhanced Accuracy: For complex outputs that are nearly impossible to describe in prompt instructions, Jordan found that fine-tuning the model on a multitude of examples, without any explicit explanation, led to the AI intuitively figuring out the intricate rules.

Now, let's demystify the magic behind what we did for Jordan and how we can do the same for you.

Robot Whisperer: The 5-step Framework for Fine-tuning GPT-3

There are five steps to execute the Robot Whisperer framework successfully, which mirror in many ways how you would hire and train a new employee or contractor.

Step 1: Define the goal

You wouldn’t hire a new employee without a job description or a defined role. In the same way, the first step involves clearly setting out your goal for the AI. This is the cornerstone of your fine-tuning efforts.

Step 2: Gather or create training examples

Next, just as you would provide training materials or a demonstration to a new recruit, you'll need to gather or create relevant examples that can guide your AI model.

Step 3: Test which base model is best equipped for the job

Just like there are different candidates to consider for a job, there are many base models to choose from to train an AI. This is where we identify which base model provides the best balance of performance, speed, and cost for our task.

Step 4: Evaluate the model's work

After you train the AI model, the next stage is onboarding your new robot hire. This is where you start grading the model's outputs to make sure it aligns with your expectations. If it’s deficient, you can show it more examples to improve its performance in that area.

Step 5: Integrate the model in your project

Finally, once your model performs up to your standards, you can include it in your operations either by adding it to your Google Sheets, chaining it in Zapier, or integrating it in your software code. I'll show you how to do this in this final step.

Let's dive deeper into our tutorial.

Step #1: Define the Goal

The main question to ask before starting your fine-tune is: What do you want your model to do?

A good rule to remember is to have one goal in mind for one fine-tuned model. Don't try to create the next ChatGPT. Fine-tuning is ultimately the act of constraining or reducing the general intelligence of a model and turning it into a specialist. If you need to achieve multiple goals, pick one to start with, and consider how you can create separate models for each later on.

There are two primary categories of AI models. Ones that:

Generate text
Classify text

An example of a text generator fine-tuned model would be a "Shopping cart product recommendation engine" like this one:

You simply list items the customer has in their shopping cart, and the fine-tuned model knows immediately what to do with the data — no extra instructions needed.

Text Generators don't need to have a rigid input like in the example above. You could prompt them just as you would any general model, and teach them to reply in a certain way. Here's my model that responds to my questions in an ancient writing style of Greek philosophers, complete with rhetorical questions.

The possibilities are truly limitless. One of our users, for example, uploaded his entire customer support chat history, creating an AI model that could accurately answer customer questions in appropriate tone.

Text classifiers, on the other hand, are designed to categorize text from a list of possible dropdown options. Their output is much shorter — oftentimes just one character.

An example of a text classifier could be this model that categorizes responses to my business proposal emails as positive or negative.

Cold Email sentiment classification AI model

Or a text classifier that determines the search intent behind a keyword that my customers are typing in Google.

SEO Keyword Search Intent Classifier OpenAI Playground

Classifiers are easy to create and incredibly valuable. You could integrate a classifier model like this into Google Sheets or Zapier (or even coding functions!) for all sorts of useful automations that wouldn't be possible to create a year ago.

For example, I have a Zap that sends me a text message to my phone every time I get a positive response to my business proposal email. Great for following up with leads fast, and way better for my productivity than getting pinged for every response: positive or negative.

Of course you can also integrate generator models with Google Sheets and Zapier.

For example, you could have it draft personalized reply emails based on your templates to the positive responses. Or something fun. Like I have integrated my Marcus Aurelius model into a Slack channel where I can get a dose of inspiration every time I don't feel like doing something:

(I'll show you how to integrate your fine-tuned model into Gsheets and Zapier in Step #5.)

Here are some other specific use cases to help you brainstorm:

Type	Name	Description
Generator	Legal Document Drafting	A fine-tuned model that drafts legal documents with industry-specific jargon and format. The AI understands and applies legal terminology and guidelines.
Generator	Medical Report Writer	A model that generates precise medical reports. It understands and uses medical terms, abbreviations, and can adhere to the specific nuances of medical writing.
Generator	Technical Manual Writer	This model produces highly specialized technical manuals, adhering to specific formats and using domain-specific language.
Generator	Financial Analysis Report Generator	This model is fine-tuned to generate specific financial analysis reports, understanding financial terms, interpreting financial data, and conveying complex concepts in an easy-to-understand manner.
Generator	Scientific Research Summarizer	This model reads and interprets scientific papers, generating concise and clear summaries. It understands complex scientific language, interprets data, and identifies key findings.
Generator	Software Code Commenter	Fine-tuned to read software code and generate comments explaining what each part of the code does. It understands various programming languages and can explain complex code in a simple manner.
Generator	News Article Writer in a Specific Style	This model generates news articles in the style of a specific author or publication, ensuring stylistic consistency across different topics.
Generator	Highly Technical Poetry	The model writes poetry following a very specific structure, but also includes very domain-specific language.
Generator	Script Generator for a Specific TV Series	Fine-tuned to write TV scripts that accurately mimic the style, pacing, humor, and character voices of a specific show.
Generator	Generating Responses in a Specific Tone or Personality	This model consistently responds in a very specific tone or personality, such as mimicking the language of Shakespeare or always using humor or sarcasm.
Generator	Generating Music Lyrics in the Style of a Specific Band/Artist	This model writes lyrics that mimic the style, themes, and language of a specific band or artist.
Generator	Specialized Joke Generator	A model that can create jokes specifically for certain age groups or domains (like computer science).
Generator	Generating Text in Historical Language or Dialect	This model consistently generates text in a historical version of English or a specific regional dialect.
Generator	Highly Specialized Cookbook Generator	A model that creates recipes following highly specific dietary restrictions and cultural traditions.
Generator	Generating Advanced Mathematical Proofs	This model is fine-tuned to create advanced mathematical proofs in specific areas of mathematics.
Generator	Creating Highly Customized Fitness Plans	This model can generate highly specialized fitness plans, accounting for individuals' specific medical conditions and fitness levels.
Generator	Drafting Advanced Scientific Hypotheses	Based on input data, this model drafts scientific hypotheses in specific fields of study.
Generator	Content Idea Generator	Generates blog post or content ideas based on specified keywords or phrases.
Generator	Social Media Post Composer	Creates engaging social media posts for a specific brand tone and style.
Generator	Customer Service Bot	Responds to customer queries with personalized and on-brand responses.
Generator	Product Description Writer	Writes detailed and enticing product descriptions for an e-commerce store.
Generator	Email Draft Assistant	Helps draft emails based on provided context and recipient details.
Generator	Recipe Creator	Generates unique recipes based on specified ingredients.
Generator	Personal Trainer Chatbot	Provides fitness advice, workout routines, and motivation in response to user queries.
Generator	Custom News Briefing	Generates daily news summaries based on user's preferences and interests.
Generator	Medical Information Provider	Responds to health-related queries with accurate, understandable information.
Generator	Fiction Story Generator	Writes engaging short stories or novel excerpts based on provided prompts.
Classifier	Email Sentiment Analyzer	Categorizes incoming emails as positive, negative, or neutral based on content.
Classifier	Product Review Classifier	Classifies user reviews into categories for easy analysis and response.
Classifier	Social Media Sentiment Tracker	Tracks sentiment of brand mentions on social media platforms.
Classifier	Resume Filter	Sorts incoming resumes based on specified job criteria.
Classifier	Support Ticket Prioritizer	Prioritizes incoming support tickets based on urgency detected in the content.

Note: "Can I do X"? We couldn't include every possible use case in this table. Each business is unique, as is each situation. If you're curious whether a model can be trained for the task you have in mind, it's best to ask those who have done it before. We have a Discord group of AI fine-tuners where you'll surely find a helping hand: https://discord.gg/rHVGXgSV

Step #2: Gather or create training examples

Once you have a clear vision of what you want your model to do, the next step is to gather or create training examples. These will serve as the foundation for the machine learning process.

I can't stress this enough: the quality and volume of your training data will make or break the performance of your fine-tuned model. If your training data is bad, or there's not enough of it, your model might not work as you hoped, or it might make mistakes.

Heads up! Later, I'll show you a trick to get another AI to create the training examples. Think of a slower but powerful GPT-4 doing the heavy lifting of thinking through a problem, then teaching a less capable but faster GPT-3 how to do the task. That's how you solve the volume problem. But if you start with that, then the quality will be limited to what GPT-4 can produce, which might not be as good as what you can produce (or how the examples naturally occur in the practice — for example, GPT-4 guessing what kind of emails you're getting, instead of using actual examples).

So I always suggest preparing the first 20 high-quality examples by hand!

These initial examples need to be really good. They should clearly represent the task you want your model to do, and they should be diverse enough to cover a variety of different scenarios.

For example, if you're making a model that sorts emails into categories, your training data should include real-life emails from each category. Or if you're building a model to write personalized emails, your training data should be full of examples of personalized emails that someone wrote and that you are extremely happy with.

Here's how I go about it step-by-step.

I open a simple Google Sheet file with "input" in one column and "output" in the other.

Do note that you can have multiple input and output columns. I'll demonstrate this in my example, working on my "AI Clerk" model that recommends what items to add in a shopping cart based on what our customer already has. I want the model to output the suggested item as well as reasoning for choosing that item.

First, I will spend some time creating 20 examples by hand that I'm really happy with.

Having 20 high-quality examples is a good start, and for simple tasks—especially classifier-type tasks—you might already see some results with DaVinci, the most powerful model in the fine-tuning series. (More on this in step #3.)

But aim for around 100 examples, because I've found that's enough training data to train every model in the GPT fine-tuning series—Ada, Babbage, Curie, and DaVinci—and get good predictions about which of the four prototype models will be best fit for your task. (Again, I'll dive deeper into this in Step #3, but it's important that you know why we're shooting for 100 examples.)

At first, Mark and I used to handcraft 100s of examples by hand, which drained a ton of time and energy…

So we tried to find a way to make things easier. And we did. We discovered that you can actually feed the high-quality examples you've created (or gathered from real scenarios) into GPT-4 and let the AI create similar high-quality variations for you!

We call this Synthetic Data.

Get to 100 examples by expanding your manual examples with AI!

We used to do this manually for every project by spending a couple of hours crafting an AI prompt that expanded our examples… then cherry-picking the best ones from the completions… and then adding them to our Google sheets one by one.

Although still a time-slog, this was fine when working on prototype models that needed 100 examples. However, when creating production-ready models that needed 1,000 examples or more, even this AI strategy became too time-consuming.

Which is why we've developed Entry Point.

Entry Point is a fine-tuning platform that we developed with the entire "Robot Whisperer" framework in mind. It allows you to manage your training examples, generate data synthetically, run fine-tunes without code, predict training costs, compare quality, and monitor the health of your models. Our goal is to make Entry Point "your entry point into the world of AI training".

I'm now going to demonstrate how I now expand my 20 hand-written examples with Entry Point.

First, I download the Google Spreadsheet as CSV.

And create a new "generator" project in Entry Point.

I'll then click on "Import" and "Choose .csv file"

Next is mapping the field. I have to have at least one "Prompt" and one or more "Completion".

This will let our AI model know what is the input data and what is the output data.

All my examples are now saved in the Examples tab, although I'm not a fan of how the "Completion" looks.

In Entry Point you can edit your examples in bulk via the "Templates", like so:

Now that I'm happy with how the examples look, I can generate more in the Synthesis tab:

Before I start generating the examples, I can select whether I want to use GPT-3.5 Turbo as the engine or GPT-4 (Annotation 1.) I also select how many examples I want to generate in total and what is the batch size (Annotation 2.), where smaller batch sizes are useful for when you want to start adding examples to your training dataset while the rest are still generating. At the bottom of the setting you'll see how many tokens Synthesis will burn and the associated cost.

Note: Since Entry Point connects directly with your OpenAI account, any tokens that you use here are tokens that you use on your OpenAI account (we don't take a cut or resell the tokens for a higher price). But this also means you'll need your own API access to use GPT-4 as the engine, which you can get via this link. The wait time is 2-3 days and they pretty much approve anyone.

After we press "Start", synthetic examples will start to appear in batches.

Add the ones you like.

Adding synthesized examples in Entry Point

And don't worry - if you like an example but it's not 100% up to your standards, you have the option to edit it after pressing the Add button.

Saving synthesized training example in Entry Point

I'll keep adding these until I have around 100 examples, which in my case took me 10 minutes and cost roughly $0.15.

Now we're ready to run our first "hiring cycle".

Step #3: Test which base model is best equipped for the job

Now that you have ~100 training examples ready, it's time to test which base model is best suited for your task.

Important Update: Since this article was written, GPT-3.5 Turbo is now live! Read our dedicated guide to fine-tuning GPT-3.5.

This is just like assessing the candidates that signed up for your job. OpenAI's GPT-3 series (which is the only series available for fine-tuning on OpenAI) has four base models to choose from: Ada, Babbage, Curie, and DaVinci. Each pre-trained model has its own strengths and weaknesses:

Model & description	Typical use case	Cost per 1000 tokens
DaVinci Most capable GPT-3 model. Can do any task the other models can do, often with higher quality.	Complex intent, cause and effect, creative generation, search, summarization for specific audience, reasoning	$0.1200
Curie Very capable, but faster and lower cost than DaVinci.	Language translation, complex classification, sentiment analysis, summarization	$0.0120 (10% of DaVinci)
Babbage Capable of straightforward tasks. Very fast and low cost.	Moderate classification, semantic search	$0.0024 (2% of DaVinci)
Ada Capable of very simple tasks. The fastest and lowest cost model in the GPT-3 series.	Parsing text, simple classification, address and data correction, keywords	$0.0016 (1.3% of DaVinci)

Note: you're probably puzzled by the relatively high cost of fine-tuned DaVinci and Curie, compared to GPT-3.5 ($0.002/1k tokens) and GPT-4 (~$0.05/1k tokens). Yes, they are indeed more expensive looking at per token cost. However, since you won't be giving any instructions in the prompts in your fine-tuned models, the number of tokens that you'll burn will be 5, 10, even 20x less than what you'd do with prompting. And this is what makes fine-tuned models more economical in the end. (table)

But you never know for sure which model will work for your use case until you test them all.

Keep in mind that the best model for your task may not necessarily be the most powerful one. Sometimes, a weaker model can perform just as well or even better on specific tasks, especially if it has been fine-tuned with a large number of high-quality examples. The weaker ones are also faster.

Which is exactly what we're doing in this step, and which is why we've prepared 100 examples in Step #2.

To do this, you'll need to run a fine-tuning process on each of the four base models using your training examples. This will give you an idea of how well each model can handle the task and help you make an informed decision on which one to use.

Here's how to run a fine-tune for each base model without writing any code, reformatting your training set into a JSON or JSONL file, or anything like that.

In Entry Point, go under "Fine-tunes" (Annotation 1.) and click the plus icon (Annotation 2.):

Here, you can select the platform and base model of your choice (Annotation 1.) and see how much examples, tokens, and dollars it will cost to do the job (Annotation 2.):

You can choose the base model (1) as well as set any custom fine-tuning parameters (2).

Clicking "Show Advanced" will open advanced options where you can set the hyperparameters like the number of training epochs, prompt weight loss, learning rate multiplier, and batch size.

Advanced GPT fine-tune options in Entry Point

I recommend you experiment with parameters later, as you get familiar with the process. For now let's leave default values and let us allow OpenAI to dynamically set other values.

Next, simply click the Start button and the fine-tuning process will begin!

I've started fine-tunes with all four base models: DaVinci, Curie, Babbage, and Ada.

It will take about an hour or two for the fine-tunes to finish at which point they'll be marked as "complete", and you'll be able to use the model in OpenAI's playground, Zapier, and other API calls.

Step #4: Pick the most appropriate model and start grading its work

All my models from the previous step have finished training:

Now it's time to evaluate each model’s performance and choose the most appropriate one for the task — or the one that we think has the potential to perform up to our standards if we just gave it more examples.

Now, there are two ways to assess how well a model performs. Through:

Validation examples while fine-tuning
Manual testing

(I suggest using a validation set when creating a classifier, and manual testing when creating a generator type of model.)

Keep in mind that the main variable for which models will perform well is the number of training examples. If you have a lot of examples, you can often downgrade to a less expensive and a faster model.

Edit: In theory, of course we want the models to be as accurate, fast, and cheap as possible. However, in reality, we need to be practical. We have to take into account how much you're going to use the model (100 requests per month vs a million), what damage can mistakes cause (fraud detection is way more important to get right than, say, writing a joke), and what is the minimum required speed and cost of the model. Answering these questions will give you an estimate how much effort you should invest in training a model.

Classifier - Testing the models with validation examples:

Validation examples are examples that Entry Point sends to OpenAI whenever we launch a new fine-tuning job. After the model fine-tunes, its accuracy will immediately be tested against the validation examples.

I'll demonstrate this on one of my classifier models. Let's open it.

Here we can see that it was trained on 113 training examples (Annotation 1.) and evaluated on 16 validation examples achieving a 93.8% accuracy score (Annotation 2.)

The more validation examples you provide, and the more varied they are, the better you can test the model. Scrolling through the results you'll quickly see what kinds of mistakes your model is making. For example, it might be overly-aggressive with one type of categorization. When you see that you can then prepare more training examples targeting that and rerun your fine-tune.

Generator - Testing the models manually:

If you haven't provided validation examples or if you're fine-tuning a generator type of model, you can use OpenAI's playground calls to interact with the fine-tuned models and see how well it performs on your task.

You can immediately select the model in the playground by clicking this "open in a new window" icon:

Now we have opened OpenAI.com and the model has been selected:

OpenAI playground with custom fine-tune selected

Write a prompt you want to test followed by "###", which you should also add in the "Stop sequence" box.

("###" is the default stop sequence Entry Point is using. If you don't include it, the model will run amok. If you changed your stop sequence under settings in Entry Point, make sure to use the appropriate stop sequence.)

The completion will then show up, highlighted in green below.

In my case, it's wind chimes. I really like that suggestion and reasoning the model has used.

Instead of using OpenAI's Playground for testing I can also use Entry Point's Playground.

It has a few benefits:

I don't have to come up with test examples on my own
I can get and compare multiple completions at once

Here's what it looks like.

Click on Playground in the navigation tab (Annotation 1.) and "Generate example" button (Annotation 2.) Now select the fine-tuned model you want to test (Annotation 3.) and how many completions you want to get (Annotation 4.)

Entry Point Playground generator example and completions

That's it!

That's how you test the performance of your models.

This will give you a better understanding of their strengths and weaknesses, and help you decide if a model needs more training or if it's ready for production.

Remember, you can always use a cheaper and faster model, as long as the quality of the output is the same as the slower and more expensive model. But if the quality of a weaker model is not there just yet, but you still want to use it for cost savings and speed, keep in mind that every time you double your training file, the quality of the model will improve linearly. In other words, think if it's worth expanding your dataset to get the quality of the faster model up to the standard.

After selecting the best model, you can start using it in your applications and automations.

Step #5: Integrate the model in your project

Once you have your fine-tuned GPT-3 model and you’re happy with its performances, the next step is to start using it in your project. I see models being most commonly used in:

Google Sheets (model turned into a function)
Zapier (model chained in a task)
Code (embedding model's output into a software application)

Let's go over each.

1. Google Sheets Integration

In the next minute we'll create a custom Google Sheets function that calls your fine-tuned AI model.

I'm going to demonstrate this on my use case: classifying search terms in Google Ads. Here's what that's gonna look like:

To call your fine-tuned model inside Google Sheets, you'll have to create a simple Apps Script function.

This is what we should see.

I'll rename the "Untitled project" to "PPC Keyword Classifier" since that's what I'm working on in this Google Sheet: I have to classify Google Ads keywords I have in my column B.

I'll also delete the myFunction function and replace it with this code, which calls my fine-tuned model through OpenAI. (There's an in-depth explanation of what each line of the code does in the doc above, along with the changes you need to make to make it work for your case.)

Editor's note: to make this work you'll have to add your OpenAI API key in Google Apps Scripts. Be careful with this, someone might steal your new secret key and burn your OpenAI credits! Make sure no one else has access to your Google Sheet OR use a tool like Google Cloud Secret Manager.

To save the code, press the floppy disk icon.

Once you do, additional controls will become available:

Click on "Run".

When you run an Apps Script function in a Google Sheet for the first time you'll trigger "Authorization required":

App Script first run Authorization Required box

Click "Review permissions" and log in with the email address that owns the Google Sheet file:

Then this scary looking message will pop up.

Click the tiny "Advanced" text (Annotated 1.) and then "Go to [your project name]" (Annotated 2.)

Finally, click "Allow"

Ignore the cache issue that shows in the Execution log.

It happened because we ran the function with an empty prompt.

App Script cache issue can be safely ignored

Now go back to your spreadsheet and run your model by typing the name of your function, selecting the prompt you want to feed into your model, and wait a few seconds:

Voila ✨

(The "Loading…" error always happens while a custom Apps Script function is processing. There is no error, and as of June 2023 there's no way to hide it.)

2. Zapier Integration

You can connect your fine-tuned model with any software that you use—without writing a single line of code—with the help of Zapier (or any other workflow automation software).

I'll demonstrate this by creating an AI filter for my incoming email.

First, create an account on Zapier and create a new Zap:

Pick the platform that you want to enhance with your AI model.

In my case, it's Gmail. But you can pick Slack, Google Calendar, your CRM of choice, or literally anything else.

Now pick an event that will trigger the automation. Note that this is different for every platform.

In my case, I want the Zap to trigger whenever I get a new Email.

Next, you'll be asked to select your Gmail account and Zapier will pull one of your recent emails to test the connection. The email, in this case, won't look anything like you're used to. This is an API call. The good news is that you don't have to worry about this — it'll all get demystified in the next step.

After clicking Continue pick the platform that you want to process the email.

In our case, it's OpenAI (GPT-3, DALLE, Whisper). Type it in the search box:

Now, if we want to call our custom AI model, we'll have to select the "Send Prompt" event.

In the "Model" box, select "model" and scroll down to find your AI fine-tune.

In the "Prompt" box, I'm going to build my prompt by including Email Subject and Email Body, which is a variable that will be different with every new email that I get. Make sure that your prompt here has the same structure as the prompts you've used for fine-tuning your AI model.

Next, tweak the promp parameters if you need to. Since I'm going to use this model for classification, I want the temperature to be 0. I want the model to be deterministic, not creative.

Also, it's very important that you set the "Stop Sequence", which should be the same to what you've set in Entry Point. By default, it's "###". Add it here.

3. Code

When you fine-tune a GPT-3 model in Entry Point, you or your developer can interact with it directly using your secret API key and the OpenAI create completions API endpoint.

You will be sending a POST request to the following endpoint:

https://platform.openai.com/docs/api-reference/completions/create

On the request headers, you need to set a value for the Authorization key, which is formatted as "Bearer YOUR_SECRET_KEY". Make sure to have the word "Bearer", followed by a space, and then your secret key. It does not need to be converted into Base64 or anything like that, just your plain text secret key.

It's a good practice to specify "application/json" for both Accept and Content-Type on the request also, so that it's clear we are sending JSON in and expecting JSON out.

You can also set a header value for OpenAI-Organization with your organization ID. This is optional. If you don't set it, your request will use the default organization linked to your account, which is going to be correct by definition if you only have one organization.

Now let's talk about the request body. Make sure to set the following parameters:

model will be the unique name of your fine-tuned model, which looks like: davinci:ft-personal:some-project-name-2023-07-28-17-11-43. A common mistake is to use the model ID, because that would be more idiomatic, but it's not used here.
prompt will be your input text - make sure to follow your prompt template from Entry Point that you trained the model on, because consistent structure is very important for consistent output. The prompt also needs to include your separator at the end, which you can get from project settings in Entry Point (usually -> or ###). We typically also prepend each prompt with a space, which is best practice since most tokens include the space at the beginning. When in doubt, export your project as a JSONL and look at the final formatting of each example.
stop will be your stop sequence, you can get this from project settings in Entry Point. Make sure to replace line breaks with the new line character \n
temperature should be a value between 0 and 2. When in doubt, start with 0 to get the most consistent output
n is for the number of completions. Set this to 1 unless you want multiple.
max_tokens can prevent your output from going off the rails and being too long. It can also cut off your output before you want it to! Set it to a reasonable number based on what size content you’re expecting to generate. A low number like 12 should be fine for a classifier, whereas for long-form text generation you might need it to be much higher. In general, start with a safe low number like 256 because you can get an error if you try to set it too high without calculating the max completion length given the size of your input prompt.

Here is a example in Kotlin, using the Unirest HTTP library to fetch completions:

val json = JSONObject() with(json) { put("model", "davinci:ft-personal:some-project-name-2023-07-28-17-11-43") put("prompt", " your prompt here ###") // Note the separator at the end and space at the beginning put("stop", "\n\n###\n\n") put("temperature", 0) put("n", 1) // Just get one completion back put("max_tokens", 256) } val response = Unirest.post("https://api.openai.com/v1/completions") .header("Authorization", "Bearer YOUR_SECRET_KEY") .header("Accept", "application/json") .header("Content-Type", "application/json") .header("OpenAI-Organization", "YOUR_ORGANIZATION_ID") // Optional .body(json) .asJson() if (response.status != 200) { // Something went wrong, throw an error! } // Get an array of completions (only one value unless you set n > 0) val completions = response.body.`object`.getJSONArray("choices")

Note: Typically, you would want to make an API call like this from your secure web backend. Any public web page using Javascript to directly get completions from OpenAI would expose your secret API key.

Pro tip: To convert this example to work with any programming language and HTTP library, open up ChatGPT, paste the example above in and ask for a conversion!