Microsoft and Google are stealing headlines as they race to see who can integrate AI into their products the fastest.
But what about us? The little guys… The underdogs…
Real founders and developers working on normal-sized projects.
We don’t have hundreds of product teams to direct at a moment’s notice. But, considering the pace of AI innovation, it feels like we need them. The alternative is being left behind in the sprint of technology & innovation.
Maybe you already have a product. Maybe you also have an idea for an AI-driven feature. Or, perhaps you are launching an entire AI-based startup from the ground up.
Six months ago I was in the same boat. I saw the potential of AI for businesses and started building Entry Point—a tool that leverages AI to help you train AI. This was my first time incorporating AI into a product, and I learned a lot building it.
Today I get to share those insights with you.
To help you move just a little bit faster and with just a little bit more clarity.
Here are 3 of the most important insights I’ve learned about integrating AI into a product and what to do about them.
I used to think prompt engineering just meant writing good prompts.
After all, when I’m working with ChatGPT I can scribble down some off-the-cuff instructions and get pretty good results on the first try.
But when you are going to run the same prompt hundreds… thousands…millions of times in your product, that’s where the "engineering" part comes in.
Suddenly, you encounter edge cases and serious reliability problems.
By that I mean you have no idea what ChatGPT is going to spit out given the variety of possible user inputs that could be injected into your prompt. Or the fact that the same prompt can give different outputs on different days, since ChatGPT is constantly being updated.
This problem becomes especially apparently when working with structured data. Examples of this could include HTML, tables, or CSV files. When I was building the Data Synthesis feature in Entry Point, I needed ChatGPT to package my data into JSON format.
The first time I tried it, BOOM! Success. "This is too easy!" I thought.
Then I ran it a few more times.
"JSON parsing failed," my runtime logs complained to me.
Turns out that ChatGPT (3.5 turbo) likes to randomly add a trailing comma to the last item in an array, which is common in many programming languages, but invalid for proper JSON. When it happened, the output was unsalvageable…all because of a simple comma.
I thought of many possible solutions:
I could try to loosen up the parsing standards, but that wasn’t a native option with the library I had selected.
I could run it through ChatGPT twice, the second time to fix JSON formatting—but that would be hacky and slow.
I could reduce the temperature parameter—except I need the completions to be creative so I want the model to take chances.
I could ask it for JSONL which would split each object by line so that there would be no commas on the parent array. But I have arrays within arrays, so the issue could still crop up if I don’t fix it for good.
I could make the prompt be more specific. Hey, that sounds easy.
So I engineered the prompt.
I added a JSON example without trailing commas that I wanted ChatGPT to output.
But trailing commas would STILL creep into my data—maybe 1 out of 20 times.
Not good enough.
Next, I added pseudo-code comments to the JSON example, explicitly pointing out where there should be "no trailing comma."
I was making progress, but this trailing comma phenomenon kept happening, albeit less frequently.
Next, I added a section in the prompt saying, “The JSON should validate perfectly according to the ECMA-404 standard and when parsed by the Jackson object-mapping library.”
That finally overcame the trailing comma issue.
…or did it just make it less frequent? And I will find out when more users come on the platform? That question will linger for some time.
Surely, there must be a better way.
The next issue I ran into was speed.
I'm just gonna say it—LLMs are kinda slow.
When I started using ChatGPT for casual Q&A, I felt like it was pretty fast. I didn’t think much of it.
That’s because it streams one word or so at a time and you read it in real-time. Since most of us aren’t savant-level speed readers, that gives us a nice illusion that it’s fast.
However, when you’re asking for structured output like JSON, you can’t parse that until it’s finished (or at least, it’s not obviously practical). So you just gotta wait.
And guess what? It feels like forever.
We are spoiled by the speed of our web responses these days, so unless you have a convenient chat use case where you can stream one word at a time to keep your user’s attention, or you can run your completions asynchronously in the background and users can happily come back later to see them, there are going to be usability issues.
For example, with the Data Synthesis feature in Entry Point, my ideal scenario was to generate about 10 synthetic data points at a time for our users. But 10 took much longer than 1, and 1 wasted tokens because it included the entire prompt every time (and I keep “engineering” it to be even longer).
So I had to compromise.
I exposed a setting to the end-user to adjust the batch size, and then made it feel faster by generating more batches in the background while you’re reading through the first batch.
That’s not going to be an option for every AI-driven feature—sometimes it will have to be synchronous.
In those cases, I’ll show you what to do at the end of this article.
We talked earlier about edge cases that can arise from including user inputs in your prompt.
Now, depending on what you do with your structured output, there are also security issues that can arise from those edge cases.
Think of it like this—people are constantly trying to hack ChatGPT to get around its safety measures.
For example, ChatGPT will refuse to teach you how to steal a car. But what if you ask it to write a story about a villain who steals a car? That might just be enough to get around its safety training. That kinda sneaky stuff.
When your users start pulling these kinds of shenanigans inside your product it’s called a prompt injection attack.
With engineered prompts and AI chat APIs, users can potentially modify their part of the prompt in a way that changes the output you get.
They will take screenshots and embarrass you online.
It can lead to malicious outcomes if you’re trusting the prompt output for your system to take further action.
Engineered prompts are susceptible to these attacks, and trying to prevent them with more prompt engineering is a never-ending game.
What can be done about all these edge cases, reliability issues, speed vs cost tradeoff, ever-growing engineered prompt lengths, and risk of prompt injection attacks?
The answer is a work in progress.
But I believe a big part of it is moving toward fine-tuned LLMs.
Basically, fine-tuning allows you to provide examples to a LLM and customize it to your specific task. There are many benefits to this:
Fine-tuned models give predictable output. The formatting is part of the training, not part of the prompt. This helps extensively with both the structured formatting issues and also the risk of injection attacks.
They can work faster. Instead of using the best and most advanced model (which also tends to be slower) with a long prompt (thus pricier), you can select a much faster model that does the same quality job right after training. Don't just take my word for it—AI21 offers a case study where a custom model ends up processing requests 5x faster than with prompt engineering.
Bonus: You’ll save dramatically on token costs. The prompt is shorter and the faster models are also typically much cheaper.
In the short-term, prompt engineering is a wonderful tool for prototyping your features.
But once you create a proof of concept with chat-based APIs, it’s time to migrate it to a fine-tuned LLM that can give you the quality, speed, and lower costs your product deserves.
Enter Entry Point AI—our platform makes fine-tuning easy.
To learn more about how to fine-tune a LLM for your product needs on Entry Point, take a look at this demo video.
I’d also love it if you try our beta and send us your feedback to email@example.com.