When integrating LLMs in your products, you often want to generate structured data, like JSONs. With the help of function calling (released June 13th 2023), this process has become much simpler! In this post I will explore the new API. Thanks for reading Simon’s Substack! Subscribe for free to receive new posts and support my work.
Not sure if it's smart to champion json schemas as a way to go, since its a resource format that brought challenges in the past leading to the emergence of GraphQL — that eventually established capability oriented design over resource oriented imperative implementations. Solving a long standing integration nightmare. So a neat way for now to "get started" with the ChatGPT API specifically, but should not be over invested. OpenAI is aware of this and the guys are working on an implementation that is more similar to https://lmql.ai/#cot ; being able to express natural language prompts that also contains code.
I agree with your sentiment. JSON Schemas is a cool new tool in the toolbox for OpenAI API users; and you can even "hack" it to make it do some "programming"; but at the end of the day, it's a hack.
LMQL does feel like magic from time to time, it's really cool!
The idea would be something like this. - I would recommend to test the impact this has on the quality of results. (It might make things worse! or better!)
"steps": {
"type": "array",
"description": "Lets think step by step, before outputting the result",
GPT can only „think“ by outputting, and it outputs each token after spending the same amount of compute on it (regardless of the cognitive complexity of the task).
The intuition here is that you can give GPT a scratchboard to output it’s ideas (result.steps), which is ignored by us but is used by the attention mechanism of GPT to make a conclusion and later output it to result.result.
This is conceptually similar to how we humans can hold an inner monologue with ourselves before answering a question.
Thanks! Updated the post to reflect this. I can't manage to make it generate invalid JSON, even on 3.5 (but can easily do so if I just pass the json schema). Very interesting!
There is no evidence they are using jsonformers. The performance is exactly the same as if you were to just feed it a json schema and tell it to format the output. This is literally exactly the same as prompt engineering they just do it for you.
I think it's unlikely that they are literally pasting the JSON Schema into the prompt:
- The naive approach of literally pasting a json schema would use up 371 tokens for the schema alone (whereas I was billed 126)
- Adding multiple functions does not increase the token usage by as much as you'd expect if they were pasting the json schemas in the prompt
- There are specific JSON Schema features that are unsupported in the API. For example: consts, if/else cases (these are ignored by functions)
- My tests show that the model is pretty robust against malicious attempts to make it not output json or break the syntax. GPT 3.5 did not withstand the same tests. However, I believe more testing is needed to rule this out.
I am not stating with 100 % confidence that they are using the same approach as jsonformers, but it would be my best guess given my observations.
I don't think JSON Schema on its own is turing complete (I haven't seen any examples of it), but: My intuition tells me that the necessary control flow primitives for Turing completeness are in JSON Schema with the help of #ref and anyOf / allOf primitives. As for memory, you might be able to get a helping hand from GPT. That would get you pretty close to having a Turing complete execution engine inside the JSON Schema.
The motivation here would be to pass down strategies with several branches and loops in a single API call. The more expressive the JSON Schema language is, the more complex algorithms you can run in a single API call.
Well you _can_ give it the choice of picking a function. In this example, there was no choice to be made so I set `function_call` to a specific function. You can avoid setting `function_call` to let the model decide :)
Not sure if it's smart to champion json schemas as a way to go, since its a resource format that brought challenges in the past leading to the emergence of GraphQL — that eventually established capability oriented design over resource oriented imperative implementations. Solving a long standing integration nightmare. So a neat way for now to "get started" with the ChatGPT API specifically, but should not be over invested. OpenAI is aware of this and the guys are working on an implementation that is more similar to https://lmql.ai/#cot ; being able to express natural language prompts that also contains code.
Dino, it's cool to see you here!
I agree with your sentiment. JSON Schemas is a cool new tool in the toolbox for OpenAI API users; and you can even "hack" it to make it do some "programming"; but at the end of the day, it's a hack.
LMQL does feel like magic from time to time, it's really cool!
Just a big thanks for an easy guide on how to use this. :) OpenAIs own documentation is lacking.
Great post. I'm intrigued about how you'd use the schema and function to do chain of thought prompting?
The idea would be something like this. - I would recommend to test the impact this has on the quality of results. (It might make things worse! or better!)
"steps": {
"type": "array",
"description": "Lets think step by step, before outputting the result",
"items": { "type": "string" }
},
"result": { "type": "string" }
Thanks. That’s a creative idea. That was my #1 concern with functions in not being able to give the model space to think. 👍
If you tell GPT to think before it starts to output, will it actually think trough its output before it starts outputting?
I dont think so. Thats not how LLMs work, no?
GPT can only „think“ by outputting, and it outputs each token after spending the same amount of compute on it (regardless of the cognitive complexity of the task).
The intuition here is that you can give GPT a scratchboard to output it’s ideas (result.steps), which is ignored by us but is used by the attention mechanism of GPT to make a conclusion and later output it to result.result.
This is conceptually similar to how we humans can hold an inner monologue with ourselves before answering a question.
See also https://platform.openai.com/docs/guides/gpt-best-practices/tactic-instruct-the-model-to-work-out-its-own-solution-before-rushing-to-a-conclusion
Oh.
So basically it's "thinking" by writing/outputting to a space that the user never sees?
Thank you :D
Yup!
Did you create the image of the Recipe Creator app yourself? It's very well-drawn and aesthetically pleasing.
Yes, I made it with Excalidraw. https://excalidraw.com/
Check it out, it's awesome!
Oh wow. Thank you. I didn’t realize a tool like this existed. It’s just what I’ve been looking for. 🙏🏼
is there a way to produce json in different languages ? i used to be able to do so with prompt engineering, but now how can we do that ?
Looks like OpenAI added a note since your writing:
> (note: the model may generate invalid JSON or hallucinate parameters)
This suggests that it’s not masking the token posterior with the schema, and just relying on the system message and improved steering.
Sad. I was getting excited at having access to something other than just bulk inference out of these most advanced models
Thanks! Updated the post to reflect this. I can't manage to make it generate invalid JSON, even on 3.5 (but can easily do so if I just pass the json schema). Very interesting!
There is no evidence they are using jsonformers. The performance is exactly the same as if you were to just feed it a json schema and tell it to format the output. This is literally exactly the same as prompt engineering they just do it for you.
I think it's unlikely that they are literally pasting the JSON Schema into the prompt:
- The naive approach of literally pasting a json schema would use up 371 tokens for the schema alone (whereas I was billed 126)
- Adding multiple functions does not increase the token usage by as much as you'd expect if they were pasting the json schemas in the prompt
- There are specific JSON Schema features that are unsupported in the API. For example: consts, if/else cases (these are ignored by functions)
- My tests show that the model is pretty robust against malicious attempts to make it not output json or break the syntax. GPT 3.5 did not withstand the same tests. However, I believe more testing is needed to rule this out.
I am not stating with 100 % confidence that they are using the same approach as jsonformers, but it would be my best guess given my observations.
You JSON schema is 134 tokens when minified
https://imgur.com/a/pQi7xtO
Here is a similar size prompt that does the same thing on 3.5 with just prompt engineering.
https://chat.openai.com/share/cdfbe292-bb6f-4f45-ae26-0a8d61c48f6c
And GPT4 was always robust against malicious attempts
https://chat.openai.com/share/a7fe1531-a504-4e36-a727-10cf0d0743ad
Thanks! I have updated the post to reflect the uncertainty about the method used
By the way, the fact that we can now rely on GPT 3.5 for JSON generation is cool, even if GPT 4 could already do it!
How could JSON schema be a Turing complete language?
Other type systems like Typescript's type system have been proven to be Turing complete. https://github.com/microsoft/TypeScript/issues/14833, https://github.com/ronami/meta-typing
I don't think JSON Schema on its own is turing complete (I haven't seen any examples of it), but: My intuition tells me that the necessary control flow primitives for Turing completeness are in JSON Schema with the help of #ref and anyOf / allOf primitives. As for memory, you might be able to get a helping hand from GPT. That would get you pretty close to having a Turing complete execution engine inside the JSON Schema.
The motivation here would be to pass down strategies with several branches and loops in a single API call. The more expressive the JSON Schema language is, the more complex algorithms you can run in a single API call.
There is a `function_call` parameter which lets you demand a specific function to be called.
Yup. You might get gibberish if its completely irrelevant.
Well you _can_ give it the choice of picking a function. In this example, there was no choice to be made so I set `function_call` to a specific function. You can avoid setting `function_call` to let the model decide :)