I found a short article summarizing the octopus test I mentioned earlier: Emily Bender (kottke.org)
And here's an extra article on what these models understand in general: Does GPT-4 Really Understand What We’re Saying? - Nautilus
And here's an extra article on what these models understand in general: Does GPT-4 Really Understand What We’re Saying? - Nautilus