I recently had a "convert from python to typescript" task for which I created a massive prompt, with various docs pasted in.
All told it was 183kb of text, 21k words. I'm not sure how many tokens that is but i would expect modern state-of-the-art models to be able to handle it easily, and indeed they did.
The Gemini UI ground to a halt when I pasted and I had to wait some ten full seconds or so for the UI to become responsive again so I could push enter, but once it did everything worked fine.
I used o1 pro via the native Mac app from OpenAI and it didn't have such problems.
Most surprising to me was that Gemini 2.0 Experimental Advanced (yes, that's a mouthful) seemed to outperform o1 pro.
1. It was done faster, which is always welcome since we need to try anything the LLM outputs to see if it actually works
2. It required fewer changes to get working
In both cases the output did not immediately work out of the box and had a few type errors. However, with just a few changes the Gemini output worked as expected. After fixing type errors in the o1 output it still failed, although not from any error. It failed silently, and since the Gemini version was working I didn't end up debugging it.
# Conclusion?
This is yet another `n=1` test so it's hard to draw any meaningful conclusion, however I was very happy with the results from Gemini so I intend to continue to pay for Gemini while I will most likely cancel OpenAI at the end of the month.