5 min read

Trying AI (and failing)

Trying AI (and failing)
Disneyland's Rise of the Resistance, March 2020 just days before COVID shut everything down

I'm generally skeptical about AI, mostly because we're currently living in a hype bubble about it, with stories both good and bad, everywhere.

It's clearly not going away so I decided that instead I should treat AI like any other technology that may have good or bad qualities, by evaluating it on its own merits whenever I could. I've heard enough good stories from people I know doing amazing things with AI that save them time and effort that I'm still willing to give it a try from time to time.

Lately, everyone is into AI agents and Paul Ford wrote a great piece about it that mirrors what I've heard from programmer friends also using them. I even dipped my own toes into this water last week when I asked Claude to code up an iOS app that I've had in my head for years and in less than two minutes it cranked out the swift code and files I need to dump into Xcode to test my new app out (haven't done that part yet but it definitely shows promise).

So let's try and save time with AI!

The things I'm most interested in are automations that reduce tedious things I have to do by hand, so asking AI to look up anything I've written in the past on a certain subject and find it for me, or turn a meeting transcript into an outline of things to do next on a project are both things that seem easy enough for a pattern-recognition engine to complete.

Someone at my work asked me to compile a list of a previous articles I've written over the past year that had already been edited and vetted by our in-house experts, so we could re-use them as internal documentation or in other contexts.

I thought this could be a perfect job for AI to save some time for me.

I tend to write a handful of pieces each week, so after a year of working there, I've got a couple hundred articles in my Google account. Google Doc's "last opened by you" interface means it's not easy to look up all my previous docs on a single screen, much less be able to go through them easily. It would take literal hours to open hundreds of docs by hand, read them a bit, then categorize them myself.

Claude trips and falls

I fired up Claude.ai and asked it to look at everything I authored in my Google Docs account, and produce a list of article titles that linked to the docs, all grouped into categories of related subjects.

Claude told me it didn't have access to my google account, even though I use Google sign-on as my login. Claude said I needed to install a Google plugin, and while digging around for that I figured out I was already connected to Google's API (again, from my login).

I told Claude and it apologized with a "oh right, my bad" which was 4-5 minutes wasted tracking down an already existing Google connection. Not a great sign.

So I asked Claude to dig up my old articles and found out Claude doesn't have API access to searching specific files in my Google Drive or Docs so instead it asked me to create, then copy and paste a "share link" for every single thing I've written.

I decided it was time for me to try using Google's own AI instead.

Gemini save me!

I started my first ever session with Gemini, as I figured it should have no issues dealing with Google APIs, so I asked it to compile a list of pieces I've written over the last year about disaster recovery and toss them into some categories for me.

At first it gave me a list of 17 articles spread across four or five categories. I knew this was barely scratching the surface, so from memory I started expanding my requests, asking for all the things I wrote about fires and floods in addition to the stuff it found about hurricanes.

It gathered 30 pieces and eventually over 40 titles of old content and the categories expanded to over a dozen. As I read through results, I kept thinking of more things I wasn't seeing and I'd ask it to gather additional pieces on specific subjects.

What's wild is each time I'd ask for an updated list, it would return a different number of items. Sometimes it would drop back down to 17, other times over 50 would be returned. Mostly about 30 docs I wrote would be shared in the list. Every two or three times I'd make a new request, it'd stop numbering my list and go back to plain bullet points. Then I'd ask it to reinstate the numbers.

I kept going, getting frustrated with the inconsistent output. After an hour of futzing, I eventually tailored my request to docs I'd been the original author of, that had more than a couple hundred words, and to omit any meeting transcripts, since a lot of those were brainstorm sessions with 3-4 people throwing ideas around. Gemini thought some of the ideas pitched in meetings were actual articles and some of my resulting items were things I once pitched but never wrote.

The wheels come off the bus

When I thought my list was finally looking comprehensive, I started checking each of the links to my old Google Docs. A few article titles linked to meeting transcripts. A few linked to my resume which I guess I uploaded at some point. I started seeing a couple articles I know I didn't write, like something about earthquake strapping on buildings to prevent damage, and when I'd follow the link to see what Gemini was using as a source, it was a piece I wrote on how to protect your home from prolonged freezing temperatures.

I know I'm supposed to mention errors to AI so it can fix things. So I asked Gemini why it was hallucinating an article I never wrote about earthquakes, and why it linked to an unrelated piece. I got apologies and muddled excuses. I noticed half the links to my shared Google Docs linked to a google.com search instead of a docs.google.com URL. Again I asked Gemini to be sure and link directly to actual pieces and I got more apologies.

I asked Gemini to save the list output as a new doc in my account and was told it wasn't allowed to create new files even though it's Google's Gemini in my Google account accessing my Google Docs.

Eventually, I copied and pasted the results from a few dozen Gemini outputs into one giant long document. I found about 50 unique pieces I had to categorize and share, and as I worked on those by hand, the titles continued to jog my memory about other pieces I'd written but weren't mentioned so I'd search Google Docs and add them to the list myself.

Conclusion: not good

A thing I thought might take five minutes and save me oodles of tedious work ended up taking over two hours to produce a long list, half done by hand, half by Gemini, after removing erroneous links and straight up hallucinations.

I think my request of "Hey Gemini, show me a list of all the articles I wrote over the last year and arrange them into categories by subject" is a straightforward one, and I came away from this experience surprised that Gemini shipped these features as bleeding edge AI to customers when it never really delivered for me.

I've had a few good experiences with AI actually saving me time and effort (especially around transcribing audio and video to text I can later edit/transform into stories) but in this case I can't believe we're boiling oceans and burning forests and firing everyone in the tech industry for AI systems that can't complete fairly simple tasks.

Newport, Oregon sand dunes in 2013

Subscribe to our newsletter.

Be the first to know - subscribe today