Underwhelming LLMs

Some people, when confronted with a problem, think “I know, I’ll use an LLM.” Now they have two problems.

Today I was writing a relatively simple PowerShell script to alert before secrets have expired, this was my first attempt:

$config = Get-Content $configFile | ConvertFrom-Json -Depth 100 -AsHashtable

$config.GetEnumerator() | ForEach-Object {

    $alert = $_.Name
    $ExpiryDate = [datetime]::ParseExact($_.Value.ExpiryDate, "yyyy-MM-dd", $null)
    $description = $_.Value.Description
    $NoticePeriodInDays = $_.Value.NoticePeriodInDays
    $daysToExpire = ($ExpiryDate - (get-date)).Days
    $shouldNotify = (get-date).AddDays($NoticePeriodInDays) -gt $ExpiryDate

    if ($shouldNotify) {

        Invoke-RestMethod -Method post -ContentType 'application/json' -uri $TeamsWebhookUrl `
        -Body "{""text"":""$alert is expiring on $($ExpiryDate.ToString("yyyy-MM-dd")). This is $daysToExpire days away. See further details here: $description""}"

    }
    else {
        Write-Host "$alert is expiring on $($ExpiryDate.ToString("yyyy-MM-dd")). This is $daysToExpire days away."
    }
}

I then proceeded to ask various LLMs out there (Gemini, Deep Seek, OpenAI and Claude) to refactor and write tests for it.

The refactors went well but not a single one managed to provide working tests the first time around.

I then tried feeding the refactored code, as generated by that LLM, and simply asked it to write tests for the script and it wasn’t great:

DeepSeek required minor changes (dot sourcing required the script parameters) to get to 3 failures out of 14 tests.
ChatGTP worked out of the box but all 11 tests failed.
Gemini required minor changes (dot sourcing required the script parameters) to get to 16 failed tests out 16.
Claude worked out of the box and got 4 failures out of 11 tests.

I didn’t engage in a deep analysis of the tests but some were a bit questionable, essentially testing basic powershell functionality, but hey if you aim for 100% coverage, then

This is the way

I suppose

I can’t cease to be both amazed and immensely frustated by these LLMs