AI Lessons: Why I had to stop my AI from over-patching my Kernel

Today, I went down a rabbit hole that every systems administrator dreads: a piece of hardware that simply refused to exist.

My Lenovo Miix has a Micro SD slot, but lspci and lsusb showed nothing. It was a ghost.

I turned to an LLM to help me hunt it down, and while we eventually won, the journey taught me more about how to manage AI than it did about Linux kernel flags.

This little and convenient Tablet/Laptop packs a punch and it’s great for not-so-demanding workloads on the go. It’s not my main driver, but when you need to pack light and a simple tablet just won’t cut it, just slap some Linux on one these bad boys and call it a day. Well, almost perfect, except that my Linux Mint refused to see the onboard Micro SD card reader and I can go to bed without fixing this first.

The Mission: Find the device, wake up the ghost.

The goal was simple: get the Realtek RTS5229 PCI Express Card Reader to show up.

When I first ran lspci, the device wasn’t there. lsusb sees nothing.
Spin up a ollama session on my server and feed the reference output to my AI.
lspci, lsusb, and dmseg | tail -n 20

My AI correctly identified that these tablets often have aggressive power management that “hides” the hardware.

I inserted a Micro SD card into the slot and feed the AI the outputs from lspci, lsusb, and dmseg | tail -n 20 again. Bingo! we can se errors!

I identified a PCI device failing to start due to power constraints. AI was right on it’s first hint.

The “Software Patch” (Kernel Boot Parameters)

Analyzing the outputs, my AI LLM suggested to modify my GRUB configuration to include: pci=nommconf pcie_aspm=off at /etc/default/grub

The line in my GRUB configuration file now looks like this:

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash pci=nommconf pcie_aspm=off

After reboot now I can see my Micro SD Card! It Worked!. But it came with a price.

And why it’s always a smart move to examine the results given by AI and not only put them to the test but also scrutinize the steps taken and the logs, anything can break your system and a AI that has not been specifically trained on your systems and scenarios excels at breaking things.

So, rinse and repeat, I rebooted the machine and feed the AI the outputs from lspci, lsusb, and dmseg | tail -n 20 again. Changing parameters in the GRUB configuration always feels like making open heart surgery to the OS. As I expected after the reboot, – and you should always expect this when using AI models – I broke something.

The Side Effect: The Screaming WiFi

As soon as the system booted and the SD card woke up, my system logs (dmesg) started screaming. My Qualcomm Atheros WiFi card began flooding the screen with PCIe Bus Errors >> BadDLLP and RxErr << messages every millisecond.

This is where the interaction with the AI got interesting—and dangerous. The AI’s instinct was to patch the symptom. It suggested:

“Let’s add pci=noaer to silence the error logs.”
“Let’s add pcie_aer_mask_override=0x1 to mask the specific bit.”

This is where I stop, here is where We draw the line.

The system worked fine before any changes, no errors, no alarming logs, just a missing device. Then we found the device, but let me just make a parenthesis for a moment and tell you what are the 2 parameters that I added in the GRUB configuration file do:

pci=nommconf: Tells the system to use an older, more stable way of accessing PCI configuration (using I/O ports instead of memory-mapped space). This often fixes “disappearing” devices on Intel chipsets.

pcie_aspm=off: Disables “Active State Power Management.” This stops the laptop from trying to put the SD card reader to sleep every microsecond—which is exactly what is causing errors in the logs.

The new parameters given by my AI were to patch or silence the errors, so “they won’t bother me” or “fill my logs”. This time patching did not mean fixing. Errors would be going on forever in the background.

The AI was trying to “medicate” the system with more and more kernel flags. We were building a house of cards. This is where the human factor is more important than ever, The AI could be a ‘State of the art’ system, trained with billions, trillions of parameters, but It can make asserted mistakes over and over if human logic is left out of the mix.

The Turning Point: Stop, Rollback, and Reflect

I realized we were playing a high-stakes game of Whack-a-Mole. I just fixed a symptom, and now I created a new “noise.” I had to stop the AI in its tracks.

I told the AI: “Stop and think. Less is more, we had no WiFi errors before implementation. Either pci=nommconf or pcie_aspm=off is the culprit. What if we rollback?”

The “Less is More” Strategy

I need to find out which one of those two flags actually fixed the SD card. If only one is needed, the other is just causing unnecessary trouble.

I stripped everything away. I removed the power management flags and rebooted, checked the logs. I did the same removing the legacy PCI flag. Checked the logs.

I went back to a “clean” GRUB and compared the logs.

So, we have 2 solutions, 3 scenarios.

The Best Outcome: After a clean reboot with zero flags, the SD card stayed visible, and the WiFi errors vanished.

Why did “Doing Nothing” work?

The initial “aggressive” flags acted like a defibrillator. They shocked the hardware out of a “stuck” sleep state (D3). Once the hardware was awake and the kernel had latched onto it, and Mint could see the device forever. it didn’t need the medicine anymore. The system just needed to be “kicked” once, not permanently medicated.

Lessons for Troubleshooting with LLMs

If you’re using an LLM to troubleshoot technical issues, here is the lesson I learned today:

AI is an Optimizer, not always a Architect: LLMs are programmed to “solve” the current prompt. If you show it an error, it will find a patch. It doesn’t always realize that the patch is cluttering your system.
Beware the “Patch Spiral”: AI tends to solve the immediate error you give it. Be wary when an AI keeps asking you to add more lines of code or more config flags to fix the problems created by the previous suggestion.
The Power of the Rollback: Sometimes the best troubleshooting step is to use the AI to “shock” the system, then immediately strip the fix away to see if the system can now stand on its own.
Human Reflection vs. Machine Logic: As the human in the loop, your job is to look for the simplest path. If the AI suggests a complex workaround, ask: “What is the absolute minimum I need to change to get a result?”
Holistic over Granular: Always ask the AI to explain the root cause. If the explanation sounds like it’s just covering up a symptom, hit the brakes.