QUALITY & SECURITY/INSIGHTS/THE-APP-WORKED-SONARQUBE-FOUND-800-ISSUES

    The App Worked. SonarQube Still Found More Than 800 Issues

    What happened when I scanned a working AI-built application, why a green dashboard can be misleading, and what I changed afterward.

    QUALITY & SECURITY · 5 MIN READ · JUNE 21, 2026

    By the time I connected InvoShift to SonarQube, the application had already existed for four or five months.

    It was not an empty demonstration. It included user registration, invoicing, a calendar, a built-in time tracker, email workflows, and many smaller features around them.

    The application worked.

    Then the first scan reported more than 800 issues and over 60 security hotspots.

    For the first few seconds, I do not think I considered anything. I was simply looking at the numbers.

    Then I started estimating how many prompts, tokens, and actual money it might take to fix everything. Replit displayed the cost of prompts at the time, so this was not an abstract concern. Even a rough estimate made the cleanup feel like it might not be worth the effort.

    That reaction taught me something important: working software and trustworthy software are not the same thing.

    Not every finding was a vulnerability

    The security hotspots received my attention first.

    Some looked alarming but turned out to be harmless. InvoShift uses Mailtrap for email functionality, with separate templates for account activation, email confirmation, payment confirmation, and other messages. Those templates were referenced by UUIDs.

    SonarQube detected some of those UUIDs as possible clear-text passwords.

    They were not passwords or credentials. They were template identifiers, so those findings could be reviewed and marked as safe.

    Other findings represented real problems. Weak cryptography had been used in multiple places. Some regular expressions needed safer alternatives. There were also many duplicated issues where one poor implementation choice appeared repeatedly throughout the codebase.

    That made the total more manageable. Eight hundred findings did not necessarily mean eight hundred unrelated engineering problems. Sometimes one corrected pattern removed many instances of the same problem.

    Little by little, I started fixing them.

    The same problems kept returning

    The fixes were usually implemented through the same process used to build the application.

    I collected the findings, discussed them with an AI assistant, prepared detailed instructions, and gave those instructions to Replit. The prompt might require stronger cryptography, a safer regular expression, better validation, or another specific correction.

    The frustrating part was that fixing a problem did not mean it was gone forever.

    Replit could generate the same class of issue again in a later feature. It did not reliably learn from the previous correction. The code changed, but the development behavior did not.

    This is why I eventually started separating two different goals:

    1. Help the coding agent avoid repeating known mistakes.
    2. Prevent bad code from passing through the delivery pipeline even when the agent repeats them.

    The first goal is informational. A read-only SonarQube MCP server can show the coding agent recurring rules and historical patterns across projects. I do not expect this to eliminate mistakes, but it should reduce how often the same mechanical problems return.

    The second goal is enforcement. That is where automated checks matter.

    The mistake of making the dashboard green

    At one point, during a brainstorming session with an AI assistant, changing the SonarQube Quality Gate was suggested.

    I accepted the suggestion without thinking carefully enough about what it meant.

    Later, I noticed that dozens of issues had disappeared. The code had not improved. The dashboard had.

    When I reviewed the gate configuration, I understood what had happened. I had not solved the problems. I had changed the measurement so that fewer of them mattered.

    It was a poor decision, and I reverted it.

    That experience changed how I look at quality tools. A green dashboard is not the objective. A green dashboard should be evidence that the code meets a standard that is appropriate for production.

    Weakening a rule because it is noisy can be reasonable when the rule genuinely does not apply. Weakening a gate because fixing the code is inconvenient is something else entirely.

    The distinction matters even more when AI is involved. An AI assistant is often very good at finding the shortest route to the requested result. If the result is "make the pipeline pass," changing the pipeline may technically satisfy the instruction.

    The instruction must instead be: resolve the underlying production risk, preserve the standard, and explain any exception.

    SonarQube is only one part of the process

    I have a personal preference for SonarQube because it provides a dedicated web interface. I can inspect projects, trends, issues, hotspots, and Quality Gate results in one place.

    That does not mean it is the only check, or necessarily the most powerful one.

    The current ARAG workflow combines several controls:

    • ESLint for code quality, security, and unsafe regular-expression patterns
    • TypeScript checks
    • Semgrep rules
    • CodeQL analysis in GitHub
    • Unit tests and coverage
    • Application smoke tests
    • SonarQube analysis and an enforced Quality Gate

    Some of these tools report findings through build logs. CodeQL has its own GitHub security interface. SonarQube gives me the dashboard I prefer. Their interfaces differ, but their purpose is shared: code should not proceed simply because it runs.

    What the first scan actually changed

    The initial scan did not make me abandon InvoShift.

    It changed the order in which I would build the next application.

    If I started InvoShift again today, I would integrate the quality and security checks from day one. I would settle the design and marketing direction before repeatedly changing logos, fonts, and colors during development. I would spend more time deciding which features small businesses and freelancers actually need.

    Most importantly, I would plan version one and version two before asking an agent to start building.

    The lesson was not that AI-generated code is unusable. InvoShift is live, and its core functionality was built surprisingly quickly.

    The lesson was that delayed quality control creates a debt that is difficult to see while the application appears to work.

    More than 800 issues were not created by SonarQube. SonarQube only made them visible.