GEP-25
|
Abstract
This GEP proposes an optional, opt-in @TypeChecked extension — TaintChecker — that performs
compile-time taint tracking over Groovy code, together with two type-qualifier annotations,
@Tainted and @Untainted. It joins the existing family of pluggable checkers in the
groovy-typecheckers module (NullChecker, RegexChecker, FormatStringChecker, PurityChecker,
and the shape checkers CombinerChecker / MonadicChecker).
Taint tracking is the classic security dataflow analysis: data originating from an untrusted source
(request parameters, environment, files, deserialized input) is flagged as @Tainted, and the checker
reports a compile error when tainted data reaches a sink that requires @Untainted data (a SQL
query, a shell command, an HTML response, a file path) without first passing through a sanitizer.
The whole analysis is a two-point lattice — @Untainted <: @Tainted — which keeps it a fast, sound,
solver-free dataflow pass rather than a heavyweight verification engine.
Two design choices distinguish this from a straight port of existing Java taint checkers. First, the
analysis is aware of Groovy’s own structural mechanisms — most notably GString, which already
gives groovy.sql.Sql safe-by-construction, parameterized queries; the checker can recognize and
reward that existing idiom rather than ignore it. Second, the @Tainted / @Untainted vocabulary is
deliberately the two extremes of a security lattice, so it remains forward-compatible with richer,
external information-flow tools without those tools needing their own annotations.
Motivation
Injection flaws (CWE-89 SQL injection, CWE-79 cross-site scripting, CWE-78 OS command injection, CWE-22 path traversal) remain perennially at the top of the OWASP Top Ten. They share one structure: untrusted data flows into a trusting operation without sanitization. That structure is exactly what taint analysis catches, and catching it at compile time — in the same pass as type checking, with the developer’s IDE underlining the offending flow — is materially cheaper than catching it in a separate SAST tool, a code review, or production.
Groovy is a frequent host for precisely the code where this matters: build scripts, server-side web
handlers (Grails, Micronaut, Ratpack, raw servlets), SQL via groovy.sql.Sql, markup via
MarkupBuilder and template engines, and shell-outs via "cmd".execute(). A Groovy-native taint
checker meets that code where it lives.
The broader trajectory matters too. As AI-assisted security tooling — vulnerability detection,
suggested fixes, automated pull requests — continues to gain momentum, a language that exposes a
typed, machine-checkable taint substrate is far better positioned than one that offers only freeform
source to reason about. An AI can propose @Tainted / @Untainted / sanitizer annotations; the
compiler then verifies the resulting flow soundly. The model proposes, the checker disposes. That
turns a probabilistic suggestion into a checked guarantee, and it gives AI-generated Groovy code a
compile-time backstop against the injection bugs such code is prone to. Putting the substrate in place
now is what lets Groovy ride those advances rather than retrofit them later.
Design principles
-
Opt-in and composable — like every other checker in the family,
TaintCheckeris enabled per class or per method via@TypeChecked(extensions = ['groovy.typecheckers.TaintChecker']), and composes with the other extensions on the same compile. Code that does not opt in is completely unaffected. -
Low-noise by default — a taint checker that cries wolf is uninstalled. The default qualifier is
@Untainted; only known sources are@Tainted(via the shipped model library or user annotation), so a diagnostic fires only when data the checker knows is tainted reaches a sink. This trades a measure of soundness (an un-modeled source is missed) for trust, with a stricter, sound-by-default mode available behind a flag. -
Two annotations, role by position —
@Taintedand@Untaintedare the entire surface. A source is a@Taintedreturn; a sink is an@Untaintedparameter; a sanitizer is an@Untaintedreturn that accepts@Taintedinput. There is no separate@Source/@Sink/@Sanitizervocabulary and no lattice-configuration machinery — a two-point order needs none. -
Reward existing Groovy safety — Groovy already has safe-by-construction idioms (GString-based parameterized SQL, escaping builders). The checker recognizes these as built-in sanitizing sinks rather than forcing developers to annotate around them.
-
Honest scope, loud where it must skip — Groovy’s dynamic dispatch cannot always be tracked statically. Where the checker cannot follow a flow it says so (a "flow not tracked" note) rather than silently passing or silently failing, mirroring the honesty of the sibling checkers.
-
Forward-compatible vocabulary —
@Tainted/@Untaintedare the top and bottom of a security lattice. Richer external analyses can consume the same annotations as the extremes of a finer ordering, so annotating forTaintCheckeris never wasted effort.
The annotations
Two annotations, usable in TYPE_USE position (and on methods, parameters, and fields):
package groovy.transform // package tentative
@Documented @Retention(RUNTIME) @Target([TYPE_USE, METHOD, PARAMETER, FIELD, LOCAL_VARIABLE])
@interface Tainted {}
@Documented @Retention(RUNTIME) @Target([TYPE_USE, METHOD, PARAMETER, FIELD, LOCAL_VARIABLE])
@interface Untainted {}
The qualifier lattice is two points with @Untainted <: @Tainted: an untainted value may be used
wherever a tainted value is accepted, but not the reverse. The role an annotation plays is determined
entirely by where it appears:
| Position | Role |
|---|---|
|
Source — the method yields untrusted data ( |
|
Sink — the method requires trusted data ( |
|
Sanitizer — the method launders taint ( |
|
A tracked value of that qualifier |
This is the whole model. There is no separate declassifier annotation: an @Untainted return is the
declassification, because in a two-point world there is only one direction worth naming
(tainted → untainted).
Features
Sources, sinks, and the basic flow
@TypeChecked(extensions = ['groovy.typecheckers.TaintChecker'])
class Handler {
@Tainted String userName(HttpServletRequest req) {
req.getParameter('name') // request params are modeled @Tainted
}
void run(Sql sql, HttpServletRequest req) {
@Tainted String name = userName(req)
// ERROR: a @Tainted String reaches a query sink built by concatenation
sql.execute('SELECT * FROM users WHERE name = \'' + name + '\'')
}
}
The diagnostic names the flow, in the style of the other checkers:
[Static type checking] - Tainted value 'name' reaches the untainted parameter of Sql.execute; data from Handler.userName (a @Tainted source) flows here without sanitization
Sanitizers (the only declassification needed)
A method whose return is @Untainted launders taint — the checker trusts the annotation and treats
the result as untainted regardless of input:
@Untainted String escapeHtml(@Tainted String s) { /* ... escaping ... */ }
void render(Writer out, @Tainted String comment) {
out.write(escapeHtml(comment)) // OK: the sink receives untainted data
}
For the rare audited-inline case — "I have validated this value by other means" — an explicit, greppable suppression is provided rather than a silent cast:
@SuppressWarnings('groovy.taint')
@Untainted String trusted = afterMyOwnValidation(raw)
Suppressions are the audit trail: a security reviewer greps groovy.taint to find every point where
a human overrode the analysis, exactly as Declassify-style markers serve in richer tools.
Propagation and flow sensitivity
Taint propagates through the expression grammar: assignment, string concatenation and GString
interpolation (see below), arithmetic, and method calls (a value derived from tainted arguments is
tainted unless the callee declares an @Untainted return). The analysis is flow-sensitive: a
variable may be tainted at one program point and untainted at a later one after sanitization —
@Tainted String x = source()
sink(x) // ERROR
x = escapeHtml(x) // x is @Untainted from here on
sink(x) // OK
— but it is not value-dependent: a value’s qualifier never depends on the runtime value of other state. That is the property that keeps the analysis a boolean dataflow pass with no solver. (It is also the natural boundary at which a deeper, external tool would take over; see Forward compatibility.)
Pass-through methods and polymorphism
Many methods neither taint nor sanitize — String.trim(), .toLowerCase(), .substring(int) — they
pass taint through. Annotating these @Untainted would leak; annotating them @Tainted would over-
report. A third, polymorphic qualifier, @PolyTainted, expresses "the result’s taint matches the
receiver/argument’s taint":
@PolyTainted String trim(@PolyTainted String self) { ... } // trim(tainted) is tainted; trim(untainted) is untainted
@PolyTainted is what makes the model library for the JDK’s string API usable. It is part of the
design but is presented as a refinement: a first cut can model pass-through methods conservatively
(propagate taint) and add @PolyTainted where over-reporting proves painful.
Leveraging existing Groovy mechanisms
Groovy already encodes injection safety structurally, before any annotation exists, and the checker
should reward that rather than ignore it. The headline case is GString.
groovy.sql.Sql special-cases GString arguments: a query written as an interpolated string is
compiled to a parameterized PreparedStatement, with the interpolated values bound as parameters
rather than concatenated into SQL. The type of the argument — GString versus String — already
distinguishes the safe path from the dangerous one:
sql.execute("SELECT * FROM users WHERE name = ${name}") // GString -> parameterized -> SAFE even if name is @Tainted
sql.execute('SELECT * FROM users WHERE name = ' + name) // String -> concatenated -> ERROR if name is @Tainted
TaintChecker recognizes the GString-accepting Sql methods as built-in sanitizing sinks: the
interpolated holes become bound parameters, so tainted values inside a GString destined for such a
sink are safe, while the same data flattened into a String (via concatenation or .toString())
and passed to the raw-String overload is reported. This generalizes: a GString carries its
structure (fixed fragments versus interpolated values), and APIs that consume that structure
(parameterized queries, attribute-escaping builders) are inherently safer than their flat-String
equivalents. The checker can be taught these GString-aware APIs out of the box, retrofitting taint
semantics onto idioms Groovy developers already use — no new annotations on user code required.
This is a distinctively Groovy advantage: the language’s structured-string type gives the checker a
ready-made, zero-annotation safety boundary that a String-only language does not have.
Defaulting and the soundness / noise trade-off
The default qualifier is @Untainted. String and character literals are untainted; unannotated
locals, parameters, fields, and returns default untainted; the shipped model library marks known
external-input APIs @Tainted and known dangerous APIs as @Untainted-requiring sinks. The
consequence is explicit and worth stating plainly:
-
Optimistic default (proposed) — low false-positive rate; a diagnostic fires only for flows the checker has been told about. The cost is potential false negatives: an un-modeled source is treated as untainted and its flows are missed.
-
Strict mode (flag) — external inputs (and, optionally, all unannotated boundaries) default
@Tainted. Sound, but noisier; appropriate for security-critical code that accepts the annotation burden.
The optimistic default is proposed as the out-of-the-box behavior precisely to protect the checker’s reputation for trustworthiness (see Risks).
Scope and limitations
-
Intraprocedural plus same-unit interprocedural — flows are tracked within a method and across calls whose callee is resolvable at compile time (same compilation unit, or modeled in the library). Whole-program, cross-jar taint is out of scope for a type-checker extension.
-
Dynamic dispatch — a call whose target cannot be resolved statically yields a loud "flow not tracked" note rather than a silent verdict.
-
Explicit flows only (initially) — implicit flows (taint induced by branching on tainted data) are not tracked in the first version; they are a well-known source of noise and are deferred to an opt-in mode. Most real injection bugs are explicit flows.
-
Collections are coarse — a container is tracked with a single taint bit rather than per-element; per-element precision is value-dependent and is exactly the boundary handed off to richer tools.
A worked example — composing with another checker
The value of the checker family is that several run on one compile, each on its own concern:
@TypeChecked(extensions = ['groovy.typecheckers.NullChecker',
'groovy.typecheckers.TaintChecker'])
class Comments {
@Tainted String body(HttpServletRequest req) { req.getParameter('body') }
void post(Sql sql, Writer page, @NonNull HttpServletRequest req) {
@Tainted String raw = body(req)
@Untainted String safe = escapeHtml(raw)
sql.execute("INSERT INTO comments(text) VALUES (${safe})") // GString sink: parameterized, OK
page.write(safe) // untainted to an HTML sink: OK
}
@Untainted String escapeHtml(@Tainted String s) { /* ... */ }
}
NullChecker verifies the request is non-null; TaintChecker verifies no tainted comment reaches the database or the page. Distinct concerns, distinct error channels, one compile.
Implementation
TaintChecker is a org.codehaus.groovy.transform.stc.TypeCheckingExtension (the same SPI as the
other checkers). During type checking it:
-
assigns each expression a taint qualifier, propagating along the dataflow rules above;
-
reads
@Tainted/@Untainted/@PolyTaintedon declarations, and the shipped model library, to fix qualifiers at boundaries; -
checks assignments to
@Untaintedtargets and arguments at@Untaintedparameters, reporting a flow that delivers tainted data; -
recognizes the GString-aware safe sinks (e.g.
groovy.sql.Sql) as built-in sanitizers.
The non-trivial deliverable is the model library: a curated set of qualifier annotations (or external
stub files) for the JDK and common Groovy/Java libraries — sources (HttpServletRequest,
System.getenv, Files.readString, Scanner, deserialization), sinks (Statement,
Runtime.exec/ProcessBuilder, Files/Path constructors, response writers, MarkupBuilder raw
sinks), sanitizers, and @PolyTainted pass-through string methods. The accuracy and currency of this
library, not the propagation engine, is the bulk of the work — and the bulk of the ongoing
maintenance.
Prior art
| Tool | Approach | Relationship to this proposal |
|---|---|---|
Checker Framework — Tainting Checker (Java) |
Pluggable type qualifiers |
Closest analogue; this proposal adapts the same two-point model to a Groovy |
Ballerina taint analysis |
Built-in |
Demonstrates a mainstream language shipping taint checking as a first-class, low-ceremony feature |
FindSecBugs / SpotBugs, SonarQube |
Bytecode/AST heuristics, separate tool |
Post-hoc and tool-external; this proposal is in-compiler and annotation-directed |
CodeQL, Semgrep |
Whole-program / query-based dataflow |
Deeper and cross-jar, but heavyweight and out-of-band; complementary, not a substitute |
OWASP ESAPI, encoder libraries |
Runtime sanitizers |
The sanitizers this checker’s annotations would mark; the checker enforces that they are called |
Forward compatibility
The @Tainted / @Untainted qualifiers are the bottom and top of a security lattice. This keeps the
vocabulary deliberately open-ended:
-
Richer information-flow tools — an external analysis offering multi-level lattices, value-dependent labels, or proof-backed declassification (for example an SMT-backed verifier such as
groovy-verify) can consume the same@Tainted/@Untaintedannotations as the extremes of a finer ordering. Code annotated forTaintCheckertherefore remains meaningful to a deeper analysis with no re-annotation, and the two compose cleanly — the cheap checker covers all opted-in code, the deeper tool adds precision and proof where it reaches. This GEP does not depend on any such tool and does not specify their internals; it simply avoids closing the door on them. -
AI-assisted security — as automated vulnerability detection and repair mature, the annotation layer is the natural integration point: a tool (or model) proposes source/sink/sanitizer annotations, and the compiler turns those proposals into checked guarantees rather than unverified suggestions. Shipping the substrate positions Groovy to gain security posture from that ecosystem as it develops, and to give AI-generated Groovy a compile-time injection backstop.
Excluded and deferred features
| Feature | Status | Rationale |
|---|---|---|
Implicit (control-flow) taint |
Deferred (opt-in) |
A known source of false positives; most injection bugs are explicit flows |
Per-element collection taint |
Deferred |
Value-dependent; the boundary at which a richer external tool takes over |
Whole-program / cross-jar tracking |
Not planned |
Out of scope for a type-checker extension; CodeQL/Semgrep territory |
Configurable multi-level lattices |
Not planned (here) |
A two-point checker needs none; multi-level is left to external tools sharing the vocabulary |
|
Refinement |
Part of the design; a first cut may model pass-through methods conservatively and add it incrementally |
Strict (sound-by-default) mode |
Behind a flag |
Available for security-critical code that accepts the annotation burden |
Compatibility
Backwards compatibility
The feature is entirely opt-in and additive. No existing program changes behavior:
-
The checker runs only where
@TypeChecked(extensions = ['groovy.typecheckers.TaintChecker'])is applied; absent that, nothing changes. -
@Tainted/@UntaintedareRUNTIME-retained marker annotations with no effect on code that does not enable the checker; they are ignored like any unrecognized annotation. -
The annotations and the checker ship in the existing
groovy-typecheckersmodule; core Groovy semantics are untouched.
Interaction with other checkers
TaintChecker composes with the other extensions on a single @TypeChecked declaration, each
reporting on its own concern through its own diagnostics, as in the worked example above.
Risks, maintenance, and status
This GEP is deliberately frank about why it is a placeholder rather than a committed deliverable.
-
Taint checkers carry a false-positive / maintenance reputation. Tools in this space are routinely uninstalled because they are too noisy, or quietly distrusted because they are too lax. The proposal leans hard on the low-noise default, GString-awareness, explicit greppable suppression, and loud skips precisely to earn and keep developer trust — but the reputational risk is real and must be managed by conservative defaults and good diagnostics, not wished away.
-
The model library is an ongoing commitment. The propagation engine is modest; the durable cost is curating and keeping current the source/sink/sanitizer annotations for the JDK and the common Groovy/Java ecosystem. This is the part the Groovy team would need to be comfortable owning (or governing as a community-maintained, versioned artifact) before the feature should ship. An out-of-date or incomplete model library is itself a source of both false negatives and false positives.
-
Soundness is a chosen posture, not a guarantee. The optimistic default is explicitly unsound (it can miss un-modeled sources) in exchange for trustworthiness. Users who need soundness opt into strict mode and its noise. The GEP does not claim to prove the absence of injection — only to catch the flows it has been told to look for.
-
Scope is bounded and honest. Dynamic dispatch, cross-jar flows, implicit flows, and per-element precision are out of scope or deferred; the checker says so rather than pretending otherwise.
Accordingly the status is Draft (placeholder). The near-term purpose is to fix the annotation
vocabulary (@Tainted / @Untainted, and the role-by-position convention) so Groovy 6.x can align to
it, to gather feedback, and to find out whether there is both community appetite and committed
maintenance for the model library. Implementation in Groovy 7.0 is conditional on that.