How to Build a Funnel Classification System Worth Trusting – NoirQuartz

Measurement

[gtm] [funnel-architecture] [ga4] [classification] [measurement] [signal-quality]

How to Build a Funnel Classification System Worth Trusting

NoirQuartz 2026 12 min read

Most funnel classification systems are built on guesswork that looks like data. A marketer picks three page views for MOF and two sessions for BOF because those numbers felt reasonable. The system runs, produces classifications, sends emails, feeds ad platforms, gates experiments. Everything appears operational. The classifications are fiction. Every downstream decision inherits that fiction.

The GTM funnel architecture described in the preceding article is only as good as the behavioral criteria it classifies against. Build that architecture on guessed thresholds and you have a sophisticated system producing confident-looking outputs that don’t correspond to actual customer intent. The confidence is the problem. A system that looks broken gets fixed. A system that looks operational while producing wrong classifications runs indefinitely, compounding bad decisions across every channel it feeds.

This article is about how to build the classification criteria correctly — from your own customer behavior data, with documented thresholds, and a maintenance protocol that keeps the system accurate as your business changes.

One framing point before the build: a funnel classification system is not a reporting layer. It is a routing layer for intervention. Its purpose is not to describe users accurately in a dashboard — it is to change what you do to a user based on where they are in their decision journey. The standard for whether the system is working is not whether the stage distribution looks coherent in GA4. It is whether BOF users convert at materially higher rates when you act on their classification, whether stage-based experiments isolate distinct behavioral populations, whether CRM sequences produce lift rather than nuisance. A system that classifies users correctly but doesn’t change any intervention is expensive instrumentation with no return. Build it to route, not to report.

// 00 The threshold problem

The dangerous property of a decaying classification system is that it keeps producing outputs that look credible. GTM still fires. GA4 still receives events. VWO still runs experiments. Email sequences still send. There is no error flag. No dashboard turns red. The system fails quietly and every downstream decision fails with it.

The failure mode has a specific shape. A marketer defines BOF as: two or more sessions, one product page view, one pricing page view. Those criteria made sense when they were set — they described the behavioral pattern of near-buyers at that point in time, for that product, with that traffic mix. Eighteen months later the product has a new pricing page, a new traffic source is sending users who browse differently, and the competitive landscape has shifted consideration from 14 days to 28 days. The criteria haven’t changed. The behavior they were calibrated against has.

Users who were genuinely MOF are now being classified as BOF. Your BOF email sequence is firing at people who aren’t ready. Your VWO BOF experiment is running on a diluted population. Your Meta CAPI is sending BOF-tagged events for users whose intent doesn’t match the label. Every platform downstream receives a confident signal that is confidently wrong.

The only protection against this is building the classification criteria from empirical behavioral data in the first place — and recalibrating them against new data on a defined schedule. Intuition as the starting point produces a system that is wrong from day one. Data as the starting point produces a system that is approximately right from day one and becomes more accurate over time.

// 01 Phase 0 — Instrument before you classify

You cannot identify meaningful behavioral thresholds from data that doesn’t exist yet. The correct sequencing is clean instrumentation first, data accumulation second, threshold identification third, classification deployment fourth. Most implementations skip the first two phases entirely and deploy classification logic against an empty or unreliable data foundation.

The minimum viable event set that needs to be firing cleanly in GTM before any classification logic is built:

Event 01 Page view with content type parameter

Every page view needs a content_type parameter attached — homepage, category, product, pricing, contact, blog, case-study. Without this, GA4 sees page views but cannot distinguish a product page view from a homepage view in the same session. Your depth signal analysis becomes impossible because all page views look identical.

Set this as a GTM variable that reads the page URL and returns a content type classification. Verify in GTM Preview that every page type on your site is returning the correct value before proceeding.

Event 02 Scroll depth on high-intent pages

25%, 50%, 75%, and 90% scroll depth thresholds on your pricing page, contact page, and any long-form case study or service pages. A user who scrolled 90% of your pricing page is behaviorally different from one who scrolled 25% and left. GA4’s default engagement metrics do not capture this distinction.

Use GTM’s built-in Scroll Depth trigger. Confirm the events are reaching GA4 with the correct page path parameter attached so you can filter scroll depth by page type in Explorations.

Event 03 Session number variable

A GTM variable that reads from a first-party cookie tracking how many sessions this user has had on your site. GTM has no native session counter — you need a custom JavaScript variable that reads and increments a cookie on each session start. This is your frequency signal foundation. Without it, GA4 can tell you how many sessions a user had in aggregate but not which session number a specific behavioral event occurred in.

Verify the counter is persisting correctly across sessions in GTM Preview by visiting the site in two separate browser sessions and confirming the variable increments.

Event 04 Engagement time on key pages

Time-on-page for your pricing page, contact page, and primary service or product pages — fired as a custom event at 30-second intervals or at exit. GA4’s default engagement time is session-level, not page-level. A user who spent 4 minutes on your pricing page versus one who spent 18 seconds represents different intent that the default metric obscures.

Event 05 Micro-conversion events

Every named action that indicates commitment short of your primary conversion. Contact form start, calculator interaction, pricing enquiry click, brochure download, video play on a product page, chat initiation, wishlist addition, cart addition. Define each one explicitly. Fire a distinct named event for each. These will become your strongest BOF signal candidates and they need clean, individual event names — not a generic click event — to be analytically useful.

// minimum run time before touching a threshold

Eight weeks minimum from clean instrumentation deployment before any threshold identification work begins. Less than eight weeks and seasonal variation, campaign bursts, and sample size limitations will bias the data. A one-week promotional campaign in week three will inflate product page view counts and produce artificially high frequency signals that don’t represent steady-state customer behavior. Let the data accumulate across varied conditions before drawing conclusions from it.

// 02 Phase 1 — Category calibration

Before identifying which signals map to which stages, you need to establish the baseline parameters of your specific customer journey. These parameters constrain everything that follows — they tell you how long your consideration window is, how many touchpoints you have to classify intent from, and where the natural friction points in your funnel sit.

Four questions. Each answered from your own GA4 data, not from category benchmarks.

// GA4 · Explorations · Funnel Exploration

Question 1: What is the median days between first session and first conversion?

Build a funnel exploration in GA4 with first session as step one and your primary conversion event as the final step. Set the completion window to 90 days. Export the time-to-completion distribution. The median is your consideration window. This number calibrates your recency decay logic — stage classifications should expire at approximately 1.5x this value for users who haven’t progressed or returned.

// GA4 · Reports · Acquisition · User Acquisition

Question 2: What is the average session count before conversion?

Pull the sessions per user metric filtered to converting users over the prior 90 days. The distribution — what percentage converted in 1 session, 2 sessions, 3-5 sessions, 5+ sessions — tells you how many touchpoints you have to accumulate behavioral signals from before conversion typically occurs. If 60% of your conversions happen in session one, your classification system needs to work from limited within-session signals. If 70% convert after session three or more, you have rich cross-session behavioral history to work with.

// GA4 · Explorations · Funnel Exploration

Question 3: Where is the primary friction point?

Build a step-by-step funnel exploration using your content type categories as steps — homepage to category to product to pricing to contact or checkout. The step with the highest drop-off rate is your primary friction point. BOF lives immediately upstream of this step — users who have reached this step but not yet converted. This is where near-buyer intent concentrates and where intervention has the highest leverage.

// GA4 · Explorations · Path Exploration

Question 4: What does the path to conversion actually look like?

Build a path exploration starting from first session event and ending at your conversion event, filtered to users who converted within your median consideration window. This shows you the actual sequence of content types and events that converting users moved through — not the sequence you assumed they moved through. The divergence between the assumed path and the observed path is often significant and invalidates classification criteria built on the assumption.

// 03 Phase 2 — Signal identification

With category parameters established, the next step is identifying which specific behavioral signals on your site are genuinely predictive of stage progression — not which signals feel intuitively meaningful, but which signals empirically appear in the behavioral history of users who actually progressed and converted.

The signal categories that apply across most businesses. Within each category the specific threshold will vary by site.

Signal category What it measures Stage relevance Predictive strength

Depth signals

How far into your content or product architecture the user went. Surface pages vs high-intent pages.

TOF: homepage only. MOF: category + product. BOF: pricing + contact + micro-conversion pages.

HIGH — page type visited is the single most reliable intent proxy available from site behavior.

Frequency signals

How many times the user has returned. Single session vs multi-session behavior.

TOF: session 1. MOF: sessions 2-3. BOF: session 3+ with depth signals present.

HIGH — return visits indicate active consideration. Must be combined with depth signals to avoid classifying low-intent return visitors as MOF.

Micro-conversion signals

Actions that indicate commitment without primary conversion. Form starts, calculator use, content downloads, cart additions.

Strong BOF anchor. A micro-conversion action is behaviorally distinct from passive browsing regardless of session count or page depth.

HIGH — the strongest intent signal available. Should anchor BOF classification for most businesses when present.

Recency signals

How recently the user engaged. Days since last session relative to your consideration window.

Applies as a modifier across all stages. A user who was BOF three weeks ago and hasn’t returned may have cooled toward MOF.

MEDIUM — powerful as a decay modifier. Weak as a standalone classifier. Always combine with behavioral signals.

Engagement time signals

Time spent on high-intent pages. Distinguishes genuine reading from a quick scroll and exit.

MOF to BOF discriminator. A user who spent 4+ minutes on your pricing page is further along than one who visited it for 15 seconds.

MEDIUM — useful as a secondary signal to tighten stage boundaries. Requires clean page-level time tracking to be reliable.

Negative signals

Behavior indicating exit from consideration. Extended inactivity after high engagement. Repeated visits to the same page without progression.

Downgrade triggers. A user who was MOF but has shown no progression in 2x their typical session interval should be reclassified.

CONTEXTUAL — essential for preventing classification inflation. Without downgrade logic, BOF populations grow indefinitely regardless of actual intent.

The signal validation test. Before any signal anchors a stage classification criterion, it needs to pass a discrimination check — not just a prevalence check. This distinction matters because a signal can appear in 85% of converting user paths and still be useless as a classifier if it also appears in 80% of non-converting paths. High prevalence among converters is not the same as high discrimination between converters and non-converters.

Run the path exploration twice: once filtered to converting users, once filtered to users who visited multiple times but did not convert within your consideration window. For each signal candidate, calculate the lift ratio — how much more common is the signal among converters than non-converters. A pricing page visit that appears in 80% of converting paths and 20% of non-converting paths has a lift ratio of 4x. A pricing page visit that appears in 80% of converting paths and 70% of non-converting paths has a lift ratio of 1.1x and is nearly worthless as a classifier despite its high converter prevalence.

The thresholds that follow are lift-ratio based, not prevalence based:

// signal discrimination — converter vs non-converter frequency

CONVERTERS

80%

NON-CONVERTERS

20%

LIFT RATIO

4.0x

STRONG DISCRIMINATOR — PRIMARY CRITERION

3x+

Signal is 3x or more common among converters than non-converters. Strong discriminatory power. Use as a primary classification criterion. A signal at this lift ratio is genuinely separating intent levels, not just describing general site engagement.

1.5–3x

Signal is 1.5x to 3x more common among converters. Moderate discrimination. Use as a secondary criterion in combination with a stronger primary signal. Alone it will produce too many false positives — users classified as MOF or BOF who have no genuine near-buyer intent.

<1.5x

Signal is less than 1.5x more common among converters than non-converters. Weak discrimination regardless of how often it appears in converting paths. Do not use as a classification criterion. It will classify a large proportion of non-converting users as high-intent and dilute your stage populations.

// segment before you generalise

A signal’s lift ratio is not universal across your traffic. A pricing page visit from a branded search user and a pricing page visit from a broad Meta campaign user are not the same behavioral signal — the intent at landing is structurally different. When your traffic mix contains meaningfully different acquisition sources, calculate lift ratios by source segment before setting global thresholds. A global threshold that averages across high-intent and low-intent traffic sources will underclassify the former and overclassify the latter simultaneously. Traffic source segmentation is the minimum conditioning that produces defensible thresholds for most sites.

// 04 Phase 3 — Setting thresholds with documented rationale

Once signals are identified and validated through converter vs non-converter discrimination analysis, thresholds are set from the distributional data — not from intuition, not from round numbers, not from what someone else’s framework suggests.

The methodology: for each signal that passed the lift ratio test, pull the distribution of that signal across your converting user population. If pricing page visits are a high-discrimination BOF signal, pull the distribution of pricing page visit counts among converters. If 85% of converters visited the pricing page exactly once before converting and 15% visited twice or more, your BOF threshold for that signal is one visit — not two, because two would exclude 85% of your actual near-buyers.

One thing to hold clearly while setting thresholds: the rule set is a practical proxy for probabilistic intent, not a literal map of buyer psychology. A user who meets your BOF criteria has a higher conditional probability of converting than one who doesn’t. They are not guaranteed to be a near-buyer. The classification is a bet, not a fact. Design the thresholds accordingly — and understand that every threshold is a choice about error profile, not just a choice about accuracy.

Every threshold that enters your GTM classification variable needs three pieces of documentation before it’s deployed:

// threshold documentation requirement

Source: which GA4 report or exploration produced the data this threshold is based on.
Date range: the date range of the data used. Thresholds derived from data that is more than 6 months old should be flagged for recalibration.
Metric value: the specific distributional data point that justified this threshold. Not “pricing page visits are important” — “82% of converting users visited the pricing page at least once in the 14 days before conversion, based on path exploration data from January–March 2026.”

This documentation is not administrative overhead. It is the mechanism that makes future recalibration tractable. When you return to the system in six months to check whether the thresholds still hold, you need to know exactly what data justified each one and where to look to verify it. Without documentation, recalibration becomes a rebuild from scratch rather than a targeted verification.

Threshold design is a tradeoff between two error types, not a search for a single correct number. Tight BOF thresholds — requiring multiple high-intent signals before classification — reduce false positives: fewer users are misclassified as near-buyers when they aren’t. But they increase false negatives: some genuine near-buyers who met most but not all criteria are excluded from your BOF population and never receive a BOF intervention. Loose BOF thresholds catch more genuine near-buyers but dilute your BOF population with lower-intent users, degrading experiment validity and email sequence lift. The right threshold is the one whose error profile fits your intervention. If your BOF intervention is aggressive — direct sales outreach, high-pressure offer — optimise for low false positives: tighter thresholds, cleaner population. If your BOF intervention is lightweight — a gentle reminder email — you can tolerate more false positives and optimise for low false negatives: looser thresholds, higher coverage.

// threshold tradeoff — drag to explore error profile

        ← LOOSETIGHT →
      

FALSE POSITIVES

HIGH

wrong users in BOF

FALSE NEGATIVES

LOW

real buyers missed

RECOMMENDED FOR

Lightweight interventions — email nudges, soft retargeting. High coverage, accept some noise in BOF population.

// you are calibrating on observable behavior, not total behavior

The GA4 path exploration data your thresholds are derived from is drawn from a specific subset of your actual customer population: users who consented to tracking, on the device and browser where your GTM classification cookie persisted, without Safari ITP clearing their session history, within a single observable consideration window. The consent-denied cohort, cross-device users whose sessions don't stitch, Safari users whose localStorage was cleared, and users who converted through channels that left no on-site behavioral trace — none of these appear in your path exploration data. Your thresholds describe near-buyers as observed through your measurement system. They do not describe near-buyers as they actually exist across your full customer base. That gap doesn't invalidate the system. It means the thresholds are historically contingent proxies, not ground truth, and should be held with appropriate epistemic humility when making capital allocation decisions based on them.

Recency decay logic

Every stage classification needs an expiry. A user classified as BOF in January who has not returned by March is not still BOF. Their consideration has either converted elsewhere, been abandoned, or cooled significantly. Treating them as BOF in April means your BOF email sequence and your VWO experiment population contain a significant proportion of users whose intent evaporated months ago.

Set stage expiry at 1.5x your empirical consideration window from Phase 1. If your median days-to-conversion is 14 days, BOF classifications expire after 21 days of no return visit. If your window is 45 days, expiry is at 67 days. The 1.5x multiplier provides a buffer for the tail of your consideration window distribution — users who take longer than median — without holding classifications open so long they become meaningless.

Expiry doesn't mean the user is forgotten. It means their stage classification resets to the last stage their current behavioral signals support. A BOF user who expires and still has MOF-level signals in their history reclassifies to MOF, not TOF. Only users with no recent behavioral history beyond a single initial session expire fully back to TOF.

// stage classification over time — decay and reset

BOF

Day 0

Day 0
BOF SET

Day 21
EXPIRY

Day 35
RESET

STAGE: BOF

User classified as near-buyer. BOF sequences active. Experiment eligible.

// 05 The maintenance protocol

A classification system without a maintenance protocol is a depreciating asset. It starts at approximately the right calibration and drifts toward wrong over time at a rate determined by how fast your business, traffic mix, and customer journey are changing.

Two distinct trigger types require different responses and shouldn't be conflated.

Scheduled recalibration — quarterly

Time-based. Runs regardless of whether anything appears wrong. The purpose is to catch drift before it surfaces as visible performance degradation — the early stage of decay is silent, and by the time it shows in conversion rates the system has been producing wrong classifications for weeks or months.

The quarterly recalibration re-runs the Phase 2 signal identification analysis on the most recent 8-12 weeks of data and compares the resulting thresholds against what your GTM variable is currently using. If the empirical thresholds have shifted meaningfully — more than 15-20% on any criterion — the GTM variable needs updating. Document the change, log the reason, update the threshold source date.

Event-driven recalibration — immediate

Five events that require recalibration within two to four weeks of occurrence, regardless of where you are in the quarterly cycle:

01 //

New traffic source reaching 15%+ of total sessions

Different acquisition channels bring users with structurally different behavioral patterns on arrival. A user from branded search has different intent at landing than one from a broad interest Meta campaign. If the new channel becomes a meaningful share of traffic, your TOF classification criteria — calibrated on the prior traffic mix — will misclassify a significant portion of new arrivals. Recalibrate within four weeks of any channel reaching 15% of total sessions.

02 //

Pricing or site architecture change affecting high-intent pages

A pricing page restructure, a new contact flow, a new service page, a reconfigured checkout — any change that affects the pages your depth and micro-conversion signals are based on invalidates the thresholds calibrated against the previous architecture. A pricing page view that was a strong BOF signal on a deep, content-rich page may be a weak signal on a shallow restructured one. Recalibrate within two weeks of any structural change to your high-intent pages.

03 //

Seasonal purchase cycle compression or extension

Categories with strong seasonality have different consideration windows at peak versus off-peak. A user who takes 21 days to decide in January may take 4 days in November with a deadline creating urgency. If your recency decay logic is calibrated on off-peak data and you're running peak season campaigns, your BOF population will be undersized — users are moving faster than your classification expects and converting before GTM has updated their stage. For seasonal businesses, maintain documented peak and off-peak threshold sets and switch between them on a calendar.

04 //

BOF-to-conversion rate dropping more than 20% over three weeks without a campaign change

Before treating this as a creative or offer problem, treat it as a classification problem. Check whether BOF classification volume has increased disproportionately to traffic growth. If BOF volume is up 40% but conversions are flat, your BOF threshold has become too loose and is admitting users who aren't genuinely near-buyers. The signal is in the ratio, not the absolute numbers.

05 //

New product or service launch

A new offering changes what consideration looks like. New product pages, new pricing tiers, new comparison touchpoints — all of these alter the behavioral path that leads to conversion. The path exploration data from Phase 2 no longer fully describes how users evaluate your expanded offering. Run a new signal identification pass on users who engaged with the new product or service specifically, and update the classification criteria to reflect the new conversion path.

The diagnostic sequence when performance degrades

When conversion rate drops or experiment results stop making sense, run this diagnostic sequence before adjusting bids, budgets, or creative. Classification system failure is the most common undiagnosed cause of performance degradation in GTM-instrumented accounts.

Check stage volume ratios against historical baseline

Pull weekly TOF, MOF, BOF user counts from GA4 for the past 12 weeks. Plot the ratios — what percentage of classified users are in each stage week over week. A sudden spike in BOF proportion without corresponding conversion growth indicates threshold decay. A collapsing MOF population indicates TOF-to-MOF transition signals have become harder to achieve, possibly because a key engagement touchpoint changed on the site. Stable ratios with declining conversion indicate the classification is holding but something else has changed — attribution, offer, creative, or external demand.

Check stage progression rates against calibration baseline

What percentage of TOF users are progressing to MOF within your established consideration window. What percentage of MOF are progressing to BOF. Compare current rates against the rates you documented during Phase 2 calibration. Declining progression rates indicate either acquisition quality degradation — genuinely lower-intent users entering the funnel — or classification criteria that have drifted from the actual behavioral path users are now taking.

Check micro-conversion signal firing rates

Are the events that anchor your BOF classification still firing at the same rate per session as during calibration. If your BOF threshold requires a pricing page scroll depth event and scroll depth events per session have dropped because the pricing page was restructured to be shorter, your BOF population will shrink for structural reasons unrelated to intent. The fix is a threshold recalibration, not a campaign change.

Check recency decay rate

What proportion of your classified BOF users are expiring back toward TOF without converting. If this proportion is rising, either your consideration window has lengthened and your expiry timer is too aggressive — cutting users off before they've reached their natural conversion point — or you're generating BOF classifications from users who never had genuine near-buyer intent and are naturally falling out of the cohort. The two explanations require opposite fixes: lengthen the expiry window for the first, tighten the BOF entry threshold for the second.

// 06 The honest timeline

Weeks 1–8

Phase 0

Clean instrumentation run

Deploy the five-event minimum in GTM. Verify clean firing in GTM Preview and GA4 DebugView. Let the data accumulate across varied campaign conditions, organic fluctuations, and at least one full business cycle. Do not touch threshold logic during this window.

Weeks 9–10

Phase 1–2

Category calibration and signal identification

Run the four category calibration questions from GA4. Run path explorations on converting and non-converting users. Calculate lift ratios for each signal candidate and apply the 3x / 1.5x discrimination thresholds. Document every finding with source, date range, and metric value.

Week 11

Phase 3

GTM classification build and testing

Write the GTM classification variable logic against the documented thresholds. Set recency decay expiry at 1.5x consideration window. Test against GTM Preview by simulating Shakespeare, Machiavelli, and Galileo behavioral paths. Confirm correct stage assignment at each transition point before publishing.

Week 12+

Ongoing

Quarterly recalibration + event-driven updates

Every quarter, re-run Phase 2 signal identification on recent data and compare against current thresholds. Update GTM variable logic where thresholds have drifted. Log every change with documented rationale. Respond to event-driven triggers within two to four weeks of occurrence.

Eleven weeks from clean deployment to a defensible classification system. Most clients won't accept this timeline. The right response is that deploying classification logic before this data exists produces a system that looks operational while classifying users against criteria with no empirical basis. Every downstream decision — email RCT population, VWO experiment gate, Meta CAPI signal quality, Google RLSA audiences — inherits that error for the entire life of the programme.

The eleven weeks of patience compounds into more accurate decisions across every channel the classification system feeds, for as long as the programme runs. Skipping it produces confident-looking output from a system built on guesswork.

// low-volume sites — provisional heuristic model

The 8-week instrumentation window assumes sufficient converting user volume to run a meaningful path exploration and lift ratio analysis — typically 50+ conversions minimum for the signal identification to be statistically reliable. Sites with fewer than 50 conversions over 8 weeks cannot derive defensible empirical thresholds from their own data alone.

For low-volume sites, the correct approach is a provisional heuristic model: set rough thresholds from whatever data exists plus category-level reasoning about your purchase cycle, deploy them in shadow mode — GTM classifies users but the classifications don't yet gate live experiments or activate email sequences — and run them in parallel with continued data accumulation until you have sufficient converting user volume to run the full signal identification analysis. Treat every classification the shadow model produces as a hypothesis to be validated, not a label to act on. When you reach sufficient volume, run the Phase 2 analysis on the accumulated data and replace the heuristic thresholds with empirically derived ones. Then activate.

// what the system looks like when it's working

Your BOF-to-conversion rate is stable and predictable. Your VWO BOF experiments are running on populations that actually convert at near-buyer rates. Your email BOF sequences show genuine lift in holdout tests. Your Meta CAPI events tagged as BOF are producing higher Event Match Quality scores than your generic events. Your Google RLSA BOF audiences are converting at rates that justify their bid adjustments. These outcomes are not reliably achievable from a classification system built on guessed thresholds — and when they appear by accident, they don't hold. They are reliably achievable from one built and maintained the way this article describes, because the classification criteria are grounded in behavioral evidence and updated as the evidence changes.

root@noirquartz:~$ ./classify --validate

▶ instrumentation: 5-event minimum firing clean ✓

▶ calibration: median consideration window from GA4 path exploration ✓

▶ signals: validated by lift ratio (converter vs non-converter), segment-conditioned ✓

▶ thresholds: documented with source, date range, metric value ✓

▶ decay logic: expiry set at 1.5x empirical consideration window ✓

▶ maintenance: quarterly schedule + 5 event-driven triggers defined ✓

// a classification system is a model. models decay. maintain accordingly.