Every assumption, exposed

Methodology

Numeric constants in DRISHTI are not hidden fudge factors — they appear here so any reviewer can challenge them. The same precedent SPIRIT and OPTIMA set with their methodology drawers.

Compliance classifier · evaluation

Accuracy

94.4%

Macro F1

91.9%

Weighted F1

94.3%

3,919 train · 980 test · GradientBoosting (200 estimators · depth 3)

Confusion matrix

	predicted
	Compliant	Watch	Flag	total
actualCompliant	578 98.1%	10 1.7%	1 0.2%	589
Watch	14 5.1%	253 92.7%	6 2.2%	273
Flag	18 15.3%	6 5.1%	94 79.7%	118

Diagonal cells (emerald) are correct predictions; off-diagonal (rose) are confusions between adjacent risk tiers. Misclassifications cluster between Compliant↔Watch — the model is conservative on calling Flag, which we consider acceptable for an enforcement-targeting tool.

Principles

▶Every synthetic field is labelled in the UI.
▶Calibration test gates the build: synthetic FIR totals must match real seizure totals within 2%.
▶All ML inference uses real models — never random outputs. Synthetic training data is fine; synthetic inference is not.

Constants

Name	Value	Rationale
simulator_seed	1947	Independence year — easy to remember, distinct from SPIRIT (7) and OPTIMA (42)
ap_border_points	18	Hardcoded border vertices (lifted from SPIRIT etl/load.py); border_km is the minimum haversine distance to any vertex
stations_total	208	Per DRISHTI pitch deck slide 8 — 'Empowering 208 Excise Stations'
stations_min_per_district	4	Smallest AP districts (Alluri Sitharama Raju, Parvathipuram Manyam) still need a coverage floor
compliance_score_weights	{"recidivism":0.45,"behavior":0.35,"location":0.2}	Composite 0–100 score weighting — recidivism dominates per the pitch's 'Historical Recidivism' emphasis (slide 5)
compliance_score_target_accuracy	0.9	Pitch deck slide 5 commits to '90%+ Risk Scoring Accuracy'
border_zone_km	50	Outlets within 50 km of state border are flagged as border-zone (lifted from SPIRIT)
fir_calibration_tolerance_pct	0.02	Synthetic FIR yearly totals must stay within 2% of real seizure aggregates (PLAN.md §5.3)
resource_corr_threshold	0.6	Pearson correlation between synthetic resource_usage z-scores and real ID-seizure counts must be ≥ 0.6 for the hotspot story to defend

Data lineage

Table	Kind	Source
retailers	REAL	Retailer Info.xlsx
brands	REAL	Brand & Supplier Info.xlsx
labels	REAL	Label Approvals_2025_2026.xlsx
sales_inflow	REAL	Retailer Wise Sales(in).xlsx
sales_yearwise	REAL	Retailer Sales Year wise -1.xlsx
distillery_quota	REAL	Statement Pharma Molasses and Distillery March 26.xlsx
seizures	REAL	MONTH WISE DATA ID NDPL DPL GANJA …xlsx
excise_stations	SIMULATED	etl/simulate/excise_stations.py (seed=1947)
retailer_station	DERIVED	haversine(retailer, station_centroid) within district
officers	SIMULATED	etl/simulate/officers.py (TODO)
offenders	SIMULATED	etl/simulate/offenders.py (TODO)
vehicles	SIMULATED	etl/simulate/vehicles_phones.py (TODO)
phones	SIMULATED	etl/simulate/vehicles_phones.py (TODO)
firs	SIMULATED	etl/simulate/firs_inspections.py (TODO)
inspection_reports	SIMULATED	etl/simulate/firs_inspections.py (TODO)
gps_trips	SIMULATED	etl/simulate/gps_trips.py (TODO — extends SPIRIT Lock-2 to ~6,000 trips)
resource_usage	SIMULATED	etl/simulate/resource_usage.py (TODO)