Every assumption, exposed

Methodology

Numeric constants in DRISHTI are not hidden fudge factors — they appear here so any reviewer can challenge them. The same precedent SPIRIT and OPTIMA set with their methodology drawers.

Compliance classifier · evaluation

Accuracy
94.4%
Macro F1
91.9%
Weighted F1
94.3%
3,919 train · 980 test · GradientBoosting (200 estimators · depth 3)
Confusion matrix
predicted
CompliantWatchFlagtotal
actualCompliant
578
98.1%
10
1.7%
1
0.2%
589
Watch
14
5.1%
253
92.7%
6
2.2%
273
Flag
18
15.3%
6
5.1%
94
79.7%
118

Diagonal cells (emerald) are correct predictions; off-diagonal (rose) are confusions between adjacent risk tiers. Misclassifications cluster between Compliant↔Watch — the model is conservative on calling Flag, which we consider acceptable for an enforcement-targeting tool.

Principles

  • Every synthetic field is labelled in the UI.
  • Calibration test gates the build: synthetic FIR totals must match real seizure totals within 2%.
  • All ML inference uses real models — never random outputs. Synthetic training data is fine; synthetic inference is not.

Constants

NameValueRationale
simulator_seed1947Independence year — easy to remember, distinct from SPIRIT (7) and OPTIMA (42)
ap_border_points18Hardcoded border vertices (lifted from SPIRIT etl/load.py); border_km is the minimum haversine distance to any vertex
stations_total208Per DRISHTI pitch deck slide 8 — 'Empowering 208 Excise Stations'
stations_min_per_district4Smallest AP districts (Alluri Sitharama Raju, Parvathipuram Manyam) still need a coverage floor
compliance_score_weights{"recidivism":0.45,"behavior":0.35,"location":0.2}Composite 0–100 score weighting — recidivism dominates per the pitch's 'Historical Recidivism' emphasis (slide 5)
compliance_score_target_accuracy0.9Pitch deck slide 5 commits to '90%+ Risk Scoring Accuracy'
border_zone_km50Outlets within 50 km of state border are flagged as border-zone (lifted from SPIRIT)
fir_calibration_tolerance_pct0.02Synthetic FIR yearly totals must stay within 2% of real seizure aggregates (PLAN.md §5.3)
resource_corr_threshold0.6Pearson correlation between synthetic resource_usage z-scores and real ID-seizure counts must be ≥ 0.6 for the hotspot story to defend

Data lineage

TableKindSource
retailersREALRetailer Info.xlsx
brandsREALBrand & Supplier Info.xlsx
labelsREALLabel Approvals_2025_2026.xlsx
sales_inflowREALRetailer Wise Sales(in).xlsx
sales_yearwiseREALRetailer Sales Year wise -1.xlsx
distillery_quotaREALStatement Pharma Molasses and Distillery March 26.xlsx
seizuresREALMONTH WISE DATA ID NDPL DPL GANJA …xlsx
excise_stationsSIMULATEDetl/simulate/excise_stations.py (seed=1947)
retailer_stationDERIVEDhaversine(retailer, station_centroid) within district
officersSIMULATEDetl/simulate/officers.py (TODO)
offendersSIMULATEDetl/simulate/offenders.py (TODO)
vehiclesSIMULATEDetl/simulate/vehicles_phones.py (TODO)
phonesSIMULATEDetl/simulate/vehicles_phones.py (TODO)
firsSIMULATEDetl/simulate/firs_inspections.py (TODO)
inspection_reportsSIMULATEDetl/simulate/firs_inspections.py (TODO)
gps_tripsSIMULATEDetl/simulate/gps_trips.py (TODO — extends SPIRIT Lock-2 to ~6,000 trips)
resource_usageSIMULATEDetl/simulate/resource_usage.py (TODO)