Methodology
Numeric constants in DRISHTI are not hidden fudge factors — they appear here so any reviewer can challenge them. The same precedent SPIRIT and OPTIMA set with their methodology drawers.
Compliance classifier · evaluation
Accuracy
94.4%
Macro F1
91.9%
Weighted F1
94.3%
3,919 train · 980 test · GradientBoosting (200 estimators · depth 3)
Confusion matrix
| predicted | ||||
|---|---|---|---|---|
| Compliant | Watch | Flag | total | |
| actualCompliant | 578 98.1% | 10 1.7% | 1 0.2% | 589 |
| Watch | 14 5.1% | 253 92.7% | 6 2.2% | 273 |
| Flag | 18 15.3% | 6 5.1% | 94 79.7% | 118 |
Diagonal cells (emerald) are correct predictions; off-diagonal (rose) are confusions between adjacent risk tiers. Misclassifications cluster between Compliant↔Watch — the model is conservative on calling Flag, which we consider acceptable for an enforcement-targeting tool.
Principles
- ▶Every synthetic field is labelled in the UI.
- ▶Calibration test gates the build: synthetic FIR totals must match real seizure totals within 2%.
- ▶All ML inference uses real models — never random outputs. Synthetic training data is fine; synthetic inference is not.
Constants
| Name | Value | Rationale |
|---|---|---|
| simulator_seed | 1947 | Independence year — easy to remember, distinct from SPIRIT (7) and OPTIMA (42) |
| ap_border_points | 18 | Hardcoded border vertices (lifted from SPIRIT etl/load.py); border_km is the minimum haversine distance to any vertex |
| stations_total | 208 | Per DRISHTI pitch deck slide 8 — 'Empowering 208 Excise Stations' |
| stations_min_per_district | 4 | Smallest AP districts (Alluri Sitharama Raju, Parvathipuram Manyam) still need a coverage floor |
| compliance_score_weights | {"recidivism":0.45,"behavior":0.35,"location":0.2} | Composite 0–100 score weighting — recidivism dominates per the pitch's 'Historical Recidivism' emphasis (slide 5) |
| compliance_score_target_accuracy | 0.9 | Pitch deck slide 5 commits to '90%+ Risk Scoring Accuracy' |
| border_zone_km | 50 | Outlets within 50 km of state border are flagged as border-zone (lifted from SPIRIT) |
| fir_calibration_tolerance_pct | 0.02 | Synthetic FIR yearly totals must stay within 2% of real seizure aggregates (PLAN.md §5.3) |
| resource_corr_threshold | 0.6 | Pearson correlation between synthetic resource_usage z-scores and real ID-seizure counts must be ≥ 0.6 for the hotspot story to defend |
Data lineage
| Table | Kind | Source |
|---|---|---|
| retailers | REAL | Retailer Info.xlsx |
| brands | REAL | Brand & Supplier Info.xlsx |
| labels | REAL | Label Approvals_2025_2026.xlsx |
| sales_inflow | REAL | Retailer Wise Sales(in).xlsx |
| sales_yearwise | REAL | Retailer Sales Year wise -1.xlsx |
| distillery_quota | REAL | Statement Pharma Molasses and Distillery March 26.xlsx |
| seizures | REAL | MONTH WISE DATA ID NDPL DPL GANJA …xlsx |
| excise_stations | SIMULATED | etl/simulate/excise_stations.py (seed=1947) |
| retailer_station | DERIVED | haversine(retailer, station_centroid) within district |
| officers | SIMULATED | etl/simulate/officers.py (TODO) |
| offenders | SIMULATED | etl/simulate/offenders.py (TODO) |
| vehicles | SIMULATED | etl/simulate/vehicles_phones.py (TODO) |
| phones | SIMULATED | etl/simulate/vehicles_phones.py (TODO) |
| firs | SIMULATED | etl/simulate/firs_inspections.py (TODO) |
| inspection_reports | SIMULATED | etl/simulate/firs_inspections.py (TODO) |
| gps_trips | SIMULATED | etl/simulate/gps_trips.py (TODO — extends SPIRIT Lock-2 to ~6,000 trips) |
| resource_usage | SIMULATED | etl/simulate/resource_usage.py (TODO) |