Datasets
About half of DRISHTI's surface runs on real RTGS data; the other half is synthetic because the RTGS drop contains no PII and no enforcement records. Every table here carries a REAL, DERIVED, or SIMULATED badge.
Live tables
| Table | Kind | Rows | Source | Description |
|---|---|---|---|---|
| retailers | REAL | 4,899 | Retailer Info.xlsx | Active outlet master — Code, Name, Address, District, Circle, Depot, Lat/Lon, Status, VendorType |
| brands | REAL | 1,457 | Brand & Supplier Info.xlsx | Brand catalogue with SKU, MRP, basic price, supplier, distillery |
| labels | REAL | 1,373 | Label Approvals_2025_2026.xlsx | Approved label registry FY26 with front/back image URLs |
| sales_inflow | REAL | 6,32,025 | Retailer Wise Sales(in).xlsx | Daily inflow ledger by vendor — date, vendorId, district, cases, bottles, sale value |
| sales_yearwise | REAL | 6,42,955 | Retailer Sales Year wise -1.xlsx | Year-wise retail sales by shop — date, retailer, depot, cases, bottles, sale value |
| distillery_quota | REAL | 42 | Statement Pharma Molasses and Distillery March 26.xlsx | Per-distillery quota: allotted vs lifted PL/BL, with AC office grouping |
| seizures | REAL | 430 | MONTH WISE DATA ID NDPL DPL GANJA …xlsx | Aggregated month-wise seizure counts 2019-2026 (ID/NDPL/DPL/Spurious/Ganja) |
| excise_stations | SIMULATED | 208 | etl/simulate/excise_stations.py (seed=1947) | 208 stations allocated proportional to outlet density, K-means centroids per district. Real boundaries arrive at pilot. |
| retailer_station | DERIVED | 4,460 | haversine(retailer, station_centroid) within district | Outlet → nearest station mapping (depends on simulated stations, real outlets) |
| officers | SIMULATED | 409 | etl/simulate/officers.py (TODO) | 1–3 officers per station with rank, tenure, arrest history |
| offenders | SIMULATED | 12,030 | etl/simulate/offenders.py (TODO) | ~12,000 offender personas; ~400 organized into 25 networks |
| vehicles | SIMULATED | 214 | etl/simulate/vehicles_phones.py (TODO) | Vehicle plates linked to offender networks |
| phones | SIMULATED | 7,913 | etl/simulate/vehicles_phones.py (TODO) | Phone numbers linked to offender networks |
| firs | SIMULATED | 68,482 | etl/simulate/firs_inspections.py (TODO) | ~50,000 FIR records calibrated to real per-district seizure totals |
| inspection_reports | SIMULATED | 27,316 | etl/simulate/firs_inspections.py (TODO) | ~120,000 routine inspection reports with findings text |
| gps_trips | SIMULATED | 6,000 | etl/simulate/gps_trips.py (TODO — extends SPIRIT Lock-2 to ~6,000 trips) | Synthetic depot→retailer GPS polylines with deviation/dwell/speed features |
| resource_usage | SIMULATED | 1,92,312 | etl/simulate/resource_usage.py (TODO) | Village × month molasses/sugar/jaggery offtake — correlated with real ID-seizure intensity for hotspot model |