Background & aims Given ongoing challenges in non-invasive non-alcoholic liver disease (NAFLD) diagnosis, we sought to validate an ALT-based NAFLD phenotype using measures readily available in electronic health records (EHRs) and population-based studies by leveraging the clinical and genetic data in the Million Veteran Program (MVP), a multi-ethnic mega-biobank of US Veterans. Methods MVP participants with alanine aminotransferases (ALT) >40 units/L for men and >30 units/L for women without other causes of liver disease were compared to controls with normal ALT. Genetic variants spanning eight NAFLD risk or ALT-associated loci (LYPLAL1, GCKR, HSD17B13, TRIB1, PPP1R3B, ERLIN1, TM6SF2, PNPLA3) were tested for NAFLD associations with sensitivity analyses adjusting for metabolic risk factors and alcohol consumption. A manual EHR review assessed performance characteristics of the NAFLD phenotype with imaging and biopsy data as gold standards. Genetic associations with advanced fibrosis were explored using FIB4, NAFLD Fibrosis Score and platelet counts. Results Among 322,259 MVP participants, 19% met non-invasive criteria for NAFLD. Trans-ethnic meta-analysis replicated associations with previously reported genetic variants in all but LYPLAL1 and GCKR loci (P<6x10-3), without attenuation when adjusted for metabolic risk factors and alcohol consumption. At the previously reported LYPLAL1 locus, the established genetic variant did not appear to be associated with NAFLD, however the regional association plot showed a significant association with NAFLD 279kb downstream. In the EHR validation, the ALT-based NAFLD phenotype yielded a positive predictive value 0.89 and 0.84 for liver biopsy and abdominal imaging, respectively (inter-rater reliability (Cohen’s kappa = 0.98)). HSD17B13 and PNPLA3 loci were associated with advanced fibrosis. Conclusions We validate a simple, non-invasive ALT-based NAFLD phenotype using EHR data by leveraging previously established NAFLD risk-associated genetic polymorphisms.