BACKGROUND AND OBJECTIVES: Clinical registries are critical for modern surgery and underpin outcomes research, device monitoring, and trial development. However, existing approaches to registry construction are labor-intensive, costly, and prone to manual error. Natural language processing techniques combined with electronic health record (EHR) data sets can theoretically automate the construction and maintenance of registries. Our aim was to automate the generation of a spine surgery registry at an academic medical center using regular expression (regex) classifiers developed by neurosurgeons to combine domain expertise with interpretable algorithms. METHODS: We used a Hadoop data lake consisting of all the information generated by an academic medical center. Using this database and structured query language queries, we retrieved every operative note written in the department of neurosurgery since our transition to EHR. Notes were parsed using regex classifiers and compared with a random subset of 100 manually reviewed notes. RESULTS: A total of 31 502 operative cases were downloaded and processed using regex classifiers. The codebase required 5 days of development, 3 weeks of validation, and less than 1 hour for the software to generate the autoregistry. Regex classifiers had an average accuracy of 98.86% at identifying both spinal procedures and the relevant vertebral levels, and it correctly identified the entire list of defined surgical procedures in 89% of patients. We were able to identify patients who required additional operations within 30 days to monitor outcomes and quality metrics. CONCLUSION: This study demonstrates the feasibility of automatically generating a spine registry using the EHR and an interpretable, customizable natural language processing algorithm which may reduce pitfalls associated with manual registry development and facilitate rapid clinical research.
- Clinical registries
- Natural language processing