LARGE LANGUAGE MODELS FOR INTELLIGENT DATA ENGINEERING: AUTOMATING SCHEMA DESIGN, LINEAGE, AND QUALITY CONTROL

Authors

  • Godavari Modalavalasa Author

DOI:

https://doi.org/10.46121/pspc.50.2.4

Keywords:

Large Language Models, Data Engineering Automation, Schema Design, Data Lineage, Quality Control, Generative AI

Abstract

The emergence of large language models has introduced transformative capabilities for automating complex knowledge work previously requiring human expertise. This research investigates the application of large language models to data engineering tasks, examining how these AI systems can automate schema design, data lineage tracking, and quality control processes that traditionally consume significant manual effort. The study explores how natural language understanding, code generation, and reasoning capabilities inherent in large language models can be leveraged to enhance data engineering workflows while maintaining accuracy and reliability. Through comprehensive analysis of contemporary LLM capabilities and data engineering challenges, this paper presents an integrated framework that combines language models with traditional data engineering tools to create intelligent automation that augments human expertise. The findings demonstrate that LLM-powered data engineering can reduce schema design time by approximately 70% while improving lineage documentation completeness by over 80% and accelerating quality issue detection by 65%. This research contributes practical implementation patterns and evaluation frameworks that enable organizations to adopt LLM-based automation while managing risks associated with model hallucinations and accuracy limitations.

Downloads

Published

2022-05-25