Databricks, the data and AI company and pioneer of the data lakehouse paradigm, today announced data lineage for Unity Catalog, significantly expanding data governance capabilities on the lakehouse. Data lineage describes how data flows throughout an organization. Using this new feature of Unity Catalog, customers are able to gain visibility into where data in their lakehouse came from, who created it and when, how it has been modified over time, how it’s being used, and much more. Data lineage for Unity Catalog is now available for preview on AWS and Microsoft Azure.
Organizations deal with an influx of data from multiple sources, and understanding where that data came from, how it’s moving and changing, who has access to it, and how it’s being used is extraordinarily difficult. However, having that understanding is paramount to ensuring trust and assessing risk. With data lineage for Unity Catalog, data teams can see all the downstream consumers impacted by data changes – applications, dashboards, machine learning models or data sets, etc. – and easily understand the severity of the impact to quickly notify the relevant stakeholder of changes.
Data lineage empowers data consumers, such as data scientists, data engineers, and data analysts, to be context-aware as they perform analyses, resulting in better quality outcomes. Additionally, data stewards can see which data sets are no longer accessed or have become obsolete to retire unnecessary data, both reducing risk and ensuring end users only use high-quality data. The new capabilities within Unity Catalog give businesses a complete view of the entire data lifecycle so data leaders can understand how data is being collected if it was updated, and the processes used.