Databricks strengthens capabilities for AI, data management and governance within Lakehouse Platform
Databricks, the data and AI company, announced new features at their Data + AI Summit that enable organisations to harness all data and enhance their AI capabilities. These features are designed to break silos, increase efficiency and accelerate organisations' AI journey.
"We are transforming Databricks into the most open and flexible lakehouse platform for data, analytics and AI. In doing so, we help our customers unify all their data, no matter where it resides and no matter the format," said Matei Zaharia, co-founder and chief technologist at Databricks.
LakehouseIQ: secure, AI-driven access to all your data
Companies often still struggle to give employees direct access to relevant internal data due to a limited number of overburdened data scientists and insufficient overall data models.
LakehouseIQ solves this problem with an interface that learns from a company's unique datasets, organisational structure and jargon. Every employee gains access to internal data to make informed decisions and drive innovation, without requiring specialised technical skills. In the process, LakehouseIQ interprets the purpose of all searches to generate the necessary insights. LakehouseIQ is powered by Unity Catalog so that only secure access to authorised data is given to mitigate security and compliance risks.
Unity Catalog offers better discoverability of fragmented data
Data is often scattered across disparate operational and analytical systems within an organisation. This makes it difficult for data teams to discover all available information and hampers compliance teams in maintaining consistent governance. Moreover, merging this data is costly and time-consuming, as integration processes rely on complex data engineering.
New and future functionalities within Databricks' Unity Catalog address these issues. A new query federation interface allows users to easily find, secure, audit and share all organisational data from a single system with optimised query performance across multiple platforms. Unity Catalog also provides consistent governance for access of all registered datasets, including data that lives outside Databricks. In the future, users can easily define a single data access policy and enforce it across multiple platforms. Finally, the recently announced Hive Metastore Interface allows all Apache Hive-compatible software to connect to Unity Catalog to further simplify data management and governance across multiple platforms.
Lakehouse AI accelerates generative AI transformation
Demand for generative AI is high, but data processes are complex and unreliable when data and AI platforms are separate. To help overcome this challenge, Databricks introduces Lakehouse AI. This solution unifies data and AI platforms and enables customers to develop their generative AI solutions more successfully and quickly by bringing together data, AI models, LLMOps, monitoring and governance.
Several new capabilities announced support this. For instance, Vector Search helps manage and edit vector embeddings from Unity Catalog, and allows developers to add query filters for better reliability of generative AI answers. Furthermore, Databricks AUtoML now allows customers to securely fine-tune LLMs based on their own datasets, giving them ownership of the resulting models. These can then be easily shared, monitored and controlled via MLflow, Unity Catalog and Model Serving integrations. Finally, Databricks Marketplace provides a curated list of open source models with optimised Lakehouse AI capabilities such as Databricks Model Serving. This results in peak performance and cost optimisation for generative AI use cases.