In the six months since the launch of ChatGPT, the world has woken up to the enormous potential of AI. And companies have flocked to this new technology en masse. For example, the use of APIs from online LLMs such as ChatGPT increased by 1310% between November 2022 and May 2023. It's one of the insights from Databricks' new research report, The State of Data and AI 2023. Another insight: we are at the dawn of a golden age of data and AI.
For this study, Databricks analyzed anonymized usage data from more than 9,000 customers to discover trends in data and AI usage. The analysis shows a steady increase in AI usage from February 2022, with an explosion following the launch of ChatGPT.
In addition to LLM usage, Databricks also sees that natural language processing (NLP) has become more popular: some 49% of Python data science library usage is for this technology. Transformer models are also still widely used, although usage "only" increased by 82% between November 2022 and May 2023.
All this AI usage results in organizations deploying more models than ever. The number of machine learning models in use increased 411% year on year, and the number of machine learning experiments increased 54% year on year.
Open source paves the way for data and AI
Open source software is incredibly popular in the world of data and AI. 8 of the 10 most widely used data and AI products are based on open source. The fastest-growing adoption sees Databricks at dbt, a data transformation tool, followed by Fivetran and Informatica (the only two enterprise software solutions on the list). The most widely used solutions in absolute numbers are Microsoft Power BI, Plotly and Tableau.
Finally, more and more companies are switching to a Lakehouse solution. For example, data volume in Delta Lake grew 304% year over year. 61% of new Lakehouse users migrated from on-prem and cloud-based data warehouses.