Partager:

JAKARTA - Databricks, a startup company based in San Francisco and worth 38 billion US dollars (Rp564 trillion), released a number of data on Wednesday 12 April which it claims businesses and researchers can use to train chatbots similar to ChatGPT.

The data, based on a questionnaire filled with Databricks employees, fills an important gap in the company's efforts to create a commercially usable tool to train an AI system that could be an alternative to Microsoft-backed OpenAI.

Databricks said it had spent the last few weeks raising 15,000 questions and answers from 5,000 employees in 40 countries, and then verifying the data for quality, which CEO Ali Ghodsi estimated spent millions of dollars.

Databricks sells software to build an AI system.

Ghodsi told Reuters that the company released training data for free in the hope that other companies would use it to create their own AI systems, perhaps by using Databricks to do so.

The free dataset was released after Databricks last month after releasing Dolly, a large language model for open sources, as a technology basis for chatbots. However, the model cannot be used in commercial products as data used to train the model is generated by OpenAI's ChatGPT, which prohibits the use of its data to develop a commercial AI system that can compete with OpenAI.

Using data generated by AI to train other AI systems has become common. The new chatbots published by Stanford University and California University of Berkeley this year, for example, use machine data generated from ChatGPT, however, both state that their models cannot be used for commercial purposes.

Ghodsi admits that this dataset is still far from perfect because it only consists of databases that tend to be male. Users will be able to check the training data themselves, something that cannot be done for models like Alphabet Inc's ChatGPT or Bard, whose training data was not released.

"We don't claim that this is an unusual dataset," said Ghodsi. "We're just trying to encourage communities to lead to higher transparency, and more people have their own model than just a few people we have to trust."


The English, Chinese, Japanese, Arabic, and French versions are automatically generated by the AI. So there may still be inaccuracies in translating, please always see Indonesian as our main language. (system supported by DigitalSiber.id)