Research From University Of Innsbruck Austria Reveals ChatGPT's Weaknesses In Understanding Time Related
JAKARTA - A research team from the University of Innsbruck in Austria has developed a method to determine how well artificial intelligence (AI) systems can understand 'temporal validity,' a parameter that could have a significant impact on the use of generative AI products such as ChatGPT in the fintech sector.
temporal validity refers to how relevant certain statements are to other statements over time. Basically, this refers to the time-based value of the paired statement.
An AI evaluated based on its ability to predict temporal validity will be given a series of statements and requested to choose the most closely related one at all times.
In a recently published pre-print research paper entitled "Tempural Validity Change Prediction," Georg Wenzel and Adam Jatowt used an example of a statement in which a person was declared to be reading a book on a bus.
In that example, the most valid context statement is "I only have a few more pages left, then I'm done." As the target statement shows bus passengers currently reading books, the other two statements are deemed irrelevant.
SEE ALSO:
Researchers created labelled datasets from training examples, which they use to build benchmarking tasks for big language models (LLM). They chose ChatGPT as the basic model for testing due to its popularity among users and found that its performance was below standard compared to less general models.
"CHATGPT is included in a low-performance model, which is consistent with other studies of TCS understanding. Its shortcomings may be due to a cow-shot learning approach and a lack of knowledge of the specific characteristics of the dataset," the researchers said.
This shows that a situation where temporal validity plays a role in determining the versatility or accuracy, such as in producing news articles or evaluating financial markets, will likely be better handled by targeted AI models than more general services such as ChatGPT.
The researchers also showed that experiments with temporal value change predictions during the LLM training cycle could potentially result in a higher score on the benchmarking task of temporal change.
While the paper does not specifically discuss the implications beyond the experiment itself, one of the current limitations of the generative AI system is the lack of the ability to distinguish between past and present events in a literature corpus.
Teaching these systems how to determine the most relevant statements across the corpus, with actual factors determining, could revolutionize the AI model's ability to make strong real-time predictions in large-scale sectors such as crypto markets and stocks.