Researchers from Yale and Google DeepMind Unlock Math Problem-Solving Success with Advanced Fine-Tuning Techniques on Large Language Models

On Oct 26, 2023

Even the most advanced large language models (LLMs), such as GPT-4 and PaLM 2, find it difficult to solve mathematical issues since they call for imagination, mathematical reasoning, and computation. The chance of LLMs being able to discover a proper answer is considerably higher when they are permitted to tackle the problem many times. Therefore, LLMs already demonstrate the potential to improve on this arithmetic problem-solving challenge. For instance, the pre-trained PaLM 2- L can reach about 33.4% accuracy with greedy decoding. However, 79.4% of the time, there is at least one accurate answer (pass@64) when sampling 64 solutions using temperature sampling (Table 1).

Table 1: Results of the fine-tuning of supervised solutions. The MATH dataset and the PRM800K dataset, which are two different sources of training data, are contrasted.

This significant performance disparity shows that LLMs may be able to generate accurate answers but have difficulty differentiating between proper and erroneous solutions. Therefore, to narrow the performance as mentioned above difference, they investigate task-specific fine-tuning techniques that might enhance the LLM’s capacity for solution development and assessment.

They examine three fine-tuning techniques:

(1) SSFT, supervised step-by-step solution fine-tuning. They study if the pre-trained LLMs may profit from a supervised fine-tuning step as a starting point technique.

They adjust the LLMs to provide the whole solution and answer.

(2) Solution-cluster Reranking (SCR). They keep perfecting the generator as a solution evaluator for candidate solution reranking to improve the LLM’s capability to evaluate solutions. While earlier research has looked at such a solution sample-rank or reranking, they offer a novel method combining the advantages of majority voting with reranking while lowering ranking costs. To be more precise, as a preliminary stage in majority voting, they first sort the candidate replies into several groups based on their mathematical equivalency. Then, to enhance the outcomes of the majority vote even more, they apply the solution evaluator to the solutions in the most frequent clusters.

(3) Sequential multi-tasking fine-tuning. In addition to the solution assessment task, they are also interested in enhancing the LLM’s performance on the solution-generating task and determining if the solution evaluation task’s training objective may help the model generate solutions.

To achieve this, they provide a sequential multi-task learning environment where the solution assessment task is framed as a natural language generation problem, such that its training goal may offer a valuable supervision signal to the solution generation model. In further detail, they adjust the model in three stages: (1) as a generator (SSFT), (2) as a solution evaluator (SCR), and (3) again as a generator (SSFT).

They do extensive research using PaLM 2-S* and PaLM 2-L, the small and big forms of PaLM 2, on the difficult MATH dataset, which results in the following conclusions:

• Since SSFT benefits more from fine-grained, well-formatted answers, the caliber and style of the step-by-step solutions can significantly influence the refined model.

• Reranking only the most common solution clusters can result in better performance than reranking all of the solutions, and it can also improve computational efficiency, which is why they think it would be a better standard practice for future work.

• They demonstrate the benefit of training the model for both solution generation and evaluation tasks and present a successful attempt at leveraging the learning signal of a binary evaluation task for a generation model. Their proposed multi-task sequential fine-tuning can more effectively improve the performance of the solution generation model compared with supervised solution fine-tuning only.

Check out the Paper. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 32k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

We are also on WhatsApp. Join our AI Channel on Whatsapp..

Aneesh Tickoo is a consulting intern at MarktechPost. He is currently pursuing his undergraduate degree in Data Science and Artificial Intelligence from the Indian Institute of Technology(IIT), Bhilai. He spends most of his time working on projects aimed at harnessing the power of machine learning. His research interest is image processing and is passionate about building solutions around it. He loves to connect with people and collaborate on interesting projects.

🔥 Meet Retouch4me: A Family of Artificial Intelligence-Powered Plug-Ins for Photography Retouching

Credit: Source link