Apple AI Researchers Propose ‘Plan-then-Generate’ (PlanGen) Framework To Improve The Controllability Of Neural Data-To-Text Models

On Oct 28, 2021

Screen Shot 2021-10-28 at 2.37.53 PM — Source: https://arxiv.org/pdf/2108.13740.pdf

In recent years, developments in neural networks have led to the advance of data-to-text generation. However, their inability to control structure can be limiting when applied to real-world applications requiring more specific formatting.

Researchers from Apple and the University of Cambridge propose a novel Plan-then-Generate (PlanGen) framework to improve the controllability of neural data-to-text models. PlanGen consists of two components: a content planner and a sequence generator. The content planner starts by first predicting the most likely plan that their output will follow. Thereafter, the sequence generator generates results using the data and content plan as input.

To ensure PlanGen model’s controllability, the research group takes an extra step to propose a structure-aware reinforcement learning objective that encourages generated output from content plans. They use an ordered list of tokens for its simplicity and broad applicability. Each token in the content plan is a slot key from the table in terms of tabular data. Graphical data stored in an RDF format is represented by tokens that represent the predicate of the triple.

Source: https://arxiv.org/pdf/2108.13740.pdf

The researchers validated their proposed model by testing it on two benchmarks with different data structures: ToTTo dataset with tabular data and WebNLG dataset with graphical data. The proposed model achieves better performance than previous state-of-the-art approaches. This has been shown by both human and automatic evaluations, with the outputs containing highly controllable structures that can achieve what they set out for in terms of generation quality.

Key takeaways:

The research group proposed the PlanGen framework, which consists of a content planner and a sequence generator.
The researchers validated their proposed model by testing it on two benchmarks with different data structures: ToTTo dataset with tabular data and WebNLG dataset with graphical data.
The research group did in-depth analysis revealing the merits of the proposed approach in terms of controllability and diversity

Paper: https://arxiv.org/pdf/2108.13740.pdf

Github: https://github.com/yxuansu/plangen

Dataset: https://github.com/google-research-datasets/ToTTo

Suggested

Credit: Source link