Late last year and so far this year, 2023 has been a great time for AI people to create AI applications, and this is possible due to a list of AI advancements by non-profit researchers. Here is a list of them:
ALiBi
ALiBi is a method that efficiently tackles the problem of text extrapolation when it comes to Transformers, which extrapolates text sequences at inference that are longer than what it was trained on. ALiBi is a simple-to-implement method that does not affect the runtime or requires extra parameters and allows models to extrapolate just by changing a few lines of existing transformer code.
Scaling Laws of RoPE-based Extrapolation
This method is a framework that enhances the extrapolating capabilities of transformers. Researchers found out that fine-tuning a Rotary
Position Embedding (RoPe) based LLM with a smaller or larger base in pre-training context length could lead to a better performance.
FlashAttention
Transformers are powerful models capable of processing textual information. However, they require a large amount of memory when working with large text sequences. FlashAttention is an IO-aware algorithm that trains transformers faster than existing baselines.
Branchformer
Conformers (a variant of Transformers) are very effective in speech processing. They use a convolutional and self-attention layer sequentially, which makes its architecture hard to interpret. Branchformer is an encoder alternative that is flexible as well as interpretable and has parallel branches to model dependencies in end-to-end speech-processing tasks.
Latent Diffusion
Although Diffusion Models achieve state-of-the-art performance in numerous image processing tasks, they are computationally very expensive, often consuming hundreds of GPU days. Latent Diffusion Models are a variation of Diffusion Models and are able to achieve high performance on various image-based tasks while requiring significantly fewer resources.
CLIP-Guidance
CLIP-Guidance is a new method for text-to-3D generation that does not require large-scale labelled datasets. It works by leveraging (or taking guidance) a pretrained vision-language model like CLIP that can learn to associate text descriptions with images, so the researchers use it to generate images from text descriptions of 3D objects.
GPT-NeoX
GPT-NeoX is an autoregressive language model consisting of 20B parameters. It performs reasonably well on various knowledge-based and mathematical tasks. Its model weights have been made publically available to promote research in a wide range of areas.
QLoRA
QLoRA is a fine-tuning approach that efficiently reduces memory usage, allowing fine-tuning a 65 billion parameter model on a single 48GB GPU while maintaining optimal task performance with full 16-bit precision. Through QLoRA fine-tuning, models are able to achieve state-of-the-art results, surpassing previous SoTA models, even with smaller model architecture.
RMKV
The Receptance Weighted Key Value (RMKV) model is a novel architecture that leverages and combines the strengths of Transformers and Recurrent Neural Networks (RNNs) while at the same time bypassing their key drawbacks. RMKV gives comparable performance to Transformers of similar size, paving the way for developing more efficient models in the future.
All Credit For This Research Goes To the Researchers of these individual projects. This article is inspired by this Tweet. Also, don’t forget to join our 32k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.
If you like our work, you will love our newsletter..
We are also on WhatsApp. Join our AI Channel on Whatsapp..
I am a Civil Engineering Graduate (2022) from Jamia Millia Islamia, New Delhi, and I have a keen interest in Data Science, especially Neural Networks and their application in various areas.
Credit: Source link
Comments are closed.