Etan Ginsberg, Co-Founder of Martian – Interview Series

Etan Ginsberg is the Co-Founder of Martian, a platform that dynamically routes every prompt to the best LLM. Through routing, Martian achieves higher performance and lower cost than any individual provider, including GPT-4. The system is built on the company’s unique Model Mapping technology that unpacks LLMs from complex black boxes into a more interpretable architecture, making it the first commercial application of mechanistic interpretability.

Etan has been coding, designing websites, and building e-businesses for clients since he was in middle school. A polymath Etan is a World Memory Championships Competitor and placed 2nd at the World Speed Reading Championships in Shenzhen, China.

He is an vid hackathon competitor. Past awards include 3rd prize at Tech Crunch SZ, top 7 finalist at Princeton Hackathon, and 3 industry awards at Yale Hackathon.

You are a previous two-time startup founder, what were these companies and what did you learn from this experience?

My first company was the first platform for the promotion and advancement of the sport of American Ninja Warrior. Back in 2012, I viewed American Ninja Warrior as an underground sport (akin to MMA in the 90s) and I made the first platform where people could buy blueprints, order obstacles, and find gyms to train. I consulted for companies looking to start their own gyms including assisting the US Special Forces with a training course and scaling a facility from napkin sketch to $300k in revenue in the first 3 months. Although I was in high school, I had my first experience managing teams of 20+ workers and learned about effective management and interpersonal relationships.

My second company was an alternative asset management company I co-founded in 2017 prior to the ICO-wave in crypto. This was my first exposure to NLP where we used sentiment analysis of social media data as an investment strategy.

I learned a lot of the hard and soft skills that go into running a startup — from how to manage a team to the technical aspects of NLP. At the same time, I also learned a lot about myself and about what I wanted to work in. I believe that the most successful companies are started by founders who have a broader vision or goal driving them. I left crypto in 2017 to focus on NLP because augmenting and understanding humanity’s intelligence is something that really drives me. I was glad to discover that.

While attending the University of Pennsylvania you did some AI research, what were you researching specifically?

Our research originally focused on building applications of LLMs. In particular, we worked on educational applications of LLMs and were building the first LLM-powered cognitive tutor. The results were pretty good – we saw a 0.3 standard deviation improvement in student outcomes in initial experimentation – and our system has been used from the University of Pennsylvania to the University of Bhutan.

Can you discuss how this research then led you to Co-Founding Martian?

Because we were some of the first people building applications on top of LLMs, we were also some of the first people to encounter the problems people face when they build applications on top of LLMs. That guided our research towards the infrastructure layer. For example, quite early on, we were fine-tuning smaller models on the outputs of larger models like GPT-3, and fine-tuning models on specialized data sources for tasks like programming and math problem solving. That eventually led us to problems about understanding model behavior and about model routing.

The origins of the Martian name and its relationship to intelligence is also interesting, could you share the story of how this name was chosen?

Our company was named after a group of Hungarian-American scientists known as “The Martians”. This group, which lived in the 20th century, was composed of some of the smartest people to have ever lived:

  • The most famous among them was John Von Neumann; he invented game theory, the modern computer architecture, automata theory, and made fundamental contributions in dozens of other fields.
  • Paul Erdos was the most prolific mathematician of all time, having published over 1500 papers.
  • Theodore Von Karman established the fundamental theories of aerodynamics and helped found the American space program. The human-defined boundary between Earth and outer space is named the “Kármán line” in recognition of his work.
  • Leo Szilard invented the atomic bomb, radiation therapy, and particle accelerators.

These scientists and 14 others like them (including the inventor of the hydrogen bomb, the man who introduced group theory into modern physics, and fundamental contributors to fields like combinatorics, number theory, numerical analysis and probability theory) shared a remarkable similarity – they all were born in the same part of Budapest. That led people to question: what was the source of so much intelligence?

In response, Szilard joked that, “Martians are already here, and they call themselves Hungarians!” In reality… nobody knows.

Humanity finds itself in a similar position today with respect to a new set of potentially superintelligent minds: Artificial Intelligence. People know that models can be incredibly smart, but have no idea how they work.

Our mission is to answer that question – to understand and harness modern superintelligence.

You have a history of incredible memory feats, how did you get immersed into these memory challenges and how did this knowledge assist you with the concept of Martian?

In most sports, a professional athlete can perform about 2-3X as well as the average person (compare how far an average person can kick a field goal or how fast they throw a fast ball compared to a professional). Memory sports are fascinating because the top athletes can memorize 100x or even 1000x more than the average person with less training than most sports. Moreover, these are often people with average natural memory who credit their performance to specific techniques that anyone can learn. I want to maximize humanity’s knowledge, and I saw the world memory championships as an underappreciated insight into how we can drive extraordinary returns increasing human intelligence.

I wanted to deploy memory techniques throughout the education system so I started exploring how NLP and LLMs could assist in reducing the setup cost that prevent most effective educational methods from being used in the mainstream education system. Yash and I created the first LLM-powered cognitive tutor and that led to us discovering the problems with LLM-deployment that we now help solve today.

Martian is essentially abstracting away the decision of what Large Language Model (LLM) to use, why is this currently such a pain point for developers?

It’s becoming easier and easier to create language models – the cost of compute is going down, algorithms are becoming more efficient, and more open source tools are available to create these models. As a result, more companies and developers are creating custom models trained on custom data. As these models have different costs and capabilities, you can get better performance by using multiple models, but it’s difficult to test them all and to find the right ones to use. We take care of that for developers.

Can you discuss how the system understands what LLM is best used for each specific task?

Routing well is fundamentally a problem about understanding models. To route between models effectively, you want to be able to understand what causes them to fail or succeed. Being able to understand these characteristics with model-mapping allows us to determine how well any given model will perform on a request without having to run that model. As a result, we can send that request to the model which will produce the best result.

Can you discuss the type of cost savings that can be seen from optimizing what LLM is used?

We let users specify how they tradeoff between cost and performance. If you only care about performance, we can outperform GPT-4 on openai/evals. If you are looking for a specific cost in order to make your unit economics work, we let you specify the max cost for your request, then find the best model to complete that request. And if you want something more dynamic, we let you specify how much you’re willing to pay for a better answer – that way, if two models have similar performance but a big difference in cost, we can let you use the less expensive models. Some of our customers have seen up to a 12x decrease in cost.

What is your vision for the future of Martian?

Each time we improve our fundamental understanding of models, it results in a paradigm shift for AI. Fine-tuning was the paradigm driven by understanding outputs. Prompting is the paradigm driven by understanding inputs. That single difference in our understanding of models is much of what differentiates traditional ML (“let’s train a regressor”) and modern generative AI (“let’s prompt a baby AGI”).

Our goal is to consistently deliver breakthroughs in interpretability until AI is fully understood and we have a theory of intelligence as robust as our theories of logic or calculus.

To us, this means building. It means creating awesome AI tooling and putting it into people’s hands. It means releasing things which break the mold, which no-one has done before, and which — more than anything else — are interesting and useful.

In the words of Sir Francis Bacon, “Knowledge is power”. Accordingly, the best way to be sure that we understand AI is to release powerful tools. In our opinion, a model router is a tool of that kind. We’re excited to build it, grow it, and put it in people’s hands.

This is the first of many tools we’re going to release in the coming months. To discover a beautiful theory of artificial intelligence, to enable entirely new types of AI infrastructure, to help build a brighter future for both man and machine – we can’t wait to share those tools with you.

Thank you for the great interview, readers who wish to learn more should visit Martian.

Credit: Source link

Comments are closed.