Jay Mishra is the Chief Operating Officer (COO) at Astera Software, a rapidly-growing provider of enterprise-ready data solutions. They help business users bridge the data-to-insight gap with a suite of user-friendly yet high-performance data extraction, data quality, data integration, data warehousing & electronic data interchange solutions, which are used by both midsize and Fortune 500 companies across a range of industries.
What initially attracted you to computer science?
I come from a mathematics background. In fact, I have my undergraduate degree in Mathematics and Computer Science. From the beginning, I have been fascinated with mathematics and it was an extension of logic and mathematics to get into computer science. So that’s how I got my undergraduate education. And then I found certain areas in computer science very attractive such as the way algorithms work, advanced algorithms. I wanted to do a specialization in that area and that’s how I got my Masters in Computer Science with a specialty in algorithms. And since then it’s been a very close relationship, I still keep myself updated with what is going on in the field.
You’re currently the COO of Astera, could you share with us what your day-to-day role entails?
My official title is COO. We are in a growth mode, but we have been building our products for a long time and I have been involved from the beginning from all different areas of the company, including building the product that is actually coding the product, then making sure that the features are meeting the customers’ requirements, working closely with the customers and then sales and marketing as well. That is kind of the extension of it.
I have my hands and pretty much all the areas from the beginning and at this point of course it includes other responsibilities such as ensuring that the company is meeting its revenue goals and we are adding the right features and right products to expand our market. That is some additional responsibility apart from the core responsibility of building and taking it to market.
For readers who are unfamiliar with this term, what is data warehousing?
Data warehousing is an architectural pattern used to bring you all of your enterprise data together so that you have one place from which you can generate any kind of analytics, any kind of the ports or dashboards that are going to be presenting the true picture of where your business is and also about forecasting how the business is going to be doing in the future to cater to all of that you bring your data together in a certain way and that architecture is called a data warehouse.
The term actually is taken from your real life warehouse where you bring your products and you have selves and you organize them to store your data, but when you come to the data world, you’re bringing your data from various sources. You’re bringing your data from your production data, from your website, from your customers, from your sales and marketing, from your finance department, from your human resources department. You bring all the data together, bring it into one place, and that’s what is going to be called a data warehouse and is designed in a certain way so that reporting especially based on timeline is going to be easy. That’s the core purpose of a data warehouse.
What are some of the key trends in data warehousing today?
Data warehousing has evolved quite a bit in the past 20-25 years. About 10 years ago or so, automated data warehousing as in using software products to build data models, to build data warehouses, and to populate it started and it has accelerated quite a bit in the recent past I would say about going back two to three years, and the focus is on automation. We already know patterns- the patterns have been around for such a long time and the patterns are repetitive. There are a lot of repetitive tasks and automation’s goal is to help users in front of repetition. They don’t have to spend time doing similar tasks again and again on which they spend a lot of time, and since the pattern is already defined, you can use automation tools to take care of that, and that brings down the amount of time and resources spent on building and maintaining a data warehouse. Automation has been a key trend in the past few years and that ranges from the design to building of a data warehouse to loading and maintaining, all of that can be automated.
Our product is one of those that is able to do the entire automation including the ETL pipelines and data modeling and loading data into your star schemas or data wall automatically and also maintaining it using CDC. That has been one of the key trends and one most recent ones is the addition of artificial intelligence to use AI, specifically generative AI to make automation even better. You can make the configuration of your data warehousing artifacts, your pipelines, and some of the points where the user has to decide about which way to go and which way they should not go. Those decision-making points can be catered to using artificial intelligence, and we are seeing a lot of intersection between artificial intelligence and data warehousing in recent past that I would say going back about a year or so was really good.
What are the four fundamental principles that businesses should consider for their data warehouse development?
- What kind of data do you need?
- Architectural patterns
- Toolsets
- Team
Why do companies need a modern data stack?
It depends on how we define modern and that keeps changing by the year, month, and even days now. I would say modern tool sets that are designed keeping in view the requirements of the new age data that we are receiving have changed in in past few years and the volume of course has changed. We have big data now and even the data that is being produced by your ecommerce websites, your production database, and even data going to different areas of your business, the data’s nature is changing. Earlier it used to be mostly structured data, now a lot of unstructured data is coming into play, so that is changing and the velocity of the data is changing.
How quickly the data is being generated, how quickly the data is coming, being made available for use, and since the data’s nature is changing, we have to keep looking at the modern, keep looking at the toolset that is able to address those changes.
The new data stack or modern data stack is designed to handle all the variations in the structures and the velocity of the data, and it is able to account for the new architectural patterns that we have seen coming up in the past few years and it addresses basically the advancement in general that is happening around the data world.
If you want to make the best use of your data, you got to look at modernizing your data stack and that is the only way to keep up with the new data challenges.
Second, we have seen that sometimes creating a solution is a working way to break it, but the nature of data itself is that it keeps changing, you have to keep looking at it and we have to see the changes that are happening in the data and you’d respond to that and existing solutions you may not be able to do that, you have to keep looking at the advancements and you have to keep adding to it.
What are some of the current data management challenges that are seen in the industry?
- Speed
- Varying data formats
- Data publishing
What are some ways that Astera has integrated AI into customer workflow?
- Using Gen AI to enhance usability
- AI integration in RM and other modules
- AI functionality as a toolset
What are some of the best practices to leverage AI and ML models in data management for large companies?
This area of large language models is still evolving, evolving very rapidly though and we were the first users of this area and we tried to use generative AI to enhance the usability of our own product and to cater to certain use cases. We are internally using Open AI and now going with Lama too and other large language models with a low-rank adapt adaption.
Using fine-tuning of this LLMS, we are able to deploy a small size like 8 to 13 billion parameter models, and deploy them locally. It is something that has worked really well for us and what we recommend is that instead of just getting or using one versus the other, try out different base models and different configurations and see which one works for you.
What we have done is we have actually created this configuration where you are able to pick from a large list of options. So pretty much what is available to a developer or data scientist who is working with the open source libraries and going through their own data science journey. We have brought all of those within our product.
You are able to now experiment with different large language models and different configurations and test them, deploy them, and see which one makes sense for your scenario. From our experience definitely, we have seen that it is advisable to have the model fine-tuned and deployed locally and that is dedicated to your scenario instead of relying on APIs. That has not worked that well for us because APIs have delays and for the data-centric products that is something that is not acceptable. Especially with the large volumes, it becomes an issue.
We recommend playing with or experimenting with all possible options in open-source libraries and trying to keep the fine-tuned model localized and customized for your scenario.
Why is Astera a superior solution than competing platforms?
- Usability (code free and drag and drop UI and enhanced usability using AI)
- Automation
- Unified and end to end Data Management Platform
Thank you for the great interview, readers who wish to learn more should visit Astera Software.
Credit: Source link
Comments are closed.