Salesforce AI Open Sources CausalAI Library for Causal Analysis of Time Series and Tabular Data

On Feb 2, 2023

The process of causal analysis is used to determine and address the causes and effects of a problem. Instead of addressing the symptoms of a problem, causal analysis helps identify the root cause of the problem so that its symptoms become less impactful. To understand this better with the help of an example, consider the scenario where airline tickets are becoming prohibitively expensive. The first stage is to determine what causes the fluctuations in airfares so that a potential macroeconomic measure can be found to reduce the airfares. One key variable that significantly affects airfares is the price of crude oil. If oil prices rise, airfares will rise in proportion to accommodate an increase in the fuel cost for airlines. On the other hand, if airlines raise their fares without regard to any variation in oil prices, this rise should not affect oil prices. As a result, it is safe to conclude that oil prices influence airfares but not the other way around.

This example shows how to perform an intervention on one variable and forecast its impact on another using causal analysis. Using historical data alone, causal analysis can assist researchers in automatically predicting such cause-effect relationships. Additionally, causal analysis is useful for determining a numerical estimate of the change in the value of a feature if its causal predecessors are affected. Although the crude oil and airline fares example was reasonably straightforward, causal analysis can be a difficult task in a multivariable system.

To make it simpler for researchers to perform causal analysis, Salesforce researchers recently unveiled CausalAI Library, an open-source library for causal analysis that employs observational data. The library provides algorithms that can handle linear and non-linear causal interactions between variables and supports tabular and time series data of different data types (discrete and continuous). The Salesforce CausalAI Library intends to offer a one-stop solution for the many additional requirements in causal analysis, ranging from data generation to multi-processing for speed-up. Additionally, the researchers provide a user interface free of coding that enables users to perform causal analysis. The library’s main objective is to offer a quick and user-friendly solution to various causality-related issues.

The Salesforce CausalAI Library intends to address causal inference and discovery issues. Using observational data, causal discovery aims to answer problems like which variable in a multivariable system affects which variable. To put it another way, the goal of causal discovery is to uncover the directed causal graph that underlies observational data, where the variables are considered nodes and the edges remain unknown. On the other hand, causal inference entails calculating a numerical estimate of how one set of variables influences another variable. Contrary to machine learning models’ inference, which is based on correlation, causal inference traverses the causal graph to determine how changes in one variable affect the target variable. This indicates that even though two or more variables are correlated, there may not be a causal link between them, in which case changing one of them may have no impact on the other.

The library’s causal discovery module generates an output causal graph from an input that consists of an observational data object and an optional prior knowledge object. The causal inference module receives a causal graph as input that can either be directly provided by the user or estimated by the causal discovery module, along with the user-defined interventions, and outputs the estimated effect on a target variable.

Apart from certain key features like supporting data of different data types, using structural equation models to generate synthetic data, and distributed computing, the library also has many other features. Supporting targeted causal discovery is one of them. In this case, the user is just interested in learning the causes of a single variable of interest and not the causes of the complete causal graph. Users can also incorporate any user-provided partial prior knowledge and visualize tabular and time series causal graphs. When it comes to the algorithms supported for causal discovery, the PC algorithm, Granger causality, and VARLINGAM algorithms are supported for time series data and the PC algorithm for tabular data. To imitate the data generation process for causal inference, conditional models are learned based on the causal graph.

Due to its parallelization functionality and user-friendly interface, the CausalAI library outperforms other libraries for causal analysis. The Salesforce team is constantly developing the library. In their future work, the researchers intend to expand the library of algorithms for causal discovery and inference. Other goals include supporting latent variables, GPU-based computing, and heterogeneous data types (mixed continuous and discrete kinds). More details regarding the Salesforce CausalAI library can be found below.

Check out the Github and Paper. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 13k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

Khushboo Gupta is a consulting intern at MarktechPost. She is currently pursuing her B.Tech from the Indian Institute of Technology(IIT), Goa. She is passionate about the fields of Machine Learning, Natural Language Processing and Web Development. She enjoys learning more about the technical field by participating in several challenges.

Credit: Source link