Data Science and Marketplace Lessons From Stanford Professor Ramesh Johari, Former Advisor to AirBnB and Uber

Founders' Corner

Data Science and Marketplace Lessons From Stanford Professor Ramesh Johari, Former Advisor to AirBnB and Uber

April 11, 2024

Ramesh Johari is a professor at Stanford University highly regarded for his research and expertise in market design and data science. He holds degrees from three of the most prestigious institutions in the world — Harvard, Cambridge and MIT. Academia, though, was something that he admittedly just “fell into.”

“At the time I was going to college, if you happened to be good at research, then faculty naturally encouraged you to take the next step in academia. For me, that eventually led to becoming a professor at Stanford.”

It wasn’t clear then whether his work had a bigger purpose, or even how his research translated outside the ivory tower.. So when he received tenure in 2011, he went out to the industry to apply his research to practice and figure out: “What impact am I really having on the world?”

As it turns out: a lot. Ramesh has been an advisor to many industry-leading marketplaces such as Airbnb, Upwork, Uber, Stitch Fix and Stripe that have revolutionized how we live, work, vacation, transact and travel.

We were excited to have him join for an AMA fireside chat with the Reach community. Our conversation touched on how data science is the DNA of business, what to look for in data science hires, and the importance of having an open, flexible mindset when it comes to experimentation. Below are highlights from that conversation, condensed and edited for clarity.

On how data science is part of the DNA of any digital business — particularly marketplaces

When you look at how most companies are organized, data science is usually treated as its own department, like product or marketing. But one of the most important lessons from my experience is that data science is actually the DNA of any digital business. It’s not so much another vertical as it is a horizontal function that cuts across everything you do.

This is especially true for marketplaces. The most important way to think about them is that you’re not selling the thing that people are going to buy. What you’re actually selling is the ability to make it easier to get that thing. Airbnb doesn’t sell lodging, though that’s what most people often have first in mind. What they’re really selling is the ability to make it easier to get lodging. They’re removing frictions.

Frictions get worse the bigger the marketplace is. How do I comb through the millions of Airbnb listings to find the right one for me? The only tools we have available to help are data science. Let’s take all the data we’ve collected, and use that to help the next set of people make better matches.

This creates a data flywheel, and it is why you can’t treat data science as a separate unit. As you collect data about what kinds of preferences guests have, what kinds of listings hosts are offering, and what matches get made, you’re getting more data that improves your marketplace. Which improves the matches, generates more data, and so on… It’s important to see data science from a continuous improvement view. Data science is influencing the experience for both the guest and the host and making sure that both parties find the best match.

A good metaphor for marketplaces is like the HOV (carpool) lane on the highway. What you’re paying for is the taking away of congestion.

But it is also important to think about both sides of the marketplace — which do you optimize for? Many marketplaces need to somewhat preferentially treat the sellers relative to the customers, because some sellers are making a living from the platform, whereas most customers will just use it once or twice. Understanding what friction you’re taking away (which customer are you the HOV lane for?) is critical to decision making around product and monetization.

A different way to think about correlation vs causation

Most people have heard the phrase “correlation is not causation.” Despite this, it’s a very human mistake to conflate the two.

One of the most common ways it happens is when we build machine learning models to estimate the lifetime value (LTV) for a customer to see how much he or she will spend on the platform. This is purely a prediction problem, and this model can be used to inform promotions and discounts. We may then say: Why don’t I rank all my customers by LTV and give discounts to the highest ones because they’re the most valuable?

In principle, that’s a defensible argument. But there’s a flaw if what we really care about is making sure those discounts have an impact. As it turns out, giving discounts to the highest LTV customers doesn’t really change their LTV in any way — they’re still going to spend a lot. What is more effective instead is to target discounts at folks who are more in the middle of the LTV range, because these are the people who are engaged with your platform enough, but not getting over the hump. And that discount might be what makes the difference.

What I like to tell people is to replace the word “correlation” with “prediction” and replace “causation” with “decision.” When we built that LTV model, we were making predictions. But when we asked who we should give the promotion to, we were making decisions. 

“Correlation is not causation” is a high school statement. To bring it up to the level of what founders should keep in their back pocket, it’s that predictions are not decisions. Just because a model predicts something does not mean it has made the decision for you. What you’re asking your data science team for is: help me make this decision.

On what to look for in a data scientist for an early-stage company

I don’t think you necessarily need to hire someone with a computer science or engineering background. These folks tend to focus on questions of infrastructure, software stack and methodology. It’s rare that their training has focused on the human side of the business. 

You want a data scientist who can dive deep into what the business is doing. Someone who understands the touch points between the product and the users, the touch points between the demand and supply sides, and all the different ways that decisions are made that affect the user experience, from monetization to operations and customer support. 

You’d want to interview for similar skills you’d ask of someone on the product side, such as knowing how to ask the good questions. Having a data scientist who understands the business and customers will allow them to contribute across multiple functions, which is crucial when your company is small. On the technical side, learning a thousand frameworks and syntax and languages is no longer as useful. Nowadays, you can use AI tools to parse and translate code into any programming language, and to work with complex datasets.

In my view, this puts a premium on someone who is a jack of all trades. Someone who doesn’t just think about machine learning, but also thinks about how people make decisions. These are individuals who may come from a very diverse set of fields, like econometrics, political science, or neuroscience. For example, neuroscience is a great field to hire from because these data scientists simultaneously think about the messiness of how humans think about the world, while also engaging with the most advanced data methods.

Do you need to hire a data analyst alongside this data scientist?

Here’s a bold statement: In a year, it should almost be a red flag if you feel like you need to hire a data analyst alongside the data scientist. Someone who fits the mold of the data scientist that I just described should have learned how to leverage current AI tools as part of their toolkit.

As a result, I’ve prioritized generative AI in my teaching. My data science class at Stanford effectively forces students to use generative AI, because the world they’re going to enter is one where there’s not going to be any kind of honor code that says generative AI is off limits. They’re on the job and they need to do things in the fastest way possible, so I want to get them comfortable with AI as a data science tool.

On the importance of being flexible with approaching experimentation 

Experimentation is critical even for the absolute newest business. But it’s also important to be a little flexible in your notion of what experimentation means. 

What I often tell students is not every decision has to be data-driven; that’s an extreme viewpoint that can be self-limiting. You just want to try to quantify decisions you’re making. 

We sometimes go too far in saying everything needs to be a gold standard clinical trial otherwise there’s no evidence whatsoever. The world does not work that way. All decisions, even those based on rigorous experiments, are informed by prior belief and knowledge.

You never want perfect to be the enemy of good. Limited experimenting is better than none at all. Don’t wait until you’ve got a scaled-up business with a full-stack experimentation platform before you experiment. Get started on day one; just recognize that the less data you have, the greater the injection of beliefs needs to be to be able to come to a decision. This is okay! When you’re starting a company, you’re banking your success in part on your prior beliefs.

But you also have to be open to belief revision. All good leaders have to be able to say the data is telling them something different than the beliefs they held. By equal measure, they also have to be willing to augment the data with those beliefs when it’s appropriate.

Thank you Wayee Chu for facilitating this conversation!