The Coming Rise of Synthetic Data in AI

More and more often, I am hearing startups talk about “synthetic data.”  I’ve seen my existing startup investments start to use it, and I have seen entire companies formed around it.  So, what is it?

Put simply, it is data created by a machine.  Now, why would we do that?  Imagine that you want to train a machine vision model to identify a Tesla.  Now imagine you only have 10 pictures of Tesla’s to train on, so you need a bigger data set to train a better model.  One way to get a bigger data set is to go get thousands of more Tesla pictures.  Or, you could consider doing some simple manipulation to the pictures you have to create new pictures instead.  

For example, maybe you don’t have a picture of a red Tesla.  You could photoshop one of your other pictures to make the Telsa red, and add that red Tesla to your data set so you model performs better at classifying Teslas.  What most people use synthetic data for is to test under different conditions.  They take an image and change the lighting, shadows, etc to simulate different conditions so the machine learning model learns what an object looks like from different angles.

A common use of synthetic data now is to build data sets for autonomous vehicles.  You could create an entire machine generated city, drive around that city obeying traffic laws, and feed that data into the autonmous vehicle model.  This allows you to simulate things that may be harder to capture in real life (e.g. a car running a stop sign).

Now, synthetic data isn’t always good for a model.  In NLP applications, one of the criticisms is that synthetic data sets generated for training are often very simple (because our language generation techniques are still weak compared to other types of AI).  So training a model on all of this language data fails to capture the nuance and vagaries of messy real human language.  But in other situations, like machine vision, synthetic data tends to work really well.

From a business perspective, there are a few use ways to think about synthetic data.  First of all, can you use it to generate new variations of things in ways that are valid for training.  Secondly, can you use it to label data about things that humans no longer need to label?  And finally, should you create the synthetic data yourself, or not?  

My current hypothesis is that synthetic data will mostly be done by a few third party platforms in a market that develops into an oligopoly.  I think the way that software debugging works today:  report a bug -> code a fix -> test on a staging environment -> deploy to production and verify, will be the way a synthetic data workflow evolves.  It would look like this:  report a model failure (e.g., model doesn’t detect things well at night) -> use a synthetic data platform to generate new items for a data set that increase the data for that problem (what things look like at night) -> rebuild model -> test model -> deploy new model to production.  Someday it will be push button easy.

This means if I am right, synthetic data business opportunities will come in 2 flavors.  The first is synthetic data for common objects, where there is lots of data.  These platforms will win by being the easiest to use, connecting to the most workflows, and having the most common options for data generation.  The second is cases where generating the synthetic data is hard because of the nature of the problem space and the lack of existing data sets to start with.  This will lead to specialized providers who can master specific domains.

If you are working on AI, soon you will need a synthetic data strategy.  And if you are company in the space, please reach out if you are looking for investment.

What Ethan Zuckerman Could Teach Googlers About Debate Culture

When I read recently about Google’s attempts to corral internal debates, I thought the way Ethan Zuckerman handled his leaving the MIT Media Lab was a very good model for how to handle debate in general. I’ve met Ethan twice while sitting on panels together and have always been impressed with how he handled himself. On one of the panels he even criticized many of my personal views about AI and the news media, but did so in a way that was very factual and respectful.

Here is the key text I like from Zuckerman’s response:

That’s okay. I feel good about my decision, and I’m hoping my decision can open a conversation about what it’s appropriate for people to do when they discover the institution they’ve been part of has made terrible errors. My guess is that the decision is different for everyone involved.

The decision is “different for everyone involved.”

As someone who has spent a lot of time in debates over the years, one of the things I’ve noticed is that the reasoning behind the debates, in general, has gotten worse. People rarely take Zuckerman’s view that rational people can come to different decisions or conclusions about the same thing that happened, or a stance on a major topic. I suspect if most college students today had to write a resignation letter from the MIT Media Lab, they would demand that everyone else resign too, and the lab shut down, or something like that.

What struck me about Zuckerman’s response is that he doesn’t demand anyone do anything else. He is mostly concerned with his own behavior and feelings, and he acknowledges that others may feel differently and that’s ok. It is an extremely mature perspective in today’s world, and I wanted to highlight it because I feel like it should be the norm, not the exception.

I’ve had a lot of friends leave Google over the past couple of years, as people say “it’s not the same.” I know some of their consternation is due to the attitudes of Googlers, the way the debates happen, and the my-way-or-the-highway debate types that often dominate these discussions. Maybe Googlers could learn something from Zuckerman about respect, perspective, and maturity.

Hiring, Reputations, and Why Backupify Was Really Successful

I remember doing reference calls on the very first executive I tried to hire at Backupify. I was 31 years old, a first time CEO, and had never hired a senior executive before. I had a candidate, and I mined LinkedIn for 3 people that had worked with him before to do reference checks. The reference calls went like this:

Person 1: Yeah he was fine. I’d work with him again. Not much to add.

Person 2: Oh he’s awesome – a great boss. I loved working for him and I’d follow him anywhere. I’d love to work with him again.

Person 3: That asshole? I hate him. I would never work with that !#@!%#! again.

Now, you might think this is a bad sign, but, I think most people that have done a lot of hiring will tell you that it is hard. Even candidates who come with 3 glowing recommendations may not work out, and may not do great work at your company. Sometimes people with mixed references like this just happen to be opinionated and sometimes rub some people the wrong way.

I remember another time when I hired a programmer for our marketing team because he wanted to move into a marketing role. He was terrible at it. After just a few months I fired him. I told him he sucked at this role but, I think he would be great at a developer relations role, and I recommended him to another company I knew with such an opening.

He wrote a really nasty blog post about me, and Backupify, and how terrible the culture was, and how he never should have gone to work there. Then he interviewed at the place I referred him and he got the job. Three weeks after he started, he was in the audience at an event where I was a panelist. He came up afterwards and hugged me, and said this new role was such a good fit.

Hiring is hard. Part of the reason it is hard is that we rely a lot on people’s reputations. A big part of the hiring process at most places is the reference checks, but, I’ve found reference checks to be almost useless. I basically just do them in case someone says the candidate did something highly unethical, but, other than that, I’ve learned that it is difficult to untangle someone’s performance from their situation. So whether or not they did a great job somewhere else may not matter for your company.

Why Backupify Was Successful

Backupify struggled from 2009 – 2011. Our primary product was initially consumer focused, then SMB focused on gmail backup. If you hear people talk about why we ultimately had a 9 figure exit, it was because we hired this or that person, because we finally figured it out or “cracked the code”, or maybe I finally grew into a good CEO and made the right decisions. But you want to know what really happened? The market moved in our favor.

I was sitting at Google’s partner conference in 2012, and almost every announcement or session or presentation was about Gdrive. Google had come to see Dropbox and Box as competition and was going on all-in on Google Drive. I called a management meeting for later that day and walked the team through it all and said we had to change the strategy to focus the product and the marketing on GDrive. Suddenly, customers who were meh about Gmail backup were putting their key documents in the cloud via Gdrive and were very concerned about backing them up. Our revenue curve took off and 2 years later we got acquired.

But the key lesson here was, we didn’t have any great vision. We had some great people, and they did make a difference, but, none of them had some key insight that changed the trajectory of the company. It was a lot of little insights, and blocking and tackling, that all added up to success. The real breakthrough – the reason that we were successful, was out of our control. The real reason was – the market moved in our favor. We were already doing a Google Docs backup, so following this Gdrive movement was easy.

There are a lot of companies like this. It’s not that the team doesn’t matter, it does, but, the real key is what is happening in the market and whether you are well positioned to take advantage of that. Good teams find that spot in the market, if it exists. The Backupify team was great, but the market really made the company.

And that brings me back to hiring, and reputations. Some people are just in the right place at the right time. Some people are awesome, but in bad markets or bad companies. Some people ride the coattails of the awesome people, and get an undeserved reputation.

When I moved to Boston in 2010, I had that experience. I’d meet with someone in the startup scene who would say “oh, you have to meet person X – he’s awesome.” Then before I’d meet person X, someone else would say “that guy? don’t waste your time.” It was hard to figure out who was “good” and who wasn’t. There are very few people that are considered rockstars, across the board, by everyone.

The lesson here, and the reason I’m writing about this is that, when you go to raise money, VCs will talk trash about other VCs. Some of it is true, some of it isn’t, but it’s all contextual, and you probably don’t know the context. When you go to hire executives, you will hear many different things. Some of of it is true, some of it isn’t, but it’s all contextual. You get my point right? Reputation is contextual. Some people have very stable reputations because they have been in very stable contexts.

Sometimes you just have to dig in and take chances on people, employees, executives, and investors. People change and adapt and learn. But some don’t. So, reputations matter, but, be careful. Do your own diligence, and don’t be afraid to take risks on people who are unproven.

What Horses, Watches, And Bookstores Can Teach Us About Why Automation Won’t Kill Jobs

A few months ago I spoke on a panel, and I was asked a question that I am always asked about AI and automation – will it kill jobs? And if so, what do we do? The answer I gave surprised many people who said they had never heard an answer like this before so I thought I would write about it here.

Here’s a heuristic that will guide the way – things we no longer need don’t always go away. Sometimes they die briefly, but then are resurrected as status symbols.


When I was working on my MBA in 1998, my business school professors were freaking out because they had no idea what business was going to look like when the Internet really took hold. I had a professor tell us that in 5 years all brick and mortar retail would be dead, and that we would only shop online. Obviously they were very wrong. If you take bookstores as an example, you might be surprised that while Amazon initially put a big dent in the bookstore industry, there are now more bookstores in the U.S. than there were before Amazon. The algorithms of Amazon are super efficient but, they don’t provide the serendipity, the community, or the curation we sometimes desire.


I remember playing golf with a friend in 2007 who said “wow, you still wear a watch?” He told me he had no need for one because he had a cellphone and it always had the time. The cellphone was, for a while, definitely killing the wristwatch, but much like independent bookstores, it appears that watches are starting to come back as more of a status symbol than ever before.


Two hundred fifty years ago, everyone owned a horse. That’s how you got somewhere. When cars came around, it didn’t kill off the horse industry altogether. Instead, horses became expensive status symbols. To own a racehorse, or to participate in equestrian events, it it’s own culture and community filled with mostly wealthy people.

So what does all of this tell us about AI, automation, and jobs?

I believe that as jobs get automated away, employing people to do a thing, instead of robots, will become a status symbol. Humans are always competing with each other for status, and that won’t stop. Bookstores, watches, and horses didn’t go away, although they did change in different ways.

But, some things do go away forever. For example, no one washes clothes by hand now that washing machines were invented. How do you explain that?

My theory is that there are two kinds of jobs. There are jobs machines are better at, and will always be better at. And then there are jobs that really can’t be done much better, but could be done faster or more efficiently. by a machine. The latter category jobs will emerge as a status symbol if you have a human still do it. Those jobs will pay well because they now become “luxury services.”

If I think forward and see some new jobs emerging as a result of AI automation (data curator, data annotator, model trainer, AI designer, etc), and see some jobs becoming status symbols as mentioned above, and also factor in that hybrid human-AI partnerships will upskill some low skilled workers and enable them to keep working, I’m actually pretty bullish that at least for the next few decades, we don’t have a problem with automating jobs away. There will be economic upheaval, and some industries will be like the clothes washing industry and suffer pretty greatly, but, at the macro level I believe the story will actually be all right.

The Half Court Ventures Story: What I’ve Learned From 4 Years of Angel Investing

Backupify was acquired in December of 2014. In June of 2015 I made my first angel investment. Along the way, I started a fund with my friend Todd Earwood, and learned a lot about investing. This post chronicles that path.

Deciding To Angel Invest

The first thing I did post exit was talk to lots of smart people about angel investing. If you do this in Boston, where I am based, most people say this… “don’t do it, you will lose all your money.” Time and time again I heard a story where, someone made a bit of cash, then they spent a year studying angel investing, looking at deals, figuring out their strategy, and then pulled the trigger, doing 4 deals in their first year of investing. Then they waited and watched while 2 went out of business, 1 became the walking dead, and they got squashed out in a recap of the other one. Almost everyone tried to talk me out of it. “You’ve worked too hard for this money, don’t lose it” was a refrain I heard many times.

But. I was 38 years old. My theory was that I could lose all of it, and that would be entirely ok. I don’t have a lavish lifestyle, and I don’t really want one. (People are always surprised I drive a 2009 toyota tacoma stick shift). My strategy was to work and invest as aggressively as if I had never made any money at all, figuring that even if I lost everything, I would learn a lot, increasing my ability to hopefully make it back.

I was lucky that 3 of the best angel investors in the world were investors in Backupify: Jason Calacanis, Chris Sacca, and Dharmesh Shah. I talked with all 3 of them about angel investing, and they all gave me advice that was very different than what other people advised. All 3 said that to be successful, you have to do a lot of deals. 20 was the number they threw out. I was told that if the first 15 bombed, still do the last 5. Do 20 deals.

The second major piece of advice was to brand myself in a way that I would get good deal flow. One reason many angel investors fail is that they don’t see the best deal flow, so all they see is second and third tier investment opportunities. This piece of advice was part of what inspired me to start my AI newsletter. It has been, and continues to be, a very good source of deal flow.

The third piece of advice was to focus on size of possible outcome over likelihood of possible outcome. This kind of ties back into Nassim Taleb’s idea of playing games that favor convexity. If you do it right, your hit rate can be worse than average but your hits payout disproportionately, so you end up fine. More on this later.

Setting Up Half Court Ventures

The idea to setup a venture fund wasn’t mine. Some of my original angel investors in Backupify were based in Louisville, KY, where I am from, and they reached out to ask if I was going to angel invest. When I said yes, they said “you will see deals in Boston, NYC, and San Fran that we will never see here, please throw some of our money into them along with yours.” This was easier to do with a fund structure, so, we created Half Court Ventures. It turned out to be a $3M fund, but I was the largest LP. As the fund was getting setup, I realized I wasn’t going to do this full time and may need some help, so I asked my friend Todd Earwood if he would be a general partner in the fund as well. Todd and I have done a bunch of business things together over the years, are close personal friends, and have made and lost money together on various things, so there is a lot of mutual trust. Plus, our skill sets are very complementary. (You can see Todd’s presentation here on telling AI stories.)

We chose the name Half Court Ventures because we both love basketball, and Full Court Ventures was taken. But over time, we evolved the story to say that it references a half court shot, which is difficult to make, but still possible, and more likely with practice – similar to startups. When we tell you that story though, you will know now that it’s a myth. The original naming story is that we were just not that thoughtful about it.

Half Court Fund 1 made 48 investments in 3.5 years, in mostly AI companies. It did so well (on paper) that the LPs all wanted to come back and do a second fund, so we just did the first close on Half Court 2.

Deal Flow, Evaluation, And Investing

All in all, I’ve made 68 early stage investments as of the day I’m writing this post. 55 of them are through Half Court, via fund 1, fund 2, and our Angel List syndicate. The other 13 were personal, either in friends, companies founded by people who worked for me, or weird non-venture style stuff. Half Court stayed focused on AI.

Angel and Seed stage investing is nerve wracking. Companies are a rocket ship one year and flat the next. They are dying, then suddenly raise massive rounds. We’ve seen companies get large markups, then crash, and we’ve seen entrepreneurs pull magic out of a company on the verge of death. The only thing that is consistent is that whatever the entrepreneur tells you will happen, will most definitely not happen.

At these stages, there aren’t many metrics to use to judge the company, and even if they have some metrics, they are usually meaningless. I don’t do much competitive diligence, because I’m not sure it matters at this stage. Smart entrepreneurs will navigate those dynamics. Mostly I look at teams and markets. Is the market big? Will the team figure it out? If what they are telling me doesn’t work out (because it often won’t) are there tangential spaces for them to move?

We tend to evaluate deals based on what I’ve learned from reading Nassim Taleb’s work on convexity. I know all the things that can go wrong in an early stage company. I’m not trying to mitigate those risks. I’m trying to figure out that in the unlikely scenario that everything goes right, how big can this be?

At the angel/seed stage, pretty much every idea looks kinda brilliant and kinda dumb. I could craft a story for why it will succeed or fail. So, I try not to waste time figuring out what could go wrong. That said, I also tend to stick to things that I already know a bit about, which helps.

Our deals come from a bunch of places, but the best sources tend to be my newsletter, and existing portfolio CEOs. One of the things I’ve been focused on, and am very passionate about (and will write about more on this blog) is AI hardware. Half Court has already invested in Mythic, Rain, and Koniku. I think investing is a lot about finding the trend everyone else is missing, and AI hardware is one of those trends.

Lessons Learned

It’s really hard to know, 4 years in, how much we’ve been lucky and how much we’ve been good. Ask me in a decade and maybe I will have a better idea. At the moment though, Half Court 1 is doing really well, despite the fact that we made a ton of mistakes.

  • We were easily taken in by charismatic entrepreneurs who had no grit, and easily gave up when things got tough.
  • We took on (and continue to take on) syndicate risk, meaning we’ve done deals when the company has no lead. We’ve been the very first check into 9 deals (usually just $50K) and almost every deal we have done has been pre-revenue, many pre-product. The results have been mixed. Some entrepreneurs don’t close their rounds. But others we get into because we’ve already committed, and the lead shuts out new angels.
  • We’ve done deals that would have been good but the valuation was bad.
  • We didn’t reserve much capital for pro ratas, and we missed chances to invest more in some of our best deals.
  • We’ve been burned by SAFEs (still hate them) when entrepreneurs have a walking dead company and the investors have no way to even force a conversation about what’s next, or when someone doesn’t honor the original spirit of the SAFE. It’s just this no man’s land of legal rights. Convertible notes or equity make a deal waaaay more attractive to us.
  • Your handful of big winners really do drive returns more than anything. Everyone says this and my experience is the same.
  • Some entrepreneurs just make things work. I was the very first angel into, and it was entirely based on the gumption and grit of Alana and Ali. They have been through multiple ideas and business models to get to their current success. Finding these entrepreneurs is hard. Always invest in them when you can.
  • Sometimes I invest in someone I really like even when I hate their idea. I think of it as getting to know them so that even if this company fails, I’ll get a shot at their next one.

A side note about big exits – it seems like their really should be a different way. There are tons of good tech ideas that aren’t venture scale, and could be nice businesses sold for $25M. But when you do a $10M cap SAFE on a slide deck, there isn’t an opportunity for an investor to make much money on that $25M exit. I don’t understand why so many companies get funded on roughly the same terms despite being vastly different businesses. Why aren’t there more $1M on $3Ms? Or $2M on $2M pre? If you did a 2 on 2 and got to cash flow breakeven on that, sold for $25M in 5 years, the founders would each make $6M (assuming 2 co-founders), and investors could get a 6x. Everybody wins. No one is financing that model.

How Do You Have Time To Do This?

The biggest question I get is how I have time to do this. I run Talla full time, and I have two school aged kids, and I write posts like this. It’s actually not that hard because I don’t have many other hobbies. I watch zero television except for college basketball season, and I don’t even have a Netflix account. So instead of “netflix and chill” I do the “read decks and chill” on a friday night. And I try to involve my kids in it a bit. They’ve met a lot of the entrepreneurs I’ve invested in, been to dinners with some of them, and I hold a quarterly AI poker night at my house that 15-20 of them will usually attend. I let my kids hang out at those because I think entrepreneurs are good role models for them.

Also, I believe it helps me at Talla. We are in an early market, so seeing the insides of so many other AI companies gives me ideas, and makes me smarter about the market. I will say my investors are split on it. About half would tell you it’s made me a smarter CEO, and about half would say it sucks time away from Talla and is a distraction. Boston is more of a do-one-thing-only tech community so, my Boston investors tend to be a little more on the “this is a distraction” side than my Silicon Valley investors.

And finally, when you get a process down, and stick to areas you know, it can become more efficient than you think.

In Closing

I think early stage investing is incredibly fun and rewarding. I actually think growth stage investing would be too, but, I can’t really play at that level yet.

If you are an AI entrepreneur, I hope you will reach out and send me a deck. If you are an investor and want to collaborate on deals, please reach out as well.

If you are new to angel investing, and want to chat, I’m always happy to share my experiences.

Welcome To Coconut Headsets

I’ve been blogging since 2003, when I wrote the Businesspundit blog, which I sold in 2008. Then I moved to, but in 2014 moved to Medium, feeling like it provided better distribution for my thoughts. I don’t really like it anymore so I’m moving back here as my primary source of writings. I’m also the same Rob May that writes the Inside AI newsletter. I run Talla as CEO, and am a partner at Half Court Ventures, the fund I started with Todd Earwood. We are the most active AI seed investors on the East Coast.

If you want to know what Coconut Headsets means, read up on cargo cults. I chose the name because in startups and investing, there are many erroneous ideas out there that follow a similar vein. I try to break those mindsets. I hope you can find something on this blog you disagree with, that challenges your thinking. Otherwise, I haven’t done my job.