Notion | Software Engineers (Full stack, Backend/Infra, Mobile; all experience levels) | San Francisco, CA | Onsite
Notion's goal is to create the general purpose work tool for a post-file, post-Microsoft Office world. Tools for knowledge workers haven't advanced much since the 90s—the state of the art is something like Google Docs, Quip, or Dropbox Paper or rigid SAAS apps—and we're working hard to figure out what a coherent technology platform for work looks like in the modern era.
We are a bunch of design-focused nerds in SF. The business is profitable, well-funded and financially strong, and growing very fast. We have a beautiful loft in the SF Mission district, with the best investors out there (notion.so/about).
For now, AI is working as a good deflection in front of the workforce. I think that will continue and get a lot better over time, but will likely still just be a combination of deflection in the front, as well as enablement for agents by giving some good summary resources that may make responding faster.
Neither of those removes the workforce at scale in my opinion.
Also, AI is real far away from replacing more advanced support teams doing "tier 2" and "tier 3" support which require a lot more work outside of the email/chat thread.
+1, very much agree with this. AI-driven chatbots are a piece of the puzzle for many of our users -- they cover commonly asked questions and can deflect a lot of questions. But it’s still just a portion of the work (even if a majority of volume). For example, a lot of “hard” support questions involve jumping between systems and following complicated (often nonexistent) procedures, even aside from models that understand intent.
Also to be fair to our team, we do have a bunch of problems that we no longer think of as AI, but are still super algorithm-intensive: forecasting, modeling of queues, and schedule optimization.
We’re really sorry that you ran into this. There are a couple things we can help you with:
- We give users the ability to disable the default rules and mark transactions as safe if you write in to support@stripe.com (so in the example you gave of a user clearing transactions with his or her bank, if Radar incorrectly blocked a subsequent payment because of high card velocity, you could mark it as safe and subsequent payments would be allowed).
- Radar surfaces the primary reason a transaction is believed to be high-risk, but that is never the only reason (so the primary reason might be that the IP is in country X, but that doesn’t mean there’s a blanket ban on X—just that that reason combined with everything else we saw across thousands of signals resulted in our giving the payment a high score). It didn’t quite make it in under the wire for today’s launch, but we’re working on making the explanations more detailed.
We believe that Radar 2.0 (and in particular Radar for Fraud Teams) should also be helpful here:
- With Radar for Fraud Teams, you can customize the threshold at which Radar blocks charges—so if false positives are very costly for your business (because you have large margins, e.g.), you can tune Radar to reflect that; you can also specify lists of trusted payment attributes to “allow” (if you have known good card numbers, e-mail addresses, IPs, etc.)
- Radar 2.0’s custom machine-learning models (for businesses that have enough data with Stripe) should adapt to the unique circumstances/patterns/trends of your business, and
- Radar 2.0’s ML overall has substantially improved performance, which you should see after you’ve upgraded.
> so the primary reason might be that the IP is in country X, but that doesn’t mean there’s a blanket ban on X—just that that reason combined with everything else we saw across thousands of signals resulted in our giving the payment a high score
It might not be that one reason, but could it be something like the following two? Because that would be practically equivalent in terms of unacceptability:
1. Customer IP is in a high risk location, and
2a. Vendor rarely does business successfully with people in that location, OR
2b. Vendors around this region don't usually do business successfully with customers in that region.
("Does business successfully" here was meant to both encompass lack of transactions as well as lots of unsuccessful transactions; I'm asking about both possibilities.)
A little more color: Stripe’s incentives are aligned with those of our users in that we want to let through as many legitimate customers as possible. We keep a very close eye on false positives.
Our machine learning models examine thousands of attributes for each payment and make predictions based on how frequently the observed attributes were associated with fraud in the past. The comparison here is never just on one or two or three attributes (as in your example), and no logic is hard-coded.
For any fraud detection scheme (or any binary classification scheme, really) there’s a tradeoff between false positives and false negatives. The Radar 2.0 updates—particularly Radar for Fraud Teams—will help here in a few ways:
- With Radar for Fraud Teams, you can customize the threshold at which Radar blocks charges—so if false positives are very costly for your business (because you have large margins, e.g.), you can tune Radar to reflect that,
- Radar 2.0’s custom machine-learning models (for businesses that have enough data with Stripe) should adapt to the unique circumstances/patterns/trends of your business, and
- Radar 2.0’s ML overall has substantially improved performance, which you should see after you’ve upgraded.
Engineering manager for Stripe Radar here. Today’s update has been almost a year in the making and we’re excited to help Stripe businesses fight fraud more effectively. Here's more on what's new: https://stripe.com/blog/radar-2018
I (and the entire Radar team) are on hand to answer any questions you may have!
One of the issues that I faced during my short stint building ML models for fraud detection in debit card transactions was dealing with class imbalance. I was not completely convinced that over sampling techniques or under sampling techniques would work. My initial experiments just resulted in more false positives. Just curious if you guys faced similar problems.
The other point I bring about is rather rhetorical - There are no open standards, model baselines and datasets in the Fraud domain. Compare building a model for fraud detection to building a model for image recognition or object detection There is a standard baseline, standard datasets and your model competes against that baseline. Because of the open nature of image recognition, the models have improved astronomically. I feel that a lack of such openness is fraud is holding back on innovation. I could be wrong in this assessment so please correct me if so.
I agree that the lack of standards and baselines in the fraud detection space isn't ideal. One example: some fraud products will build models using human labels as the target to be predicted. Radar, on the other hand, tries to predict whether a charge actually turns out to be fraudulent (we use dispute/chargeback data we get directly from card issuers/networks). These are in fact different problems and the fact that the industry generally doesn't have a consistent target makes discourse and comparisons more muddled.
(And on class imbalance: we spent quite a bit of time experimenting/analyzing how to deal with it—we found that sampling rate has a marginal impact on performance but not a huge one.)
The only problem I ran into with the old Radar was a situation where a card was declined, and the customer contacted his bank to clear it up. The bank said they had no idea, they didn't decline the charge. When I followed up with Stripe, turns out Stripe declined the charge, and it never got to the bank.
Is there a way to tell when this happens in the Dashboard? There wasn't at the time, but I'm hoping this is maybe now visible somehow? It's obviously helpful to know when trying to help a customer resolve the situation.
PM on Radar here. There is! If you see a payment blocked for high risk in the Stripe Dashboard, it means that Radar blocked it before the card was charged. That is, the customer’s bank would have no record of the charge. In addition to the risk evaluation, Radar also provides the primary reason a transaction is believed to be high-risk (for example, the card has been linked to an unusually large number of card payments in the Stripe network over the past 24 hours).
A problem we've run into with Radar is that it only kicks in when you attempt to create a charge, and not when you attach a card to a customer.
This means that if your business model involves "try before you buy" or usage-based billing, you'd better be sure to make an initial charge, otherwise the customer might incur costs before Radar decides to block the charges.
Even if you do require an initial charge, if you allow customers to change their credit card between recurring charges, the new card could be extra risky and "fly under the Radar" until the first charge attempt.
Are there any plans to offer fraud risk and blocking when attaching a card to a customer, or will still be limited to just blocking charges? With Stripe's new emphasis on recurring billing, it seems like this would be important.
We currently see Radar as a liability for us. It might block the occasional fraud and avoid a chargeback, but it also allows customers to incur costs with dodgy cards before we know they're dodgy, and then blocks charges outright before we know.
My perspective on this is colored by selling SaaS.
In software sold on a free-trial model, you assume most trials don’t convert (overwhelmingly due to declining to pay but with a bit of fraud) and then the cost to provision the service (COGS) is, effectively, a marketing expense. COGS in SaaS are typically negligible to low; this is why the industry is OK with providing services on, basically, a digital handshake. If you want to allow users to try out high-COGS services (or highly-abused services) prior to verifying capacity/willingness to pay, you’d need some way to credit score potential customers outside the context of a particular payment.
To date, we’ve generally focused the bulk of our ML efforts on things which apply to the majority of our users, but as we get better at customizing these technologies to specific industries at scale and even on a per-account basis, we could certainly imagine applying them in contexts that are more relevant in your model. I’d love to hear more detail about your use case; feel free to email me (my HN username at stripe.com). If we get closer to shipping something that is probably interesting, we’d be happy to give you a heads up.
Most of our ML stack has been developed internally given the unique constraints we have for Radar. Among other things, we need to be able to
- compute a huge number of features, many of which are quite complex (involving data collected from throughout the payment process), in real-time: e.g. how many distinct IP addresses have we seen this card from over its entire history on Stripe, how many distinct cards have we seen from the IP address over its history, and do payments from this card usually come from this IP address?
- train custom models for all Stripe users who have enough data to make this feasible, necessitating the ability to train large numbers of models in parallel,
- provide human-readable explanations as to why we think a payment has the score that it does (which involves building simpler “explanation models”—which are themselves machine learning models—on top of the core fraud models),
- surface model performance and history in the Radar dashboard,
- allow users to customize the risk score thresholds at which we action payments in Radar for Fraud teams,
- and so forth.
We found that getting everything exactly right on the data-ML-product interactions necessitated our building most of the stack ourselves.
That said, we do use a number of open source tools—we use TensorFlow and pytorch for our deep learning work, xgboost for training boosted trees, and Scalding and Hadoop for our core data processing, among others.
Broadly speaking, what approach do you use to "build simpler 'explanation models'" from the more complicated "core fraud models"? Do you learn the models separately over the training data, or does the more complicated model somehow influence the training of the simpler model?
Why you so stubborn on IP address? Its not a holy grail! I use proxy for some years now and many times I want to buy something on the frontstore “powered by Stripe” and my card is declined due to “unknow error”. Moment I turn off my vpn, transaction goes thru. I can exect this to be a huge problem for Stripe or anyone deciding on fraud attempt greatly basing it on IP. These days if i find a cool product and see “powered by Stripe” I simply end up on Amazon purchasing same product for similar price. Worst part — your clients don’t even know!
I’m sorry that you had this experience. We vehemently agree that any one signal (such as IP address or use of a proxy) is a pretty poor predictor of fraud in isolation. We are trying to move the industry towards holistic evaluation rather than inflexible blacklists; not everyone behind a TOR exit node is a fraudster, for example.
While we can’t fix the previous experience you had, we’ve rebuilt almost every component of our fraud detection stack over the past year. We’ve added hundreds of new signals to improve accuracy, each payment is now scored using thousands of signals, and we retrain models every day.
We hope these improvements will help. We want our customers to be able to provide you services; that’s what keeps the lights on here. We’d be happy to look into what happened if you have specific websites in mind—feel free to shoot me a note at mlm@stripe.com.
The rough idea is that you look at all the decisions made by the fraud model (sample 1 is fraud, sample 2 is not fraud) and the world of possible "predicates" ("feature 1 > x1", "feature 1 > x2", ..., "feature 10000 > z1," etc.) and try to find a collection of explanations (which are conjunctions of these predicates) that have high precision and recall over the fraud model's predictions. For example, if "feature X > X0 and feature Y < Y0" is true for 20% of all payments the fraud model thinks are fraudulent, and 95% of all payments matching those conditions are predicted by the fraud model to be fraud, that's a good "explanation" in terms of its recall and precision.
It's a little tough to talk about this in an HN comment but please feel free to shoot me an e-mail (mlm@stripe.com) if you'd like to talk more.
We’ve been working on Radar, and releasing updates and improvements, continuously since we first launched. What we’re announcing today is (1) a completely revamped machine learning system (which we couldn’t release in pieces—we needed to finish every layer of the stack before we could launch it, though we’ve been running it in beta for a percentage of users since late last year) and (2) a new package of features specifically designed for teams working on fraud prevention.
Engineering manager for Stripe Radar here. Today’s update has been almost a year in the making and we’re excited to help Stripe businesses fight fraud more effectively. Here's more on what's new: https://stripe.com/blog/radar-2018
I (and the entire Radar team) are on hand to answer any questions you may have!
(I work at Stripe) We've thought about this a little, and while we'd be excited to make Radar available more broadly (i.e., even to businesses not using Stripe for payments), for now we're focusing our efforts on improving the product.
For credit card processing, automated fraud prevention does in fact come standard on all Stripe accounts and takes into account a great number of signals including some that you brought up below, like whether the location of the IP address used matches the location of the billing address.
For ACH, we've built even more stringent verification--as well as any other signals we might take into account, purchasers have to demonstrate, through microdeposits or Plaid, that they have direct access to the bank account in question.
It's impossible to prevent all fraud, but we take it very seriously and are investing heavily in continually improving and expanding what we do there. It sounds like you may have had some specific bad experiences, and we'd love to hear your feedback and suggestions for improvement. Please do reach out--I'm mlm@stripe.com.
Your description does not match Stripe's documentation. Specifically, there's no mention of matching geo-IP to billing address.
The problem I see with Stripe's fraud prevention is that it appears to be "all or none", with no visibility or control on our end.
The only control knob appears to be whether to allow Stripe to automatically decline things it thinks are fraud. It doesn't appear to provide any detail on why it thought something was fraud.
So it seems to leave two choices:
- Allow stripe to decline things, and try to address false positives manually. There is no "go ahead and charge this button", by the way...you would have to contact the customer.
- Set stripe not to decline things, but lose even the limited visibility to "stripe would have declined this".
The real problem, in my mind, is that fraud risk for both the banks and providers like Stripe is really small. You risk very little.
The vast majority of the cost goes back the individual merchants/sellers, who have the least visibility into what the risk of an individual transaction is.
Thus, there's very little incentive for the banks, or stripe, to provide decent tools. A shame, because your ability to see the bigger pictures means your tools would always be better than anything we can use.
We've been thinking about (and working on) a lot of these issues--we're definitely aware that users would like (totally reasonably!) more visibility and control. If you have any more feedback, I'd love to hear it!
Appreciate the response. I suppose the most simple change would be visibility to "would have declined" if the "automatically decline detected fraud" buttons are set to off.
That would allow the flexibility for merchants to make their own decisions without the hassle of manually dealing with false positives. The trouble with the false positives is that you have to talk the customer into entering all their data again, without having anything specific to tell them, like "Er, you were declined, but I have no idea why...can you try again?".
Adyen does a good job providing a developer-centric platform with robust controls around fraud rules and other elements. I'd encourage you to benchmark them. You and your colleagues can also drop me an email anytime to discuss: ben.brown@firstannapolis.com.
(I work at Stripe) Buyers won't have to contact Stripe directly, just as they would not contact a third-party fraud detection service directly. They'll contact businesses, who can use the dashboard or API to mark the charge as safe and retry it without Stripe intervening. In our experience, card networks don't catch much fraud, and not all businesses have the time or resources to integrate a third-party solution—we don’t want them to be unprotected.
And, to be clear, the fact that Stripe is doing fraud protection isn't new—we've always blocked some fraudulent payments, as does every other major payment company. What we're launching is a much better system and, especially, one that businesses using Stripe can train so that there are fewer false positives over time.
Notion's goal is to create the general purpose work tool for a post-file, post-Microsoft Office world. Tools for knowledge workers haven't advanced much since the 90s—the state of the art is something like Google Docs, Quip, or Dropbox Paper or rigid SAAS apps—and we're working hard to figure out what a coherent technology platform for work looks like in the modern era.
This is our product: https://notion.so
We are looking for people to fill the following roles: https://www.notion.so/jobs
You can read more about about the company here: https://www.nytimes.com/2020/04/01/technology/notion-startup... https://www.notion.so/about
We are a bunch of design-focused nerds in SF. The business is profitable, well-funded and financially strong, and growing very fast. We have a beautiful loft in the SF Mission district, with the best investors out there (notion.so/about).