"The check grounding API returns an overall support score of 0 to 1, which indicates how much the answer candidate agrees with the given facts. The response also includes citations to the facts supporting each claim in the answer candidate.
Perfect grounding requires that every claim in the answer candidate must be supported by one or more of the given facts. In other words, the claim is wholly entailed by the facts. If the claim is only partially entailed, it is not considered grounded."
There's an example input and grounded output scores that shows how the model splits into claims, decides if the claim needs grounding, and the resulting entailment score for that claim in: https://cloud.google.com/generative-ai-app-builder/docs/chec...
Do you know by any chance how the `embedding_v1` vectors were generated? The data field description says "Machine-learned vector embedding based on document contents and metadata, where two documents that have similar technical content have a high dot product score of their embedding vectors."
Could this be word2vec, GloVe, or something else like that? Maybe produced from the tf-idf-transformed sum of the word tokens in the title+abstract of each patent?
We (I run Google Patents), generated them using Wsabie (https://research.google.com/pubs/archive/37180.pdf) trained on the set of words of the full text -> Cooperative Patent Classification codes. So summed word embeddings trained for a classification task, which works well on similarity too.
I run Google Patents (patents.google.com) - we've thought about including a way to search by expired patents (we have the expiration indicators), but in my opinion it gives a false sense of security. There's also http://freeip.mtu.edu/home/index.php, which searches over only expired patents.
The patent in question could have been improved upon, and that improvement can still be in force. Say someone patents a widget A + B, and later files a continuation A + B + C (https://en.wikipedia.org/wiki/Continuing_patent_application). The first patent could be expired, but while building your copy of A + B you might come to the same conclusion that the invention also needs to include C (which is still in force) to actually work.
Google Patents focuses on improving patent quality. There's still uncertainty if a granted patent is actually valid. If we can improve the prior art finding process for inventors and examiners, then fewer overly-broad patents will be granted, and it will be easier to tell if an invention actually infringes a patent.
Then we can start to think about making patent information more useful for part of the original purpose - as a transfer of knowledge to the public domain in exchange for a temporary exclusive right.
Thanks for chiming in, Ian! Google Patents is awesome :-)! You're right, there are a number of edge cases where the expiry date is incorrect - the USPTO even provides a patent expiry calculator [1] (as an Excel file, no less ;)...). For patents issued 20 years ago, the "issue date + 20 years" should hold true as a very basic rule of thumb (disregarding any improvements that may still be in force). I hope that nobody builds a business based on one of the displayed patents without first checking what other, newer patents may also affect said invention - the site is merely meant as a starting point to get inspired; the actual research should follow suit.
> Say someone patents a widget A + B, and later files a continuation A + B + C (https://en.wikipedia.org/wiki/Continuing_patent_application). The first patent could be expired, but while building your copy of A + B you might come to the same conclusion that the invention also needs to include C (which is still in force) to actually work.
This is a perfect example of why patents are harmful. The patent on C did not in any way help the person copying A + B, it was invented on its own. But now this new person can be punished simply for being equally as creative as the other guy. It's a huge disincentive on innovation. I understand the theory behind patents, but I have a really hard time believing anyone in history has ever actually used a patent to find out how a thing was made and then improved on it; or could not have done this equally as easy without patents. Even if you can point to one or two such examples, that still has to more than balance out the loss in innovation and increased legal costs created by uncertainty of IP legality.
This solar sail patent http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=H... was educational to me (at some other site which showed the figures too) -- but it's literally the only time that's ever happened in my multi-decade career. It's written in a more helpful style than usual, and undoubtedly never saw commercial use in its lifetime. I would not have even looked if it were a patent in my own field.
The majority of software companies don't hold any patents (aside of copyrights). As such, it doesn't seem necessary that patents are the only means to profit from "inventing."
But as you say, they have intellectual property protection. Isn't the copyright protection that software developers get over their software analogous in this sense to the protection that patent owners get over their inventions?
Unless you just meant to split hairs over copyright vs. patent, in which case I agree. I should have said something like "intellectual property protection" instead of "patents".
Patents and copyright work quite differently and patents aren't really appropriate for software. Patents allow you to be too granular in what you protect to make sense for software. I think a good comparison would be protecting combinations of two musical notes in music. This would dramatically impact creation of new songs because you are protecting something too trivial. The equivalent happened when software patents first were introduced.
I feel like you're attacking our specific implementation of patents (and especially software patents) in America. I'm talking about the idea of copy protection. Is it an incorrect assertion that in a world without any copy protection, people would be free to copy any invention they wanted? Thus, the original inventors' version of the product (which would be exactly the same as the copied version) would have no differentiating factors, and there would be no guarantee of profit from invention.
Yes, there are a lot of flaws in copy protection. There are a lot of instances where patent trolls and patents are detrimental to innovation.
But the main point that I was originally making was: Wouldn't a world without copy protection be a world where every invention gets literally copied, resulting in lesser gains for the actual inventors?
I think you may have misunderstood me: I did not mean to support software patents or our current implementation of software patents at all. Instead, I was countering the gp assertion that (patents are rare in the software world + software world doesn't have rampant issues with ripoffs) implies that patents are not necessary to profit off of inventing.
My counterargument was that the software world may not have many patents, but it does still have copyright, and thus their counterargument does not invalidate my overall point/question: "copy protection is necessary in order to not have a world full of ripoffs"
Yes, that's a very different point that I mostly agree with. In addition the secrecy that also gets caused by rampant copying also would be/was a massive issue and was the driving force behind the introduction of patents. Of course patenting "XOR" and protecting the stupid mouse forever are a totally different issue as you pointed out.
I don't think I believe you. That's the theory of course, but I haven't lived in a society without patents so I can only trust people who say that's how it works.
My mental heuristic says that software engineering seems like it would be only improved by patents not existing. I can't speak to hardware inventions, though.
It could be, however, if you limited patent lifetime to reasonable market turnover rates. Say, a year or two. It's enough to establish the brand, and after that, competition is healthy.
Patents were meant to be a carrot - a way to enrich society with inventions by hacking the market to incentivize it. Instead, companies now are given a free carrot factory, so they can (and do) show middle finger to society and profit off inventions while keeping them away from people for a very long time.
>Google Patents focuses on improving patent quality. //
Could you go in to that a bit. There's been fulltext, full image databases with proximity searching and bespoke classifications and such for a while, how has Google Patents improved on that. How is it better than Espacenet for example?
When I looked at patents a lot the biggest help would probably have been being able to get decent translations of Korean/Japanese/Chinese/etc. patents - I gather you were involved in an EPO project to do that. What other goodies have I missed?
>but while building your copy of A + B you might come to the same conclusion //
Worth noting is that this is fine in most jurisdictions as long as your not trying to commercialise the build.
I think they meant improving the quality of actual patents (by giving people tools to combat bogus patents), not necessarily improving on the patent browsing experience (though I'm sure they would argue that they try to do that too).
Great insight on continuing patent applications. I agree wholeheartedly with your assessment that overly broad patents, particularly with respect to technology, need serious review. It is way too easy to patent a concept instead of an invention.
Which basically says, I haven't actually invented anything useful, but if anyone else does they should have to pay me because I thought of something super generic first.
Why wouldn't the continuation (or CIP as you might have meant) expire at the same time as the original? Same priority date, so unless it's old enough to fall under the 17-years-from-issuance regime...
thanks for the great work and insightful comments! what are the top three things you would do if you could reform the patent system? and do you support/oppose abolishing patents for software (or any other kinds of products)?
I made a painless way to integrate native libraries into Java (Android NDK support too) with some automatic JNI generation [1]. It uses GWT's javascript comment syntax (edit: see [2] for examples).
I've been working on a tool (https://then.io) that solves all of these problems with the same effort that a todo list requires. Your tasks are scheduled between your calendar events and within the spans of time you set for them to be executed. They are ordered so that all of your due dates are met, with padding for mistakes.
Paradox of choice, lack of context, lack of commitment devices, and heterogeneous complexity are all solved by the nature of then.io. I would argue that priority doesn't matter, only due dates do. If you have too many tasks to finish before a due date, then you need to make the decision of which due date to push back.
The author found the optimal solution when trying to construct these expressions: store them in a trie. Because the generated regular expressions match only the inputs, this solution may find a more compact way to test if the inputs have lots of overlap.
It would be cool if the inputs weren't matched exactly, and frak could figure out a general pattern for your inputs (decimals, capitalized words, etc). That could help newcomers with a starting expression that matches their inputs.
This was the subject of my honours thesis[1] back in 2000. I was actually inferring DTDs from sample XML documents, but it's the exact problem as DTDs use regular expressions and I only had positive examples to go on.
Based on existing methods my solution started with the same trie and then generalised to a more flexible DFA by merging states. I used information theory (specifically Minimum Message Length) to turn it into an optimisation problem and tried a few different algorithms, in the addition of Ant Colony Optimisation to an existing algorithm produced the best results for my tests. (They were pretty limited, though.)
> It would be cool if the inputs weren't matched exactly, and frak could figure out a general pattern for your inputs (decimals, capitalized words, etc).
The problem is that there are an huge number of possible solutions for any given input. For example, you could always give a trivial solution: ".*"
For something like that to work, you'd need a large dictionary of common patterns, and then you'd want to compare against the dictionary to see if it matches a sequence of common patterns.
I can't imagine that sort of thing being too useful.
An alternative solution is to provide a list of matches and non-matches, and look for the shortest or simplest regex that separates the two.
(Finding the actual minimal regex, instead of just a reasonable guess, might be a computationally tough problem. I guess it's in coNP and might be in NP, too. An algorithm in P would be nice to find.)
Update: Finding any separating regex would be in P. A separating finite automaton is easy to find, and then you just convert that into a regex. Now, how do you find the minimal regex?
It depends on what the grandparent means with 'not exactly'. If it means 'within a certain edit distance', it is very doable. You can store your dictionary in an automaton and construct a Levenshtein[1] automaton in linear time. The intersection of the dictionary and Levenshtein automaton gives all words within the given edit distance. My library (Dictomaton) that I mention in another comment implements this. There is a different implementation in recent Lucene versions, that is used for finding documents with keywords that approximately match the query.
The problem is that this is unlikely to give what you want, since, for example, if my inputs are "123", "2315", "12451", ..., then what I'm looking for is probably going to be something along the lines of "\d+", and not the set of words that are within a small edit distance of a subset of integers.
It depends on the application. For instance, in search engines approximate matching is very useful. E.g. if people want to search for Rotterdam, they make typos like Roterdam or are foreigner and don't know that Rotterdam has a double t.
You can rank candidates afterwards. However, it's a good solution for finding words that are closed to a mistyped word.
I have successfully used such techniques in a spellchecker and various search engines.
Actually, frak doesn't generate patterns that require an exact match. I use Clojure's `re-matches` function (for exact matches) in the example and the tests to show and ensure the generated patterns work as expected. Clojure has another function called `re-find` which is a bit more relaxed. An expression like #"bar" will fail with the string "barn" using `re-matches` but succeed with re-find. I plan to include an option for generating regexes that require exact matches.
1. Work with congress to enact reforms to PATRIOT Act. He said the US is not allowed to listen to any phone calls without a warrant.
2. Work with congress to improve the public's confidence in FISA. Consider changes: a judge reviewing a request only hears one side of the story, but we could allow an outside party to defend civil liberties and privacy "in certain cases".
3. Be more transparent. Create a website to provide more transparency.
4. Form an outside group of intelligence experts to review surveillance technologies.
Asked about if Snowden is a Patriot: "No he is not. I called for a thorough review of surveillance programs before Snowden leaks. A thoughtful, fact-based debate. I signed an executive order that provided protections for whistleblowers in the intelligence community, so there were better avenues for leaking information."
It's hard to find something to say that isn't hopeless and nihilistic, so I'll just say this: it's disappointing to see someone who used to be a professor of constitutional law ignore the obvious dangers to free speech of a set of programs and secret laws that he himself oversees.
"Q: Can I run a server from my home?
A: Our Terms of Service prohibit running a server. However, use of applications such as multi-player gaming, video-conferencing, home security and others which may include server capabilities but are being used for legal and non-commercial purposes are acceptable and encouraged."
"The check grounding API returns an overall support score of 0 to 1, which indicates how much the answer candidate agrees with the given facts. The response also includes citations to the facts supporting each claim in the answer candidate.
Perfect grounding requires that every claim in the answer candidate must be supported by one or more of the given facts. In other words, the claim is wholly entailed by the facts. If the claim is only partially entailed, it is not considered grounded."
There's an example input and grounded output scores that shows how the model splits into claims, decides if the claim needs grounding, and the resulting entailment score for that claim in: https://cloud.google.com/generative-ai-app-builder/docs/chec...