I haven't jumped on the GraphQL train yet, largely for a lot of the reasons the original author calls out. I see the benefits, but they don't outweigh the costs of converting our existing API surface area.
Like most of the tools we choose to use (or not use) there are trade-offs. The original tweet and post fail to recognize why GraphQL might make sense, even with its caveats. GraphQL makes the API more flexible for the front-end to consume. This reduces the number of requests a UI might need to make in order to render something, which makes clients (particularly mobile ones) faster. It also means a team of specialists working on the UI can probably add or adjust features faster, as the backend is more dynamic.
So if you're serving a certain audience (lots of clients where network requests are expensive) or have a large, specialized front-end team that's distinctly separated from the team that's responsible for the API, then GraphQL might be worth the trade offs. Sure, it'll come with some downsides, but all things do -- it's our job to be careful and deliberate about the tools we choose to use.
AI2 is a non-profit research institute working to apply AI research and engineering efforts towards the common good. Part of this, of course, involves writing a lot of code.
You might write code for scheduling machine learning experiments across both on-premise and cloud hardware, or work on a platform we have for handling inference at runtime. Or you might contribute to Semantic Scholar, an AI powered, open academic search index or projects like EarthRanger and Skylight, that use technology to combat illegal poaching and fishing activities.
We're a small, open institution with a lot of really smart, motivated people. There's no shortage of fun problems to work on, and folks are given a ton of autonomy to drive and shape their projects.
Take a look at our job offerings, and send me a note if you have any questions:
> What are the limitations?
There are several known limitations. Tables are currently extracted from PDFs as images, which are not accessible. Mathematical content is either extracted with low fidelity or not being extracted at all from PDFs. Processing of LaTeX source and PubMed Central XML may lack some of the features implemented for PDF processing. We are working to improve these components, but please let us know if you would like some of these features prioritized over others.
There's a lot of amazing people here, doing really great work. It's a really inspiring place to be. I feel really lucky to work with such great people on interesting, important problems.
Yay, glad to hear it! If you end up viewing one of these on your Kindle, let us know how well (or not) things work.
We're not sure if it's something that we can distribute as OSS just yet. It relies on a few internal libraries that would also need be publicly released, so it's not as simple as adjusting a single repository's visibility.
> One comment is that the slowest page to load was the Gallery [0] as it loads an ungodly amount of PNG files from what appears to be a single IP (a GCP Compute instance?)
Yup. There's no CDN or anything like that right now. We kept things simple to get this out the door. But we definitely intend to make improvements like this as we improve the tool.
The more adoption we see, the more it motivates these types of fixes!
> P.S. Also, the paper linked below [1] seems to have a few conversion problems -- I see "EQUATION (1): Not extracted; please refer to original document", and also some (formula? Greek?) characters that seem out of place after the words "and the next token is generated by sampling"
Thanks for the catch. As you noted there's still a fair number of extraction errors for us to correct!
Like most of the tools we choose to use (or not use) there are trade-offs. The original tweet and post fail to recognize why GraphQL might make sense, even with its caveats. GraphQL makes the API more flexible for the front-end to consume. This reduces the number of requests a UI might need to make in order to render something, which makes clients (particularly mobile ones) faster. It also means a team of specialists working on the UI can probably add or adjust features faster, as the backend is more dynamic.
So if you're serving a certain audience (lots of clients where network requests are expensive) or have a large, specialized front-end team that's distinctly separated from the team that's responsible for the API, then GraphQL might be worth the trade offs. Sure, it'll come with some downsides, but all things do -- it's our job to be careful and deliberate about the tools we choose to use.