More

codeviking · on May 6, 2022

I haven't jumped on the GraphQL train yet, largely for a lot of the reasons the original author calls out. I see the benefits, but they don't outweigh the costs of converting our existing API surface area.

Like most of the tools we choose to use (or not use) there are trade-offs. The original tweet and post fail to recognize why GraphQL might make sense, even with its caveats. GraphQL makes the API more flexible for the front-end to consume. This reduces the number of requests a UI might need to make in order to render something, which makes clients (particularly mobile ones) faster. It also means a team of specialists working on the UI can probably add or adjust features faster, as the backend is more dynamic.

So if you're serving a certain audience (lots of clients where network requests are expensive) or have a large, specialized front-end team that's distinctly separated from the team that's responsible for the API, then GraphQL might be worth the trade offs. Sure, it'll come with some downsides, but all things do -- it's our job to be careful and deliberate about the tools we choose to use.

codeviking · on Feb 2, 2022

AI2 | Full Time, Seattle (REMOTE or ONSITE) | Engineering Managers and Software Engineers | https://allenai.org/careers#current-openings

AI2 is a non-profit research institute working to apply AI research and engineering efforts towards the common good. Part of this, of course, involves writing a lot of code.

You might write code for scheduling machine learning experiments across both on-premise and cloud hardware, or work on a platform we have for handling inference at runtime. Or you might contribute to Semantic Scholar, an AI powered, open academic search index or projects like EarthRanger and Skylight, that use technology to combat illegal poaching and fishing activities.

We're a small, open institution with a lot of really smart, motivated people. There's no shortage of fun problems to work on, and folks are given a ton of autonomy to drive and shape their projects.

Take a look at our job offerings, and send me a note if you have any questions:

https://allenai.org/careers#current-openings

sams [at] allenai.org

codeviking · on Sept 16, 2021

Yup, this is a known limitation:

> What are the limitations? There are several known limitations. Tables are currently extracted from PDFs as images, which are not accessible. Mathematical content is either extracted with low fidelity or not being extracted at all from PDFs. Processing of LaTeX source and PubMed Central XML may lack some of the features implemented for PDF processing. We are working to improve these components, but please let us know if you would like some of these features prioritized over others.

But we intend to fix this!

codeviking · on Sept 16, 2021

I agree!

Maybe we'll work on vi bindings next...

codeviking · on Sept 15, 2021

Yup, we've tried a lot of different tools in combination. All of them have their own trade-offs and extraction errors.

This system uses GROBID and some extraction techniques of our own. We're working on a GROBID replacement too, which should help us make things better.

codeviking · on Sept 15, 2021

Thanks!

There's a lot of amazing people here, doing really great work. It's a really inspiring place to be. I feel really lucky to work with such great people on interesting, important problems.

Also, I should mention...we're hiring!

https://allenai.org/careers#current-openings

codeviking · on Sept 15, 2021

Yay, glad to hear it! If you end up viewing one of these on your Kindle, let us know how well (or not) things work.

We're not sure if it's something that we can distribute as OSS just yet. It relies on a few internal libraries that would also need be publicly released, so it's not as simple as adjusting a single repository's visibility.

codeviking · on Sept 15, 2021

> all of the math and code parts were broken.

Yup, this is a known issue that we're working towards fixing.

> But clearly it is a nice idea and I can't wait that such tools work better!

Glad to hear it!

codeviking · on Sept 15, 2021

Thanks for the feedback. There's two hard problems n' all that... :)

codeviking · on Sept 15, 2021

> One comment is that the slowest page to load was the Gallery [0] as it loads an ungodly amount of PNG files from what appears to be a single IP (a GCP Compute instance?)

Yup. There's no CDN or anything like that right now. We kept things simple to get this out the door. But we definitely intend to make improvements like this as we improve the tool.

The more adoption we see, the more it motivates these types of fixes!

> P.S. Also, the paper linked below [1] seems to have a few conversion problems -- I see "EQUATION (1): Not extracted; please refer to original document", and also some (formula? Greek?) characters that seem out of place after the words "and the next token is generated by sampling"

Thanks for the catch. As you noted there's still a fair number of extraction errors for us to correct!

mintplant · on Sept 15, 2021

Another sample paper that caused some trouble with figure extraction: https://www.cs.utexas.edu/~hovav/dist/vera.pdf

Very cool project, looking forward to seeing how it develops!

codeviking · on Sept 15, 2021

Thanks, I'll pass this example along!