Announcing Excalibur, a Web Interface to Extract Tabular Data from PDFs

Last week, Camelot trended at #1 on Hacker News, Github and #5 on Product Hunt. Thank you for the love! There's still a lot to do to make it more awesome. You can follow the roadmap on its Github wiki. You can also check out my previous blog post on …

Read

There are comments.

Announcing Camelot, a Python Library to Extract Tabular Data from PDFs

I originally wrote this post for the SocialCops engineering blog, and then published it on Hacker Noon.


The PDF (Portable Document Format) was born out of The Camelot Project to create “a universal way to communicate documents across a wide variety of machine configurations, operating systems and communication networks”. Basically …

Read

There are comments.

Airflow, Meta Data Engineering, and a Data Platform for the World’s Largest Democracy

I originally wrote this post for the SocialCops engineering blog, and then published it on Hacker Noon.


In our last post on Apache Airflow, we mentioned how it has taken the data engineering ecosystem by storm. We also talked about how we’ve been using it to move data across …

Read

There are comments.

How to Create a Workflow in Apache Airflow to Track Disease Outbreaks in India

I originally wrote this post for the SocialCops engineering blog, and then published it on Hacker Noon.


What is the first thing that comes to your mind upon hearing the word ‘Airflow’? Data engineering, right? For good reason, I suppose. You are likely to find Airflow mentioned in every other …

Read

There are comments.

Starry Night at Inchhapuri

I'm a Star Trek fan. Space has always fascinated me. So when I saw a friend reading this, I just had to ask him to lend it to me. Nice book BTW.

And after watching Cosmos: A Spacetime Odyssey, I've been looking for ways to dive into astronomy. I started …

Read

There are comments.