May 8

[Transformers] Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

T5 transformer.

May 5

[Alexa] NLU in Alexa

How does Alexa perform NLU

April 20

[Transformers] Have Your Text and Use It Too! End-to-End Neural Data-to-Text Generation with Semantic Fidelity

Converting structured data to natural language explanations using transformers

[Transformers] Answer sentence selection using local and global context in Transformer models

Tricks to find the sentence containing the answer to the question from a document by adding local and global context.

April 20

[Startups] Valuation in Four Lessons | Aswath Damodaran | Talks at Google

If you have been into the tech-startup world, you must have always wondered how are these companies valuated. This talk explains how startups are valued differently than other traditional companies, like a food company,

April 12

[MLOps] Machine Learning: The High-Interest Credit Card of Technical Debt

If you work in a small company, chances are you don’t follow a lot of best practices when building and deploying your model. Google, in this paper, talks about common technical debts companies incur when building ML projects.

[PyTorch] Automatic differentiation in PyTorch

I became curious about how PyTorch implements automatic differentiation of complex functions on scalars and matrices. Although the paper is titled Automatic differentiation in PyTorch, I learned more about it from the references in the end of the paper.

April 9

[NLP] Google’s BigBird transformer model - video

The video describes Google’s BigBird transformer model. If you know about BERT, you must be aware of the quadratic complexity of its attention layer, which makes training longer sequences time-consuming. BigBird tries to solve this problem by using a combination of window, global and random attention.

[Writing] How to write usefully by Paul Graham

An essay written by Y-Combinator’s founder Paul Graham describes his approach to writing essays. I have been meaning to make my technical writing more engaging and useful and this essay definitely helped. To summarize, your writing should have these components - usefulness, correctness, novel, and important. I’ve am over-simplifying it. You should definitely read the whole thing.

[Personal Finance] How to get rich without getting lucky - Naval Ravikant

If you have been following the startup ecosystem on Twitter, you must have come across one of his tweets. The name says what the podcast is about, or as he puts it “In 1,000 parallel universes, you want to be wealthy in 999 of them. You don’t want to be wealthy in the 50 of them where you got lucky. We want to factor luck out of it.“

April 8

[NLP] Explainable Prediction of Medical Codes With Knowledge Graphs

The paper describes a method to extract information from unstructured medical text like discharge summaries. In medicine, ICD10 codes are standardized entities used to describe a symptom. diagnosis, procedure, medicine, and examination. Extracting relevant ICD10 codes from unstructured texts has been a hard problem. The paper uses CNN (on text) + Knowledge Graphs to solve this very problem.

[Graph ML] Traversing Knowledge Graphs in Vector Space

A paper from 2015 that talks about how to traverse a knowledge graph using semantic vectors. A great read for anyone who has some idea about Knowledge Graphs but wants to know how machine learning is being used to identify relations between two nodes in the graph.

April 7

[NLP] Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks

The paper demonstrates how transformer language models like BERT, which are pre-trained on GBs of generic text can benefit from even further pre-training. Authors, in the paper, broadly devise two types of this extended pre-training - Domain Adapted Pre-Training (DAPT) and Task Adaptive Pre-Training (TAPT).

DAPT involves taking a large set of unlabelled data from a domain - medicine, sports, etc. TAPT involves taking a small subset of data, but specific to the downstream task. The idea is that task-specific data, although smaller, is more focused.