Resume

Experience

Senior Software Engineer Sep 2022 - present

Fanatics Ecommerce

Implemented a custom Apache arrow flight server in Golang to work as a sidecar for Informix to allow for faster data ingestions and updates utilizing external table capabilities. This allowed for at least a 32% faster data transfers when working with thousands of rows and an even higher throughput for larger updates.
Implemented backend client library for consuming pulsar events from our application backend. The client supports JWT authorization along AWS KMS based encryption to handle PIIs within the event message.
Designed and implemented automatic schema detection and decoding capabilities in the pulsar backend client to seamlessly decode messages by utilizing pulsar schema registry to get fetch the schema used to encode the message. This allowed for reduction of schema incompatibility issues on our event consumers.
Authored Pulsar Avro Code generator library for python, which takes in Avro schema and spits out pulsar compatible schema classes that can be installed as libraries for faster development.
Authored JSON-Schema based Avro schema validator with error correction suggestion capabilities that provides hints on how to correct the schema

SDE III Jan 2021 - Sep 2022

Fanatics Ecommerce

Authored custom python logging client library extending python logging to stream logs to internal Kafka logging pipeline, which are query able via GreyLog (elastic-search wrapper).
Implemented a custom vault client library to allow for our backend applications to easily connect with our internal vault instance with both EC2 and IAM auth.

Data Scientist Jan 2019 - Dec 2020

Toppr.com

Optimized similar questions service latency from ~3s to <300ms by improving OCR API inputs and elastic search queries even when dealing with questions with lots of text tokens.
Designed Search History pipeline using kinesis and lambda to push search data to S3 data lake in real time and while using dynamo to store paginated searches history across different namespaces using bucket pattern. This helped us maintain the API latency to sub 10 ms without compromising on analytics use cases
Improved latex/equation-based question support similar questions suggestions from 84% matching to 96% by creating custom mappings and adding decay functions to requisite queries.
Deployed content search pipelines that asynchronously update searchable content reliably across a wide range of dynamically generated personalized syllabi
Designed a zero maintenance serverless solution for solver app service using WhatsApp API backed by lambda and DynamoDB.
Deployed Search pipelines to AWS step functions to improve monitoring of the seeding flow
Designed a question recommendation evaluation interface to aid in identifying lapses in question recommendation quality
Implemented a brand name identification model inspired by Nilesh Dalvi, Marian Olteanu, Manish Raghavan, and Philip Bohannon. 2014. Deduplicating a places database. In Proceedings of the 23rd international conference on World wide web (WWW '14). ACM, New York, NY, USA, 409-418.
Designed a data pipeline to identify school name cluster using Expectation maximization to allow the marketing team to target user groups
Contributer to NLTK; provided an implementation of Meteor Score "Lavie, A., & Agarwal, A. (2007, June). METEOR An automatic metric for MT evaluation with high levels of correlation with human judgments. In Proceedings of the Second Workshop on Statistical Machine Translation (pp. 228-231). Association for Computational Linguistics."
Migrated older Solr cluster to containerized Solr cloud deployment which not only helped increase SLA from 76% to almost 99% but also made monitoring and maintenance a lot more easier.

Senior Data Scientist Apr 2018 - Dec 2018

Merilytics Inc

Built and maintained a containerized job orchestrator for extracting sentiment, keywords and topics for processing news articles and news feed emails as both online/batch jobs
Built maintained and improved a Containerized Realtime multi-threaded keyword extraction service Key Extraction using flask+ gunicorn deployed on docker swarm.
Implemented a containerized concept extraction service which used a combination of previously mentioned keyword extraction techniques and n-gram extraction techniques based on frequent n-grams, RAKE and TextRank to extract uni, bi and tri-grams and linked them to a topic by querying DBpedia to convert the keywords to a referenceable topic.
Built trend APIs using mongo db's aggregate queries to display current trends in supported industries by various time segments
Built a containerized document clustering service with automatic cluster label generation capabilities in python.
Built build and deployment pipelines for keyword extraction, concept extraction, document clustering and article mining services using bitbucket pipelines and docker-compose
Build extendable targeted article mining platform using Scrapy with spiders that could mine both article text and article HTML content from over 20+ sites with very high accuracy.
Built an email news feed mining application that could be used to mine and process email news feed in parallel to the article mining platform
Built a multi-level text classification model using a series of semi-supervised and supervised learning methods to enable working with very low quantities of tagged data
Speaker for data science at a company-wide knowledge management conference.
Engineered features for multiple financial datasets (TU datasets) huge dataset with >200M each using Spark (pySpark) on data bricks.

Data Scientist Jul 2017 - Apr 2018

Merilytics Inc

Built an end to end Conjoint analysis pipeline for a growing trip-based startup.
Built an end to end service to perform market basket analysis on sales data using eclat on R and using MS SQL Server for importing and exporting data automatically for a large Brazilian supermarket chain.
Built a topic extraction module based on gensim to quickly swap between LDA and LSI without affecting the format of the output.
Made multiple keyword extraction modules to extract keywords from news and social data. Integrated each of these modules into the product pipeline as a Dockerized HTTP Service deployable via docker-compose using Flask and Gunicorn. Few of the key implementations are as follows.

Implemented a keyword extraction module with multi-threaded training capability based on 'Parameswaran, Aditya, et al. “Towards the Web of Concepts. ”Proceedings of the VLDB Endowment, vol. 3, no. 1-2, 2010, pp. 566-577., doi:10.14778/1920841.1920914.' and extended it with ngram based NLP techniques to improve the quality and rank the keywords.
Built a keyword extraction module based on SG-Rank to extract industry-relevant keywords from news articles.
Built a Keyword extraction module for extracting keywords from scientific text using n-gram based techniques and textrank.

Built a custom data tagging interface for data tagging based on pybossa specifically tuned for tagging articles for NLP related tasks, post which the data could be exported to csv, excel from which other pipelines could pick the data up for training models. These pipelines reduced our training time significantly
Added Support for importing custom formats for the tagging platform mentioned above

Product Development Intern Jul 2016 - Dec 2016

Blue Yonder (formerly JDA Software)

Designed UI Screens for JDA's Master Data Management suite using Ext.JS Following MVC Pattern
Developed Code to display appropriate HTTP Status Codes on Spring Based Rest API of the Data Management Suit
Designed client side Connector back-end for the UI in JAVA
Engineered Unit Test Cases for the Master Data Management Suite.

Engineering Intern Jun 2015 - Jul 2015

Carborundum Universal Limited

Investigated different steps required to change the configuration of the Poggi machine and the time taken for performing each step along with highlighting the most time consuming steps
Provided suggestions on eliminating various redundant steps which facilitated the decrease of setup time by 23%
Collected the data on time required to perform each task and number of workers needed to perform each task and authored a comprehensive report on optimizing each task during the setup of the Poggi machine.
Investigated for leakage during the loading of the machine and provided suggestions to decrease spillage.

Education

Birla Institute of Technology and Sciences Pilani 2013 - 2017

Bachelor of Engineering (Hons.)

Joint Director of a Not for Profit organization - Scio Benevolent Foundation, was a member of spearhead team that made it possible to reach more the 16,000 needy students.
2nd place in Ground reality Business plan competition, BITS-Pilani Hyderabad
Winner HultPrizeat BITS-Pilani Dubai.

Sri Chaitanya Junior Kalasala 2011 - 2013

High School

The Hyderabad Public School, Ramanthapur 2001 - 2013

High School

1st in computer science during the academic year 2007-2008.
3rd in Science fair during the academic year 2009-2010.

Honors

Jul 2018

Grand Prize winner - 6th Edition LVPEI Engineering the eye hackathon

Successfully designed and built a cost-effective prototype ophthalmoscope to diagnose Retinopathy of Prematurity (ROP). ROP if not treated swiftly causes permanent blindness in infants within a matter of days. In India alone ~150K infants go undiagnosed and ~23M infants are at significant risk of ROP.

Feb 2018

1st Prize winner - Merilytics Hackathon

Made a Microsoft teams based chatbot to summarize an article within any link provided. Additionally, all the summarized articles could be queried with simple chat based search commands.

Mar 2016

2nd Place in Ground Reality

Dec 2015

Finalists in innovation contest by TBI at Bits Pilani Hyderabad

Was the finalist in innovation contest organised by Technology business incubator, Bits Pilani Hyderabad.

Dec 2015

Winner, hultprizeat Bits Pilani Dubai

Dec 2008

First in Computer Science

Was awarded merit certificate for scoring the highest in computer science during the academic year 2007-2008

Dec 2008

3rd in Science fair

3rd in Science fair during the academic year 2009-2010 for the project electronic watchdog to detect people using infrared light.

Skills

Python; R; Machine Learning; Data Analysis; Shell Scripting; Web Design; Chemical Engineering; Social Entrepreneurship; Android Development; Multithreading; Statistics; Regression Testing; Big Data; Web Applications; Linux; Microsoft Excel; Java; C; JavaScript; HTML; CSS; Photoshop; AutoCAD; Microsoft Word; PowerPoint; PHP; SQL; Microsoft Office; ANSYS; Matlab; MongoDB; Microsoft Azure; Amazon Web Services (AWS); keras; Pandas (Software); Web Scraping; Aspen Plus; Natural Language Processing; Regression Analysis; Containerization; Predictive Modeling; Business Insights; Go Lang

Certifications

Machine Learning by Andrew NG on Coursera (YS6ZMFEQLN23)