Programmer who knows ML, I spend my free time building stuff or building bots to do my stuff.
SDE IIIJan 2021 -
Created an automatic DAG generation program for better explaining the selection logic for stock replenishment.
Implemented Python Pulsar Client library from scratch with JWT capability to get authorization for consuming or producing to a topic. Additionally added KMS based encryption and decryption support to the client to handle PII data during transit.
Designed and implemented automatic schema detection and decoding capabilities to pulsar client library to allow for a more seamless decoding of messages by utilizing pulsar schema registry to get fetch the schema used to encode the message. This allowed for reduction of schema incompatibility issues on consumers.
Implemented a custom vault client for python to allow for connecting easier connection to internal vault instance with both EC2 and IAM auth.
Designed Consumer Manager application to allow for easier management of pulsar consumers. The application abstracts out scaling dead letter queuing and logging, increasing the ease of writing consumers.
Authored Pulsar Avro Code generator library for python, which takes in Avro schema and spits out pulsar compatible schema classes that can be used for development.
Authored JSON-Schema based Avro schema validator with error correction suggestion capabilities that provides hints on how to correct the schema
Authored custom python logging client library extending python logging to stream logs to internal Kafka logging pipeline, which are query able via GreyLog (elastic-search wrapper)
Implemented a custom Apache arrow flight server in Golang to work as a sidecar for Informix to allow for faster data ingestions and updates utilizing external table capabilities. This allowed for 32% faster data throughput when updating large number of rows.
Data ScientistJan 2019 -
Optimized similar questions service latency from ~3s to <300ms by improving OCR API inputs and elastic search queries even when dealing with questions with lots of text tokens.
Designed Search History pipeline using kinesis and lambda to push search data to S3 data lake in real time and while using dynamo to store paginated searches history across different namespaces using bucket pattern. This helped us maintain the API latency to sub 10 ms without compromising on analytics use cases
Improved latex/equation-based question support similar questions suggestions from 84% matching to 96% by creating custom mappings and adding decay functions to requisite queries.
Deployed content search pipelines that asynchronously update searchable content reliably across a wide range of dynamically generated personalized syllabi
Designed a zero maintenance serverless solution for solver app service using WhatsApp API backed by lambda and DynamoDB.
Deployed Search pipelines to AWS step functions to improve monitoring of the seeding flow
Designed a question recommendation evaluation interface to aid in identifying lapses in question recommendation quality
Implemented a brand name identification model inspired by Nilesh Dalvi, Marian Olteanu, Manish Raghavan, and Philip Bohannon. 2014. Deduplicating a places database. In Proceedings of the 23rd international conference on World wide web (WWW '14). ACM, New York, NY, USA, 409-418.
Designed a data pipeline to identify school name cluster using Expectation maximization to allow the marketing team to target user groups
Contributer to NLTK; provided an implementation of Meteor Score "Lavie, A., & Agarwal, A. (2007, June). METEOR An automatic metric for MT evaluation with high levels of correlation with human judgments. In Proceedings of the Second Workshop on Statistical Machine Translation (pp. 228-231). Association for Computational Linguistics."
Migrated older Solr cluster to containerized Solr cloud deployment which not only helped increase SLA from 76% to almost 99% but also made monitoring and maintenance a lot more easier.
Senior Data ScientistApr 2018 -
Built and maintained a containerized job orchestrator for extracting sentiment, keywords and topics for processing news articles and news feed emails as both online/batch jobs
Built maintained and improved a Containerized Realtime multi-threaded keyword extraction service Key Extraction using flask+ gunicorn deployed on docker swarm.
Implemented a containerized concept extraction service which used a combination of previously mentioned keyword extraction techniques and n-gram extraction techniques based on frequent n-grams, RAKE and TextRank to extract uni, bi and tri-grams and linked them to a topic by querying DBpedia to convert the keywords to a referenceable topic.
Built trend APIs using mongo db's aggregate queries to display current trends in supported industries by various time segments
Built a containerized document clustering service with automatic cluster label generation capabilities in python.
Built build and deployment pipelines for keyword extraction, concept extraction, document clustering and article mining services using bitbucket pipelines and docker-compose
Build extendable targeted article mining platform using Scrapy with spiders that could mine both article text and article HTML content from over 20+ sites with very high accuracy.
Built an email news feed mining application that could be used to mine and process email news feed in parallel to the article mining platform
Built a multi-level text classification model using a series of semi-supervised and supervised learning methods to enable working with very low quantities of tagged data
Speaker for data science at a company-wide knowledge management conference.
Engineered features for multiple financial datasets (TU datasets) huge dataset with >200M each using Spark (pySpark) on data bricks.
Data ScientistJul 2017 -
Built an end to end Conjoint analysis pipeline for a growing trip-based startup.
Built an end to end service to perform market basket analysis on sales data using eclat on R and using MS SQL Server for importing and exporting data automatically for a large Brazilian supermarket chain.
Built a topic extraction module based on gensim to quickly swap between LDA and LSI without affecting the format of the output.
Made multiple keyword extraction modules to extract keywords from news and social data. Integrated each of these modules into the product pipeline as a Dockerized HTTP Service deployable via docker-compose using Flask and Gunicorn. Few of the key implementations are as follows.
Implemented a keyword extraction module with multi-threaded training capability based on 'Parameswaran, Aditya, et al. “Towards the Web of Concepts. ”Proceedings of the VLDB Endowment, vol. 3, no. 1-2, 2010, pp. 566-577., doi:10.14778/1920841.1920914.' and extended it with ngram based NLP techniques to improve the quality and rank the keywords.
Built a keyword extraction module based on SG-Rank to extract industry-relevant keywords from news articles.
Built a Keyword extraction module for extracting keywords from scientific text using n-gram based techniques and textrank.
Built a custom data tagging interface for data tagging based on pybossa specifically tuned for tagging articles for NLP related tasks, post which the data could be exported to csv, excel from which other pipelines could pick the data up for training models. These pipelines reduced our training time significantly
Added Support for importing custom formats for the tagging platform mentioned above
Product Development InternJul 2016 -
Blue Yonder (formerly JDA Software)
Designed UI Screens for JDA's Master Data Management suite using Ext.JS Following MVC Pattern
Developed Code to display appropriate HTTP Status Codes on Spring Based Rest API of the Data Management Suit
Designed client side Connector back-end for the UI in JAVA
Engineered Unit Test Cases for the Master Data Management Suite.
Engineering InternJun 2015 -
Carborundum Universal Limited
Investigated different steps required to change the configuration of the Poggi machine and the time taken for performing each step along with highlighting the most time consuming steps
Provided suggestions on eliminating various redundant steps which facilitated the decrease of setup time by 23%
Collected the data on time required to perform each task and number of workers needed to perform each task and authored a comprehensive report on optimizing each task during the setup of the Poggi machine.
Investigated for leakage during the loading of the machine and provided suggestions to decrease spillage.
Birla Institute of Technology and Sciences Pilani2013 -
Bachelor of Engineering (Hons.)
Joint Director of a Not for Profit organization - Scio Benevolent Foundation, was a member of spearhead team that made it possible to reach more the 16,000 needy students.
2nd place in Ground reality Business plan competition, BITS-Pilani Hyderabad
Winner HultPrizeat BITS-Pilani Dubai.
Sri Chaitanya Junior Kalasala2011 -
The Hyderabad Public School, Ramanthapur2001 -
1st in computer science during the academic year 2007-2008.
3rd in Science fair during the academic year 2009-2010.
Grand Prize winner - 6th Edition LVPEI Engineering the eye hackathon
Successfully designed and built a cost-effective prototype ophthalmoscope to diagnose Retinopathy of Prematurity (ROP). ROP if not treated swiftly causes permanent blindness in infants within a matter of days. In India alone ~150K infants go undiagnosed and ~23M infants are at significant risk of ROP.
1st Prize winner - Merilytics Hackathon
Made a Microsoft teams based chatbot to summarize an article within any link provided. Additionally, all the summarized articles could be queried with simple chat based search commands.
2nd Place in Ground Reality
Finalists in innovation contest by TBI at Bits Pilani Hyderabad
Was the finalist in innovation contest organised by Technology business incubator, Bits Pilani Hyderabad.
Winner, hultprizeat Bits Pilani Dubai
First in Computer Science
Was awarded merit certificate for scoring the highest in computer science during the academic year 2007-2008
3rd in Science fair
3rd in Science fair during the academic year 2009-2010 for the project electronic watchdog to detect people using infrared light.
Machine Learning by Andrew NG on Coursera (YS6ZMFEQLN23)