V Venkata Sri Harsha

EX Data Scientist at Jio|Ex Senior MLE at ExaWizards India|Founder of Ascent Intelligent Technologies

image

Engineer with 5+ years of experience as a data scientist / MLE looking to fuel my fervent passion for technology by developing next-generation, data-driven solutions to increase efficiency and accuracy of systems. Capable of working in highly challenging and demanding environments, utilizing my skills and knowledge to be the best of my abilities and contribute positively to the growth of the organization.


Work Experience

Senior Machine Learning Engineer

ExaWizards India, Hyderabad | April 2022 - June 2023

  • Started a Terrafrom Module which is used across organisation to create, maintain AWS resources without acessing console.
  • Organised hackathon as a part of knowledge sharing to help all the Data scientists in the team to understand the engineering side of things.
  • Architeced, designed and built, maintained entire cloud side for a Event driven platform for CCTV cameras which can do ML on the edge and raise events like intrusion, person in-out , suspicious, attendance etc.
  • Managed and Maintained AWS, Local infrastructure with proper access controls, written and maintained policies for the same
  • Lead a team building the AI Smart camera project from concept stage to POC stage to finally product stage.
  • Wrote CICD pipelines for deployments into different services like lambda, ecs, amplify.
  • Guided intenship fellows with NLP(Japaneses speech to text wave2vec) and terraform related activties.

Data Scientist

Jio, Hyderabad, Telangana | Nov 2020 - April 2022

  • Designed, developed and deployed an end to end article recommendation system.
  • Designed a devoped entity disambiguation model which can identify medical entities from free text.(Pretrained ML models and look up based).
  • Benchmarked performance of Habana (Intel goya card) w.r.t nvidia by running multiple ML models(X-ray classification, article classification) .
  • Designed, deployed and maintained infra,CICD pipelines for OCR project which extracts meta data from diagnostic reports
  • POC’s for article classifier, Benchmarked different OCR’s.

Data Scientist

GEP, Hyderabad, Telangana | Oct 2018 - Oct 2020

  • Developed and trained machine learning models to solve problems in the procurement space.
  • Designed, built, maintained CICD (devops) pipelines and architecture to deploy Machine Learning Models in Azure Cloud Services.
  • Built multiple POC’s and Demos for clients which brought business to the organization
  • Coordinated with other teams to deploy and integrate ML models, that can be consumed by UI/Frontend

Data Scientist

Algoleap Technologies, Hyderabad | May 18 - Oct 18

  • Designed, programmed and deployed multiple chatbots (Keyword based, script-based, FAQ/Intelligent string matching based) using python in the chat platform.
  • Pre-processed data, coded RNN based deep learning models for question and Non- question detection, trained and tuned the models for better accuracy and generalization.
  • Configured, deployed an open-source chat platform (Mattermost) on Amazon web services
  • Made Dash( plotly ) based interactive dashboard and hosted it on Amazon web services.

Robotics and Engineering Teacher

Quarrylane School,Dublin,CA | Aug 15 - Aug 17

  • Taught Robotics, Programming, and engineering to middle and high school.
  • Designed Engineering curriculum for school students.
  • Trained students in robotics competitions.
  • Our middle school vex robotics team won “Design Award” twice and are Finalists in California state competitions. We participated in VEX worlds-16,17.
  • Led the High school robotics team in all design, assembly, program, and testing robots for First Tech Challenges for years 2016,17.

Projects

Smart Camera System (July 2022 - May 2023)

To make the reactive cctv cameras proactive using a event driven system which runs the Ml algorithms on edge device and raise events.

Tech used : AWS cloud ec2, ecs ,lambda, cloudfront, cognito,Dynamodb, RDS kinesis video stream, iot Core, grpc, tensorRT, YOLO

  • Developed an end-to-end system encompassing a user-accessible web application built with Next.js, a cloud infrastructure hosted on AWS, backend APIs implemented in Python, and edge software utilizing GRPC modules.
  • Designed and implemented the cloud infrastructure on AWS, utilizing services such as EC2, S3, and RDS to ensure scalability, security, and reliability
  • Developed robust and efficient backend APIs using Python, enabling seamless data flow between the web application and the cloud components.
  • Ensured end-to-end security and data privacy by implementing encryption protocols and access controls across the system.

IOT Sensor Data Pipeline (May 2022 - July 2022)

To build a dashboard to show the realtime data produced by multiple IOT sensors and apply a anomaly detection alogritm on the data to detect and show anomalies in time series data.

Tech used : AWS EC2, Kinesis datahose, data stream, Athena DB on S3, Grafana for dashboarding, lambda, Autoencoders

  • Developed and deployed an efficient data pipeline to capture real-time data from IoT sensors and process it for analysis.
  • Implemented a seamless integration between IoT devices and the pipeline, ensuring reliable and continuous data ingestion.
  • Designed a scalable architecture to store the raw sensor data in Amazon S3, enabling cost-effective and durable storage.
  • Leveraged AWS Athena to perform ad-hoc queries directly on the raw data stored in S3, providing real-time insights without the need for preprocessing or data transformation.
  • Configured dynamic dashboards in Grafana to monitor and analyze key metrics and trends derived from the IoT sensor data.

Article Recommendation System (Nov 2021 - Apr 2022)

To build a recommendation system based on the interests and entities the user interacted with.

Tech used : Recommendation system, NLP, DOC2Vec, Word2vec

  • Developed and implemented an advanced article recommendation system within jio health hub app, leveraging user interactions with various entities such as diseases, medicines,symptoms …
  • Implemented content filitering techniques to solve the cold start problem.
  • Implemented collaborative filtering techniques to analyze user behavior and preferences, identifying patterns and similarities to generate accurate recommendations.
  • Integrated natural language processing (NLP) techniques to extract key information from articles and user interactions, enhancing the relevance and quality of recommendations.
  • Implemented algorithms and strategies to ensure diversity in recommendations, promoting exploration and discovery of new health-related topics.
  • Utilized feedback loops and metrics to evaluate the performance of the recommendation system, fine-tuning algorithms and parameters for optimal results.

Entity Search service(Nov 2020 - April 2022)

To build a an api that can extract medical entities from free text and map them the appropriates entites in the DB.

Tech used : elastic search, Fastapi, NER

  • Developed, deployed and maintained a Fastapi on top of elastic search which does NER (Named Entity Recognition).
  • This system is used across the app anywhere there is an entity resolution required.
  • This API also support autosuggest, look ahead search functionalities
  • Built a POC for scalable unsupervised learning based named entity recognition for semantic search usecase.

Item Master Data Management (Jan 2020 - Oct 2020)

To solve master data management problem with Machine learning in different stages. Key challenge is to do Item duplicate detection, Category (noun modifier) Identification, Attribute extraction on item description (short text)

Tech used : Elastic search, mongo db, Azure Web-app, Azure storage tables, Azure Data Factory, Support Vector Machine, Named Entity Recognition, POC on LUIS

  • Built Azure data factory pipelines run batch job on millions of rows.
  • Built feedback loop for extraction alogo to updated extraction regex based on user decisions
  • Started with simple SVM classifier for Category detection and upgraded to deep learning based(embeddings training) semi-supervised identifier
  • Trained a custom NER attribute extracter
  • Improved master data search to give better contextual results by identifying parts of search query
  • Designed, built CICD pipelines for ease development and deployment.
  • As a part of supplier details validation, implemented all rules to validate the details using python. Also implemeted realtime address validation using google API

Contracts Meta Data Extraction (Jan 2019 - Oct 2020)

To automate extraction of metadata, clauses from scanned, signed contracts. which helped in reducing the manual efforts of internal team by more than 30 %. Which was then productized so it can be used directly by clients .

Tech used : py-tesseract, Azure web-app for containers, Azure DevOps pipelines, Azure cognitive services, Keras, Docker, Flask, CNN, RNN, Decision trees, Mongodb (Azure Cosmos DB), Azure tables services, Azure blob storage.

  • Trained Image classification model using CNN to identify different types of pages in contracts.
  • Trained Decision tree model to decide the type of contract from the first 2 paragraphs of the contract.
  • Developed and trained a Recurrent neural network model(LSTM) for capturing address from natural text.
  • Developed and trained a multi-class sentence classifier for identifying types of clauses in a contract.
  • Built, Deployed and maintained APIs to extract metadata using all these machine learning models.
  • Designed, built CICD pipelines for ease development and deployment, Designed and built an automated feedback mechanism to train and deploy models automatically.
  • Engineered Tables extraction and structure maintenance from scanned documents after OCR.

Reverse Image Search (Oct 2018 - Dec 2018)

Given an image, Detect known objects in the image and find similar images in the database of images.

Tech used : Deep Learning using Keras, Transfer learning, CNN, Object-detection and Image Classification, KNN for Finding closest images

  • Used pre-trained Res-net 50 architecture to encode the images, Used Modified K-nearest neighbor to find the images for the recommendation.
  • Used classification models to decrease false-positive rate and object detection (Yolo) to identify multiple objects from one image and recommend accordingly.
  • Deployed the entire project with Models as an API and made it as a framework to update the database.

Kaggle - Retail case study (May 2018 - May 2018)

Given Sale or revenue of previous years of a US-based retailer forecast the sales for next year.

Tech used : Implemented Linear, Polynomial regression, LSTM (RNN) and Decision tree regression with combining Macroeconomic, weather data along with the given data. Preformed grid search for hyperparameter tuning.

  • Ranked 2/93, Won Gold for getting the lowest RMS error

Projects/POC at AlgoLeap (June 2018 - Sep 2018)

Designed, Built and Deployed multiple POC’s, working prototypes in very short time to demo for clients

Tech used : YOLO - CNN, RNN, chatbots

  • Setup a matter most server on an AWS EC2 instance, deployed 3 different chatbots, which can be triggered in a group chat
  • Trained an RNN based question and non-question classifier using the SQUAD dataset which is used to automate the questions extraction from a chat transcript.
  • Built a real-time dashboard using dash plotly which shows the graphs and counts of people and vehicles.
  • Built a YOLO based object detection model to analyze the CCTV footage at a gate and count people, vehicles going IN-OUT

Academic - Laser Scanning System (Jan 15 - Mar 15)

Designed, built and tested the prototype of a laser range finder for detecting breakthroughs in underwater boreholes (Subsea level).

Tech used : Used CMU camera for object detection, Arduino microcontroller, Matlab for real-time 3D visualization, and designed a Solid works CAD model.

  • Master’s Project

Academic - Educational Line Follower(Aug 13 - Dec 13)

we designed and made a line follower robot with an option to various control parameters in PID system.

Tech used : Worked on Arduino Microcontroller, Tuned PID control parameters.

  • Used to demonstrate the performace of a system with varying P,I,D values
  • Master’s sem Project

Academic - Autonomous Biped Robot (Jan 12 - May 12)

Built a Biped robot using servo motors and Arduino.

Tech used : Solidworks, Arduino

  • Programmed and used Arduino board to control servo motors, used Solidworks application for designing the links, performed structural analysis of Biped

Academic - Mech Rubik’s Cube Solver(May 09 - Sep 09)

we designed, engineered and assembled the working model of a machine, which replicates the three-hand motions through which RUBIK’S cubes are solved.

Tech used : Expert with Lathe, Milling, CNC, Cutting and Drilling Machines tools.

  • This project was awarded the best project in the class by IIT Kanpur.

Information

Course work

CS231n, CS224n, Advanced Engineering Mathematics, Discrete Mathematics, Robotics, Modern Control Systems, Linear Algebra, Probability, Mechatronics, Dynamic systems, Inverse kinematics.

Mini Projects based on Arduino

Rock-paper-scissors (Ultrasonic sensor, LCD screen, Servo motors), Go for green(Used Ir sensors and LEDs), obstacle avoiding robot (Ultrasonic), Step lights(pressure sensors, LED’s), Robotic Arm (3 Servo motors).