Hello, I'm Pratham

I'm a

Enthusiastic Computer Science graduate specializing in AI and ML. Experienced in machine learning models, data analysis, and algorithm optimization. Skilled in problem solving and proficient in Python. Excited to apply my expertise to innovative ventures.

Education

Present

MPS in Applied Machine Intelligence

IIIT-B + Northeastern University
Bengaluru, India + Boston, USA
Expected Graduation by: Dec 2025

2020-24

B.Tech in Computer Science(AI&ML)

Jain (deemed to be) University
Jayanagar, Bengaluru, India
Grdaduation: May 2024
CGPA: 8.86

2019-20

12th Grade

Shree Bhagwan Mahaveer Jain College
VV Puram, Bengaluru, India
Completed: June 2020
Percentage: 78%

2017-18

10th Grade

BGS Central School
Karehalli, Bhadravathi, Karnataka
Completed: April 2018
Percentage: 72%

Certifications

Machine Learning with Python

09/2023
Provider: codnitive class(IBM)
Go to course

Microsoft AI-900

01/2024
Provider: Microsoft
Go to course

Generative AI with LLM’s

08/2023
Provider: Deep learning AI (Coursera)
Go to course

Prompt Engineering

02/2024
Provider: Futurense Technologies

Data Analysis with Pandas and Python

01/2023
Provider: Udemy
Go to course

Data visualization and Storytelling

01/2024
Provider: Futurense Technologies

Data Warehousing and Business Intelligence

12/2022
Provider: University of California, Irvine (Coursera)
Go to course

Hands on Python

05/2020
Provider: Hunar Pro

Python Programming

12/2020
Provider: Greatlearning
Go to course

C++ for Beginners

10/2020
Provider: Greatlearning
Go to course

Introduction to Augmented Reality and ARCore

07/2023
Provider: daydream(Coursera)
Go to course

Experience

Jan 2024- May 2024

Data Science Intern

Futurense Technologies

During my internship, I cleaned and analyzed large datasets from various sectors, including the Indian census, housing, and healthcare. I resolved data inconsistencies and created visualizations to inform policy decisions, identifying regions with inadequate hospital bed capacity and toilet facilities. Additionally, I analyzed Airbnb listings in Seattle, extracting insights on pricing, amenities, and user reviews, and visualizing trends to identify key factors affecting listing prices and perform sentiment analysis of reviews.

Futurense Technologies


03/01/2024

During my internship, I worked extensively with large datasets from the Indian census, housing, and healthcare sectors. My primary responsibilities included cleaning and analyzing these datasets to ensure accuracy and consistency. I addressed and resolved various data issues such as inconsistencies and missing values, which are critical for reliable analysis.

A significant part of my work involved creating visualizations to inform healthcare policy decisions. For instance, I identified regions with inadequate hospital bed capacity and toilet facilities by analyzing the cleaned data. These visualizations provided clear insights that could aid policymakers in making informed decisions to improve healthcare infrastructure in underserved areas.

Additionally, I analyzed a comprehensive dataset of Airbnb listings in Seattle. My analysis focused on extracting meaningful insights regarding pricing, amenities, and user reviews. I employed various plots and heatmaps to visualize trends and correlations within the data. Through this analysis, I identified key factors affecting listing prices and conducted sentiment analysis on user reviews to understand guest experiences better.

Overall, my internship provided valuable experience in data cleaning, analysis, and visualization, enabling me to contribute to significant policy and business insights.

Research & Publications

Augmented reality is an on point or off point perspective of the real-world elements which are augmented over a computational program. It mostly increments the program information and improves the customer’s perspective towards the real world. The main aim of this project is to create a virtual trial room program that helps the user simulate cloth movement in Augmented reality. The early applications were implemented over static images of the user and added static clothes over them. This is the common goal for each AR application. This application can require OpenCV and a webcam. Once there is a video captured, the application runs a series of codes to identify the background and the subject over which the cloth will be simulated. It is a well-versed platform that has been created for getting efficient computational functions. OpenCV holds multiple functions that are bonded with each other in identifying textures in different objects in a frame. After analyzing the color palettes, the image is turned into grayscale and thresholding is put on it to increase the intensity of the subject and decrease the ones of the background to a much darker intensity. Many works of the same threshold can be implemented on the picture on further requirement. Thresholding support is differentiating objects by compiling with their background functions and work on the pixels as required. On receiving the hierarchy, the objects are extracted and the detection on the required object starts kicking in with the application focusing onto it. Following the aforementioned task, the application can compute the required areas where the cloth is to be put and simulated on to the real world. The selection of the texture, color, patter, material is based on the users’ preferences. The key difference between this technology and previous existing virtual halls is scarce hardware support on this virtual base.

- Published by: IEEE Xplore, 2022

AR in Fashion Industries

Authors:
Dwaj Ranka
Pratham Chopra
Ranvir M Mehta

The extensive implementation of information technology and the advent of the fourth industrial revolution, sometimes referred to as Industry 4.0, have brought about substantial transformations in the operational practices of enterprises and organizations. Robotic Process Automation (RPA) assumes a pivotal position in the automation of diverse organizational operations. Its integration with Artificial Intelligence (AI) algorithms and methodologies holds the potential to significantly augment operational efficiency. This study examines the incorporation of Robotic Process Automation (RPA) and Artificial Intelligence (AI) technologies within the framework of Industry 4.0. When combined with Artificial Neural Network methods, Text Mining techniques, and Natural Language Processing, Robotic Process Automation (RPA) becomes a powerful tool for many tasks including data extraction, recognition, classification, forecasting, and process optimization. The advantages of Robotic Process Automation (RPA) are manifold, encompassing uninterrupted provision of services, the flexibility to scale and adjust to evolving demands, greater precision of data, time efficiency achieved through automation, streamlined workflows, increased productivity, less errors, and cost savings. Not with standing these benefits, the deployment of RPA presents many hurdles, including financial expenses, the need for technical proficiency, significant alterations to existing processes, and the possibility of duplications. Presently, Robotic Process Automation (RPA) is extensively employed for the purpose of automating tasks within the workplace. Its primary function involves the proficient extraction of data from client systems, monitoring of purchase orders, expediting order fulfillment, and ultimately improving overall operational efficiency.

Robotics and AI in Industry 4.0

Authors:
Dwaj Ranka
Neell Ravindra Ambere
Pratham Chopra
Ranvir M Mehta

This study offers a revolutionary paradigm that gives voice assistants emotional intelligence. By using machine learning and audio preprocessing, the system can capture user emotions and use Natural Language Processing (NLP) to generate written responses that are contextually relevant. An algorithmic method combines context awareness and sentiment analysis to choose appropriate responses. The assistant's interactions are enhanced when emotion is infused into text data. With the help of this framework, voice assistants may better comprehend and react to the emotions of their users, resulting in more engaging conversations. Experiments show that context-based responses, emotionally complex interactions, and precise emotion perception are possible. This effort advances artificial intelligence's human-computer interface by developing emotionally intelligent systems.

LAI (LIFE LIKE AI): VOICE ASSISTANT WITH EMOTIONAL RESPONSE

Authors:
Dwaj Ranka
Neell Ravindra Ambere
Pratham Chopra
Ranvir M Mehta

Skills

Python
75%
C
60%
HTML
70%
CSS
65%
Git
65%
SQL
60%
WordPress
70%
Data analysis
70%
Machine Learning
75%
Artificial Intelligence
40%
Tensorflow
65%
Deep Learning
60%
Computer Vision
55%
Numpy and Pandas
75%
Automatic Speech Recognition
70%
Pytorch
70%
Keras
65%
Natural Language Processing
60%
Large Language Models
55%
Langchain
70%

Blogs

  • Getting Started with Python for Data Analysis
  • Best Practices for Data Cleaning and Preprocessing

Getting Started with Python for Data Analysis

Data analysis is a critical skill in today's data-driven world. Python, with its powerful libraries and simple syntax, has become the go-to language for data analysts and scientists. This blog will guide you through the essentials of getting started with Python for data analysis, providing you with the tools and knowledge to begin your journey.

1. Why Python for Data Analysis?

Python is popular for data analysis due to its simplicity, readability, and extensive ecosystem of libraries. It offers several advantages:

  • Ease of Learning: Python's syntax is straightforward and readable, making it accessible for beginners.
  • Comprehensive Libraries: Libraries like Pandas, NumPy, Matplotlib, and Seaborn provide robust tools for data manipulation, analysis, and visualization.
  • Community Support: A large community means extensive documentation, tutorials, and forums to help you overcome challenges.
Python Logo

2. Setting Up Your Environment

Before diving into data analysis, you need to set up your Python environment. Here’s how:

Installing Python

Download and install Python from the official website. Ensure you add Python to your system's PATH during installation.

Installing Jupyter Notebook

Jupyter Notebook is an open-source web application that allows you to create and share documents containing live code, equations, visualizations, and narrative text. Install Jupyter Notebook using pip:

pip install jupyter

To start Jupyter Notebook, simply run:

jupyter notebook

This command will open a new tab in your default web browser with the Jupyter interface.

Jupyter Notebook Interface

Installing Essential Libraries

Install the following libraries using pip:

pip install numpy pandas matplotlib seaborn

3. Introduction to Key Libraries

NumPy

NumPy is the fundamental package for scientific computing with Python. It provides support for arrays, matrices, and many mathematical functions.

import numpy as np

# Create an array
data = np.array([1, 2, 3, 4, 5])
print(data)
NumPy Array Example

Pandas

Pandas is a powerful data manipulation tool that provides data structures like Series and DataFrame.

import pandas as pd

# Create a DataFrame
data = {'Name': ['John', 'Anna', 'Peter', 'Linda'],
        'Age': [28, 24, 35, 32]}
df = pd.DataFrame(data)
print(df)
Pandas DataFrame Example

Matplotlib and Seaborn

Matplotlib is a plotting library for creating static, animated, and interactive visualizations. Seaborn builds on Matplotlib and provides a high-level interface for drawing attractive statistical graphics.

import matplotlib.pyplot as plt
import seaborn as sns

# Create a simple plot
x = [1, 2, 3, 4, 5]
y = [10, 15, 13, 17, 19]
plt.plot(x, y)
plt.title('Simple Plot')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.show()
Matplotlib Plot Example
# Create a Seaborn plot
sns.set(style="darkgrid")
tips = sns.load_dataset("tips")
sns.relplot(x="total_bill", y="tip", hue="smoker", data=tips)
plt.show()
Seaborn Plot Example

4. Loading and Inspecting Data

Pandas makes it easy to load and inspect data. You can read data from various formats like CSV, Excel, and SQL databases.

# Load a CSV file
df = pd.read_csv('data.csv')

# Display the first few rows
print(df.head())

# Get a summary of the DataFrame
print(df.info())

5. Data Cleaning and Preparation

Data often needs to be cleaned and prepared before analysis. This includes handling missing values, removing duplicates, and transforming data.

Handling Missing Values

# Check for missing values
print(df.isnull().sum())

# Fill missing values
df.fillna(method='ffill', inplace=True)

Removing Duplicates

# Remove duplicate rows
df.drop_duplicates(inplace=True)

Transforming Data

# Convert data types
df['column'] = df['column'].astype('int')

6. Exploratory Data Analysis (EDA)

EDA involves summarizing the main characteristics of a dataset, often using visual methods.

Descriptive Statistics

# Summary statistics
print(df.describe())

Data Visualization

# Histogram
df['column'].hist()
plt.show()

# Scatter plot
plt.scatter(df['column1'], df['column2'])
plt.show()
Histogram Example Scatter Plot Example

7. Conclusion

Getting started with Python for data analysis involves setting up your environment, understanding key libraries, and learning basic data manipulation and visualization techniques. As you gain more experience, you can explore advanced topics like machine learning, deep learning, and big data processing.

References

IMAGES ADDED ARE JUST EXAPMPLES AND REFERRED FROM INTERNET

Best Practices for Data Cleaning and Preprocessing

Data cleaning and preprocessing are essential steps in the data science pipeline. These processes ensure that your data is accurate, consistent, and ready for analysis. In this blog post, we will explore best practices for data cleaning and preprocessing to help you achieve high-quality, reliable datasets.

1. Understand Your Data

Before diving into cleaning and preprocessing, it's crucial to understand the data you're working with. This includes knowing:

  • The source of the data
  • The structure of the data
  • The meaning of each feature (column)
  • The type of data (categorical, numerical, etc.)
Data Understanding

2. Handle Missing Values

Missing data can skew your analysis and lead to inaccurate conclusions. Here are a few strategies for handling missing values:

  • Remove Missing Values: If the dataset is large and the number of missing values is small.
  • Impute Missing Values: Use statistical methods (mean, median, mode) or predictive models to fill in missing values.
  • Flag and Fill: Create a new column indicating the presence of missing values and fill them with a placeholder.
Handling Missing Values

3. Remove Duplicates

Duplicate data can distort your analysis. Ensure to:

  • Identify duplicates using key columns.
  • Remove exact duplicates or decide on a method for handling near-duplicates based on the context.
Removing Duplicates

4. Handle Outliers

Outliers can significantly impact the results of your analysis. Depending on the context:

  • Remove Outliers: If they are errors or irrelevant.
  • Cap or Transform Outliers: If they are valid but extreme, consider capping values at a certain percentile or transforming the data using techniques like log transformation.
Handling Outliers

A box plot showing the identification of outliers in a dataset.

5. Standardize and Normalize Data

Ensuring that your data is on a comparable scale is vital, especially for algorithms that are sensitive to the scale of input features.

  • Normalization: Rescale the data to have a mean of 0 and a standard deviation of 1.
  • Standardization: Scale data to a [0, 1] range.
Standardization and Normalization

Comparison of raw data, standardized data, and normalized data on a graph.

6. Encode Categorical Variables

Many machine learning algorithms require numerical input. Convert categorical variables into numerical format using:

  • Label Encoding: Assign a unique number to each category.
  • One-Hot Encoding: Create binary columns for each category.
Encoding Categorical Variables

Example of categorical data before and after label and one-hot encoding.

7. Feature Engineering

Creating new features from existing ones can improve model performance. Examples include:

  • Date Features: Extracting day, month, year, or creating time-based features.
  • Interaction Features: Combining features to capture interactions.

8. Data Splitting

Split your data into training, validation, and test sets to ensure your model's generalizability.

  • Training Set: Used to train the model.
  • Validation Set: Used to tune the model and prevent overfitting.
  • Test Set: Used to evaluate the final model performance.
Data Splitting

Visualization of a dataset being split into training, validation, and test sets.

Conclusion

Data cleaning and preprocessing are critical steps in the data science process. By following these best practices, you can ensure that your data is accurate, consistent, and ready for analysis, leading to more reliable and valid results.

References

Contact Me