Data Science | Assesments

A Data Science internship offers hands-on experience in analyzing large datasets, building predictive models, and using data-driven methods to solve business problems. Interns learn to gather, clean, visualize, and model data using modern tools and techniques in machine learning, statistics, and programming.

Objectives

Understand the full lifecycle of data — from collection to actionable insights.
Gain real-world exposure to machine learning, statistical modeling, and data visualization.
Apply tools like Python, SQL, R, and libraries such as pandas, NumPy, scikit-learn, etc.
Learn to work with structured and unstructured data across domains.

Key Responsibilities

Collect and clean raw datasets from various sources (APIs, databases, CSVs, etc.)
Perform Exploratory Data Analysis (EDA) to find patterns and trends.
Build and test predictive models (e.g., regression, classification, clustering).
Visualize results using tools like Matplotlib, Seaborn, or Power BI/Tableau.
Generate reports and dashboards to communicate findings to stakeholders.
Work closely with Data Engineers and Analysts to improve data pipelines.

Tools & Technologies You’ll Use

Languages: Python, R, SQL
Libraries: pandas, NumPy, scikit-learn, TensorFlow, Keras, PyTorch
Visualization: Matplotlib, Seaborn, Plotly, Tableau, Power BI
Big Data: Hadoop, Spark (optional)
Databases: MySQL, PostgreSQL, MongoDB
Version Control: Git, GitHub

Skills You’ll Gain

Data cleaning and preprocessing
Statistical analysis and hypothesis testing
Predictive modeling and machine learning
Business problem-solving with data
Data visualization and storytelling
Model evaluation and tuning

Important Notice:
Once you start the quiz, you will not be able to pause, exit, or restart it. Please ensure you are ready before beginning.

Data Science D1

1 / 45

1) How to drop rows where all elements are NaN?

a) df.dropna(how='any')

b) df.dropna(how='all')

c) df.dropna(all=True)

d) df.dropna(all=False)

2 / 45

2) Remove duplicate rows in Pandas:

a) df.clean()

b) df.drop_duplicates()

c) df.remove_duplicates()

d) df.unique()

3 / 45

3) In Pandas, axis=0 refers to:

a) Column-wise

b) None

c) Diagonal

d) Row-wise

4 / 45

4) Convert a categorical column into dummy variables:

a) df.to_dummy()

b) pd.get_dummies()

c) df.encode()

d) pd.dummy()

5 / 45

5) Which method shows memory usage of DataFrame?

a) df.mem()

b) df.memory_usage()

c) df.info(memory=True)

d) df.size()

6 / 45

6) Which data structure is unordered and mutable in Python?

a) Dictionary

b) Tuple

c) List

d) Set

7 / 45

7) Which function gives cumulative sum of a column?

a) df['col'].cum_sum()

b) df.cum_sum('col')

c) df.cumsum('col')

d) df['col'].cumsum()

8 / 45

8) Export a DataFrame to Excel:

a) df.export_excel()

b) df.to_csv()

c) df.to_excel()

d) df.save_excel()

9 / 45

9) Approximate time complexity of dictionary lookup:

a) O(1)

b) O(n^2)

c) O(log n)

d) O(n)

10 / 45

10) Pandas method to compute rolling mean:

a) df.mean(rolling)

b) df.shift().mean()

c) df.shift().mean()

d) df.rolling().mean()

11 / 45

11) Which function returns unique values across the entire DataFrame?

a) np.unique(df.values)

b) df.unique_all()

c) df.values.unique()

d) df.unique()

12 / 45

12) Which method calculates the rolling median?

a) df.rolling_median()

b) df.rolling().median()

c) df.rolling.median()

d) df.median_rolling()

13 / 45

13) How to interpolate missing numeric values?

a) df.interpolate()

b) df.fillna(interpolate=True)

c) df.fillna(method='interpolate')

d) df.replace(np.nan, interpolate)

14 / 45

14) Which method returns the first valid index in a Series?

a) series.first_valid_index()

b) series.valid_index()

c) series.first_index()

d) series.idxmin()

15 / 45

15) Which function calculates quantiles?

a) df.quant()

b) df.percentile()

c) df.quantile()

d) df.q()

16 / 45

16) How to shuffle rows of a DataFrame?

a) df.shuffle()

b) df.randomize()

c) df.permute()

d) df.sample(frac=1)

17 / 45

17) Which method joins DataFrames by index?

a) df.combine()

b) df.merge()

c) df.concat()

d) df.join()

18 / 45

18) Difference between np.dot() and np.matmul():

a) No difference

b) np.dot() only works for scalars

c) np.matmul() works for strings

d) np.dot() works for 1D/2D; np.matmul() mainly for 2D+

19 / 45

19) Method to add/update a key-value pair in a dict:

a) dict.append()

b) dict.add()

c) dict.insert()

d) dict.update()

20 / 45

20) NumPy broadcasting allows:

a) Iterating over lists

b) Splitting arrays

c) Operations on arrays of different shapes

d) Creating copies

21 / 45

21) Which function converts a DataFrame to NumPy array?

a) df.values

b) All of the above

c) df.to_numpy()

d) np.array(df)

22 / 45

22) Pandas method to compute rolling mean:

a) df.shift().mean()

b) df.moving_avg()

c) df.rolling().mean()

d) df.mean(rolling)

23 / 45

23) How to convert column types in Pandas?

a) df.set_type('col','int')

b) df['col'].astype('int')

c) df.change_type('col','int')

d) df['col'].convert('int')

24 / 45

24) Which function computes pairwise distances between rows?

a) scipy.spatial.distance.pdist()

b) df.pairwise_dist()

c) pdist()

d) np.dist()

25 / 45

25) Which function creates a 3×3 identity matrix?

a) np.eye(3)

b) list.set()

c) np.identity(3,3)

d) np.ones(3,3)

26 / 45

26) What is the output of: a = [1,2,3]; print(a*2)

a) [1,2,3,1,2,3]

b) Error

c) [2,4,6]

d) [1,2,3,2]

27 / 45

27) Which of the following is used to create a dictionary?

a) {}

b) []

c) ()

28 / 45

28) Which method returns the indices of missing values?

a) df[df.isnull().any(axis=1)]

b) df.isnull().index()

c) df.isnull().any()

d) df.isnull().sum()

29 / 45

29) Which method slices rows by label range?

a) df.iloc['start':'end']

b) df.range('start','end')

c) df.slice('start','end')

d) df.loc['start':'end']

30 / 45

30) Compute correlation between numeric columns:

a) f.corr_matrix()

b) df.correlation()

c) df.relate()

d) df.corr()

31 / 45

31) Which method performs one-hot encoding in Pandas?

a) df.dummies('col')

b) df.encode_onehot('col')

c) df.one_hot('col')

d) pd.get_dummies(df['col'])

32 / 45

32) NumPy broadcasting allows:

a) Iterating over lists

b) Creating copies

c) Operations on arrays of different shapes

d) Splitting arrays

33 / 45

33) Which parameter in read_csv handles large file chunks?

a) chunks

b) chunksize

c) blocksize

d) size

34 / 45

34) Which method returns descriptive stats for categorical columns?

a) df.describe(include=['object'])

b) df.describe_cats()

c) df.stats_categorical()

d) df.cat_summary()

35 / 45

35) Why is vectorization in NumPy faster?

a) Uses recursion

b) Uses compiled C code internally

c) Uses dictionaries

d) Skips memory

36 / 45

36) .agg() in Pandas is used for:

a) Merge DataFrames

b) Apply multiple aggregations

c) Filter rows

d) Plot data

37 / 45

37) Function to create a pivot table in Pandas:

a) df.group()

b) pd.table()

c) df.pivoting()

d) df.pivot_table()

38 / 45

38) Which function is used to pivot a table by index and columns?

a) df.pivot_table(index='i', columns='c', values='v')

b) df.pivot_all(index='i', columns='c', values='v')

c) df.pivot(index='i', columns='c', values='v')

d) df.pivoting(index='i', columns='c', values='v')

39 / 45

39) Which method returns the first valid index in a Series?

a) series.first_index()

b) series.valid_index()

c) series.first_valid_index()

d) series.idxmin()

40 / 45

40) What will be printed? x = {1,2,2,3,4}; print(x)

a) {1,2,2,3,4}

b) Error

c) {1,2,3,4}

d) [1,2,3,4]

41 / 45

41) Python sets do not allow:

a) Numbers

b) Strings

c) Iteration

d) Duplicates

42 / 45

42) arr = np.arange(6).reshape(2,3); print(arr.T.shape)

a) (3,2)

b) Error

c) (2,3)

d) 6,)

43 / 45

43) Which is a benefit of list comprehension?

a) All of the above

b) Often more readable

c) Shorter code

d) Faster than loops

44 / 45

44) Which function computes pairwise distances between rows?

a) pdist()

b) np.dist()

c) scipy.spatial.distance.pdist()

d) df.pairwise_dist()

45 / 45

45) Function to compute standard deviation in NumPy:

a) np.std()

b) np.var()

c) np.mean()

d) np.dev()

Your score is

The average score is 84%

Exit

Important Notice:
Once you start the quiz, you will not be able to pause, exit, or restart it. Please ensure you are ready before beginning.

Data Science D2

1 / 45

1) What does “.isnull()” function in Pandas do?

a) Checks for missing values

b) Drops missing values

c) Returns dataset shape

d) Fills missing values

2 / 45

2) Which of the following is NOT part of the Data Science workflow?

a) Data Cleaning

b) Model Training

c) Data Collection

d) Music Production

3 / 45

3) Logistic Regression is used for:

a) Clustering customers

b) Dimensionality reduction

c) Predicting continuous values

d) Predicting categorical outcomes

4 / 45

4) Which of the following is used for data visualization in Python?

a) Matplotlib

b) NumPy

c) Pandas

d) SQL

5 / 45

5) What is the role of hypothesis testing in Data Science?

a) Making decisions based on sample data

b) Collecting raw data

c) Scaling data

d) Visualizing data

6 / 45

6) Which metric is used for classification evaluation?

a) RMSE

b) MSE

c) Precision, Recall, F1-score

d) R-squared

7 / 45

7) Which of the following best describes a DataFrame in Pandas?

a) 1D labeled array

b) Scalar value

c) Multidimensional tensor

d) 2D tabular data structure

8 / 45

8) Which of the following is a regression algorithm?

a) K-Means

b) KNN

c) Linear Regression

d) Decision Tree

9 / 45

9) In supervised learning, the dataset is divided into:

a) Labeled and Unlabeled sets

b) Training and Testing sets

c) Only one dataset

d) Images and Texts

10 / 45

10) Which data visualization is best for time-series data?

a) Line chart

b) Pie chart

c) Scatter plot

d) Bar chart

11 / 45

11) Which of these is NOT a data visualization tool?

a) Power BI

b) Matplotlib

c) Docker

d) Tableau

12 / 45

12) Which algorithm is used for classification tasks?

a) PCA

b) KNN (K-Nearest Neighbors)

c) Apriori

d) Linear Regression

13 / 45

13) Which visualization is best for showing correlation between two variables?

a) Scatter plot

b) Histogram

c) Bar chart

d) Pie chart

14 / 45

14) Which of the following describes overfitting?

a) Model performs well on both training and test data

b) Model ignores training data

c) Model performs poorly on training data

d) Model performs well on training but poorly on test data

15 / 45

15) Which visualization is best for showing correlation between two variables?

a) Bar chart

b) Pie chart

c) Scatter plot

d) Histogram

16 / 45

16) Which of the following is an ensemble method in machine learning?

a) KNN

b) Linear Regression

c) Random Forest

d) Decision Tree

17 / 45

17) Which database is often used for unstructured big data?

a) SQLite

b) PostgreSQL

c) MongoDB

d) MySQL

18 / 45

18) Which machine learning library is commonly used for building models?

a) Matplotlib

b) Flask

c) Pandas

d) Scikit-learn

19 / 45

19) In Data Science, what does EDA stand for?

a) Extended Data Analytics

b) Enhanced Data Algorithm

c) Easy Data Application

d) Exploratory Data Analysis

20 / 45

20) In Data Science, “feature scaling” is required because:

a) It makes features comparable in scale

b) It removes missing values

c) It creates new features

d) It reduces dataset size

21 / 45

21) Which type of chart is best to visualize categorical data distribution?

a) Histogram

b) Bar chart

c) Line chart

d) Scatter plot

22 / 45

22) Data Science is mainly a combination of:

a) Statistics, Computer Science, and Domain Knowledge

b) Drawing, Music, and Literature

c) Biology, Chemistry, and Physics

d) Networking, Hardware, and Security

23 / 45

23) Which SQL command is used to extract data from a database?

a) SELECT

b) UPDATE

c) INSERT

d) DELETE

24 / 45

24) What does NumPy mainly provide?

a) Machine learning algorithms

b) Graph plotting functions

c) Database management

d) High-performance multidimensional arrays

25 / 45

25) Which of the following is an unsupervised learning algorithm?

a) Linear Regression

b) Logistic Regression

c) K-Means

d) Decision Trees

26 / 45

26) Which of these is an AI-based data visualization tool?

a) Hadoop

b) Power BI

c) NumPy

d) TensorFlow

27 / 45

27) Which of these plots is best to visualize data distribution?

a) Pie chart

b) Line chart

c) Histogram

d) Scatter plot

28 / 45

28) Which programming language is most popular in Data Science?

a) PHP

b) C++

c) Python

d) HTMLv

29 / 45

29) Which of the following techniques reduces dimensionality?

a) PCA (Principal Component Analysis)

b) Logistic Regression

c) K-Means

d) Random Forest

30 / 45

30) The process of cleaning and preparing raw data is called:

a) Data Modeling

b) Data Mining

c) Data Preprocessing

d) Data Visualization

31 / 45

31) Which Python library is most widely used for data analysis?

a) TensorFlow

b) Matplotlib

c) Pandas

d) Scikit-learn

32 / 45

32) Which function is used in Pandas to view the first few rows of data?

a) .info()

b) .tail()

c) .head()

d) .describe()

33 / 45

33) Which of the following is an example of supervised learning?

a) Grouping documents

b) Predicting house prices

c) Market basket analysis

d) Clustering customers

34 / 45

34) What does one-hot encoding do?

a) Normalizes data

b) Removes duplicate values

c) Combines multiple datasets

d) Converts categorical values into numerical form

35 / 45

35) What does the Pandas function .groupby() do?

a) Joins multiple DataFrames

b) Groups data based on conditions

c) Cleans missing values

d) Splits dataset into train/test

36 / 45

36) In statistics, which measure shows the spread of data?

a) Variance

b) Mode

c) Mean

d) Median

37 / 45

37) Which of these is a common file format for datasets?

a) .exe

b) .mp3

c) .csv

d) .jpg

38 / 45

38) Which metric is used to evaluate regression models?

a) Confusion Matrix

b) Mean Squared Error (MSE)

c) Accuracy

d) Precision

39 / 45

39) Which function in Pandas is used to join two datasets?

a) .join()

b) .merge()

c) .concat()

d) All of these

40 / 45

40) Which of the following describes “Big Data”?

a) Data only stored in Excel

b) Data too large and complex for traditional tools

c) Small structured datasets

d) Simple flat files

41 / 45

41) A confusion matrix is used to evaluate:

a) Clustering models

b) Regression models

c) Dimensionality reduction

d) Classification models

42 / 45

42) Which measure of central tendency is affected most by extreme values?

a) Mean

b) Mode

c) Median

d) All of the above

43 / 45

43) Which Python package provides tools for statistical modeling?

a) NumPy

b) TensorFlow

c) Matplotlib

d) Statsmodels

44 / 45

44) Which of these Python libraries is best for numerical computation?

a) Seaborn

b) NumPy

c) Pandas

d) Matplotlib

45 / 45

45) Which cloud platforms provide Data Science services?

a) All of the above

b) Azure

c) AWS

d) Google Cloud

Your score is

The average score is 96%

Exit

Important Notice:
Once you start the quiz, you will not be able to pause, exit, or restart it. Please ensure you are ready before beginning.

Data Analyst D2

1 / 45

1) What does ETL stand for?

a) Execute, Transfer, Log

b) Encrypt, Translate, Locate

c) Enter, Test, Learn

d) Extract, Transform, Load

2 / 45

2) Which tool can be integrated with Power BI for advanced analytics?

a) Both a and b

b) R

c) Python

d) Photoshop

3 / 45

3) What is Big Data mainly characterized by?

a) Size only

b) Value, Vision, Validation

c) Volume, Velocity, Variety

d) SQL queries

4 / 45

4) In ETL, the "Transform" step includes:

a) Storing raw data

b) Importing charts

c) Data cleaning and formatting

d) Creating dashboards

5 / 45

5) A primary key in a database is:

a) A column allowing duplicates

b) A chart type

c) A foreign key

d) A unique identifier for each record

6 / 45

6) What is the main benefit of using dashboards?

a) Storing raw files

b) Interactive and real-time insights

c) Replacing databases

d) Writing long reports

7 / 45

7) Which of the following is the first step in Data Analytics?

a) Data Collection

b) Data Cleaning

c) Data Visualization

d) Model Deployment

8 / 45

8) In SQL, which clause is used to filter records?

a) WHERE

b) FROM

c) SELECT

d) JOIN

9 / 45

9) Which of these is NOT a data type in Power BI?

a) Whole Number

b) Decimal Number

c) Music File

d) Date/Time

10 / 45

10) Cleaning data involves:

a) Handling missing values

b) Correcting errors

c) Removing duplicates

d) All of the above

11 / 45

11) Which function in Power BI is used to create calculated columns?

a) NumPy

b) DAX

c) Python

d) SQL

12 / 45

12) Which SQL command is used to combine rows from two tables?

a) SELECT

b) UPDATE

c) JOIN

d) DELETE

13 / 45

13) In Power BI, data can be imported from:

a) All of the above

b) Cloud services

c) SQL databases

d) Excel

14 / 45

14) Data storytelling in analytics refers to:

a) Presenting data with narratives and visuals

b) Writing novels

c) Coding in SQL

d) Using data to entertain

15 / 45

15) Prescriptive analytics helps to:

a) Visualize data

b) Suggest actions to optimize results

c) Identify what happened

d) Delete unwanted data

16 / 45

16) Which analytical method helps to find hidden patterns in data?

a) Data Cleaning

b) Data Mining

c) Data Reporting

d) Data Loading

17 / 45

17) Which type of join keeps only the matching records from two tables?

a) Inner Join

b) Outer Join

c) Left Join

d) Cross Join

18 / 45

18) In analytics, KPI stands for:

a) Knowledge Processing Input

b) Key Performance Indicator

c) Kernel Process Integration

d) Key Power Index

19 / 45

19) Which component of Power BI is used for creating reports?

a) Power BI Service

b) Power BI Mobile

c) Power Query

d) Power BI Desktop

20 / 45

20) In Power BI, relationships are built between:

a) Charts

b) Tables

c) DAX formulas

d) Reports

21 / 45

21) Which visualization shows parts of a whole?

a) Histogram

b) Line chart

c) Scatter plot

d) Pie chart

22 / 45

22) Data Analytics mainly focuses on:

a) Creating new programming languages

b) Collecting data only

c) Computer hardware maintenance

d) Analyzing past data for insights

23 / 45

23) Which of these is an open-source tool for data visualization?

a) Tableau

b) Google Data Studio

c) Power BI

d) All of the above

24 / 45

24) Which database language is widely used in analytics?

a) SQL

b) Java

c) HTML

d) C++

25 / 45

25) DAX in Power BI stands for:

a) Database Access XML

b) Data Automated Execution

c) Data Analysis Expressions

d) Data Array Exchange

26 / 45

26) Which of these is an example of descriptive analytics?

a) Optimizing supply chain

b) Predicting sales

c) Automating workflows

d) Explaining past sales trends

27 / 45

27) Which tool is most widely used for business data visualization?

a) Notepad

b) Power BI

c) MS Word

d) Photoshop

28 / 45

28) Predictive analytics is used to:

a) Clean datasets

b) Create pie charts

c) Describe the past

d) Forecast future outcomes

29 / 45

29) Which chart is best to show correlation between two variables?

a) Line chart

b) Pie chart

c) Scatter plot

d) Bar chart

30 / 45

30) Which chart is best for showing trends over time?

a) Histogram

b) Scatter plot

c) Pie chart

d) Line chart

31 / 45

31) Which of these is NOT a stage in the analytics process?

a) Data Cleaning

b) Space Exploration

c) Data Collection

d) Data Visualization

32 / 45

32) The ultimate goal of Data Analytics is to:

a) Build hardware devices

b) Generate insights for better decisions

c) Collect large amounts of raw data

d) Write computer programs

33 / 45

33) A slicer in Power BI is used for:

a) Creating charts

b) Filtering data interactively

c) Formatting reports

d) Joining tables

34 / 45

34) What is the purpose of data transformation?

a) Store data in raw form

b) Erase unnecessary data permanently

c) Convert data into usable format

d) Visualize the data

35 / 45

35) Which cloud platform is popular for analytics?

a) AWS

b) Google Cloud

c) All of the above

d) Azure

36 / 45

36) What is the role of a Data Analyst?

a) Extract insights from data

b) Develop operating systems

c) Build computer hardware

d) Write networking protocols

37 / 45

37) A foreign key is used to:

a) Link two tables together

b) Store passwords

c) Define a unique record

d) Visualize dashboards

38 / 45

38) What is the main purpose of a dashboard in analytics?

a) Collect social media data

b) Store raw data

c) Present key insights visually

d) Train machine learning models

39 / 45

39) Which chart is suitable for showing cumulative values?

a) Scatter plot

b) Line chart

c) Area chart

d) Pie chart

40 / 45

40) Which visualization is best for frequency distribution?

a) Line graph

b) Area chart

c) Histogram

d) Pie chart

41 / 45

41) What is data visualization mainly used for?

a) Encrypt data

b) Increase data errors

c) Make insights easier to understand

d) Hide data

42 / 45

42) Which function in Power BI creates aggregate measures?

a) SUM()

b) AVERAGE()

c) COUNT()

d) All of the above

43 / 45

43) Power BI dashboards can be shared through:

a) All of the above

b) Power BI Service

c) Emails

d) Cloud storage

44 / 45

44) A bar chart is best suited for:

a) Displaying proportions

b) Comparing categories

c) Showing trends over time

d) Showing correlations

45 / 45

45) . What does a Data Warehouse store?

a) Historical and structured data

b) Live gaming data

c) Music files

d) Emails only

Your score is

The average score is 79%

Exit

Important Notice:
Once you start the quiz, you will not be able to pause, exit, or restart it. Please ensure you are ready before beginning.

Data Science D3

1 / 45

1) Which technique helps reduce variance in machine learning models?

a) Bagging

b) Boosting

c) Normalization

d) Feature scaling

2 / 45

2) Which Python library is most widely used for handling structured data?

a) Pandas

b) Scikit-learn

c) Numpy

d) Matplotlib

3 / 45

3) Bagging in machine learning stands for:

a) Boosted Aggregation

b) Binary Aggregation

c) Bayesian Aggregation

d) Bootstrap Aggregation

4 / 45

4) Which algorithm is used for clustering in Data Science?

a) Linear Regression

b) Logistic Regression

c) Decision Trees

d) K-Means

5 / 45

5) Which sampling method ensures all groups are represented?

a) Random Sampling

b) Systematic Sampling

c) Cluster Sampling

d) Stratified Sampling

6 / 45

6) Which programming language is most popular in Data Science?

a) Java

b) C++

c) Python

d) Ruby

7 / 45

7) Which method is used to handle multicollinearity in regression models?

a) PCA

b) Bagging

c) Clustering

d) Logistic Regression

8 / 45

8) Which library in Python is used for machine learning algorithms?

a) Seaborn

b) Matplotlib

c) Pandas

d) Scikit-learn

9 / 45

9) Which step comes last in a Data Science workflow?

a) Deployment & Monitoring

b) Data Collection

c) Model Building

d) Data Cleaning

10 / 45

10) What does normalization do to data?

a) Adds noise to data

b) Scales data to a fixed range

c) Removes missing values

d) Converts data into categories

11 / 45

11) One-hot encoding is used for:

a) Scaling continuous features

b) Reducing dimensionality

c) Encoding categorical variables

d) Handling missing data

12 / 45

12) The process of filling missing values in a dataset is called:

a) Encoding

b) Imputation

c) Normalization

d) Sampling

13 / 45

13) Which SQL clause is used to group rows for aggregation in Data Science workflows?

a) WHERE

b) ORDER BY

c) HAVING

d) GROUP BY

14 / 45

14) Which Pandas method is used to check missing values?

a) df.clean()

b) df.isnull()

c) df.hasnull()

d) df.isna()

15 / 45

15) What does “overfitting” mean in machine learning models?

a) Model has too few parameters

b) Model performs well on training but poorly on new data

c) Model ignores training data

d) Model performs poorly on both training and test data

16 / 45

16) Which plot is used to visualize the relationship between two continuous variables?

a) Heatmap

b) Scatter plot

c) Histogram

d) Pie chart

17 / 45

17) In supervised learning, the dataset must contain:

a) Only input features

b) Random data

c) Input features and labeled outputs

d) Only categorical variables

18 / 45

18) Which data type is best to represent categorical variables in Pandas?

a) int

b) float

c) bool

d) object

19 / 45

19) Which evaluation metric is better for imbalanced classification datasets?

a) Recall

b) Accuracy

c) R-squared

d) Mean Squared Error

20 / 45

20) In Python, which function gives the first five rows of a DataFrame?

a) df.sample()

b) df.top()

c) df.first()

d) df.head()

21 / 45

21) Which Python function removes duplicates from a DataFrame?

a) df.clean()

b) df.drop_duplicates()

c) df.unique()

d) df.remove()

22 / 45

22) Which Python library is best for creating interactive visualizations?

a) Seaborn

b) Numpy

c) Matplotlib

d) Plotly

23 / 45

23) What does feature engineering involve?

a) Creating new features from raw data

b) Encoding missing data

c) Training models

d) Deleting features

24 / 45

24) Which of the following is the first step in the Data Science process?

a) Model Evaluation

b) Problem Definition

c) Data Visualization

d) Data Cleaning

25 / 45

25) What is the main disadvantage of decision trees?

a) High computational cost

b) Cannot handle categorical variables

c) Prone to overfitting

d) Do not support regression tasks

26 / 45

26) The term “Big Data” usually refers to:

a) Balanced datasets

b) Small datasets

c) Unstructured and large datasets

d) Labeled data

27 / 45

27) Which of these is an unsupervised learning algorithm?

a) Decision Tree

b) Logistic Regression

c) K-Means Clustering

d) Linear Regression

28 / 45

28) What is the purpose of data wrangling?

a) Cleaning and preparing data

b) Creating models

c) Deploying models

d) Visualizing data

29 / 45

29) Which of the following is NOT a Data Science task?

a) Game Design

b) Data Collection

c) Data Cleaning

d) Model Deployment

30 / 45

30) What is the main purpose of feature scaling?

a) To bring all features to the same range

b) To add categorical variables

c) To increase missing values

d) To improve visualization

31 / 45

31) A confusion matrix is mainly used for:

a) Regression tasks

b) Data Cleaning

c) Classification tasks

d) Clustering tasks

32 / 45

32) Which of these is an example of unstructured data?

a) Text documents

b) Excel sheet

c) CSV file

d) SQL table

33 / 45

33) In Python, what does df.describe() do?

a) Shows column names

b) Provides summary statistics

c) Shows first few rows

d) Displays null values

34 / 45

34) What is the purpose of Exploratory Data Analysis (EDA)?

a) To deploy applications

b) To visualize patterns and insights in data

c) To optimize algorithms

d) To train machine learning models

35 / 45

35) What does R² (R-squared) measure in regression?

a) Proportion of variance explained

b) Error rate

c) Precision

d) Accuracy

36 / 45

36) Which metric is most appropriate for evaluating a regression model?

a) F1 Score

b) Accuracy

c) Precision

d) Mean Squared Error

37 / 45

37) Which visualization technique is most suitable for checking correlation?

a) Pie chart

b) Heatmap

c) Bar chart

d) Line chart

38 / 45

38) Which distribution is used for binary classification problems?

a) Uniform distribution

b) Bernoulli distribution

c) Normal distribution

d) Poisson distribution

39 / 45

39) Which is a key challenge in Data Science?

a) Simple model selection

b) Lack of data

c) Handling large and messy datasets

d) Writing C++ code

40 / 45

40) Which of the following is an ensemble method?

a) PCA

b) Linear Regression

c) Logistic Regression

d) Random Forest

41 / 45

41) Which type of graph best shows the distribution of a numeric variable?

a) Histogram

b) Bar chart

c) Scatter plot

d) Pie chart

42 / 45

42) What does cross-validation help with?

a) Checking model generalization

b) Reducing features

c) Preventing data cleaning

d) Increasing dataset size

43 / 45

43) Which operation in Pandas merges datasets based on a common column?

a) join()

b) concat()

c) merge()

d) append()

44 / 45

44) ROC curve is used to measure:

a) Regression accuracy

b) Data preprocessing

c) Clustering efficiency

d) Classification performance

45 / 45

45) Which of the following reduces the number of dimensions in data?

a) Regression

b) Classification

c) Decision Tree

d) PCA

Your score is

The average score is 91%

Exit

Important Notice:
Once you start the quiz, you will not be able to pause, exit, or restart it. Please ensure you are ready before beginning.

Data Science D4

1 / 43

1) Which of these is NOT a step in CRISP-DM framework?

a) Deployment

b) Data Understanding

c) Data Preparation

d) Game Development

2 / 43

2) Backpropagation is used in:

a) Random Forest

b) Decision Trees

c) Clustering

d) Neural Networks

3 / 43

3) Which algorithm is best for spam email classification?

a) PCA

b) Naïve Bayes

c) Linear Regression

d) K-Means

4 / 43

4) Which library is most used for deep learning?

a) Pandas

b) Matplotlib

c) Seaborn

d) TensorFlow

5 / 43

5) Which metric is used for classification models?

a) RMSE

b) Accuracy

c) MSE

d) MAE

6 / 43

6) Which algorithm is called "lazy learner"?

a) Decision Tree

b) KNN

c) SVM

d) Naïve Bayes

7 / 43

7) Which type of variable has numeric values?

a) Nominal

b) Ordinal

c) Categorical

d) Continuous

8 / 43

8) F1-score is the harmonic mean of:

a) Precision and Recall

b) Precision and RMSE

c) Accuracy and Recall

d) Recall and RMSE

9 / 43

9) Which ML technique is inspired by human brain neurons?

a) Regression

b) Neural Networks

c) Decision Trees

d) Clustering

10 / 43

10) Which of the following is used for topic modeling in NLP?

a) LDA

b) RNN

c) PCA

d) CNN

11 / 43

11) What is Data Science primarily concerned with?

a) Extracting insights from data

b) Building websites

c) Creating animations

d) Designing circuits

12 / 43

12) Which cloud service provides ML tools?

a) MS Word

b) GitHub

c) AWS SageMaker

d) Photoshop

13 / 43

13) Hyperparameters are:

a) Set before training

b) Predictions

c) Model outputs

d) Learned during training

14 / 43

14) Which library is used for numerical computing in Python?

a) Seaborn

b) NumPy

c) Matplotlib

d) Sklearn

15 / 43

15) Which of these is a data visualization library in Python?

a) NumPy

b) Pandas

c) Matplotlib

d) TensorFlow

16 / 43

16) Confusion matrix is used in:

a) PCA

b) Classification

c) Clustering

d) Regression

17 / 43

17) Which clustering algorithm is density-based?

a) DBSCAN

b) K-Means

c) Hierarchical

d) PCA

18 / 43

18) Data Science lifecycle ends with:

a) Deployment

b) Visualization

c) Cleaning

d) Model Building

19 / 43

19) Which concept allows machines to learn from data?

a) Machine Learning

b) Compilers

c) Web Development

d) Networking

20 / 43

20) ROC curve is used for:

a) Data Visualization

b) Regression model evaluation

c) Classification model evaluation

d) Data Cleaning

21 / 43

21) Tokenization in NLP means:

a) Removing duplicate

b) Combining words

c) Splitting text into smaller units

d) Encrypting text

22 / 43

22) What is the full form of NLP?

a) Network Language Programming

b) Natural Language Processing

c) Neural Learning Process

d) National Learning Protocol

23 / 43

23) In NLP, removing stop words is part of:

a) None

b) Visualization

c) Model Training

d) Preprocessing

24 / 43

24) Cross-validation is used to:

a) Reduce dataset size

b) Increase model complexity

c) Test model generalization

d) Remove missing values

25 / 43

25) Which ML algorithm is most interpretable?

a) Decision Trees

b) Random Forest

c) Gradient Boosting

d) Neural Networks

26 / 43

26) Which is an activation function in Neural Networks?

a) All of the above

b) Sigmoid

c) ReLU

d) Tanh

27 / 43

27) Which of these is an unsupervised learning algorithm?

a) K-Means

b) Logistic Regression

c) Decision Trees

d) Linear Regression

28 / 43

28) The ultimate goal of Data Science is:

a) Extract knowledge and insights from data

b) Create databases

c) Build applications

d) Develop operating systems

29 / 43

29) Bagging technique helps in:

a) Data cleaning

b) Visualization

c) Increasing bias

d) Reducing variance

30 / 43

30) Gradient Descent is used for:

a) Data Collection

b) Optimization

c) Visualization

d) Data Cleaning

31 / 43

31) Which of the following is NOT supervised learning?

a) Linear Regression

b) Logistic Regression

c) Decision Trees

d) K-Means Clustering

32 / 43

32) Which of these languages is most popular for Data Science?

a) Python

b) CSS

c) C#

d) HTML

33 / 43

33) Deep learning is mainly based on:

a) Neural Networks

b) SVM

c) Decision Trees

d) Regression

34 / 43

34) Which ML model is good for image classification?

a) Naïve Bayes

b) Decision Trees

c) Logistic Regression

d) CNN

35 / 43

35) Which of the following evaluates regression models?

a) RMSE

b) F1-score

c) Precision

d) Recall

36 / 43

36) Which method handles class imbalance?

a) PCA

b) SMOTE

c) K-Means

d) DBSCAN

37 / 43

37) Which of these is a classification algorithm?

a) Linear Regression

b) KNN

c) PCA

d) Gradient Descent

38 / 43

38) Which of the following is NOT a type of bias in data science?

a) Confirmation bias

b) Sampling bias

c) Musical bias

d) Algorithmic bias

39 / 43

39) Which of the following is NOT supervised learning?

a) Decision Trees

b) K-Means Clustering

c) Logistic Regression

d) Linear Regression

40 / 43

40) Which algorithm is widely used for recommendation systems?

a) K-Means

b) Linear Regression

c) Collaborative Filtering

d) Decision Trees

41 / 43

41) Which of these is a data preprocessing step?

a) Model deployment

b) None

c) Data cleaning

d) Data visualization

42 / 43

42) Which ML algorithm is most interpretable?

a) Gradient Boosting

b) Neural Networks

c) Decision Trees

d) Random Forest

43 / 43

43) Which is an ensemble method?

a) K-Means

b) Random Forest

c) Logistic Regression

d) PCA

Your score is

The average score is 95%

Exit

Important Notice:
Once you start the quiz, you will not be able to pause, exit, or restart it. Please ensure you are ready before beginning.

Data Science D6

1 / 42

1) In statistics, p-value is used for:

a) Classification

b) Regression

c) Hypothesis testing

d) Clustering

2 / 42

2) Which Python library is most commonly used for numerical computations?

a) Seaborn

b) NumPy

c) Pandas

d) SciPy

3 / 42

3) Which of these is a regression evaluation metric?

a) All of the above

b) RMSE

c) R²

d) MAE

4 / 42

4) Which process splits data into training and test sets?

a) Sampling

b) Cross-validation

c) Bootstrapping

d) Data splitting

5 / 42

5) Overfitting occurs when a model:

a) Ignores training data

b) Has low complexity

c) Performs well on test data

d) Learns noise in data

6 / 42

6) Which metric evaluates classification models?

a) Recall

b) Precision

c) All of the above

d) Accuracy

7 / 42

7) Which is an unsupervised technique?

a) Naive Bayes

b) Random Forest

c) Logistic Regression

d) K-Means

8 / 42

8) Which type of learning uses labeled datasets?

a) Supervised

b) Reinforcement

c) Unsupervised

d) Semi-supervised

9 / 42

9) Which of the following detects overfitting?

a) Low training accuracy

b) High test accuracy only

c) High accuracy in both sets

d) High training accuracy, low test accuracy

10 / 42

10) Which algorithm is suitable for market basket analysis?

a) Linear Regression

b) KNN

c) SVM

d) Apriori

11 / 42

11) Data Science is an interdisciplinary field combining?

a) All of the above

b) Domain Knowledge

c) Statistics

d) Computer Science

12 / 42

12) Feature scaling techniques include:

a) Normalization

b) Both

c) Standardization

d) None

13 / 42

13) Which of these is a cloud platform for Data Science?

a) Azure ML

b) AWS SageMaker

c) Google Colab

d) All of the above

14 / 42

14) Logistic regression is mainly used for:

a) Regression tasks

b) Classification tasks

c) Clustering

d) Time-series

15 / 42

15) Which library is used for deep learning in Python?

a) TensorFlow

b) Pandas

c) NumPy

d) Matplotlib

16 / 42

16) Which of the following models handles non-linear data well?

a) Linear Regression

b) Decision Tree

c) Logistic Regression

d) PCA

17 / 42

17) Cross-validation is used to:

a) Improve generalization

b) Increase noise

c) Reduce training data

d) Merge datasets

18 / 42

18) Which time-series model is commonly used?

a) ARIMA

b) PCA

c) K-Means

d) Decision Tree

19 / 42

19) In NLP, TF-IDF is used for:

a) Visualization

b) Data cleaning

c) Feature extraction

d) Model training

20 / 42

20) Bagging improves model:

a) Variance

b) None

c) Bias

d) Noise

21 / 42

21) Which of these is a supervised algorithm?

a) K-Means

b) Linear Regression

c) PCA

d) Apriori

22 / 42

22) Which regularization technique adds absolute value penalties?

a) None

b) L1 (Lasso)

c) L2 (Ridge)

d) ElasticNet

23 / 42

23) Which evaluation metric is best for imbalanced datasets?

a) Accuracy

b) Mean Squared Error

c) R-Squared

d) F1 Score

24 / 42

24) Which evaluation metric is NOT for regression?

a) RMSE

b) R²

c) Accuracy

d) MAE

25 / 42

25) K in KNN refers to:

a) Number of clusters

b) Number of neighbors

c) Number of records

d) Number of features

26 / 42

26) In machine learning, classification problems predict:

a) Missing values

b) Categories

c) Random numbers

d) Continuous values

27 / 42

27) Gradient Descent is an algorithm for:

a) Data collection

b) Visualization

c) Optimization

d) Data cleaning

28 / 42

28) Which algorithm is used in recommendation systems?

a) Collaborative filtering

b) K-Means

c) PCA

d) Linear regression

29 / 42

29) In NLP, stop words are:

a) Mathematical symbols

b) Very important words

c) Numbers

d) Words filtered out

30 / 42

30) In statistics, variance measures:

a) Central value

b) Frequency

c) Count of data

d) Spread of data

31 / 42

31) In decision trees, leaf nodes represent:

a) Splits

b) Root nodes

c) Final outcomes

d) Features

32 / 42

32) Which of the following is the primary goal of Data Science?

a) Collect hardware

b) Delete data

c) Extract insights

d) Store data

33 / 42

33) Which plot is best for visualizing correlation?

a) Pie chart

b) Bar chart

c) Histogram

d) Scatter plot

34 / 42

34) Which step comes first in a Data Science project?

a) Visualization

b) Data collection

c) Feature engineering

d) Model deployment

35 / 42

35) Which of the following measures similarity between vectors?

a) Euclidean distance

b) Cosine similarity

c) All of the above

d) Manhattan distance

36 / 42

36) Which of these is a dimensionality reduction method?

a) PCA

b) None

c) t-SNE

d) Both

37 / 42

37) Which library is commonly used for machine learning in Python?

a) NumPy

b) Matplotlib

c) Pandas

d) Scikit-learn

38 / 42

38) What is the purpose of a confusion matrix?

a) Summarize classification results

b) Show correlations

c) Collect raw data

d) Plot graphs

39 / 42

39) Which type of bias occurs when training data isn’t representative?

a) Selection bias

b) Measurement bias

c) Confirmation bias

d) Survivorship bias

40 / 42

40) Which of these is an ensemble method?

a) Random Forest

b) Decision Tree

c) KNN

d) Linear Regression

41 / 42

41) The F1-score is the harmonic mean of:

a) Recall and Specificity

b) Precision and Recall

c) Sensitivity and Accuracy

d) Accuracy and Precision

42 / 42

42) Which visualization helps detect skewness?

a) Histogram

b) Bar chart

c) Scatter plot

d) Pie chart

Your score is

The average score is 0%

Exit

Important Notice:
Once you start the quiz, you will not be able to pause, exit, or restart it. Please ensure you are ready before beginning.

Data Science D5

1 / 43

1) Supervised learning is based on:

a) No data

b) Unlabeled data

c) Random data

d) Labeled data

2 / 43

2) Which visualization is best for correlation analysis?

a) Heatmap

b) Line chart

c) Pie chart

d) Histogram

3 / 43

3) Which step ensures model generalization?

a) All of the above

b) Using test data

c) Cross-validation

d) Regularization

4 / 43

4) Which method reduces overfitting in decision trees?

a) Pruning

b) All of the above

c) Bagging

d) Boosting

5 / 43

5) Which model is best for sequence data like time series?

a) CNN

b) Decision Trees

c) RNN (Recurrent Neural Network)

d) Random Forest

6 / 43

6) Which metric is used for clustering quality?

a) RMSE

b) Accuracy

c) R-squared

d) silhouette score

7 / 43

7) Gradient Boosting builds models:

a) Independently

b) Without trees

c) Sequentially

d) Randomly

8 / 43

8) Ensemble learning means:

a) Reducing dataset size

b) Combining multiple models for better accuracy

c) Using deep learning

d) Using one strong model

9 / 43

9) Which library is mainly used for linear algebra in Python?

a) Matplotlib

b) flask

c) NumPy

d) Pandas

10 / 43

10) Random Forest is based on:

a) Regression only

b) boosting technique

c) Single decision tree

d) Bagging technique

11 / 43

11) A feature in Data Science means:

a) A model output

b) A variable or input column

c) An error measure

d) A predictione

12 / 43

12) Confusion matrix is used in:

a) Visualization

b) Classification

c) Regression

d) Clustering

13 / 43

13) Principal Component Analysis (PCA) is used for:

a) Regression

b) Dimensionality reduction

c) Classification

d) Clustering

14 / 43

14) Which of the following is a deep learning library?

a) TensorFlow

b) All of the above

c) PyTorch

d) Keras

15 / 43

15) Which optimizer is widely used in deep learning?

a) Adam

b) SGD

c) RMSProp

d) All of the above

16 / 43

16) Bag of Words is a technique used in:

a) Regression

b) NLP

c) Image processing

d) Clustering

17 / 43

17) Which metric is used in regression problems?

a) Accuracy

b) RMSE (Root Mean Square Error)

c) Precision

d) Recall

18 / 43

18) ReLU is a type of:

a) Regularizer

b) Loss function

c) Activation function

d) Optimizer

19 / 43

19) Which evaluation method is best for time-series forecasting?

a) Cross-validation with random splits

b) Bootstrap sampling

c) Time-based split

d) Random subsampling

20 / 43

20) Which evaluation metric is best for imbalanced data classification?

a) Precision, Recall, F1-score

b) Mean Absolute Error

c) RMSE

d) Accuracy

21 / 43

21) Which dataset is commonly used as a beginner dataset in Data Science?

a) Titanic

b) All of the above

c) MNIST

d) Iris

22 / 43

22) Logistic regression is used for:

a) Visualization

b) Binary classification

c) Predicting continuous values

d) Clustering

23 / 43

23) Which technique converts categorical data into numerical?

a) Standardization

b) Normalization

c) PCA

d) One-hot encoding

24 / 43

24) Which programming languages are most used in Data Science?

a) Swift and Kotlin

b) Python and R

c) C and C++

d) JavaScript and PHP

25 / 43

25) Hyperparameter tuning can be done using:

a) Bayesian Optimization

b) Random Search

c) Grid Search

d) All of the above

26 / 43

26) NLP stands for:

a) Neural Layer Prediction

b) None

c) Natural Language Processing

d) Non-linear Programming

27 / 43

27) L1 and L2 penalties are used in:

a) Regularization

b) Clustering

c) PCA

d) Normalization

28 / 43

28) What is the primary goal of Data Science?

a) Building websites

b) Storing large datasets

c) Extracting insights and knowledge from data

d) Making animations

29 / 43

29) Which plot shows model accuracy over training epochs?

a) Histogram

b) Pie chart

c) Scatter plot

d) Learning curve

30 / 43

30) Which method is used to split data into training and testing sets?

a) random_split()

b) data_partition()

c) divide_data()

d) train_test_split()

31 / 43

31) In clustering, the “elbow method” is used to:

a) Choose the number of clusters

b) Handle missing data

c) Train the model faster

d) Reduce dimensions

32 / 43

32) In Python, which library is commonly used for machine learning?

a) Scikit-learn

b) ) Flask

c) BeautifulSoup

d) Django

33 / 43

33) Which is an example of unsupervised learning?

a) K-Means clustering

b) Linear regression

c) Decision trees

d) Logistic regression

34 / 43

34) Which technique is used to handle class imbalance?

a) Undersampling

b) SMOTE

c) All of the above

d) Oversampling

35 / 43

35) The ROC curve shows:

a) Relationship between Precision and Recall

b) Relationship between True Positive Rate and False Positive Rate

c) Model accuracy

d) Distribution of errors

36 / 43

36) Which algorithm is best for image recognition tasks?

a) CNN (Convolutional Neural Network)

b) Logistic regression

c) Linear regression

d) Decision tree

37 / 43

37) Which measure is best for classification accuracy?

a) R-squared

b) Mean Squared Error

c) Correlation coefficient

d) Precision and Recall

38 / 43

38) Which of the following is a key step in Data Science workflow?

a) Data collection

b) Data cleaning

c) All of the above

d) Model building

39 / 43

39) Which algorithm is NOT supervised learning?

a) Decision trees

b) Logistic regression

c) Linear regression

d) K-Means clustering

40 / 43

40) Which distance measure is often used in KNN algorithm?

a) Cosine similarity

b) All of the above

c) Euclidean distance

d) Manhattan distance

41 / 43

41) Which type of learning requires reward-based training?

a) Reinforcement learning

b) Unsupervised learning

c) Supervised Learning

d) Semi-supervised learning

42 / 43

42) In machine learning, “overfitting” means:

a) Model is too simple

b) Model performs well on training data but poorly on new data

c) Model generalizes well

d) Model performs badly on both datasets

43 / 43

43) Which of the following is NOT a machine learning algorithm?

a) SQL Join

b) Random Forest

c) SVM

d) Naïve Bayes

Your score is

The average score is 0%

Exit

Important Notice:
Once you start the quiz, you will not be able to pause, exit, or restart it. Please ensure you are ready before beginning.

Data Science D7

1 / 45

1) Which of the following best defines Data Science?

a) Building websites with data

b) Study of databases only

c) Extracting knowledge and insights from data

d) Managing Excel sheets

2 / 45

2) which of the following is a feature selection method?

a) Logistic regression

b) Decision tree pruning

c) Gradient descent

d) Chi-square test

3 / 45

3) Which model is commonly used for binary classification?

a) K-means

b) Logistic Regression

c) PCA

d) Linear Regression

4 / 45

4) Which technique prevents overfitting in decision trees?

a) Standardization

b) Normalization

c) Splitting

d) Pruning

5 / 45

5) Which of these is NOT a supervised learning algorithm?

a) K-means

b) Decision Tree

c) Linear Regression

d) Random Forest

6 / 45

6) which machine learning task does clustering belong to?

a) Supervised learning

b) Unsupervised learning

c) Reinforcement learning

d) Semi-supervised learning

7 / 45

7) What is the main use of cross-validation?

a) Improve accuracy on training data

b) Estimate model performance on unseen dat

c) Increase dataset variance

d) Reduce dataset size

8 / 45

8) Which Python library is used for statistical modeling?

a) TensorFlow

b) Statsmodels

c) Matplotlib

d) Scikit-learn

9 / 45

9) Which data visualization is best for correlation matrices?

a) Histogram

b) Bar chart

c) Heatmap

d) Pie chart

10 / 45

10) Which is a supervised learning task?

a) Anomaly detection

b) Classification

c) Clustering

d) Association rule mining

11 / 45

11) Which of the following is an example of regression?

a) Predicting spam emails

b) Classifying animals

c) Predicting house prices

d) Predicting movie ratings

12 / 45

12) Which metric is commonly used for regression tasks?

a) R² Score

b) Accuracy

c) Precision

d) Recall

13 / 45

13) Which of these is an unsupervised learning algorithm?

a) K-means

b) Logistic Regression

c) Naive Bayes

d) Random Forest

14 / 45

14) Which distribution is often assumed in statistics?

a) Normal distribution

b) Binomial distribution

c) Uniform distribution

d) Poisson distribution

15 / 45

15) Which technique reduces multicollinearity in regression?

a) Cross-validation

b) Overfitting

c) Normalization

d) Regularization

16 / 45

16) Which ML algorithm is known as a lazy learner?

a) SVM

b) Decision Trees

c) Naive Bayes

d) KNN (K-Nearest Neighbors)

17 / 45

17) Which of the following is NOT a step in the Data Science lifecycle?

a) Data Visualization

b) Data Collection

c) Hardware Manufacturing

d) Data Cleaning

18 / 45

18) Which algorithm is best suited for grouping customers by behavior?

a) Naive Bayes

b) Decision Trees

c) Logistic Regression

d) K-means Clustering

19 / 45

19) Which of these is a feature scaling method?

a) Robust Scaler

b) All of the above

c) Standardization

d) Min-Max Normalization

20 / 45

20) Which ML algorithm is inspired by the biological nervous system?

a) Decision Trees

b) Regression

c) KNN

d) Neural Networks

21 / 45

21) Which type of variable has no natural order

a) Categorical (Nominal)

b) Continuous

c) Ordinal

d) Interval

22 / 45

22) Which measure of central tendency is affected most by outliers?

a) Median

b) None

c) Mean

d) Mode

23 / 45

23) What does PCA (Principal Component Analysis) do?

a) Builds decision trees

b) Reduces dimensionality

c) Creates clusters

d) Increases dataset size

24 / 45

24) Which data type is continuous?

a) Country

b) Eye color

c) Temperature

d) Gender

25 / 45

25) Which evaluation metric is used for regression models?

a) RMSE (Root Mean Square Error)

b) F1-score

c) Precision

d) Accuracy

26 / 45

26) Which Python library is most commonly used for data manipulation?

a) NumPy

b) TensorFlow

c) Pandas

d) Seaborn

27 / 45

27) Which of the following is an example of classification?

a) Predicting stock prices

b) Predicting rainfall amount

c) Predicting email as spam or not spam

d) Predicting house prices

28 / 45

28) Which is a key difference between Data Science and Data Analytics?

a) Data Science ignores machine learning

b) Data Science focuses more on predictive modeling

c) Data Analytics focuses only on visualization

d) Data Analytics always uses AI

29 / 45

29) Which evaluation metric is useful for imbalanced classification?

a) Precision and Recall

b) Accuracy

c) R² Score

d) RMSE

30 / 45

30) Which dataset is commonly used for image recognition experiments?

a) MNIST

b) Boston Housing

c) Titanic

d) FIFA 21 Dataset

31 / 45

31) Which term refers to selecting the best parameters for a model?

a) Scaling

b) Model tuning

c) Data cleaning

d) Cross-validation

32 / 45

32) Which type of learning involves agents interacting with an environment?

a) Supervised Learning

b) Unsupervised Learning

c) Semi-supervised Learning

d) Reinforcement Learning

33 / 45

33) Which method is used for splitting datasets in machine learning?

a) divide_data()

b) train_test_split()

c) partition()

d) split_data()

34 / 45

34) Which algorithm is suitable for market basket analysis?

a) SVM

b) Apriori

c) KNN

d) Random Forest

35 / 45

35) What does “bias” in a machine learning model indicate?

a) Random fluctuations

b) Error from incorrect assumptions

c) Overfitting

d) Error due to variance

36 / 45

36) Which library in Python is used for deep learning?

a) NumPy

b) Pandas

c) Seaborn

d) TensorFlow

37 / 45

37) Which technique combines bootstrapping with decision trees?

a) Bagging

b) All of the above

c) Random Forest

d) Boosting

38 / 45

38) Which optimization technique is used in training neural networks?

a) PCA

b) Gradient Descent

c) Clustering

d) Sampling

39 / 45

39) Which of these is a Python library for machine learning?

a) Matplotlib

b) Scikit-learn

c) BeautifulSoup

d) Flask

40 / 45

40) Which step comes after data cleaning in the DS process?

a) Data Modeling

b) data Reporting

c) Data Archiving

d) Data Collection

41 / 45

41) Which model combines predictions from multiple models?

a) Regression model

b) Ensemble model

c) Neural network

d) Decision tree

42 / 45

42) Which activation function is commonly used in deep learning?

a) All of the above

b) Sigmoid

c) Tanh

d) ReLU

43 / 45

43) What is overfitting in machine learning?

a) Model ignores irrelevant features

b) Model performs poorly on both train and test data

c) Model works well on training but poorly on test data

d) Model fits all test data perfectly

44 / 45

44) Which type of learning uses both labeled and unlabeled data?

a) Unsupervised

b) Semi-supervised

c) Supervised

d) Reinforcement

45 / 45

45) Which algorithm is commonly used for anomaly detection?

a) Isolation Forest

b) Logistic Regression

c) Decision Trees

d) Linear Regression

Your score is

The average score is 0%

Exit

Objectives

Key Responsibilities

Tools & Technologies You’ll Use

Skills You’ll Gain

Leave a Reply Cancel reply