Data Science | Internboot Assesments

A Data Science internship offers hands-on experience in analyzing large datasets, building predictive models, and using data-driven methods to solve business problems. Interns learn to gather, clean, visualize, and model data using modern tools and techniques in machine learning, statistics, and programming.

Objectives

Understand the full lifecycle of data — from collection to actionable insights.
Gain real-world exposure to machine learning, statistical modeling, and data visualization.
Apply tools like Python, SQL, R, and libraries such as pandas, NumPy, scikit-learn, etc.
Learn to work with structured and unstructured data across domains.

Key Responsibilities

Collect and clean raw datasets from various sources (APIs, databases, CSVs, etc.)
Perform Exploratory Data Analysis (EDA) to find patterns and trends.
Build and test predictive models (e.g., regression, classification, clustering).
Visualize results using tools like Matplotlib, Seaborn, or Power BI/Tableau.
Generate reports and dashboards to communicate findings to stakeholders.
Work closely with Data Engineers and Analysts to improve data pipelines.

Tools & Technologies You’ll Use

Languages: Python, R, SQL
Libraries: pandas, NumPy, scikit-learn, TensorFlow, Keras, PyTorch
Visualization: Matplotlib, Seaborn, Plotly, Tableau, Power BI
Big Data: Hadoop, Spark (optional)
Databases: MySQL, PostgreSQL, MongoDB
Version Control: Git, GitHub

Skills You’ll Gain

Data cleaning and preprocessing
Statistical analysis and hypothesis testing
Predictive modeling and machine learning
Business problem-solving with data
Data visualization and storytelling
Model evaluation and tuning

Important Notice:
Once you start the quiz, you will not be able to pause, exit, or restart it. Please ensure you are ready before beginning.

Data Science L1

1 / 100

1) Which SQL command is used to remove duplicate records?

a) SELECT

b) DELETE

c) SELECT DISTINCT

d) REMOVE

2 / 100

2) In time series forecasting, which component represents seasonal variation?

a) Noise

b) Level

c) Trend

d) Seasonality

3 / 100

3) In which ML model is the sigmoid function used?

a) Logistic Regression

b) Decision Tree

c) Linear Regression

d) Random Forest

4 / 100

4) What is the purpose of the predict() function in ML?

a) Fit the model

b) Make predictions

c) Tune model parameters

d) Evaluate the model

5 / 100

5) What does drop(columns=[]) do in Pandas?

a) Drops index

b) Drops specific rows

c) Drops NaN values

d) Drops selected columns

6 / 100

6) What does .head() in Pandas return?

a) First 5 rows

b) Only column headers

c) Random 5 rows

d) Last 5 rows

7 / 100

7) What does CSV stand for?

a) Character Sorted Values

b) Common Standard Values

c) Column Separated Values

d) Comma Separated Values

8 / 100

8) Which chart is most suitable for showing parts of a whole?

a) Scatter plot

b) Pie chart

c) Histogram

d) Line chart

9 / 100

What type of ML is used when no labelled data is available?

a) Supervised

b) Unsupervised

c) Transfer Learning

d) Reinforcement

10 / 100

10) Which function is used to combine two DataFrames in Pandas?

a) join()

b) All of the above

c) concat()

d) merge()

11 / 100

11) Which algorithm works based on similarity/proximity?

a) Naive Bayes

b) Decision Tree

c) Linear Regression

d) KNN

12 / 100

12) Which method is used to normalize data?

a) One-hot encoding

b) Tokenizer

c) StandardScaler

d) LabelEncoder

13 / 100

13) . Which function in NumPy returns evenly spaced values?

a) random()

b) split()

c) linspace()

d) arrange()

14 / 100

14) What is cross-validation used for in ML?

a) Avoiding overfitting

b) Combining models

c) Reducing data size

d) increasing accuracy on training set

15 / 100

15) Which metric is best for imbalanced classification problems?

a) F1 Score

b) Accuracy

c) Recall

d) Precision

16 / 100

16) Which function is used to group data in Pandas?

a) aggregate()

b) sort()

c) map()

d) groupby()

17 / 100

17) Which plot best visualizes the distribution of a single numeric variable?

a) Line plot

b) Bar chart

c) Scatter plot

d) Histogram

18 / 100

18) Which method is used to find correlation in Pandas?

a) .map()

b) .corr()

c) .groupby()

d) apply()

19 / 100

19) Which of the following is NOT a Python data type?

a) list

b) graph

c) set

d) tuple

20 / 100

20) Which library is commonly used for data visualization in Python?

a) Scikit-learn

b) NumPy

c) Pandas

d) Matplotlib

21 / 100

21) Which of the following is NOT a valid data type in R?

a) decimal

b) numeric

c) logical

d) character

22 / 100

22) Which command in SQL is used to remove a table?

a) REMOVE

b) DROP

c) CLEAR

d) DELETE

23 / 100

23) Which of the following is a type of structured data?

a) Images

b) Video streams

c) Excel spreadsheets

d) Audio files

24 / 100

24) What is the default axis for dropna() in Pandas?

a) axis=0 (rows)

b) axis='index'

c) axis=1 (columns)

d) axis=None

25 / 100

25) Which statistics type generalizes from a sample to the population?

a) Both

b) Inferential

c) Descriptive

d) None

26 / 100

26) What does NaN stand for in data science?

a) New added Number

b) Not a Number

c) Not a Name

d) Not a Node

27 / 100

27) What does the value_counts() function do in Pandas?

a) Counts all NaN values

b) Counts column values

c) Counts characters in a string

d) Counts unique values in a Serie

28 / 100

28) Which of the following is NOT a supervised learning task?

a) Classification

b) Linear Regression

c) Decision Tree

d) Clustering

29 / 100

29) Which method is used to fill missing values with the previous one?

a) fillna('mean')

b) dropna()

c) fillna(method='pad')

d) replace()

30 / 100

30) What function in Pandas checks for missing values?

a) isna()

b) dropna()

c) fillna()

d) checkna()

31 / 100

31) Which algorithm works well for linearly separable data?

a) SVM

b) Naive Bayes

c) Random Forest

d) K-Means

32 / 100

32) Which of these is NOT a distance metric?

a) Cosine similarity

b) Chi-square test

c) Manhattan distance

d) Euclidean distance

33 / 100

33) What is the primary goal of clustering?

a) Normalize data

b) Predict labels

c) Test hypotheses

d) Group similar items

34 / 100

34) What does NLP stand for?

a) Natural Language Programming

b) Neural Language Program

c) Natural Logic Processing

d) Natural Language Processing

35 / 100

35) Which type of plot is best for bivariate analysis of numerical data?

a) Pie chart

b) Scatter plot

c) Line chart

d) Bar chart

36 / 100

36) Feedback-based ML is known as:

a) Supervised

b) Unsupervised

c) Semi-supervised

d) Reinforcement Learning

37 / 100

37) In Python, how do you check the data type of a variable?

a) gettype()

b) typeof()

c) type()

d) datatype()

38 / 100

38) What is the purpose of one-hot encoding?

a) Convert categorical data to numerical

b) Reduce features

c) Detect null values

d) Normalize data

39 / 100

39) Which ML task predicts continuous numeric values?

a) Clustering

b) Classification

c) Regression

d) Dimensionality Reduction

40 / 100

40) Which is an ensemble learning method?

a) Naive Bayes

b) Random Forest

c) Decision Tree

d) KNN

41 / 100

41) Which function in NumPy returns the mean of an array?

a) np.mean()

b) np.average()

c) np.median()

d) np.sum()

42 / 100

42) Which of these evaluation metrics is used for regression?

a) F1 Score

b) RMSE

c) Precision

d) Recall

43 / 100

43) Which data type does not support mathematical operations directly in Python?

a) float

b) int

c) string

d) bool

44 / 100

44) What is the first step in the CRISP-DM data science process model?

a) Data Understanding

b) Modeling

c) Data Preparation

d) Business Understanding

45 / 100

45)

Which of the following is a supervised learning algorithm?

a) Linear Regression

b) K-Means

c) DBSCAN

d) Apriori

46 / 100

46) What does the fit() function do in machine learning models?

a) Trains the model

b) Predicts output

c) Scales features

d) Tests the model

47 / 100

47) Which of the following is used for text vectorization?

a) Box Plot

b) TF-IDF

c) Heatmap

d) ROC curve

48 / 100

48) Which ML algorithm is based on probability?

a) Naive Bayes

b) SVM

c) KNN

d) Random Forest

49 / 100

49) Which of the following is a classification metric?

a) MAE

b) AUC

c) RMSE

d) R²

50 / 100

50) Which R function is used to create a histogram?

a) hist()

b) pie()

c) plot()

d) graph()

51 / 100

51) Which of the following is used to handle missing values in a dataset?

a) dropna()

b) All of the above

c) isnull()

d) fillna()

52 / 100

52) Which library is commonly used for NLP in Python?

a) Keras

b) Flask

c) NLTK

d) OpenCV

53 / 100

53) Which data analysis aims to recommend actions for desired outcomes?

a) Descriptive

b) Diagnostic

c) Predictive

d) Prescriptive

54 / 100

54) What is a confusion matrix used for?

a) Data cleaning

b) Feature scaling

c) Visualizing model accuracy

d) Analyzing classification performance

55 / 100

55) How to remove duplicate rows in Pandas?

a) drop_duplicates()

b) unique()

c) remove()

d) dropna

56 / 100

56) What does A/B testing primarily evaluate?

a) Model accuracy

b) Two variants of a product or strategy

c) Website performance

d) Clustering quality

57 / 100

57) What is bootstrapping in statistics?

a) Sampling without replacement

b) Creating synthetic features

c) Repeating experiments

d) Sampling with replacement

58 / 100

58) What does ROC stand for in classification problems?

a) Rate of Change

b) Region of Confidence

c) Regression Output Curve

d) Receiver Operating Characteristic

59 / 100

59) Which of the following are types of supervised learning?

a) Both a and c

b) Regression

c) Clustering

d) Classification

60 / 100

60) Which of the following is used for feature selection?

a) Lasso Regression

b) Ridge Regression

c) Naive Bayes

d) Linear Regression

61 / 100

61) What is a common use of dimensionality reduction?

a) Increase training time

b) Add more features

c) Reduce overfitting

d) Increase model complexity

62 / 100

62) Which function is used to get summary statistics in Pandas?

a) info()

b) describe()

c) stats()

d) summary()

63 / 100

63) What type of data is most appropriate for a box plot?

a) Numerical data

b) Ordinal data

c) Text data

d) Categorical data

64 / 100

64) Purpose of sampling data?

a) Decrease dimensionality

b) Select representative subset

c) Decrease dataset size

d) Increase dataset size

65 / 100

65) Feedback-based ML is known as:

a) Unsupervised

b) Supervised

c) Semi-supervised

d) Reinforcement Learning

66 / 100

66) Which of the following is a dimensionality reduction technique?

a) SVM

b) Decision Tree

c) PCA

d) Logistic Regression

67 / 100

67) What does np.array() do in NumPy?

a) Creates a plot

b) Creates an array

c) Creates a matrix of zeros

d) Creates a new DataFrame

68 / 100

68) Which of the following is a hashing technique used in NLP?

a) Hashing Vectorizer

b) Count Vectorizer

c) Stemming

d) Lemmatization

69 / 100

69) Which algorithm is best suited for classification tasks?

a) Linear Regression

b) K-Means

c) Logistic Regression

d) PCA

70 / 100

70) What does time.time() in Python return?

a) seconds since Jan 1 1970 GMT

b) Current time in ms only

c) ms since Jan 1 1970 GMT

d) Past 1 hour time

71 / 100

71) Which of the following is NOT a Python loop structure?

a) while

b) for

c) repeat

d) None of the above

72 / 100

72) What is the main purpose of exploratory data analysis (EDA)?

a) Data cleaning

b) Feature scaling

c) Model selection

d) Data visualization and summary

73 / 100

73) What is overfitting in machine learning?

a) Model performs well on test data but poorly on training data

b) Model does not train

c) Model performs well on training data but poorly on test data

d) Model performs equally on training and test data

74 / 100

74) What is the main role of the activation function in neural networks?

a) Normalize inputs

b) Introduce non-linearity

c) Calculate weights

d) Reduce dimensionality

75 / 100

75) Which library in Python is used for machine learning?

a) Matplotlib

b) Flask

c) Seaborn

d) Scikit-learn

76 / 100

76) In feature scaling, which technique centers the data around zero?

a) Label encoding

b) Standardization

c) Normalization

d) One-hot encoding

77 / 100

77) Which is not suitable for importing CSV files in R?

a) read.csv()

b) Both a & b

c) read.table()

d) read_excel()

78 / 100

78) Which keyword imports external libraries in Python?

a) use

b) include

c) import

d) load

79 / 100

79) Which R function is used to create a histogram?

a) plot()

b) pie()

c) hist()

d) graph()

80 / 100

80) What does the 'k' represent in the K-Means algorithm?

a) The number of iterations

b) The number of clusters

c) The learning rate

d) The number of features

81 / 100

81) What is a real-world application of Data Science?

a) Hardware repair

b) Web development

c) Image recognition

d) Desktop publishing

82 / 100

82) What does ROC stand for in classification problems?

a) Receiver Operating Characteristic

b) Region of Confidence

c) Regression Output Curve

d) Rate of Change

83 / 100

83) What is the full form of KPI in data analytics?

a) Known Predictive Insight

b) Key Performance Indicator

c) Known Process Input

d) Key Predictive Indicator

84 / 100

84) Which is not suitable for importing CSV files in R?

a) read_excel()

b) Both a & b

c) read.csv()

d) read.table()

85 / 100

85) Which R package includes class()?

a) base

b) utils

c) stats

d) class

86 / 100

86) Which Python function returns a sequence of numbers?

a) list()

b) int()

c) slice()

d) range()

87 / 100

87) Which of these is a continuous probability distribution?

a) Binomial

b) Poisson

c) Geometric

d) Normal

88 / 100

88) What does NaN stand for in data science?

a) New added Number

b) Not a Node

c) Not a Number

d) Not a Name

89 / 100

89) What is the output of len("Data Science") in Python?

a) 12

b) 11

c) 10

d) 13

90 / 100

90) What is the full form of SQL?

a) Structured Question Language

b) Sequential Query Language

c) Structured Query Language

d) Simple Query Language

91 / 100

91) Which of the following is a hyperparameter in decision trees?

a) Root node

b) Gini index

c) max_depth

d) Accuracy

92 / 100

92) Which technique is used to reduce multicollinearity?

a) Ridge

b) Forward selection

c) Backward elimination

d) Lasso

93 / 100

93) What does df.shape return in Pandas?

a) List of column names

b) Number of rows only

c) (rows, columns)

d) Data types

94 / 100

94) Which of the following is an unsupervised learning algorithm?

a) Decision Tree

b) K-Means

c) Linear Regression

d) Logistic Regression

95 / 100

95) What is the full form of RMSE in regression analysis?

a) Root Mean Squared Error

b) Root Model Squared Evaluation

c) Relative Mean Squared Error

d) Random Mean Squared Estimate

96 / 100

96) What is the output of type(5.0) in Python?

a) number

b) int

c) double

d) float

97 / 100

97) Which of the following is a continuous variable?

a) Height

b) Blood Type

c) Gender

d) Country

98 / 100

98) Which Python library is primarily used for data manipulation?

a) Matplotlib

b) Pandas

c) Seaborn

d) NumPy

99 / 100

99) Which file format is commonly used to store machine learning models?

a) .pkl

b) .txt

c) .py

d) .csv

100 / 100

100) Which of the following is not a type of machine learning?

a) Reinforcement learning

b) Unsupervised learning

c) Descriptive learning

d) Supervised learning

Your score is

The average score is 0%

Exit

Objectives

Key Responsibilities

Tools & Technologies You’ll Use

Skills You’ll Gain

What type of ML is used when no labelled data is available?

Which of the following is a supervised learning algorithm?

Leave a Reply Cancel reply