Data Science

data Science
August 6, 2025

A Data Science internship offers hands-on experience in analyzing large datasets, building predictive models, and using data-driven methods to solve business problems. Interns learn to gather, clean, visualize, and model data using modern tools and techniques in machine learning, statistics, and programming.

Objectives

  • Understand the full lifecycle of data — from collection to actionable insights.
  • Gain real-world exposure to machine learning, statistical modeling, and data visualization.
  • Apply tools like Python, SQL, R, and libraries such as pandas, NumPy, scikit-learn, etc.
  • Learn to work with structured and unstructured data across domains.

Key Responsibilities

  • Collect and clean raw datasets from various sources (APIs, databases, CSVs, etc.)
  • Perform Exploratory Data Analysis (EDA) to find patterns and trends.
  • Build and test predictive models (e.g., regression, classification, clustering).
  • Visualize results using tools like Matplotlib, Seaborn, or Power BI/Tableau.
  • Generate reports and dashboards to communicate findings to stakeholders.
  • Work closely with Data Engineers and Analysts to improve data pipelines.

Tools & Technologies You’ll Use

  • Languages: Python, R, SQL
  • Libraries: pandas, NumPy, scikit-learn, TensorFlow, Keras, PyTorch
  • Visualization: Matplotlib, Seaborn, Plotly, Tableau, Power BI
  • Big Data: Hadoop, Spark (optional)
  • Databases: MySQL, PostgreSQL, MongoDB
  • Version Control: Git, GitHub

Skills You’ll Gain

  • Data cleaning and preprocessing
  • Statistical analysis and hypothesis testing
  • Predictive modeling and machine learning
  • Business problem-solving with data
  • Data visualization and storytelling
  • Model evaluation and tuning
0%

Important Notice:
Once you start the quiz, you will not be able to pause, exit, or restart it. Please ensure you are ready before beginning.


data Science

Data Science D1

1 / 45

1) Function to compute standard deviation in NumPy:

2 / 45

2) Which function computes pairwise distances between rows?

3 / 45

3) What is the output of: a = [1,2,3]; print(a*2)

4 / 45

4) How to shuffle rows of a DataFrame?

5 / 45

5) Export a DataFrame to Excel:

6 / 45

6) In Pandas, axis=0 refers to:

7 / 45

7) How to drop rows where all elements are NaN?

8 / 45

8) arr = np.arange(6).reshape(2,3); print(arr.T.shape)

9 / 45

9) Which method shows memory usage of DataFrame?

10 / 45

10) Which method slices rows by label range?

11 / 45

11) Why is vectorization in NumPy faster?

12 / 45

12) NumPy broadcasting allows:

13 / 45

13) Which method calculates the rolling median?

14 / 45

14) Difference between np.dot() and np.matmul():

15 / 45

15) Which method returns the first valid index in a Series?

16 / 45

16) How to interpolate missing numeric values?

17 / 45

17) Remove duplicate rows in Pandas:

18 / 45

18) NumPy broadcasting allows:

19 / 45

19) Which function calculates quantiles?

20 / 45

20) Function to create a pivot table in Pandas:

21 / 45

21) Convert a categorical column into dummy variables:

22 / 45

22) Pandas method to compute rolling mean:

23 / 45

23) How to convert column types in Pandas?

24 / 45

24) .agg() in Pandas is used for:

25 / 45

25) Which method returns descriptive stats for categorical columns?

26 / 45

26) Which method returns the indices of missing values?

27 / 45

27) Which function gives cumulative sum of a column?

28 / 45

28) Which function is used to pivot a table by index and columns?

29 / 45

29) Which function creates a 3×3 identity matrix?

30 / 45

30) Which is a benefit of list comprehension?

31 / 45

31) Python sets do not allow:

32 / 45

32) Compute correlation between numeric columns:

33 / 45

33) Approximate time complexity of dictionary lookup:

34 / 45

34) Method to add/update a key-value pair in a dict:

35 / 45

35) Which method joins DataFrames by index?

36 / 45

36) Which parameter in read_csv handles large file chunks?

37 / 45

37) Which function converts a DataFrame to NumPy array?

38 / 45

38) Pandas method to compute rolling mean:

39 / 45

39) Which function returns unique values across the entire DataFrame?

40 / 45

40) Which function computes pairwise distances between rows?

41 / 45

41) Which of the following is used to create a dictionary?

42 / 45

42) Which method returns the first valid index in a Series?

43 / 45

43) Which data structure is unordered and mutable in Python?

44 / 45

44) Which method performs one-hot encoding in Pandas?

45 / 45

45) What will be printed? x = {1,2,2,3,4}; print(x)

Your score is

The average score is 84%

0%

Exit

0%

Important Notice:
Once you start the quiz, you will not be able to pause, exit, or restart it. Please ensure you are ready before beginning.


data Science

Data Science D2

1 / 45

1) What does NumPy mainly provide?

2 / 45

2) Which function in Pandas is used to join two datasets?

3 / 45

3) Which metric is used for classification evaluation?

4 / 45

4) Which machine learning library is commonly used for building models?

5 / 45

5) In statistics, which measure shows the spread of data?

6 / 45

6) Which measure of central tendency is affected most by extreme values?

7 / 45

7) Which visualization is best for showing correlation between two variables?

8 / 45

8) In supervised learning, the dataset is divided into:

9 / 45

9) Which Python package provides tools for statistical modeling?

10 / 45

10) Which of the following is an unsupervised learning algorithm?

11 / 45

11) Which of the following describes “Big Data”?

12 / 45

12) Which of the following is an ensemble method in machine learning?

13 / 45

13) Which metric is used to evaluate regression models?

14 / 45

14) Which of the following describes overfitting?

15 / 45

15) What is the role of hypothesis testing in Data Science?

16 / 45

16) Which Python library is most widely used for data analysis?

17 / 45

17) Data Science is mainly a combination of:

18 / 45

18) Which data visualization is best for time-series data?

19 / 45

19) Which of the following techniques reduces dimensionality?

20 / 45

20) Which of the following best describes a DataFrame in Pandas?

21 / 45

21) A confusion matrix is used to evaluate:

22 / 45

22) The process of cleaning and preparing raw data is called:

23 / 45

23) Which cloud platforms provide Data Science services?

24 / 45

24) Which programming language is most popular in Data Science?

25 / 45

25) Which of these is an AI-based data visualization tool?

26 / 45

26) Which of these is NOT a data visualization tool?

27 / 45

27) Which of the following is NOT part of the Data Science workflow?

28 / 45

28) Which of these plots is best to visualize data distribution?

29 / 45

29) Which of these is a common file format for datasets?

30 / 45

30) Which database is often used for unstructured big data?

31 / 45

31) Which type of chart is best to visualize categorical data distribution?

32 / 45

32) What does “.isnull()” function in Pandas do?

33 / 45

33) Logistic Regression is used for:

34 / 45

34) What does the Pandas function .groupby() do?

35 / 45

35) Which SQL command is used to extract data from a database?

36 / 45

36) In Data Science, what does EDA stand for?

37 / 45

37) What does one-hot encoding do?

38 / 45

38) Which visualization is best for showing correlation between two variables?

39 / 45

39) Which of the following is an example of supervised learning?

40 / 45

40) Which of the following is a regression algorithm?

41 / 45

41) Which of these Python libraries is best for numerical computation?

42 / 45

42) Which of the following is used for data visualization in Python?

43 / 45

43) Which algorithm is used for classification tasks?

44 / 45

44) In Data Science, “feature scaling” is required because:

45 / 45

45) Which function is used in Pandas to view the first few rows of data?

Your score is

The average score is 96%

0%

Exit

0%

Important Notice:
Once you start the quiz, you will not be able to pause, exit, or restart it. Please ensure you are ready before beginning.


data analyst

Data Analyst D2

1 / 45

1) What is the role of a Data Analyst?

2 / 45

2) What is data visualization mainly used for?

3 / 45

3) In analytics, KPI stands for:

4 / 45

4) Which database language is widely used in analytics?

5 / 45

5) Which visualization shows parts of a whole?

6 / 45

6) Data Analytics mainly focuses on:

7 / 45

7) Which chart is suitable for showing cumulative values?

8 / 45

8) Which chart is best to show correlation between two variables?

9 / 45

9) Which function in Power BI is used to create calculated columns?

10 / 45

10) Which SQL command is used to combine rows from two tables?

11 / 45

11) Cleaning data involves:

12 / 45

12) A slicer in Power BI is used for:

13 / 45

13) What is Big Data mainly characterized by?

14 / 45

14) Which of the following is the first step in Data Analytics?

15 / 45

15) In ETL, the "Transform" step includes:

16 / 45

16) What is the main benefit of using dashboards?

17 / 45

17) Which analytical method helps to find hidden patterns in data?

18 / 45

18) What is the purpose of data transformation?

19 / 45

19) A foreign key is used to:

20 / 45

20) A bar chart is best suited for:

21 / 45

21) Which of these is an open-source tool for data visualization?

22 / 45

22) . What does a Data Warehouse store?

23 / 45

23) What does ETL stand for?

24 / 45

24) Which tool is most widely used for business data visualization?

25 / 45

25) In Power BI, relationships are built between:

26 / 45

26) Which function in Power BI creates aggregate measures?

27 / 45

27) A primary key in a database is:

28 / 45

28) DAX in Power BI stands for:

29 / 45

29) Data storytelling in analytics refers to:

30 / 45

30) In Power BI, data can be imported from:

31 / 45

31) Power BI dashboards can be shared through:

32 / 45

32) Which of these is NOT a data type in Power BI?

33 / 45

33) Which chart is best for showing trends over time?

34 / 45

34) Which type of join keeps only the matching records from two tables?

35 / 45

35) In SQL, which clause is used to filter records?

36 / 45

36) Predictive analytics is used to:

37 / 45

37) Which tool can be integrated with Power BI for advanced analytics?

38 / 45

38) Which visualization is best for frequency distribution?

39 / 45

39) Which of these is NOT a stage in the analytics process?

40 / 45

40) Prescriptive analytics helps to:

41 / 45

41) Which component of Power BI is used for creating reports?

42 / 45

42) Which cloud platform is popular for analytics?

43 / 45

43) The ultimate goal of Data Analytics is to:

44 / 45

44) Which of these is an example of descriptive analytics?

45 / 45

45) What is the main purpose of a dashboard in analytics?

Your score is

The average score is 79%

0%

Exit

0%

Important Notice:
Once you start the quiz, you will not be able to pause, exit, or restart it. Please ensure you are ready before beginning.


data Science

Data Science D3

1 / 45

1) What does R² (R-squared) measure in regression?

2 / 45

2) Which algorithm is used for clustering in Data Science?

3 / 45

3) Which operation in Pandas merges datasets based on a common column?

4 / 45

4) Which sampling method ensures all groups are represented?

5 / 45

5) Which of the following reduces the number of dimensions in data?

6 / 45

6) The process of filling missing values in a dataset is called:

7 / 45

7) What is the main disadvantage of decision trees?

8 / 45

8) Which of these is an example of unstructured data?

9 / 45

9) Which Python library is most widely used for handling structured data?

10 / 45

10) One-hot encoding is used for:

11 / 45

11) Bagging in machine learning stands for:

12 / 45

12) Which of the following is the first step in the Data Science process?

13 / 45

13) What does “overfitting” mean in machine learning models?

14 / 45

14) What does normalization do to data?

15 / 45

15) Which is a key challenge in Data Science?

16 / 45

16) ROC curve is used to measure:

17 / 45

17) Which of the following is NOT a Data Science task?

18 / 45

18) What is the purpose of Exploratory Data Analysis (EDA)?

19 / 45

19) Which of these is an unsupervised learning algorithm?

20 / 45

20) In Python, which function gives the first five rows of a DataFrame?

21 / 45

21) Which distribution is used for binary classification problems?

22 / 45

22) Which programming language is most popular in Data Science?

23 / 45

23) Which step comes last in a Data Science workflow?

24 / 45

24) Which metric is most appropriate for evaluating a regression model?

25 / 45

25) A confusion matrix is mainly used for:

26 / 45

26) Which evaluation metric is better for imbalanced classification datasets?

27 / 45

27) Which of the following is an ensemble method?

28 / 45

28) In supervised learning, the dataset must contain:

29 / 45

29) In Python, what does df.describe() do?

30 / 45

30) What does feature engineering involve?

31 / 45

31) Which Pandas method is used to check missing values?

32 / 45

32) What is the main purpose of feature scaling?

33 / 45

33) What is the purpose of data wrangling?

34 / 45

34) Which method is used to handle multicollinearity in regression models?

35 / 45

35) Which library in Python is used for machine learning algorithms?

36 / 45

36) Which technique helps reduce variance in machine learning models?

37 / 45

37) Which Python function removes duplicates from a DataFrame?

38 / 45

38) Which data type is best to represent categorical variables in Pandas?

39 / 45

39) Which type of graph best shows the distribution of a numeric variable?

40 / 45

40) Which visualization technique is most suitable for checking correlation?

41 / 45

41) Which Python library is best for creating interactive visualizations?

42 / 45

42) Which SQL clause is used to group rows for aggregation in Data Science workflows?

43 / 45

43) The term “Big Data” usually refers to:

44 / 45

44) What does cross-validation help with?

45 / 45

45) Which plot is used to visualize the relationship between two continuous variables?

Your score is

The average score is 91%

0%

Exit

0%

Important Notice:
Once you start the quiz, you will not be able to pause, exit, or restart it. Please ensure you are ready before beginning.


data Science

Data Science D4

1 / 43

1) Bagging technique helps in:

2 / 43

2) Which of these is a classification algorithm?

3 / 43

3) Which ML technique is inspired by human brain neurons?

4 / 43

4) ROC curve is used for:

5 / 43

5) Which of the following is used for topic modeling in NLP?

6 / 43

6) Which of these is a data visualization library in Python?

7 / 43

7) Tokenization in NLP means:

8 / 43

8) Which of the following is NOT supervised learning?

9 / 43

9) Which library is used for numerical computing in Python?

10 / 43

10) Which of the following evaluates regression models?

11 / 43

11) Which of these is an unsupervised learning algorithm?

12 / 43

12) Which clustering algorithm is density-based?

13 / 43

13) Which cloud service provides ML tools?

14 / 43

14) Which of these languages is most popular for Data Science?

15 / 43

15) Which is an activation function in Neural Networks?

16 / 43

16) Deep learning is mainly based on:

17 / 43

17) Cross-validation is used to:

18 / 43

18) What is the full form of NLP?

19 / 43

19) Which ML algorithm is most interpretable?

20 / 43

20) In NLP, removing stop words is part of:

21 / 43

21) Which library is most used for deep learning?

22 / 43

22) Which of the following is NOT a type of bias in data science?

23 / 43

23) Which concept allows machines to learn from data?

24 / 43

24) Which method handles class imbalance?

25 / 43

25) F1-score is the harmonic mean of:

26 / 43

26) Which is an ensemble method?

27 / 43

27) Gradient Descent is used for:

28 / 43

28) Which of these is a data preprocessing step?

29 / 43

29) Which of the following is NOT supervised learning?

30 / 43

30) Which algorithm is best for spam email classification?

31 / 43

31) Which ML model is good for image classification?

32 / 43

32) Which metric is used for classification models?

33 / 43

33) The ultimate goal of Data Science is:

34 / 43

34) Which of these is NOT a step in CRISP-DM framework?

35 / 43

35) Data Science lifecycle ends with:

36 / 43

36) Which algorithm is called "lazy learner"?

37 / 43

37) Backpropagation is used in:

38 / 43

38) Which ML algorithm is most interpretable?

39 / 43

39) Which algorithm is widely used for recommendation systems?

40 / 43

40) Confusion matrix is used in:

41 / 43

41) Which type of variable has numeric values?

42 / 43

42) What is Data Science primarily concerned with?

43 / 43

43) Hyperparameters are:

Your score is

The average score is 95%

0%

Exit

0%

Important Notice:
Once you start the quiz, you will not be able to pause, exit, or restart it. Please ensure you are ready before beginning.


data Science

Data Science D6

1 / 42

1) In NLP, TF-IDF is used for:

2 / 42

2) Which of these is a supervised algorithm?

3 / 42

3) In statistics, variance measures:

4 / 42

4) What is the purpose of a confusion matrix?

5 / 42

5) Feature scaling techniques include:

6 / 42

6) In machine learning, classification problems predict:

7 / 42

7) Which of these is an ensemble method?

8 / 42

8) Which type of learning uses labeled datasets?

9 / 42

9) Logistic regression is mainly used for:

10 / 42

10) Which plot is best for visualizing correlation?

11 / 42

11) Which library is commonly used for machine learning in Python?

12 / 42

12) Bagging improves model:

13 / 42

13) Which Python library is most commonly used for numerical computations?

14 / 42

14) Which algorithm is used in recommendation systems?

15 / 42

15) Which evaluation metric is NOT for regression?

16 / 42

16) Which is an unsupervised technique?

17 / 42

17) Which step comes first in a Data Science project?

18 / 42

18) Which time-series model is commonly used?

19 / 42

19) Which type of bias occurs when training data isn’t representative?

20 / 42

20) K in KNN refers to:

21 / 42

21) Cross-validation is used to:

22 / 42

22) Which of these is a cloud platform for Data Science?

23 / 42

23) Which visualization helps detect skewness?

24 / 42

24) Overfitting occurs when a model:

25 / 42

25) Gradient Descent is an algorithm for:

26 / 42

26) Which library is used for deep learning in Python?

27 / 42

27) Which of the following is the primary goal of Data Science?

28 / 42

28) Which of these is a dimensionality reduction method?

29 / 42

29) In decision trees, leaf nodes represent:

30 / 42

30) Which of the following detects overfitting?

31 / 42

31) Which process splits data into training and test sets?

32 / 42

32) Which algorithm is suitable for market basket analysis?

33 / 42

33) The F1-score is the harmonic mean of:

34 / 42

34) Which of the following measures similarity between vectors?

35 / 42

35) Which metric evaluates classification models?

36 / 42

36) Which of the following models handles non-linear data well?

37 / 42

37) Which evaluation metric is best for imbalanced datasets?

38 / 42

38) Data Science is an interdisciplinary field combining?

39 / 42

39) In NLP, stop words are:

40 / 42

40) Which regularization technique adds absolute value penalties?

41 / 42

41) In statistics, p-value is used for:

42 / 42

42) Which of these is a regression evaluation metric?

Your score is

The average score is 0%

0%

Exit

0%

Important Notice:
Once you start the quiz, you will not be able to pause, exit, or restart it. Please ensure you are ready before beginning.


data Science

Data Science D5

1 / 43

1) A feature in Data Science means:

2 / 43

2) Which metric is used in regression problems?

3 / 43

3) Which of the following is a key step in Data Science workflow?

4 / 43

4) Which metric is used for clustering quality?

5 / 43

5) In clustering, the “elbow method” is used to:

6 / 43

6) Which dataset is commonly used as a beginner dataset in Data Science?

7 / 43

7) Which of the following is a deep learning library?

8 / 43

8) Principal Component Analysis (PCA) is used for:

9 / 43

9) Which measure is best for classification accuracy?

10 / 43

10) Which evaluation method is best for time-series forecasting?

11 / 43

11) Which visualization is best for correlation analysis?

12 / 43

12) Which evaluation metric is best for imbalanced data classification?

13 / 43

13) Which model is best for sequence data like time series?

14 / 43

14) NLP stands for:

15 / 43

15) Which distance measure is often used in KNN algorithm?

16 / 43

16) The ROC curve shows:

17 / 43

17) Hyperparameter tuning can be done using:

18 / 43

18) Which library is mainly used for linear algebra in Python?

19 / 43

19) Supervised learning is based on:

20 / 43

20) Logistic regression is used for:

21 / 43

21) Which is an example of unsupervised learning?

22 / 43

22) Confusion matrix is used in:

23 / 43

23) L1 and L2 penalties are used in:

24 / 43

24) Which of the following is NOT a machine learning algorithm?

25 / 43

25) Random Forest is based on:

26 / 43

26) Which technique is used to handle class imbalance?

27 / 43

27) Ensemble learning means:

28 / 43

28) What is the primary goal of Data Science?

29 / 43

29) Bag of Words is a technique used in:

30 / 43

30) ReLU is a type of:

31 / 43

31) Which technique converts categorical data into numerical?

32 / 43

32) Which method reduces overfitting in decision trees?

33 / 43

33) Gradient Boosting builds models:

34 / 43

34) Which type of learning requires reward-based training?

35 / 43

35) Which plot shows model accuracy over training epochs?

36 / 43

36) Which algorithm is best for image recognition tasks?

37 / 43

37) In machine learning, “overfitting” means:

38 / 43

38) In Python, which library is commonly used for machine learning?

39 / 43

39) Which step ensures model generalization?

40 / 43

40) Which programming languages are most used in Data Science?

41 / 43

41) Which method is used to split data into training and testing sets?

42 / 43

42) Which optimizer is widely used in deep learning?

43 / 43

43) Which algorithm is NOT supervised learning?

Your score is

The average score is 0%

0%

Exit

0%

Important Notice:
Once you start the quiz, you will not be able to pause, exit, or restart it. Please ensure you are ready before beginning.


data Science

Data Science D7

1 / 45

1) Which of these is a Python library for machine learning?

2 / 45

2) which machine learning task does clustering belong to?

3 / 45

3) Which method is used for splitting datasets in machine learning?

4 / 45

4) Which of these is a feature scaling method?

5 / 45

5) Which data visualization is best for correlation matrices?

6 / 45

6) What does PCA (Principal Component Analysis) do?

7 / 45

7) Which evaluation metric is useful for imbalanced classification?

8 / 45

8) Which step comes after data cleaning in the DS process?

9 / 45

9) What does “bias” in a machine learning model indicate?

10 / 45

10) Which measure of central tendency is affected most by outliers?

11 / 45

11) Which of these is NOT a supervised learning algorithm?

12 / 45

12) Which of the following is NOT a step in the Data Science lifecycle?

13 / 45

13) Which algorithm is best suited for grouping customers by behavior?

14 / 45

14) Which library in Python is used for deep learning?

15 / 45

15) Which algorithm is commonly used for anomaly detection?

16 / 45

16) What is the main use of cross-validation?

17 / 45

17) Which technique reduces multicollinearity in regression?

18 / 45

18) Which model is commonly used for binary classification?

19 / 45

19) Which ML algorithm is inspired by the biological nervous system?

20 / 45

20) Which metric is commonly used for regression tasks?

21 / 45

21) Which is a supervised learning task?

22 / 45

22) Which of these is an unsupervised learning algorithm?

23 / 45

23) Which type of variable has no natural order

24 / 45

24) Which type of learning uses both labeled and unlabeled data?

25 / 45

25) Which dataset is commonly used for image recognition experiments?

26 / 45

26) Which Python library is used for statistical modeling?

27 / 45

27) Which technique prevents overfitting in decision trees?

28 / 45

28) Which activation function is commonly used in deep learning?

29 / 45

29) Which distribution is often assumed in statistics?

30 / 45

30) which of the following is a feature selection method?

31 / 45

31) What is overfitting in machine learning?

32 / 45

32) Which data type is continuous?

33 / 45

33) Which of the following is an example of classification?

34 / 45

34) Which of the following is an example of regression?

35 / 45

35) Which type of learning involves agents interacting with an environment?

36 / 45

36) Which technique combines bootstrapping with decision trees?

37 / 45

37) Which evaluation metric is used for regression models?

38 / 45

38) Which Python library is most commonly used for data manipulation?

39 / 45

39) Which model combines predictions from multiple models?

40 / 45

40) Which is a key difference between Data Science and Data Analytics?

41 / 45

41) Which algorithm is suitable for market basket analysis?

42 / 45

42) Which ML algorithm is known as a lazy learner?

43 / 45

43) Which term refers to selecting the best parameters for a model?

44 / 45

44) Which optimization technique is used in training neural networks?

45 / 45

45) Which of the following best defines Data Science?

Your score is

The average score is 0%

0%

Exit

Leave a Reply

Your email address will not be published. Required fields are marked *