Data Science

data Science
August 6, 2025

A Data Science internship offers hands-on experience in analyzing large datasets, building predictive models, and using data-driven methods to solve business problems. Interns learn to gather, clean, visualize, and model data using modern tools and techniques in machine learning, statistics, and programming.

Objectives

  • Understand the full lifecycle of data — from collection to actionable insights.
  • Gain real-world exposure to machine learning, statistical modeling, and data visualization.
  • Apply tools like Python, SQL, R, and libraries such as pandas, NumPy, scikit-learn, etc.
  • Learn to work with structured and unstructured data across domains.

Key Responsibilities

  • Collect and clean raw datasets from various sources (APIs, databases, CSVs, etc.)
  • Perform Exploratory Data Analysis (EDA) to find patterns and trends.
  • Build and test predictive models (e.g., regression, classification, clustering).
  • Visualize results using tools like Matplotlib, Seaborn, or Power BI/Tableau.
  • Generate reports and dashboards to communicate findings to stakeholders.
  • Work closely with Data Engineers and Analysts to improve data pipelines.

Tools & Technologies You’ll Use

  • Languages: Python, R, SQL
  • Libraries: pandas, NumPy, scikit-learn, TensorFlow, Keras, PyTorch
  • Visualization: Matplotlib, Seaborn, Plotly, Tableau, Power BI
  • Big Data: Hadoop, Spark (optional)
  • Databases: MySQL, PostgreSQL, MongoDB
  • Version Control: Git, GitHub

Skills You’ll Gain

  • Data cleaning and preprocessing
  • Statistical analysis and hypothesis testing
  • Predictive modeling and machine learning
  • Business problem-solving with data
  • Data visualization and storytelling
  • Model evaluation and tuning
0%

Important Notice:
Once you start the quiz, you will not be able to pause, exit, or restart it. Please ensure you are ready before beginning.


data Science

Data Science D1

1 / 45

1) How to drop rows where all elements are NaN?

2 / 45

2) Remove duplicate rows in Pandas:

3 / 45

3) In Pandas, axis=0 refers to:

4 / 45

4) Convert a categorical column into dummy variables:

5 / 45

5) Which method shows memory usage of DataFrame?

6 / 45

6) Which data structure is unordered and mutable in Python?

7 / 45

7) Which function gives cumulative sum of a column?

8 / 45

8) Export a DataFrame to Excel:

9 / 45

9) Approximate time complexity of dictionary lookup:

10 / 45

10) Pandas method to compute rolling mean:

11 / 45

11) Which function returns unique values across the entire DataFrame?

12 / 45

12) Which method calculates the rolling median?

13 / 45

13) How to interpolate missing numeric values?

14 / 45

14) Which method returns the first valid index in a Series?

15 / 45

15) Which function calculates quantiles?

16 / 45

16) How to shuffle rows of a DataFrame?

17 / 45

17) Which method joins DataFrames by index?

18 / 45

18) Difference between np.dot() and np.matmul():

19 / 45

19) Method to add/update a key-value pair in a dict:

20 / 45

20) NumPy broadcasting allows:

21 / 45

21) Which function converts a DataFrame to NumPy array?

22 / 45

22) Pandas method to compute rolling mean:

23 / 45

23) How to convert column types in Pandas?

24 / 45

24) Which function computes pairwise distances between rows?

25 / 45

25) Which function creates a 3×3 identity matrix?

26 / 45

26) What is the output of: a = [1,2,3]; print(a*2)

27 / 45

27) Which of the following is used to create a dictionary?

28 / 45

28) Which method returns the indices of missing values?

29 / 45

29) Which method slices rows by label range?

30 / 45

30) Compute correlation between numeric columns:

31 / 45

31) Which method performs one-hot encoding in Pandas?

32 / 45

32) NumPy broadcasting allows:

33 / 45

33) Which parameter in read_csv handles large file chunks?

34 / 45

34) Which method returns descriptive stats for categorical columns?

35 / 45

35) Why is vectorization in NumPy faster?

36 / 45

36) .agg() in Pandas is used for:

37 / 45

37) Function to create a pivot table in Pandas:

38 / 45

38) Which function is used to pivot a table by index and columns?

39 / 45

39) Which method returns the first valid index in a Series?

40 / 45

40) What will be printed? x = {1,2,2,3,4}; print(x)

41 / 45

41) Python sets do not allow:

42 / 45

42) arr = np.arange(6).reshape(2,3); print(arr.T.shape)

43 / 45

43) Which is a benefit of list comprehension?

44 / 45

44) Which function computes pairwise distances between rows?

45 / 45

45) Function to compute standard deviation in NumPy:

Your score is

The average score is 84%

0%

Exit

0%

Important Notice:
Once you start the quiz, you will not be able to pause, exit, or restart it. Please ensure you are ready before beginning.


data Science

Data Science D2

1 / 45

1) What does “.isnull()” function in Pandas do?

2 / 45

2) Which of the following is NOT part of the Data Science workflow?

3 / 45

3) Logistic Regression is used for:

4 / 45

4) Which of the following is used for data visualization in Python?

5 / 45

5) What is the role of hypothesis testing in Data Science?

6 / 45

6) Which metric is used for classification evaluation?

7 / 45

7) Which of the following best describes a DataFrame in Pandas?

8 / 45

8) Which of the following is a regression algorithm?

9 / 45

9) In supervised learning, the dataset is divided into:

10 / 45

10) Which data visualization is best for time-series data?

11 / 45

11) Which of these is NOT a data visualization tool?

12 / 45

12) Which algorithm is used for classification tasks?

13 / 45

13) Which visualization is best for showing correlation between two variables?

14 / 45

14) Which of the following describes overfitting?

15 / 45

15) Which visualization is best for showing correlation between two variables?

16 / 45

16) Which of the following is an ensemble method in machine learning?

17 / 45

17) Which database is often used for unstructured big data?

18 / 45

18) Which machine learning library is commonly used for building models?

19 / 45

19) In Data Science, what does EDA stand for?

20 / 45

20) In Data Science, “feature scaling” is required because:

21 / 45

21) Which type of chart is best to visualize categorical data distribution?

22 / 45

22) Data Science is mainly a combination of:

23 / 45

23) Which SQL command is used to extract data from a database?

24 / 45

24) What does NumPy mainly provide?

25 / 45

25) Which of the following is an unsupervised learning algorithm?

26 / 45

26) Which of these is an AI-based data visualization tool?

27 / 45

27) Which of these plots is best to visualize data distribution?

28 / 45

28) Which programming language is most popular in Data Science?

29 / 45

29) Which of the following techniques reduces dimensionality?

30 / 45

30) The process of cleaning and preparing raw data is called:

31 / 45

31) Which Python library is most widely used for data analysis?

32 / 45

32) Which function is used in Pandas to view the first few rows of data?

33 / 45

33) Which of the following is an example of supervised learning?

34 / 45

34) What does one-hot encoding do?

35 / 45

35) What does the Pandas function .groupby() do?

36 / 45

36) In statistics, which measure shows the spread of data?

37 / 45

37) Which of these is a common file format for datasets?

38 / 45

38) Which metric is used to evaluate regression models?

39 / 45

39) Which function in Pandas is used to join two datasets?

40 / 45

40) Which of the following describes “Big Data”?

41 / 45

41) A confusion matrix is used to evaluate:

42 / 45

42) Which measure of central tendency is affected most by extreme values?

43 / 45

43) Which Python package provides tools for statistical modeling?

44 / 45

44) Which of these Python libraries is best for numerical computation?

45 / 45

45) Which cloud platforms provide Data Science services?

Your score is

The average score is 96%

0%

Exit

0%

Important Notice:
Once you start the quiz, you will not be able to pause, exit, or restart it. Please ensure you are ready before beginning.


data analyst

Data Analyst D2

1 / 45

1) What does ETL stand for?

2 / 45

2) Which tool can be integrated with Power BI for advanced analytics?

3 / 45

3) What is Big Data mainly characterized by?

4 / 45

4) In ETL, the "Transform" step includes:

5 / 45

5) A primary key in a database is:

6 / 45

6) What is the main benefit of using dashboards?

7 / 45

7) Which of the following is the first step in Data Analytics?

8 / 45

8) In SQL, which clause is used to filter records?

9 / 45

9) Which of these is NOT a data type in Power BI?

10 / 45

10) Cleaning data involves:

11 / 45

11) Which function in Power BI is used to create calculated columns?

12 / 45

12) Which SQL command is used to combine rows from two tables?

13 / 45

13) In Power BI, data can be imported from:

14 / 45

14) Data storytelling in analytics refers to:

15 / 45

15) Prescriptive analytics helps to:

16 / 45

16) Which analytical method helps to find hidden patterns in data?

17 / 45

17) Which type of join keeps only the matching records from two tables?

18 / 45

18) In analytics, KPI stands for:

19 / 45

19) Which component of Power BI is used for creating reports?

20 / 45

20) In Power BI, relationships are built between:

21 / 45

21) Which visualization shows parts of a whole?

22 / 45

22) Data Analytics mainly focuses on:

23 / 45

23) Which of these is an open-source tool for data visualization?

24 / 45

24) Which database language is widely used in analytics?

25 / 45

25) DAX in Power BI stands for:

26 / 45

26) Which of these is an example of descriptive analytics?

27 / 45

27) Which tool is most widely used for business data visualization?

28 / 45

28) Predictive analytics is used to:

29 / 45

29) Which chart is best to show correlation between two variables?

30 / 45

30) Which chart is best for showing trends over time?

31 / 45

31) Which of these is NOT a stage in the analytics process?

32 / 45

32) The ultimate goal of Data Analytics is to:

33 / 45

33) A slicer in Power BI is used for:

34 / 45

34) What is the purpose of data transformation?

35 / 45

35) Which cloud platform is popular for analytics?

36 / 45

36) What is the role of a Data Analyst?

37 / 45

37) A foreign key is used to:

38 / 45

38) What is the main purpose of a dashboard in analytics?

39 / 45

39) Which chart is suitable for showing cumulative values?

40 / 45

40) Which visualization is best for frequency distribution?

41 / 45

41) What is data visualization mainly used for?

42 / 45

42) Which function in Power BI creates aggregate measures?

43 / 45

43) Power BI dashboards can be shared through:

44 / 45

44) A bar chart is best suited for:

45 / 45

45) . What does a Data Warehouse store?

Your score is

The average score is 79%

0%

Exit

0%

Important Notice:
Once you start the quiz, you will not be able to pause, exit, or restart it. Please ensure you are ready before beginning.


data Science

Data Science D3

1 / 45

1) Which technique helps reduce variance in machine learning models?

2 / 45

2) Which Python library is most widely used for handling structured data?

3 / 45

3) Bagging in machine learning stands for:

4 / 45

4) Which algorithm is used for clustering in Data Science?

5 / 45

5) Which sampling method ensures all groups are represented?

6 / 45

6) Which programming language is most popular in Data Science?

7 / 45

7) Which method is used to handle multicollinearity in regression models?

8 / 45

8) Which library in Python is used for machine learning algorithms?

9 / 45

9) Which step comes last in a Data Science workflow?

10 / 45

10) What does normalization do to data?

11 / 45

11) One-hot encoding is used for:

12 / 45

12) The process of filling missing values in a dataset is called:

13 / 45

13) Which SQL clause is used to group rows for aggregation in Data Science workflows?

14 / 45

14) Which Pandas method is used to check missing values?

15 / 45

15) What does “overfitting” mean in machine learning models?

16 / 45

16) Which plot is used to visualize the relationship between two continuous variables?

17 / 45

17) In supervised learning, the dataset must contain:

18 / 45

18) Which data type is best to represent categorical variables in Pandas?

19 / 45

19) Which evaluation metric is better for imbalanced classification datasets?

20 / 45

20) In Python, which function gives the first five rows of a DataFrame?

21 / 45

21) Which Python function removes duplicates from a DataFrame?

22 / 45

22) Which Python library is best for creating interactive visualizations?

23 / 45

23) What does feature engineering involve?

24 / 45

24) Which of the following is the first step in the Data Science process?

25 / 45

25) What is the main disadvantage of decision trees?

26 / 45

26) The term “Big Data” usually refers to:

27 / 45

27) Which of these is an unsupervised learning algorithm?

28 / 45

28) What is the purpose of data wrangling?

29 / 45

29) Which of the following is NOT a Data Science task?

30 / 45

30) What is the main purpose of feature scaling?

31 / 45

31) A confusion matrix is mainly used for:

32 / 45

32) Which of these is an example of unstructured data?

33 / 45

33) In Python, what does df.describe() do?

34 / 45

34) What is the purpose of Exploratory Data Analysis (EDA)?

35 / 45

35) What does R² (R-squared) measure in regression?

36 / 45

36) Which metric is most appropriate for evaluating a regression model?

37 / 45

37) Which visualization technique is most suitable for checking correlation?

38 / 45

38) Which distribution is used for binary classification problems?

39 / 45

39) Which is a key challenge in Data Science?

40 / 45

40) Which of the following is an ensemble method?

41 / 45

41) Which type of graph best shows the distribution of a numeric variable?

42 / 45

42) What does cross-validation help with?

43 / 45

43) Which operation in Pandas merges datasets based on a common column?

44 / 45

44) ROC curve is used to measure:

45 / 45

45) Which of the following reduces the number of dimensions in data?

Your score is

The average score is 91%

0%

Exit

0%

Important Notice:
Once you start the quiz, you will not be able to pause, exit, or restart it. Please ensure you are ready before beginning.


data Science

Data Science D4

1 / 43

1) Which of these is NOT a step in CRISP-DM framework?

2 / 43

2) Backpropagation is used in:

3 / 43

3) Which algorithm is best for spam email classification?

4 / 43

4) Which library is most used for deep learning?

5 / 43

5) Which metric is used for classification models?

6 / 43

6) Which algorithm is called "lazy learner"?

7 / 43

7) Which type of variable has numeric values?

8 / 43

8) F1-score is the harmonic mean of:

9 / 43

9) Which ML technique is inspired by human brain neurons?

10 / 43

10) Which of the following is used for topic modeling in NLP?

11 / 43

11) What is Data Science primarily concerned with?

12 / 43

12) Which cloud service provides ML tools?

13 / 43

13) Hyperparameters are:

14 / 43

14) Which library is used for numerical computing in Python?

15 / 43

15) Which of these is a data visualization library in Python?

16 / 43

16) Confusion matrix is used in:

17 / 43

17) Which clustering algorithm is density-based?

18 / 43

18) Data Science lifecycle ends with:

19 / 43

19) Which concept allows machines to learn from data?

20 / 43

20) ROC curve is used for:

21 / 43

21) Tokenization in NLP means:

22 / 43

22) What is the full form of NLP?

23 / 43

23) In NLP, removing stop words is part of:

24 / 43

24) Cross-validation is used to:

25 / 43

25) Which ML algorithm is most interpretable?

26 / 43

26) Which is an activation function in Neural Networks?

27 / 43

27) Which of these is an unsupervised learning algorithm?

28 / 43

28) The ultimate goal of Data Science is:

29 / 43

29) Bagging technique helps in:

30 / 43

30) Gradient Descent is used for:

31 / 43

31) Which of the following is NOT supervised learning?

32 / 43

32) Which of these languages is most popular for Data Science?

33 / 43

33) Deep learning is mainly based on:

34 / 43

34) Which ML model is good for image classification?

35 / 43

35) Which of the following evaluates regression models?

36 / 43

36) Which method handles class imbalance?

37 / 43

37) Which of these is a classification algorithm?

38 / 43

38) Which of the following is NOT a type of bias in data science?

39 / 43

39) Which of the following is NOT supervised learning?

40 / 43

40) Which algorithm is widely used for recommendation systems?

41 / 43

41) Which of these is a data preprocessing step?

42 / 43

42) Which ML algorithm is most interpretable?

43 / 43

43) Which is an ensemble method?

Your score is

The average score is 95%

0%

Exit

0%

Important Notice:
Once you start the quiz, you will not be able to pause, exit, or restart it. Please ensure you are ready before beginning.


data Science

Data Science D6

1 / 42

1) In statistics, p-value is used for:

2 / 42

2) Which Python library is most commonly used for numerical computations?

3 / 42

3) Which of these is a regression evaluation metric?

4 / 42

4) Which process splits data into training and test sets?

5 / 42

5) Overfitting occurs when a model:

6 / 42

6) Which metric evaluates classification models?

7 / 42

7) Which is an unsupervised technique?

8 / 42

8) Which type of learning uses labeled datasets?

9 / 42

9) Which of the following detects overfitting?

10 / 42

10) Which algorithm is suitable for market basket analysis?

11 / 42

11) Data Science is an interdisciplinary field combining?

12 / 42

12) Feature scaling techniques include:

13 / 42

13) Which of these is a cloud platform for Data Science?

14 / 42

14) Logistic regression is mainly used for:

15 / 42

15) Which library is used for deep learning in Python?

16 / 42

16) Which of the following models handles non-linear data well?

17 / 42

17) Cross-validation is used to:

18 / 42

18) Which time-series model is commonly used?

19 / 42

19) In NLP, TF-IDF is used for:

20 / 42

20) Bagging improves model:

21 / 42

21) Which of these is a supervised algorithm?

22 / 42

22) Which regularization technique adds absolute value penalties?

23 / 42

23) Which evaluation metric is best for imbalanced datasets?

24 / 42

24) Which evaluation metric is NOT for regression?

25 / 42

25) K in KNN refers to:

26 / 42

26) In machine learning, classification problems predict:

27 / 42

27) Gradient Descent is an algorithm for:

28 / 42

28) Which algorithm is used in recommendation systems?

29 / 42

29) In NLP, stop words are:

30 / 42

30) In statistics, variance measures:

31 / 42

31) In decision trees, leaf nodes represent:

32 / 42

32) Which of the following is the primary goal of Data Science?

33 / 42

33) Which plot is best for visualizing correlation?

34 / 42

34) Which step comes first in a Data Science project?

35 / 42

35) Which of the following measures similarity between vectors?

36 / 42

36) Which of these is a dimensionality reduction method?

37 / 42

37) Which library is commonly used for machine learning in Python?

38 / 42

38) What is the purpose of a confusion matrix?

39 / 42

39) Which type of bias occurs when training data isn’t representative?

40 / 42

40) Which of these is an ensemble method?

41 / 42

41) The F1-score is the harmonic mean of:

42 / 42

42) Which visualization helps detect skewness?

Your score is

The average score is 0%

0%

Exit

0%

Important Notice:
Once you start the quiz, you will not be able to pause, exit, or restart it. Please ensure you are ready before beginning.


data Science

Data Science D5

1 / 43

1) Supervised learning is based on:

2 / 43

2) Which visualization is best for correlation analysis?

3 / 43

3) Which step ensures model generalization?

4 / 43

4) Which method reduces overfitting in decision trees?

5 / 43

5) Which model is best for sequence data like time series?

6 / 43

6) Which metric is used for clustering quality?

7 / 43

7) Gradient Boosting builds models:

8 / 43

8) Ensemble learning means:

9 / 43

9) Which library is mainly used for linear algebra in Python?

10 / 43

10) Random Forest is based on:

11 / 43

11) A feature in Data Science means:

12 / 43

12) Confusion matrix is used in:

13 / 43

13) Principal Component Analysis (PCA) is used for:

14 / 43

14) Which of the following is a deep learning library?

15 / 43

15) Which optimizer is widely used in deep learning?

16 / 43

16) Bag of Words is a technique used in:

17 / 43

17) Which metric is used in regression problems?

18 / 43

18) ReLU is a type of:

19 / 43

19) Which evaluation method is best for time-series forecasting?

20 / 43

20) Which evaluation metric is best for imbalanced data classification?

21 / 43

21) Which dataset is commonly used as a beginner dataset in Data Science?

22 / 43

22) Logistic regression is used for:

23 / 43

23) Which technique converts categorical data into numerical?

24 / 43

24) Which programming languages are most used in Data Science?

25 / 43

25) Hyperparameter tuning can be done using:

26 / 43

26) NLP stands for:

27 / 43

27) L1 and L2 penalties are used in:

28 / 43

28) What is the primary goal of Data Science?

29 / 43

29) Which plot shows model accuracy over training epochs?

30 / 43

30) Which method is used to split data into training and testing sets?

31 / 43

31) In clustering, the “elbow method” is used to:

32 / 43

32) In Python, which library is commonly used for machine learning?

33 / 43

33) Which is an example of unsupervised learning?

34 / 43

34) Which technique is used to handle class imbalance?

35 / 43

35) The ROC curve shows:

36 / 43

36) Which algorithm is best for image recognition tasks?

37 / 43

37) Which measure is best for classification accuracy?

38 / 43

38) Which of the following is a key step in Data Science workflow?

39 / 43

39) Which algorithm is NOT supervised learning?

40 / 43

40) Which distance measure is often used in KNN algorithm?

41 / 43

41) Which type of learning requires reward-based training?

42 / 43

42) In machine learning, “overfitting” means:

43 / 43

43) Which of the following is NOT a machine learning algorithm?

Your score is

The average score is 0%

0%

Exit

0%

Important Notice:
Once you start the quiz, you will not be able to pause, exit, or restart it. Please ensure you are ready before beginning.


data Science

Data Science D7

1 / 45

1) Which of the following best defines Data Science?

2 / 45

2) which of the following is a feature selection method?

3 / 45

3) Which model is commonly used for binary classification?

4 / 45

4) Which technique prevents overfitting in decision trees?

5 / 45

5) Which of these is NOT a supervised learning algorithm?

6 / 45

6) which machine learning task does clustering belong to?

7 / 45

7) What is the main use of cross-validation?

8 / 45

8) Which Python library is used for statistical modeling?

9 / 45

9) Which data visualization is best for correlation matrices?

10 / 45

10) Which is a supervised learning task?

11 / 45

11) Which of the following is an example of regression?

12 / 45

12) Which metric is commonly used for regression tasks?

13 / 45

13) Which of these is an unsupervised learning algorithm?

14 / 45

14) Which distribution is often assumed in statistics?

15 / 45

15) Which technique reduces multicollinearity in regression?

16 / 45

16) Which ML algorithm is known as a lazy learner?

17 / 45

17) Which of the following is NOT a step in the Data Science lifecycle?

18 / 45

18) Which algorithm is best suited for grouping customers by behavior?

19 / 45

19) Which of these is a feature scaling method?

20 / 45

20) Which ML algorithm is inspired by the biological nervous system?

21 / 45

21) Which type of variable has no natural order

22 / 45

22) Which measure of central tendency is affected most by outliers?

23 / 45

23) What does PCA (Principal Component Analysis) do?

24 / 45

24) Which data type is continuous?

25 / 45

25) Which evaluation metric is used for regression models?

26 / 45

26) Which Python library is most commonly used for data manipulation?

27 / 45

27) Which of the following is an example of classification?

28 / 45

28) Which is a key difference between Data Science and Data Analytics?

29 / 45

29) Which evaluation metric is useful for imbalanced classification?

30 / 45

30) Which dataset is commonly used for image recognition experiments?

31 / 45

31) Which term refers to selecting the best parameters for a model?

32 / 45

32) Which type of learning involves agents interacting with an environment?

33 / 45

33) Which method is used for splitting datasets in machine learning?

34 / 45

34) Which algorithm is suitable for market basket analysis?

35 / 45

35) What does “bias” in a machine learning model indicate?

36 / 45

36) Which library in Python is used for deep learning?

37 / 45

37) Which technique combines bootstrapping with decision trees?

38 / 45

38) Which optimization technique is used in training neural networks?

39 / 45

39) Which of these is a Python library for machine learning?

40 / 45

40) Which step comes after data cleaning in the DS process?

41 / 45

41) Which model combines predictions from multiple models?

42 / 45

42) Which activation function is commonly used in deep learning?

43 / 45

43) What is overfitting in machine learning?

44 / 45

44) Which type of learning uses both labeled and unlabeled data?

45 / 45

45) Which algorithm is commonly used for anomaly detection?

Your score is

The average score is 0%

0%

Exit

Leave a Reply

Your email address will not be published. Required fields are marked *