Data Science

data Science
August 6, 2025

A Data Science internship offers hands-on experience in analyzing large datasets, building predictive models, and using data-driven methods to solve business problems. Interns learn to gather, clean, visualize, and model data using modern tools and techniques in machine learning, statistics, and programming.

Objectives

  • Understand the full lifecycle of data — from collection to actionable insights.
  • Gain real-world exposure to machine learning, statistical modeling, and data visualization.
  • Apply tools like Python, SQL, R, and libraries such as pandas, NumPy, scikit-learn, etc.
  • Learn to work with structured and unstructured data across domains.

Key Responsibilities

  • Collect and clean raw datasets from various sources (APIs, databases, CSVs, etc.)
  • Perform Exploratory Data Analysis (EDA) to find patterns and trends.
  • Build and test predictive models (e.g., regression, classification, clustering).
  • Visualize results using tools like Matplotlib, Seaborn, or Power BI/Tableau.
  • Generate reports and dashboards to communicate findings to stakeholders.
  • Work closely with Data Engineers and Analysts to improve data pipelines.

Tools & Technologies You’ll Use

  • Languages: Python, R, SQL
  • Libraries: pandas, NumPy, scikit-learn, TensorFlow, Keras, PyTorch
  • Visualization: Matplotlib, Seaborn, Plotly, Tableau, Power BI
  • Big Data: Hadoop, Spark (optional)
  • Databases: MySQL, PostgreSQL, MongoDB
  • Version Control: Git, GitHub

Skills You’ll Gain

  • Data cleaning and preprocessing
  • Statistical analysis and hypothesis testing
  • Predictive modeling and machine learning
  • Business problem-solving with data
  • Data visualization and storytelling
  • Model evaluation and tuning
0%

Important Notice:
Once you start the quiz, you will not be able to pause, exit, or restart it. Please ensure you are ready before beginning.


data Science

Data Science L1

1 / 100

1) Which SQL command is used to remove duplicate records?

2 / 100

2) In time series forecasting, which component represents seasonal variation?

3 / 100

3) In which ML model is the sigmoid function used?

4 / 100

4) What is the purpose of the predict() function in ML?

5 / 100

5) What does drop(columns=[]) do in Pandas?

6 / 100

6) What does .head() in Pandas return?

7 / 100

7) What does CSV stand for?

8 / 100

8) Which chart is most suitable for showing parts of a whole?

9 / 100

9)

What type of ML is used when no labelled data is available?

10 / 100

10) Which function is used to combine two DataFrames in Pandas?

11 / 100

11) Which algorithm works based on similarity/proximity?

12 / 100

12) Which method is used to normalize data?

13 / 100

13) . Which function in NumPy returns evenly spaced values?

14 / 100

14) What is cross-validation used for in ML?

15 / 100

15) Which metric is best for imbalanced classification problems?

16 / 100

16) Which function is used to group data in Pandas?

17 / 100

17) Which plot best visualizes the distribution of a single numeric variable?

18 / 100

18) Which method is used to find correlation in Pandas?

19 / 100

19) Which of the following is NOT a Python data type?

20 / 100

20) Which library is commonly used for data visualization in Python?

21 / 100

21) Which of the following is NOT a valid data type in R?

22 / 100

22) Which command in SQL is used to remove a table?

23 / 100

23) Which of the following is a type of structured data?

24 / 100

24) What is the default axis for dropna() in Pandas?

25 / 100

25) Which statistics type generalizes from a sample to the population?

26 / 100

26) What does NaN stand for in data science?

27 / 100

27) What does the value_counts() function do in Pandas?

28 / 100

28) Which of the following is NOT a supervised learning task?

29 / 100

29) Which method is used to fill missing values with the previous one?

30 / 100

30) What function in Pandas checks for missing values?

31 / 100

31) Which algorithm works well for linearly separable data?

32 / 100

32) Which of these is NOT a distance metric?

33 / 100

33) What is the primary goal of clustering?

34 / 100

34) What does NLP stand for?

35 / 100

35) Which type of plot is best for bivariate analysis of numerical data?

36 / 100

36) Feedback-based ML is known as:

37 / 100

37) In Python, how do you check the data type of a variable?

38 / 100

38) What is the purpose of one-hot encoding?

39 / 100

39) Which ML task predicts continuous numeric values?

40 / 100

40) Which is an ensemble learning method?

41 / 100

41) Which function in NumPy returns the mean of an array?

42 / 100

42) Which of these evaluation metrics is used for regression?

43 / 100

43) Which data type does not support mathematical operations directly in Python?

44 / 100

44) What is the first step in the CRISP-DM data science process model?

45 / 100

45)

 Which of the following is a supervised learning algorithm?

46 / 100

46) What does the fit() function do in machine learning models?

47 / 100

47) Which of the following is used for text vectorization?

48 / 100

48) Which ML algorithm is based on probability?

49 / 100

49) Which of the following is a classification metric?

50 / 100

50) Which R function is used to create a histogram?

51 / 100

51) Which of the following is used to handle missing values in a dataset?

52 / 100

52) Which library is commonly used for NLP in Python?

53 / 100

53) Which data analysis aims to recommend actions for desired outcomes?

54 / 100

54) What is a confusion matrix used for?

55 / 100

55) How to remove duplicate rows in Pandas?

56 / 100

56) What does A/B testing primarily evaluate?

57 / 100

57) What is bootstrapping in statistics?

58 / 100

58) What does ROC stand for in classification problems?

59 / 100

59) Which of the following are types of supervised learning?

60 / 100

60) Which of the following is used for feature selection?

61 / 100

61) What is a common use of dimensionality reduction?

62 / 100

62) Which function is used to get summary statistics in Pandas?

63 / 100

63) What type of data is most appropriate for a box plot?

64 / 100

64) Purpose of sampling data?

65 / 100

65) Feedback-based ML is known as:

66 / 100

66) Which of the following is a dimensionality reduction technique?

67 / 100

67) What does np.array() do in NumPy?

68 / 100

68) Which of the following is a hashing technique used in NLP?

69 / 100

69) Which algorithm is best suited for classification tasks?

70 / 100

70) What does time.time() in Python return?

71 / 100

71) Which of the following is NOT a Python loop structure?

72 / 100

72) What is the main purpose of exploratory data analysis (EDA)?

73 / 100

73) What is overfitting in machine learning?

74 / 100

74) What is the main role of the activation function in neural networks?

75 / 100

75) Which library in Python is used for machine learning?

76 / 100

76) In feature scaling, which technique centers the data around zero?

77 / 100

77) Which is not suitable for importing CSV files in R?

78 / 100

78) Which keyword imports external libraries in Python?

79 / 100

79) Which R function is used to create a histogram?

80 / 100

80) What does the 'k' represent in the K-Means algorithm?

81 / 100

81) What is a real-world application of Data Science?

82 / 100

82) What does ROC stand for in classification problems?

83 / 100

83) What is the full form of KPI in data analytics?

84 / 100

84) Which is not suitable for importing CSV files in R?

85 / 100

85) Which R package includes class()?

86 / 100

86) Which Python function returns a sequence of numbers?

87 / 100

87) Which of these is a continuous probability distribution?

88 / 100

88) What does NaN stand for in data science?

89 / 100

89) What is the output of len("Data Science") in Python?

90 / 100

90) What is the full form of SQL?

91 / 100

91) Which of the following is a hyperparameter in decision trees?

92 / 100

92) Which technique is used to reduce multicollinearity?

93 / 100

93) What does df.shape return in Pandas?

94 / 100

94) Which of the following is an unsupervised learning algorithm?

95 / 100

95) What is the full form of RMSE in regression analysis?

96 / 100

96) What is the output of type(5.0) in Python?

97 / 100

97) Which of the following is a continuous variable?

98 / 100

98) Which Python library is primarily used for data manipulation?

99 / 100

99) Which file format is commonly used to store machine learning models?

100 / 100

100) Which of the following is not a type of machine learning?

Your score is

The average score is 0%

0%

Exit

Leave a Reply

Your email address will not be published. Required fields are marked *