Python Interview Questions and Answers

Question 1

What is Pandas?

Accepted Answer

Ans: Pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python .

Question 2

What is Python pandas used for?

Accepted Answer

Ans: Pandas is a software library written for the Python programming language for data manipulation and analysis. In particular, it offers data structures and operations for manipulating numerical tables and time series. pandas is free software released under the three-clause BSD license.

Question 3

What is a Series in Pandas?

Accepted Answer

Ans: Pandas Series is a one-dimensional labelled array capable of holding data of any type (integer, string, float, python objects, etc.). The axis labels are collectively called index. Pandas Series is nothing but a column in an excel sheet.

Question 4

Mention the different Types of Data structures in pandas??

Accepted Answer

Ans: There are two data structures supported by pandas library, Series and DataFrames. Both of the data structures are built on top of Numpy. Series is a one-dimensional data structure in pandas and DataFrame is the two-dimensional data structure in pandas. There is one more axis label known as Panel which is a three-dimensional data structure and it includes items, major_axis, and minor_axis.

Question 5

Explain Reindexing in pandas?

Accepted Answer

Ans: Re-indexing means to conform DataFrame to a new index with optional filling logic, placing NA/NaN in locations having no value in the previous index. It changes the row labels and column labels of a DataFrame.

Question 6

What are the key features of pandas library ?

Accepted Answer

Ans: There are various features in pandas library and some of them are mentioned below Data Alignment Memory Efficient Reshaping Merge and join Time Series

Question 7

What is pandas Used For ?

Accepted Answer

Ans: This library is written for the Python programming language for performing operations like data manipulation, data analysis, etc. The library provides various operations as well as data structures to manipulate time series and numerical tables.

Question 8

How can we create copy of series in Pandas?

Accepted Answer

Ans: pandas.Series.copy Series.copy( deep=True ) pandas.Series.copy. Make a deep copy, including a copy of the data and the indices. With deep=False neither the indices or the data are copied. Note that when deep=True data is copied, actual python objects will not be copied recursively, only the reference to the object.

Question 9

What is Time Series in pandas?

Accepted Answer

Ans: A time series is an ordered sequence of data which basically represents how some quantity changes over time. pandas contains extensive capabilities and features for working with time series data for all domains. pandas supports: Parsing time series information from various sources and formats Generate sequences of fixed-frequency dates and time spans Manipulating and converting date time with timezone information Resampling or converting a time series to a particular frequency Performing date and time arithmetic with absolute or relative time increments

Question 10

Explain Categorical Data in Pandas?

Accepted Answer

Ans: Categorical are a pandas data type corresponding to categorical variables in statistics. A categorical variable takes on a limited and usually fixed, number of possible values (categories; levels in R). Examples are gender, social class, blood type, country affiliation, observation time or rating via Likert scales. All values of categorical data are either in categories or np.nan. The categorical data type is useful in the following cases: A string variable consisting of only a few different values. Converting such a string variable to a categorical variable will save some memory, The lexical order of a variable is not the same as the logical order (“one”, “two”, “three”). By converting to a categorical and specifying an order on the categories, sorting and min/max will use the logical order instead of the lexical order, As a signal to other Python libraries that this column should

Ans: Scatter_matrix

Question 11

How will you create a series from dict in Python?

Accepted Answer

Ans: A Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.). It has to be remembered that unlike Python lists, a Series will always contain data of the same type. Let’s see how to create a Pandas Series from Dictionary. Using Series() method without index parameter.

Question 12

What are operations on Series in pandas?

Accepted Answer

Ans: Pandas Series is a one-dimensional labeled array capable of holding data of any type (integer, string, float, python objects, etc.). The axis labels are collectively called index . Pandas Series is nothing but a column in an excel sheet. Creating a Pandas Series- In the real world, a Pandas Series will be created by loading the datasets from existing storage, storage can be SQL Database, CSV file, and Excel file. Pandas Series can be created from the lists, dictionary, and from a scalar value etc. Series can be created in different ways, here are some ways by which we create a series: Creating a series from array: In order to create a series from array, we have to import a numpy module and have to use array() function. # import pandas as pd import pandas as pd # import numpy as np import numpy as np # simple array data = np.array([‘g’,’e’,’e’,’k’,’s’]) ser = pd.Series(data) print(se

Question 13

What is a DataFrame in pandas?

Accepted Answer

Ans: Pandas DataFrame is two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns. Pandas DataFrame consists of three principal components, the data, rows, and columns. Creating a Pandas DataFrame- n the real world, a Pandas DataFrame will be created by loading the datasets from existing storage, storage can be SQL Database, CSV file, and Excel file. Pandas DataFrame can be created from the lists, dictionary, and from a list of dictionary etc. Dataframe can be created in different ways here are some ways by which we create a dataframe: Creating a dataframe using List: DataFrame can be created using a single list or a list of lists. # import pandas as pd import pandas as pd # list of strings lst = [‘Geeks’, ‘For’, ‘Gee

Question 14

What are the different ways in which a DataFrame can be created in Pandas?

Accepted Answer

Ans: Pandas DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. It is generally the most commonly used pandas object. Pandas DataFrame can be created in multiple ways. Let’s discuss different ways to create a DataFrame one by one. Creating Pandas DataFrame from lists of lists. Import pandas library import pandas as pd # initialize list of lists data = [[‘tom’, 10], [‘nick’, 15], [‘juli’, 14]] # Create the pandas DataFrame df = pd.DataFrame(data, columns = [‘Name’, ‘Age’]) # print dataframe. df Output:

Question 15

How will you create an empty DataFrame in pandas?

Accepted Answer

Ans: To create a completely empty Pandas dataframe, we use do the following: import pandas as pd MyEmptydf = pd.DataFrame() This will create an empty dataframe with no columns or rows. To create an empty dataframe with three empty column (columns X, Y and Z), we do: df = pd.DataFrame(columns=[‘X’, ‘Y’, ‘Z’])

Question 16

How will you add a column to a pandas DataFrame?

Accepted Answer

Ans: Adding new column to existing DataFrame in Pandas Import pandas package import pandas as pd # Define a dictionary containing Students data data = {‘Name’: [‘Jai’, ‘Princi’, ‘Gaurav’, ‘Anuj’], ‘Height’: [5.1, 6.2, 5.1, 5.2], ‘Qualification’: [‘Msc’, ‘MA’, ‘Msc’, ‘Msc’]} # Convert the dictionary into DataFrame df = pd.DataFrame(data) # Declare a list that is to be converted into a column address = [‘Delhi’, ‘Bangalore’, ‘Chennai’, ‘Patna’] # Using ‘Address’ as the column name # and equating it to the list df[‘Address’] = address # Observe the result df Output:

Question 17

How will you retrieve a single column from pandas DataFrame?

Accepted Answer

Ans: To start a project in Django, use the command $django-admin.py and then use the following command: Project _init_.py manage.py settings.py urls.py

Question 18

range () vs and xrange () functions in Python?

Accepted Answer

Ans: In Python 2 we have the following two functions to produce a list of numbers within a given range. range() xrange() in Python 3, xrange() is deprecated, i.e. xrange() is removed from python 3.x. Now In Python 3, we have only one function to produce the numbers within a given range i.e. range() function. But, range() function of python 3 works same as xrange() of python 2 (i.e. internal implementation of range() function of python 3 is same as xrange() of Python 2). So The difference between range() and xrange() functions becomes relevant only when you are using python 2. range() and xrange() function values a). range() creates a list i.e., range returns a Python list object, for example, range (1,500,1) will create a python list of 499 integers in memory. Remember, range() generates all numbers at once. b).xrange() functions returns an xrange object that evaluates lazily. That means

Please refer to training materials for the detailed answer.

Question 19

What is the name of pandas library tools used to create a scatter plot matrix?

Accepted Answer

Ans: Scatter_matrix

Question 20

What is pylab?

Accepted Answer

Please refer to training materials for the detailed answer.

Question 21

Ans: PyLab is a package that contains NumPy, SciPy, and Matplotlib into a single namespace.

Accepted Answer

Please refer to training materials for the detailed answer.

Question 22

Define the different ways a DataFrame can be created in pandas?

Accepted Answer

Ans: We can create a DataFrame using following ways: Lists Dict of ndarrays Example-1: Create a DataFrame using List: importpandas as pd # a list of strings a = [‘Python’, ‘Pandas’] # Calling DataFrame constructor on list info = pd.DataFrame(a) print(info) Output: 0 0 Python 1 Pandas Example-2: Create a DataFrame from dict of ndarrays: importpandas as pd info = {‘ID’:[101, 102, 103],’Department’ :[‘B.Sc’,’B.Tech’,’M.Tech’,]} info = pd.DataFrame(info) print (info) Output: ID Department 0 101 B.Sc 1 102 B.Tech 2 103 M.Tech

Question 23

Explain Categorical data in Pandas?

Accepted Answer

Ans: A Categorical data is defined as a Pandas data type that corresponds to a categorical variable in statistics. A categorical variable is generally used to take a limited and usually fixed number of possible values. Examples: gender, country affiliation, blood type, social class, observation time, or rating via Likert scales. All values of categorical data are either in categories or np.nan. This data type is useful in the following cases: It is useful for a string variable that consists of only a few different values. If we want to save some memory, we can convert a string variable to a categorical variable. It is useful for the lexical order of a variable that is not the same as the logical order (?one?, ?two?, ?three?) By converting into a categorical and specify an order on the categories, sorting and min/max is responsible for using the logical order instead of the lexical order.

Question 24

How will you create a series from dict in Pandas?

Accepted Answer

Ans: A Series is defined as a one-dimensional array that is capable of storing various data types. We can create a Pandas Series from Dictionary: Create a Series from dict: We can also create a Series from dict. If the dictionary object is being passed as an input and the index is not specified, then the dictionary keys are taken in a sorted order to construct the index. If index is passed, then values correspond to a particular label in the index will be extracted from the dictionary. importpandas as pd importnumpy as np info = {‘x’: 0., ‘y’ : 1., ‘z’ : 2.} a = pd.Series(info) print (a) Output: x 0.0 y 1.0 z 2.0 dtype: float64

Question 25

How can we create a copy of the series in Pandas?

Accepted Answer

Ans: We can create the copy of series by using the following syntax: pandas.Series.copy Series.copy(deep=True) The above statements make a deep copy that includes a copy of the data and the indices. If we set the value of deep to False, it will neither copy the indices nor the data. 25. ) How will you create an empty DataFrame in Pandas? Ans: A DataFrame is a widely used data structure of pandas and works with a two-dimensional array with labeled axes (rows and columns) It is defined as a standard way to store data and has two different indexes, i.e., row index and column index. Create an empty DataFrame: The below code shows how to create an empty DataFrame in Pandas: # importing the pandas library importpandas as pd info = pd.DataFrame() print (info) Output: Empty DataFrame Columns: [ ] Index: [ ]

Question 26

How will you add a column to a pandas DataFrame?

Accepted Answer

Ans: We can add any new column to an existing DataFrame. The below code demonstrates how to add any new column to an existing DataFrame: # importing the pandas library import pandas as pd info = {‘one’: pd.Series([1, 2, 3, 4, 5], index=[‘a’, ‘b’, ‘c’, ‘d’, ‘e’]), ‘two’ : pd.Series([1, 2, 3, 4, 5, 6], index=[‘a’, ‘b’, ‘c’, ‘d’, ‘e’, ‘f’])} info = pd.DataFrame(info) # Add a new column to an existing DataFrame object print (“Add new column by passing series”) info[‘three’]=pd.Series([20,40,60],index=[‘a’,’b’,’c’]) print (info) print (“Add new column using existing DataFrame columns”) info[‘four’]=info[‘one’]+info[‘three’] print (info) Output: Add new column by passing series one two three a 1.0 1 20.0 b 2.0 2 40.0 c 3.0 3 60.0 d 4.0 4 NaN e 5.0 5 NaN f NaN 6 NaN Add new column using existing DataFrame columns one two three four a 1.0 1 20.0 21.0 b 2.0 2 40.0 42.0 c 3.0 3 60.0 63.0 d 4.0 4 N

Question 27

How to add an Index, row, or column to a Pandas DataFrame?

Accepted Answer

Ans: Adding an Index to a DataFrame Pandas allow adding the inputs to the index argument if you create a DataFrame. It will make sure that you have the desired index. If you don?t specify inputs, the DataFrame contains, by default, a numerically valued index that starts with 0 and ends on the last row of the DataFrame. Adding Rows to a DataFrame We can use .loc, iloc, and ix to insert the rows in the DataFrame. The loc basically works for the labels of our index. It can be understood as if we insert in loc[4], which means we are looking for that values of DataFrame that have an index labeled 4. The iloc basically works for the positions in the index. It can be understood as if we insert in iloc[4], which means we are looking for the values of DataFrame that are present at index ‘4`. The ix is a complex case because if the index is integer-based, we pass a label to ix. The ix[4] means tha

Question 28

Add row with specific index name:

Accepted Answer

import pandas as pd employees = pd.DataFrame( data = { 'Name' : [ 'John Doe' , 'William Spark' ], 'Occupation' : [ 'Chemist' , 'Statistician' ], 'Date Of Join' : [ '2018-01-25' , '2018-01-26' ], 'Age' : [ 23 , 24 ]}, index = [ 'Emp001' , 'Emp002' ], columns = [ 'Name' , 'Occupation' , 'Date Of Join' , 'Age' ]) print ( "n------------ BEFORE ----------------n" ) print (employees) employees.loc[ 'Emp003' ] = [ 'Sunny' , 'Programmer' , '2018-01-25' , 45 ] print ( "n------------ AFTER ----------------n" ) print (employees) OUTPUT : C:pandas>python example22.py ------------ BEFORE ---------------- Name Occupation Date Of Join Age Emp001 John Doe Chemist 2018-01-25 23 Emp002 William Spark Statistician 2018-01-26 24 ------------ AFTER ---------------- Name Occupation Date Of Join Age Emp001 John Doe Chemist 2018-01-25 23 Emp002 William Spark Statistician 2018-01-26 24 Emp003 Sunny Programmer 201

Question 29

How to Delete Indices, Rows or Columns From a Pandas Data Frame?

Accepted Answer

Ans: Deleting an Index from Your DataFrame If you want to remove the index from the DataFrame, you should have to do the following: Reset the index of DataFrame. Executing del df.index.name to remove the index name. Remove duplicate index values by resetting the index and drop the duplicate values from the index column. Remove an index with a row. Deleting a Column from Your DataFrame You can use the drop() method for deleting a column from the DataFrame. The axis argument that is passed to the drop() method is either 0 if it indicates the rows and 1 if it drops the columns. You can pass the argument inplace and set it to True to delete the column without reassign the DataFrame. You can also delete the duplicate values from the column by using the drop_duplicates() method. Removing a Row from Your DataFrame By using df.drop_duplicates(), we can remove duplicate rows from the DataFrame. Y

Question 30

How to Rename the Index or Columns of a Pandas DataFrame?

Accepted Answer

Ans: You can use the .rename method to give different values to the columns or the index values of DataFrame. There are the following ways to change index / columns names (labels) of pandas.DataFrame . Use pandas.DataFrame.rename() Change any index / columns names individually with dict Change all index / columns names with a function Use pandas.DataFrame.add_prefix() , pandas.DataFrame.add_suffix() Add prefix and suffix to columns name Update the index / columns attributes of pandas.DataFrame Replace all index / columns names set_index() method that sets an existing column as an index is also provided. See the following post for detail. Specify the original name and the new name in dict like {original name: new name} to index / columns of rename() . index is for index name and columns is for the columns name. If you want to change either, you need only specify one of index or columns .

Question 31

How to iterate over a Pandas DataFrame?

Accepted Answer

Ans. You can iterate over the rows of the DataFrame by using for loop in combination with an iterrows() call on the DataFrame. import pandas as pd import numpy as np df = pd.DataFrame([{'c1':10, 'c2':100}, {'c1':11,'c2':110}, {'c1':12,'c2':120}]) for index, row in df.iterrows(): print(row['c1'], row['c2']) Output: 10 100 11 110 12 120

Question 32

How to get the items of series A not present in series B?

Accepted Answer

Ans: We can remove items present in p2 from p1 using isin() method. import pandas as pd p1 = pd.Series([2, 4, 6, 8, 10]) p2 = pd.Series([8, 10, 12, 14, 16]) p1[~p1.isin(p2)] Solution 0 2 1 4 2 6 dtype: int64

Question 33

How to get the items not common to both series A and series B?

Accepted Answer

Ans: We get all the items of p1 and p2 not common to both using below example: import pandas as pd import numpy as np p1 = pd.Series([2, 4, 6, 8, 10]) p2 = pd.Series([8, 10, 12, 14, 16]) p1[~p1.isin(p2)] p_u = pd.Series(np.union1d(p1, p2)) # union p_i = pd.Series(np.intersect1d(p1, p2)) # intersect p_u[~p_u.isin(p_i)] Output: 0 2 1 4 2 6 5 12 6 14 7 16 dtype: int64

Question 34

How to get the minimum, 25th percentile, median, 75th, and max of a numeric series?

Accepted Answer

Ans: We can compute the minimum, 25th percentile, median, 75th, and maximum of p as below example: import pandas as pd import numpy as np p = pd.Series(np.random.normal(14, 6, 22)) state = np.random.RandomState(120) p = pd.Series(state.normal(14, 6, 22)) percentile(p, q=[0, 25, 50, 75, 100]) Output: array([ 4.61498692, 12.15572753, 14.67780756, 17.58054104, 33.24975515])

Question 35

How to get frequency counts of unique items of a series?

Accepted Answer

Ans: We can calculate the frequency counts of each unique value p as below example: import pandas as pd import numpy as np p= pd.Series(np.take(list(‘pqrstu’), np.random.randint(6, size=17))) p = pd.Series(np.take(list(‘pqrstu’), np.random.randint(6, size=17))) value_counts() Output: s 4 r 4 q 3 p 3 u 3

Question 36

How to convert a numpy array to a dataframe of given shape?

Accepted Answer

Ans. We can reshape the series p into a dataframe with 6 rows and 2 columns as below example: import pandas as pd import numpy as np p = pd.Series(np.random.randint(1, 7, 35)) # Input p = pd.Series(np.random.randint(1, 7, 35)) info = pd.DataFrame(p.values.reshape(7,5)) print(info) Output: 0 1 2 3 4 0 3 2 5 5 1 1 3 2 5 5 5 2 1 3 1 2 6 3 1 1 1 2 2 4 3 5 3 3 3 5 2 5 3 6 4 6 3 6 6 6 5

Question 37

How can we convert a Series to DataFrame?

Accepted Answer

Ans: The Pandas Series.to_frame() function is used to convert the series object to the DataFrame. to_frame(name=None) name: Refers to the object. Its Default value is None. If it has one value, the passed name will be substituted for the series name. s = pd.Series([“a”, “b”, “c”], name=”vals”) to_frame() Output: vals 0 a 1 b 2 c

Question 38

How can we sort the DataFrame?

Accepted Answer

Ans: We can efficiently perform sorting in the DataFrame through different kinds: By label By Actual value 1). By label The DataFrame can be sorted by using the sort_index() method. It can be done by passing the axis arguments and the order of sorting. The sorting is done on row labels in ascending order by default. Using the sort_index() method, by passing the axis arguments and the order of sorting, DataFrame can be sorted. By default, sorting is done on row labels in ascending order. import pandas as pd import numpy as np unsorted_df = pd.DataFrame(np.random.randn(10,2),index=[1,4,6,2,3,5,9,8,0,7],colu mns = ['col2','col1']) sorted_df=unsorted_df.sort_index() print sorted_df Its output is as follows − col2 col1 0 0.208464 0.627037 1 0.641004 0.331352 2 -0.038067 -0.464730 3 -0.638456 -0.021466 4 0.014646 -0.737438 5 -0.290761 -1.669827 6 -0.797303 -0.018737 7 0.525753 1.628921 8 -0.56

Question 39

How to convert String to date?

Accepted Answer

Ans: The below code demonstrates how to convert the string to date: From datetime import datetime # Define dates as the strings dmy_str1 = ‘Wednesday, July 14, 2018’ dmy_str2 = ’14/7/17′ dmy_str3 = ’14-07-2017′ # Define dates as the datetime objects dmy_dt1 = datetime.strptime(date_str1, ‘%A, %B %d, %Y’) dmy_dt2 = datetime.strptime(date_str2, ‘%m/%d/%y’) dmy_dt3 = datetime.strptime(date_str3, ‘%m-%d-%Y’) #Print the converted dates print(dmy_dt1) print(dmy_dt2) print(dmy_dt3) Output: 2017-07-14 00:00:00 2017-07-14 00:00:00 2018-07-14 00:00:00

Question 40

What is Data Aggregation?

Accepted Answer

Ans: The main task of Data Aggregation is to apply some aggregation to one or more columns. It uses the following: sum: It is used to return the sum of the values for the requested axis. min: It is used to return a minimum of the values for the requested axis. max: It is used to return a maximum values for the requested axis. Examples import pandas as pd import numpy as np df = pd.DataFrame([[1, 2, 3], [4, 5, 6], [7, 8, 9], [np.nan, np.nan, np.nan]], columns=['A', 'B', 'C']) print(df) # Aggregate these functions over the rows. print(df.agg(['sum', 'min'])) # Different aggregations per column. print(df.agg({'A' : ['sum', 'min'], 'B' : ['min', 'max']})) # Aggregate over the columns. print(df.agg("mean", axis="columns")) # Aggregate over the rows. print(df.agg("mean", axis="rows")) OUTPUT : A B C 0 1.0 2 3.0 1 4.0 5 6.0 2 7.0 8 9.0 3 NaN NaN NaN A B C sum 12.0 15.0 18.0 min 1.0 2.0 3.0 A B

Question 41

What is Pandas Index?

Accepted Answer

Ans: Indexing in Pandas : Indexing in pandas means simply selecting particular rows and columns of data from a DataFrame. Indexing could mean selecting all the rows and some of the columns, some of the rows and all of the columns, or some of each of the rows and columns. Indexing can also be known as Subset Selection. Pandas Indexing using [ ] , .loc[] , .iloc[ ] , .ix[ ] There are a lot of ways to pull the elements, rows, and columns from a DataFrame. There are some indexing method in Pandas which help in getting an element from a DataFrame. These indexing methods appear very similar but behave very differently. Pandas support four types of Multi-axes indexing they are: Dataframe.[ ] ; This function also known as indexing operator Dataframe.loc[ ] : This function is used for labels. Dataframe.iloc[ ] : This function is used for positions or integer based Dataframe.ix[] : This function i

Question 42

Selecting a single row using .ix[] as .loc[]

Accepted Answer

In order to select a single row, we put a single row label in a .ix function. This function act similar as .loc[ ] if we pass a row label as a argument of a function. # importing pandas package import pandas as pd # making data frame from csv file data = pd.read_csv( "nba.csv" , index_col = "Name" ) # retrieving row by ix method first = data.ix[ "Avery Bradley" ] print (first) Output :

Question 43

Define ReIndexing?

Accepted Answer

Ans: Reindexing changes the row labels and column labels of a DataFrame. To reindex means to conform the data to match a given set of labels along a particular axis. Multiple operations can be accomplished through indexing like − Reorder the existing data to match a new set of labels. Insert missing value (NA) markers in label locations where no data for the label existed. Example import pandas as pd import numpy as np N=20 df = pd.DataFrame({ 'A': pd.date_range(start='2016-01-01',periods=N,freq='D'), 'x': np.linspace(0,stop=N-1,num=N), 'y': np.random.rand(N), 'C': np.random.choice(['Low','Medium','High'],N).tolist(), 'D': np.random.normal(100, 10, size=(N)).tolist() }) #reindex the DataFrame df_reindexed = df.reindex(index=[0,2,5], columns=['A', 'C', 'B']) print df_reindexed Its output is as follows − A C B 0 2016-01-01 Low NaN 2 2016-01-03 High NaN 5 2016-01-06 Low NaN 42. Define Multi

Question 44

Describe Data Operations in Pandas?

Accepted Answer

Ans: In Pandas, there are different useful data operations for DataFrame, which are as follows: Row and column selection We can select any row and column of the DataFrame by passing the name of the rows and columns. When you select it from the DataFrame, it becomes one-dimensional and considered as Series. Filter Data We can filter the data by providing some of the boolean expressions in DataFrame. Null values A Null value occurs when no data is provided to the items. The various columns may contain no values, which are usually represented as NaN. 46. Define GroupBy in Pandas? Ans: Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric python packages. Pandas is one of those packages and makes importing and analyzing data much easier. Pandas dataframe.groupby() function is used to split the data into groups based on some criteria.

Question 45

How will you delete rows from a pandas DataFrame?

Accepted Answer

Ans: Import modules import pandas as pd Create a dataframe data = {'name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy'], 'year': [2012, 2012, 2013, 2014, 2014], 'reports': [4, 24, 31, 2, 3]} df = pd.DataFrame(data, index = ['Cochice', 'Pima', 'Santa Cruz', 'Maricopa', 'Yuma']) df name reports year Cochice Jason 4 2012 Pima Molly 24 2012 Santa Cruz Tina 31 2013 Maricopa Jake 2 2014 Yuma Amy 3 2014 Delete a row df.drop(['Cochice', 'Pima']) Output : name reports year Santa Cruz Tina 31 2013 Maricopa Jake 2 2014 Yuma Amy 3 2014

Question 46

What is Pandas ml?

Accepted Answer

Ans: pandas_ml is a package which integrates pandas, scikit-learn, xgboost into one package for easy handling of data and creation of machine learning models Installation $ pip install pandas_ml Example >>> import pandas_ml as pdml >>> import sklearn.datasets as datasets # create ModelFrame instance from sklearn.datasets >>> df = pdml.ModelFrame(datasets.load_digits()) >>> type(df) # binarize data (features), not touching target >>> df.data = df.data.preprocessing.binarize() >>> df.head() .target 0 1 2 3 4 5 6 7 8 ... 54 55 56 57 58 59 60 61 62 63 0 0 0 0 1 1 1 1 0 0 0 ... 0 0 0 0 1 1 1 0 0 0 1 1 0 0 0 1 1 1 0 0 0 ... 0 0 0 0 0 1 1 1 0 0 2 2 0 0 0 1 1 1 0 0 0 ... 1 0 0 0 0 1 1 1 1 0 3 3 0 0 1 1 1 1 0 0 0 ... 1 0 0 0 1 1 1 1 0 0 4 4 0 0 0 1 1 0 0 0 0 ... 0 0 0 0 0 1 1 1 0 0 [5 rows x 65 columns] # split to training and test data >>> train_df, test_df = df.model_selection.train_test_split(

Question 47

What is Pandas Charm?

Accepted Answer

Ans: pandas-charm is a small Python package for getting character matrices (alignments) into and out of pandas. Use this library to make pandas interoperable with BioPython and DendroPy . Convert between the following objects: BioPython Multiple Seq Alignment pandas DataFrame DendroPy Character Matrix pandas DataFrame “Sequence dictionary” pandas DataFrame The code has been tested with Python 2.7, 3.5 and 3.6.

Question 48

Installation :

Accepted Answer

$ pip install pandas-charm You may consider installing pandas-charm and its required Python packages within a virtual environment in order to avoid cluttering your system’s Python path. See for example the environment management system conda or the package virtualenv .

Question 49

Running the tests

Accepted Answer

Testing is carried out with pytest: $ pytest -v test_pandascharm.py Test coverage can be calculated with Coverage.py using the following commands: $ coverage run -m pytest $ coverage report -m pandascharm.py The code follow style conventions in PEP8, which can be checked with pycodestyle: $ pycodestyle pandascharm.py test_pandascharm.py setup.py

Question 50

DendroPy CharacterMatrix to pandas DataFrame

Accepted Answer

>>> import pandas as pd >>> import pandascharm as pc >>> import dendropy >>> dna_string = '3 5nt1 TCCAAnt2 TGCAAnt3 TG-AAn' >>> print(dna_string) 3 5 t1 TCCAA t2 TGCAA t3 TG-AA >>> matrix = dendropy.DnaCharacterMatrix.get( ... data=dna_string, schema='phylip') >>> df = pc.from_charmatrix(matrix) >>> df t1 t2 t3 0 T T T 1 C G G 2 C C - 3 A A A 4 A A A By default, characters are stored as rows and sequences as columns in the DataFrame. If you want rows to hold sequences, just transpose the matrix in pandas: >>> df.transpose() 0 1 2 3 4 t1 T C C A A t2 T G C A A t3 T G - A A

Question 51

How will you add a scalar column with same value for all rows to a pandas DataFrame?

Accepted Answer

Ans: Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages and makes importing and analyzing data much easier. Dataframe.add() method is used for addition of dataframe and other, element-wise (binary operator add). Equivalent to dataframe + other, but with support to substitute a fill_value for missing data in one of the inputs. Syntax: DataFrame.add(other, axis=’columns’, level=None, fill_value=None) Parameters: other :Series, DataFrame, or constant axis :{0, 1, ‘index’, ‘columns’} For Series input, axis to match Series index on fill_value : [None or float value, default None] Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing. level : [int or name] Broadcast across a level, matching Index values on the passed MultiIndex

Question 52

How can we select a column in pandas DataFrame?

Accepted Answer

Ans: Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages and makes importing and analyzing data much easier. Let’s discuss all different ways of selecting multiple columns in a pandas DataFrame. Method #1: Basic Method Given a dictionary which contains Employee entity as keys and list of those entity as values. # Import pandas package import pandas as pd # Define a dictionary containing employee data data = { 'Name' :[ 'Jai' , 'Princi' , 'Gaurav' , 'Anuj' ], 'Age' :[ 27 , 24 , 22 , 32 ], 'Address' :[ 'Delhi' , 'Kanpur' , 'Allahabad' , 'Kannauj' ], 'Qualification' :[ 'Msc' , 'MA' , 'MCA' , 'Phd' ]} # Convert the dictionary into DataFrame df = pd.DataFrame(data) # select two columns df[[ 'Name' , 'Qualification' ]] Output: Select Second to fourth column. # Import pandas package imp

Question 53

How can we retrieve a row in pandas DataFrame ?

Accepted Answer

Ans: Pandas provide a unique method to retrieve rows from a Data frame. DataFrame.loc[] method is a method that takes only index labels and returns row or dataframe if the index label exists in the caller data frame. Syntax: pandas.DataFrame.loc[ ] Parameters: Index label: String or list of string of index label of rows Return type: Data frame or Series depending on parameters Example #1 : Extracting single Row In this example, Name column is made as the index column and then two single rows are extracted one by one in the form of series using index label of rows. # importing pandas package import pandas as pd # making data frame from csv file data = pd.read_csv( "nba.csv" , index_col = "Name" ) # retrieving row by loc method first = data.loc[ "Avery Bradley" ] second = data.loc[ "R.J. Hunter" ] print (first, "nnn" , second) Output: As shown in the output image, two series were returned

Question 54

How will you convert a DataFrame to an array in pandas?

Accepted Answer

Ans: For performing some high-level mathematical functions, we can convert Pandas DataFrame to numpy arrays. It uses the DataFrame.to_numpy() function. The DataFrame.to_numpy() function is applied on the DataFrame that returns the numpy ndarray. Syntax: DataFrame.to_numpy(dtype=None, copy=False) Parameters dtype: It is an optional parameter that pass the dtype to numpy.asarray(). copy: It returns the boolean value that has the default value False. It ensures that the returned value is not a view on another array. Returns It returns the numpy.ndarray as an output. Example1: import pandas as pd pd.DataFrame({“P”: [2, 3], “Q”: [4, 5]}).to_numpy() info = pd.DataFrame({“P”: [2, 3], “Q”: [4.0, 5.8]}) info.to_numpy() info[‘R’] = pd.date_range(‘2000’, periods=2) info.to_numpy() Output : array([[2, 4.0, Timestamp('2000-01-01 00:00:00')], [3, 5.8, Timestamp('2000-01-02 00:00:00')]], dtype=object)

Question 55

How can you check if a DataFrame is empty in pandas?

Accepted Answer

Ans : Pandas DataFrame is a two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). Arithmetic operations align on both row and column labels. It can be thought of as a dict-like container for Series objects. This is the primary data structure of the Pandas. Pandas DataFrame.empty attribute checks if the dataframe is empty or not. It return True if the dataframe is empty else it return False . Syntax: DataFrame.empty Parameter : None Returns : bool Example #1: Use DataFrame.empty attribute to check if the given dataframe is empty or not # importing pandas as pd import pandas as pd # Creating the DataFrame df = pd.DataFrame({ 'Weight' :[ 45 , 88 , 56 , 15 , 71 ], 'Name' :[ 'Sam' , 'Andrea' , 'Alex' , 'Robin' , 'Kia' ], 'Age' :[ 14 , 25 , 55 , 8 , 21 ]}) # Create the index index_ = [ 'Row_1' , 'Row_2' , 'Row_3' , 'Row_4' , 'Row_5

Question 56

How will you get the average of values of a column in pandas DataFrame?

Accepted Answer

Please refer to training materials for the detailed answer.

Question 57

How will you apply a function to every data element in a DataFrame?

Accepted Answer

Ans: One can use apply() function in order to apply function to every row in given dataframe. Let’s see the ways we can do this task. Example # Import pandas package import pandas as pd # Function to add def add(a, b, c): return a + b + c def main(): # create a dictionary with # three fields each data = { 'A' :[ 1 , 2 , 3 ], 'B' :[ 4 , 5 , 6 ], 'C' :[ 7 , 8 , 9 ] } # Convert the dictionary into DataFrame df = pd.DataFrame(data) print ( "Original DataFrame:n" , df) df[ 'add' ] = df. apply ( lambda row : add(row[ 'A' ], row[ 'B' ], row[ 'C' ]), axis = 1 ) print ( 'nAfter Applying Function: ' ) # printing the new dataframe print (df) if __name__ = = '__main__' : main() Output: 60. How will you get the top 2 rows from a DataFrame in pandas? # Select the first 2 rows of the Dataframe dfObj1 = empDfObj.head(2) print(“First 2 rows of the Dataframe : “) print(dfObj1) Output: First 2 rows of the

Python Interview Questions & Answers 2026

1Introduction to Pandas

2Pandas Series

3Indexing, Reindexing & DataFrame Modification

4Data Operations & Analysis

5DataFrame Creation & Structure

6Selecting & Retrieving DataFrame Data

7Python Fundamentals & Pandas Ecosystem

Ready to Master Python?

Python Interview Questions & Answers 2026

1Introduction to Pandas

1What is Pandas?beginner

2What is Python pandas used for?beginner

4Mention the different Types of Data structures in pandas??beginner

6What are the key features of pandas library ?beginner

7What is pandas Used For ?beginner

2Pandas Series

3What is a Series in Pandas?beginner

8How can we create copy of series in Pandas?intermediate

9What is Time Series in pandas?beginner

3Indexing, Reindexing & DataFrame Modification

5Explain Reindexing in pandas?beginner

16How will you add a column to a pandas DataFrame?beginner

26How will you add a column to a pandas DataFrame?

4Data Operations & Analysis

10Explain Categorical Data in Pandas?beginner

19What is the name of pandas library tools used to create a scatter plot matrix?beginner

5DataFrame Creation & Structure

13What is a DataFrame in pandas?beginner

14What are the different ways in which a DataFrame can be created in Pandas?beginner

6Selecting & Retrieving DataFrame Data

17How will you retrieve a single column from pandas DataFrame?beginner

31How to iterate over a Pandas DataFrame?intermediate

38How can we sort the DataFrame?intermediate

7Python Fundamentals & Pandas Ecosystem

18range () vs and xrange () functions in Python?beginner

20What is pylab?beginner

Ready to Master Python?

Explore More Interview Guides

11How will you create a series from dict in Python?beginner

12What are operations on Series in pandas?beginner

24How will you create a series from dict in Pandas?beginner

25How can we create a copy of the series in Pandas?intermediate

32How to get the items of series A not present in series B?intermediate

33How to get the items not common to both series A and series B?intermediate

34How to get the minimum, 25th percentile, median, 75th, and max of a numeric series?intermediate

35How to get frequency counts of unique items of a series?intermediate

27How to add an Index, row, or column to a Pandas DataFrame?intermediate

28Add row with specific index name:beginner

29How to Delete Indices, Rows or Columns From a Pandas Data Frame?intermediate

30How to Rename the Index or Columns of a Pandas DataFrame?intermediate

41What is Pandas Index?beginner

42Selecting a single row using .ix[] as .loc[]beginner

43Define ReIndexing?beginner

45How will you delete rows from a pandas DataFrame?beginner

51How will you add a scalar column with same value for all rows to a pandas DataFrame?beginner

39How to convert String to date?intermediate

40What is Data Aggregation?beginner

44Describe Data Operations in Pandas?beginner

56How will you get the average of values of a column in pandas DataFrame?beginner

57How will you apply a function to every data element in a DataFrame?beginner

15How will you create an empty DataFrame in pandas?beginner

22Define the different ways a DataFrame can be created in pandas?beginner

36How to convert a numpy array to a dataframe of given shape?intermediate

37How can we convert a Series to DataFrame?intermediate

52How can we select a column in pandas DataFrame?intermediate

53How can we retrieve a row in pandas DataFrame ?intermediate

54How will you convert a DataFrame to an array in pandas?beginner

55How can you check if a DataFrame is empty in pandas?intermediate

46What is Pandas ml?beginner

47What is Pandas Charm?beginner

48Installation :beginner

49Running the testsbeginner

50DendroPy CharacterMatrix to pandas DataFramebeginner