There are many usecases where we need to read data from an Excel file. Simple example is to dump the data in Excel file into MongoDB. My use case is, I want to read test cases from Excel file and run those cases with Selenium Automation framework. Python XLRD module helps in reading the data from Excel file. This article explains how to use and get data from the Excel file using XLRD module in Python 3. Let’s get started.
XLRD can be installed using PIP installer provided along with the Python installation. On Windows, PIP tool is available in “C:\Python35\Scripts\pip3.exe”. Check your install folder to get the correct path. You can use the below command to install XLRD.
C:\Python35\Scripts\pip3.exe install xlrd
What we read from Excel file?
We all know the format of Excel file, but for the sake of understanding I will provide the information here. Below are the typical data fields that we would like to read from an Excel file:
- Number of sheets and Sheet names
- Rows and columns information
- Cell data and its type
Now we will get into the coding part to read all these data fields from Excel file. To get started we need to have an Excel file. I downloaded a sample Excel file (SampleData.xls) from here. Below is the image snapshot of the file. The Excel file contains two sheets, “SalesOrders” and “MyLinks”. We will use the data in the first sheet. There are 7 columns in the first sheet:
- OrderDate – Date field
- Region – Text field
- Rep – Text field
- Item – Text field
- Units – Integer
- Unit cost – Double/float
- Total – Formula field applied on Units * Unit cost
Open a workbook and get sheet information
The below code snippet will open the Excel workbook and print number of sheets and names of the sheets.
import xlrd # Open the workbook book = xlrd.open_workbook("SampleData.xls") # Print number of sheets print(book.nsheets) # Print sheet names for sheet in book.sheet_names(): print(sheet)
A workbook can be opened using “open_workbook” API provided the file path. This API will create a book object. “nsheets” will give you the number of sheets available in the workbook and “sheet_names” API will give the list of sheets.
Reading rows and columns in a sheet
import xlrd # Open the workbook book = xlrd.open_workbook("SampleData.xls") # get the list of sheets sheets = book.sheets() # print number of rows and cols in first sheet print (sheets.nrows) print (sheets.ncols)
The above code uses “sheets()” API to get the list of sheets as sheet objects. “nrows” and “ncols” will print the number of rows and columns in a sheet. The two “print” statements print the number of rows and cols in first sheet.
Reading data in rows and columns in a sheet
Now its time to read data in the rows and columns. We can iterate through rows and columns to read all the information in the sheet. The contents of a cell is provided by the “Cell” object in XLRD. The other most important thing we need to know is about the data format, which is provided by the “ctype” attribute in the “Cell” object. “ctype” is an integer values which is mapped to the below table:
Based on our input Excel file the first column is date so when we read the “ctype” of the cell in first column we should get 3. Below code snippet prints the “ctype” value of row 1 and col 0. We need to ignore the first row as it has headers. Similarly, we can find the types of each cell and assign them to appropriate Python type.
Let’s iterate through all the rows and find the total amount of sales available in the sheet.
import xlrd # Open the workbook book = xlrd.open_workbook("SampleData.xls") # get the list of sheets sheets = book.sheets() total = 0 for row in range(1, sheets.nrows): total = total + sheets.cell_value(row, 6) print (total)
That’s all for now.