Python Reading Excel and CSV Files

Python Tutorial: Reading Excel and CSV Files

Welcome to this comprehensive tutorial on reading Excel and CSV files in Python, brought to you by codeswithpankaj.com. In this tutorial, we will explore various methods and libraries to handle Excel and CSV files, covering their definition, usage, and practical examples. By the end of this tutorial, you will have a thorough understanding of how to work with these file formats effectively in your Python programs.

Table of Contents

  1. Introduction to Excel and CSV Files

  2. Reading CSV Files

    • Using the csv Module

    • Using pandas

  3. Reading Excel Files

    • Using pandas

    • Using openpyxl

  4. Practical Examples

  5. Common Pitfalls and Best Practices


1. Introduction to Excel and CSV Files

What are CSV Files?

CSV (Comma-Separated Values) files are plain text files that store tabular data in a simple format. Each line in a CSV file represents a row in the table, and each value is separated by a comma.

What are Excel Files?

Excel files are spreadsheet files created by Microsoft Excel or other compatible spreadsheet programs. They can store data in a tabular format, including formulas, graphs, and various formatting options. Excel files are typically saved with the .xlsx or .xls file extension.

Why Handle Excel and CSV Files?

Handling Excel and CSV files is essential for data analysis, data manipulation, and data storage. They are commonly used for data exchange between systems, making it crucial to know how to read and process these files in Python.


2. Reading CSV Files

Using the csv Module

The csv module in Python provides functionality to read from and write to CSV files.

Syntax

import csv

with open('filename.csv', 'r') as file:
    reader = csv.reader(file)
    for row in reader:
        print(row)

Example

import csv

with open('example.csv', 'r') as file:
    reader = csv.reader(file)
    for row in reader:
        print(row)

Using pandas

The pandas library provides powerful data manipulation capabilities and makes it easy to read CSV files.

Syntax

import pandas as pd

df = pd.read_csv('filename.csv')
print(df)

Example

import pandas as pd

df = pd.read_csv('example.csv')
print(df)

3. Reading Excel Files

Using pandas

The pandas library also supports reading Excel files.

Syntax

import pandas as pd

df = pd.read_excel('filename.xlsx', sheet_name='Sheet1')
print(df)

Example

import pandas as pd

df = pd.read_excel('example.xlsx', sheet_name='Sheet1')
print(df)

Using openpyxl

The openpyxl library is used to read and write Excel 2010 xlsx/xlsm/xltx/xltm files.

Syntax

from openpyxl import load_workbook

workbook = load_workbook('filename.xlsx')
sheet = workbook['Sheet1']

for row in sheet.iter_rows(values_only=True):
    print(row)

Example

from openpyxl import load_workbook

workbook = load_workbook('example.xlsx')
sheet = workbook['Sheet1']

for row in sheet.iter_rows(values_only=True):
    print(row)

4. Practical Examples

Example 1: Filtering CSV Data

import pandas as pd

df = pd.read_csv('example.csv')
filtered_df = df[df['Column1'] > 50]
print(filtered_df)

Example 2: Summarizing Excel Data

import pandas as pd

df = pd.read_excel('example.xlsx', sheet_name='Sheet1')
summary = df.describe()
print(summary)

Example 3: Merging Data from Multiple Excel Sheets

import pandas as pd

sheet_names = ['Sheet1', 'Sheet2', 'Sheet3']
dfs = [pd.read_excel('example.xlsx', sheet_name=sheet) for sheet in sheet_names]
merged_df = pd.concat(dfs)
print(merged_df)

Example 4: Writing Filtered Data to a New CSV File

import pandas as pd

df = pd.read_csv('example.csv')
filtered_df = df[df['Column1'] > 50]
filtered_df.to_csv('filtered_example.csv', index=False)

5. Common Pitfalls and Best Practices

Pitfalls

  1. Incorrect File Path: Ensure the file path is correct to avoid FileNotFoundError.

  2. Unsupported File Formats: Verify that the file format is supported by the library being used.

  3. Large Files: Reading large files can consume significant memory. Consider reading in chunks or using efficient libraries.

Best Practices

  1. Use Context Managers: Use with statements to ensure files are properly closed.

  2. Handle Missing Data: Check for and handle missing or NaN values in your data.

  3. Optimize Memory Usage: For large datasets, use memory-efficient methods to read and process data.

# Good example with context manager and handling missing data
import pandas as pd

with open('example.csv', 'r') as file:
    df = pd.read_csv(file)
    df.fillna(0, inplace=True)
    print(df)

This concludes our detailed tutorial on reading Excel and CSV files in Python. We hope you found this tutorial helpful and informative. For more tutorials and resources, visit codeswithpankaj.com. Happy coding!

Welcome to this comprehensive tutorial on reading Excel and CSV files in Python, brought to you by codeswithpankaj.com. In this tutorial, we will explore various methods and libraries to handle Excel and CSV files, covering their definition, usage, and practical examples. By the end of this tutorial, you will have a thorough understanding of how to work with these file formats effectively in your Python programs.

Table of Contents

  1. Introduction to Excel and CSV Files

  2. Reading CSV Files

    • Using the csv Module

    • Using pandas

  3. Reading Excel Files

    • Using pandas

    • Using openpyxl

  4. Practical Examples

  5. Common Pitfalls and Best Practices


1. Introduction to Excel and CSV Files

What are CSV Files?

CSV (Comma-Separated Values) files are plain text files that store tabular data in a simple format. Each line in a CSV file represents a row in the table, and each value is separated by a comma.

What are Excel Files?

Excel files are spreadsheet files created by Microsoft Excel or other compatible spreadsheet programs. They can store data in a tabular format, including formulas, graphs, and various formatting options. Excel files are typically saved with the .xlsx or .xls file extension.

Why Handle Excel and CSV Files?

Handling Excel and CSV files is essential for data analysis, data manipulation, and data storage. They are commonly used for data exchange between systems, making it crucial to know how to read and process these files in Python.


2. Reading CSV Files

Using the csv Module

The csv module in Python provides functionality to read from and write to CSV files.

Syntax

import csv

with open('filename.csv', 'r') as file:
    reader = csv.reader(file)
    for row in reader:
        print(row)

Example

import csv

with open('example.csv', 'r') as file:
    reader = csv.reader(file)
    for row in reader:
        print(row)

Using pandas

The pandas library provides powerful data manipulation capabilities and makes it easy to read CSV files.

Syntax

import pandas as pd

df = pd.read_csv('filename.csv')
print(df)

Example

import pandas as pd

df = pd.read_csv('example.csv')
print(df)

3. Reading Excel Files

Using pandas

The pandas library also supports reading Excel files.

Syntax

import pandas as pd

df = pd.read_excel('filename.xlsx', sheet_name='Sheet1')
print(df)

Example

import pandas as pd

df = pd.read_excel('example.xlsx', sheet_name='Sheet1')
print(df)

Using openpyxl

The openpyxl library is used to read and write Excel 2010 xlsx/xlsm/xltx/xltm files.

Syntax

from openpyxl import load_workbook

workbook = load_workbook('filename.xlsx')
sheet = workbook['Sheet1']

for row in sheet.iter_rows(values_only=True):
    print(row)

Example

from openpyxl import load_workbook

workbook = load_workbook('example.xlsx')
sheet = workbook['Sheet1']

for row in sheet.iter_rows(values_only=True):
    print(row)

4. Practical Examples

Example 1: Filtering CSV Data

import pandas as pd

df = pd.read_csv('example.csv')
filtered_df = df[df['Column1'] > 50]
print(filtered_df)

Example 2: Summarizing Excel Data

import pandas as pd

df = pd.read_excel('example.xlsx', sheet_name='Sheet1')
summary = df.describe()
print(summary)

Example 3: Merging Data from Multiple Excel Sheets

import pandas as pd

sheet_names = ['Sheet1', 'Sheet2', 'Sheet3']
dfs = [pd.read_excel('example.xlsx', sheet_name=sheet) for sheet in sheet_names]
merged_df = pd.concat(dfs)
print(merged_df)

Example 4: Writing Filtered Data to a New CSV File

import pandas as pd

df = pd.read_csv('example.csv')
filtered_df = df[df['Column1'] > 50]
filtered_df.to_csv('filtered_example.csv', index=False)

5. Common Pitfalls and Best Practices

Pitfalls

  1. Incorrect File Path: Ensure the file path is correct to avoid FileNotFoundError.

  2. Unsupported File Formats: Verify that the file format is supported by the library being used.

  3. Large Files: Reading large files can consume significant memory. Consider reading in chunks or using efficient libraries.

Best Practices

  1. Use Context Managers: Use with statements to ensure files are properly closed.

  2. Handle Missing Data: Check for and handle missing or NaN values in your data.

  3. Optimize Memory Usage: For large datasets, use memory-efficient methods to read and process data.

# Good example with context manager and handling missing data
import pandas as pd

with open('example.csv', 'r') as file:
    df = pd.read_csv(file)
    df.fillna(0, inplace=True)
    print(df)

This concludes our detailed tutorial on reading Excel and CSV files in Python. We hope you found this tutorial helpful and informative. For more tutorials and resources, visit codeswithpankaj.com. Happy coding!

Last updated