The `apply()` function allows you to apply a function along either the row (axis=1) or column (axis=0) of a DataFrame.
loc is label-based indexing, allowing access to specific rows using row labels or indices.
Element-wise subtraction between two DataFrames is done using the '-' operator, producing a new DataFrame that contains the difference of corresponding elements from the first DataFrame minus the second.
Accessing a specific column of the DataFrame by using the column name, e.g., print(df['Name']) for a single column or print(df[['Name', 'City']]) for multiple columns.
Element-wise addition of two DataFrames is performed using the '+' operator, resulting in a new DataFrame that contains the sum of corresponding elements from both DataFrames.
Indexing in Pandas refers to the method of accessing data in DataFrames and Series using row and column labels or positions.
iloc is integer-location based indexing, which allows access to rows using their positions based on a 0-based index.
The `applymap()` function is used to apply a function element-wise across the entire DataFrame.
Merging is the process of using `pd.merge()` to merge DataFrames based on a common column or index.
The `map()` function is used for element-wise transformations in Series objects and can accept a dictionary, a Series, or a function.
Element-wise division is the operation that divides corresponding elements of one array by another, resulting in a new array.
SciPy is frequently used for statistical computations, signal processing, and solving differential equations.
You can access the 'name' field in a structured array by using the syntax students['name'].
Hierarchical indexing (or MultiIndex) allows for more complex data structures in pandas, where data can be indexed by multiple levels.
The Mean is the average value of the array elements, calculated using the function np.mean(arr).
The ndarray is the core data structure in NumPy, representing a multidimensional array that allows for efficient storage and manipulation of large datasets.
Data wrangling (also known as data cleaning or data preprocessing) involves transforming raw data into a more usable format.
Key features of ndarray include homogeneous data, where all elements must be of the same type, and multi-dimensional capability, allowing the creation of arrays of any number of dimensions (1D, 2D, 3D, etc.).
Boolean indexing in Pandas is a method of filtering data using conditions to select specific rows based on boolean values.
You can compute the average salary by using the syntax `df['Salary'].mean()`.
The df.dropna() function is used to drop missing values from a DataFrame.
A 1D NumPy array can be created using the `np.array()` function with a list as an argument, for example, `np.array([1, 2, 3])`.
Fancy indexing allows you to retrieve multiple elements or rows/columns from an array using another array of indices.
The syntax is `df['column1'] + df['column2']`, which allows you to perform arithmetic operations between columns.
It adds elements of DataFrame df1 and df2 element-wise.
You can transpose a matrix in NumPy using the transpose() function or the .T attribute.
You can use df.iloc[0:2, 1:3] to access multiple rows and columns by position.
Conditional Selection allows you to filter data based on specific conditions, selecting rows that meet certain criteria.
Data selection in pandas involves accessing specific rows, columns, or elements from a DataFrame.
The map() method is used to substitute values in a Series based on a dictionary or function.
A transposed matrix is obtained by swapping the rows and columns of the original matrix. For example, the transposed form of [[1, 4], [2, 5], [3, 6]] is [[1, 2, 3], [4, 5, 6]].
The 'apply' function in pandas is used to apply a function along an axis of the DataFrame, allowing for transformations or calculations on specific columns or rows.
The function np.max is used to find the maximum value in a NumPy array.
The mean (average) of the elements in the array can be computed using the mean() function.
Element-wise division is the operation where each element of one DataFrame is divided by the corresponding element of another DataFrame, resulting in a new DataFrame.
The standard deviation measures how spread out the values are, and the variance is the square of the standard deviation.
Combining datasets involves merging, concatenating, or joining multiple DataFrames.
The loc[] function in Pandas is used for label-based indexing, allowing selection of data based on row and column labels.
Element-wise multiplication of two DataFrames is achieved using the '*' operator, resulting in a new DataFrame that contains the product of corresponding elements from both DataFrames.
Element-wise subtraction is the operation that subtracts corresponding elements of one array from another, resulting in a new array.
An array in NumPy is a grid of values, all of the same type, and is indexed by a tuple of non-negative integers.
A structured array is an array that allows you to define the data types for each column using a list of tuples that specify the name and data type of each field.
Slicing an array means selecting a subset of elements from the array based on specified indices.
Joining is the process of using `df.join()` to join DataFrames on their index.
The output is the first three elements of the array, for example, [10, 20, 30].
The df.fillna(0) function is used to fill missing values in a DataFrame with zero.
Fancy indexing is a method in NumPy that allows access to specific elements of an array using an array of indices.
The natural logarithm function computes the natural logarithm (ln) of each element in the array.
It resets the index of the DataFrame to the default integer index, removing any custom index that was set.
The np.std() function calculates the standard deviation of the elements in an array.
Structured arrays in NumPy are used to handle heterogeneous data, allowing different types of data (e.g., integers, floats, strings) within a single array, similar to tables.
df.loc[] is used to access rows and columns by labels in a DataFrame.
A Series is a one-dimensional labeled array capable of holding any data type, and it can be accessed using the column name from a DataFrame.
passed_students is a boolean array indicating which students have marks greater than 50, signifying they passed.
The iloc[] function selects data by row and column positions using a 0-based index.
A method used to fill missing data in a DataFrame with a specified value or method, such as forward filling or backward filling.
NumPy (Numerical Python) is a fundamental package in Python for scientific computing, providing support for arrays, matrices, and a large number of mathematical functions to operate on these data structures efficiently.
You can find the minimum and maximum values in an array using min() and max() functions.
The syntax for slicing in NumPy is array[start:end:step], which allows you to access a subset of an array.
The max() function returns the maximum value in an array.
The cumulative sum is computed using the cumsum() function, which returns the cumulative sum of the array elements.
The percentile() function gives the value below which a given percentage of observations fall.
Pandas is a powerful Python library for data manipulation and analysis, providing data structures like Series and DataFrame to work with structured data, such as tabular datasets, facilitating easy data cleaning, preparation, and analysis.
loc is a method in Pandas used to access a group of rows and columns by labels or a boolean array.
The np.add() function performs element-wise addition of two arrays.
The corrcoef() function computes the correlation coefficient matrix between two arrays.
You can categorize age by defining a custom function and applying it to the age column using `df['column_name'].apply(function)`.
Structured arrays are useful for handling heterogeneous data, like rows in a table, where each field can have a different data type.
Functions like `np.zeros()`, `np.ones()`, and `np.arange()` can be used to create NumPy arrays with initial values.
Pandas is a powerful Python library used for data manipulation and analysis, providing data structures like DataFrames and Series for easy handling of structured data.
You can create a DataFrame by passing a dictionary to the pd.DataFrame() constructor, where the keys represent column names and the values are lists of column data.
The syntax is `df[df['column_name'] condition]`, which allows you to select rows that satisfy the given condition.
Boolean Logic in NumPy refers to the use of boolean conditions to filter and select elements from arrays based on specified criteria.
Vectorized operations are significantly faster than for-loop based code.
A NumPy array is a powerful N-dimensional array object that allows for efficient storage and manipulation of numerical data.
A combined condition for filtering can be created using logical operators, such as using & to combine conditions like (df['Age'] > 30) & (df['Salary'] < 85000).
Pandas provides methods like 'pd.isna()' or 'pd.isnull()' to detect missing values, which can be used to handle missing data effectively.
The sample data structures are dictionaries named 'data1' and 'data2', each containing lists of values for columns 'A', 'B', and 'C'.
Group By is a method in Pandas that allows you to group data using `df.groupby()` and apply aggregation functions like `sum()`, `mean()`, etc.
You can create a MultiIndex DataFrame using the pd.MultiIndex.from_arrays method, providing arrays for the index levels.
Indexing in NumPy allows you to access specific elements of an array, with arrays being zero-indexed.
Pandas is the most popular library for data manipulation and analysis in Python, providing powerful tools for handling structured data like DataFrames and Series.
The min() function returns the minimum value in an array.
Pandas handles missing values (NaN) gracefully during arithmetic operations; if either element is NaN, the result will also be NaN unless specified otherwise using the fill_value parameter.
The iloc[] function in Pandas is used for position-based indexing, allowing selection of data based on integer positions.
A pivot table is a data processing tool that allows for the summarization and organization of data, typically used to group and aggregate values based on specific categories.
Openpyxl is used for reading and writing Excel files.
NumPy is a fundamental package for numerical computations in Python, providing support for creating and manipulating large arrays and matrices of numeric data.
The CSV module is used for working with CSV files.
A 1D NumPy array is a one-dimensional array created using the np.array() function, which can hold a sequence of elements.
Pandas is used for advanced data wrangling tasks like filtering, transforming, and aggregating data.
A 2D NumPy array can be created using the `np.array()` function with a list of lists as an argument, for example, `np.array([[1, 2], [3, 4]])`.
The sum() function computes the sum of all elements in an array, while the prod() function computes the product of all elements.
You can add the new column using the syntax `df['New_Salary'] = df['Salary'] * 1.1`.
The apply() method applies a function along the axis of the DataFrame.
The np.ones() function creates an array filled with ones, with a specified shape.
You can select a single column by using the syntax `df['column_name']`.
You can select a single or multiple columns using the column name with syntax: `df['column_name']` or `df[['column1', 'column2']]`.
The syntax to select a row using `iloc` is `df.iloc[row_index]`.
NaN stands for 'Not a Number' and is used to represent missing or undefined values in a DataFrame.
The `applymap()` function is used to apply a function to each element of a DataFrame, returning a DataFrame of the same shape.
The median of the elements can be computed using the median() function.
When performing arithmetic operations with a scalar, the operation is applied to each element of the DataFrame, resulting in a new DataFrame with the scalar added to each element.
A method used to remove rows or columns from a DataFrame that contain missing data.
You can filter rows where Age is greater than 30 using the expression df[df['Age'] > 30].
You can filter rows where Salary is less than 80000 using the expression df[df['Salary'] < 80000].
Element-wise multiplication is the operation that multiplies corresponding elements of two arrays, resulting in a new array.
SciPy builds on NumPy and provides additional utilities for scientific computing, including modules for optimization, integration, and interpolation.
The cumprod() function computes the cumulative product of the array elements.
The 50th percentile, also known as the median, is the value that separates the higher half from the lower half of the data set.
The exponential function computes e raised to the power of each element in the array.
It sets the 'Name' column as the index of the DataFrame, allowing for easier access to data using names.
A 2D NumPy array is a two-dimensional array created using the np.array() function, which can hold a matrix of elements.
They are powerful tools to apply functions and transformations to DataFrames and Series.
A DataFrame is a two-dimensional, size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns) in Pandas.
NumPy arrays are a key feature of the NumPy library that provide efficient storage and manipulation of large datasets, especially numerical data.
Numerical computing and array handling.
The np.arange() function creates an array with a range of values, specified by a start, stop, and step size.
You can select these rows using the syntax `high_salary = df[df['Salary'] > 55000]`.
Applying operations on entire arrays (or ndarrays) without explicit loops, allowing for concise and efficient code.
The loc[] method is used for label-based indexing to filter specific rows and select specific columns in a DataFrame.
Aggregation functions in NumPy, such as `sum()`, `mean()`, and `std()`, are used to perform common mathematical operations across the elements of arrays.
Data manipulation and analysis.
Label-based indexing for selecting rows/columns in a DataFrame.
A Pivot Table is a data summarization tool in Pandas created using `pd.pivot_table()` that allows you to summarize data by specifying values, index, and columns.
Vectorized addition is the operation that adds corresponding elements of two arrays without the need for a loop.
Boolean indexing allows you to filter data based on conditions, selecting rows where a specific condition is met.
Element-wise addition is the operation that adds corresponding elements of two arrays, resulting in a new array.
Concatenation is the process of using `pd.concat()` to concatenate DataFrames along rows or columns.
Element-wise Comparison produces boolean arrays that indicate where comparisons between elements hold true.
A 2-dimensional labeled data structure, similar to a table in a database, a spreadsheet, or a dictionary of Series objects.
Boolean Masking allows you to filter out specific elements of an array using a boolean array that indicates which elements to keep.
You define the data types for fields in a structured array by creating a list of tuples, where each tuple contains the field name and its corresponding data type.
Pandas allows for vectorized operations between columns or between DataFrames, enabling arithmetic calculations directly on DataFrame columns.
iloc is a method used to access rows and columns by position in a DataFrame.
A DataFrame is a two-dimensional, size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns) in Pandas.
The output of accessing the 'age' and 'weight' fields in a structured array is a list of tuples containing the values from those fields.
`iloc[]` is used for integer-based indexing to select rows or columns in a DataFrame.
The logarithm base 10 function computes the logarithm of each element in the array with base 10.
`loc[]` is used for label-based indexing to select rows or columns in a DataFrame.
The applymap() method applies a function element-wise to all the DataFrame elements.
Series and DataFrame.
You can filter a DataFrame by using boolean indexing, where you specify a condition that returns a boolean Series, and use it to index the DataFrame.
Fancy Indexing in NumPy is the process of indexing or slicing an array using another array of indices, allowing access to multiple elements simultaneously.
A 3D array can be created using np.array() with nested lists, for example, np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]]).
High performance for numerical operations.
Position-based indexing for selecting rows/columns in a DataFrame.
Combining Datasets involves merging, concatenating, and joining DataFrames to create a unified dataset.
iloc is a method in Pandas used to access rows and columns by integer-location based indexing.
Cumulative sum of elements in an array.
The Standard Deviation measures the amount of variation or dispersion of the array elements, calculated using np.std(arr).
Cumulative product of elements in an array.
The `mean()` function is used to compute the average value of a specified column in a DataFrame.
NumPy is essential for handling numerical data in Python, providing support for large, multi-dimensional arrays and matrices, along with a wide range of mathematical functions.
A DataFrame can be created using various methods, such as from a dictionary, a list of lists, NumPy arrays, or from an external file like a CSV or Excel.
The `apply()` function is used to apply a specified function to each element in a DataFrame column for data transformation.
df_sum_with_nan is the result of adding two DataFrames (df1 and df2) while filling missing values (NaN) with 0.
Transposing an array means flipping it over its diagonal, changing its rows into columns and vice versa.
It accesses the element at row 3, column 2 (index 1) in the DataFrame.
A 2D NumPy array, also known as a matrix, is an array that contains rows and columns, allowing for the representation of data in two dimensions.
Structured arrays behave like a hybrid of regular arrays and dictionaries, allowing access to individual fields by name.
filtered_arr is an array that contains elements from the original array 'arr' that are greater than 3.
A NumPy array can be created from Python lists or by using built-in functions like np.array(), np.zeros(), np.ones(), etc.
It accesses a specific column named 'Name' in the DataFrame.
Homogeneous data types.
The isin() method allows filtering of DataFrame rows based on whether the values in a specified column are contained in a provided list of values.
Data Selection in pandas involves selecting specific rows or columns using indexers like 'loc' or 'iloc'.
You create a 2D array in NumPy by passing a list of lists to the np.array function, such as np.array([[1, 2, 3], [4, 5, 6]]).
The 'print' function is used to output the DataFrame to the console for visualization.
NumPy Basics refer to the foundational concepts of creating arrays and performing operations on them within the NumPy library.
df.loc[2, 'Name'] returns the element at row 2 in the 'Name' column of the DataFrame.
The ndarray uses contiguous blocks of memory, which allows for fast operations and access.
Matplotlib is a versatile plotting library used for data visualization.
The np.mean() function calculates the mean (average) of the elements in an array.
Seaborn is built on top of Matplotlib and provides a high-level interface for drawing attractive and informative statistical graphics.
The np.zeros() function creates an array filled with zeros, with a specified shape.
You can create a scatter plot using Seaborn with the command sns.scatterplot(x='A', y='B', data=df) followed by plt.show().
The np.percentile() function computes the specified percentile of the elements in an array.
The condition represents a logical operation that finds elements in 'arr' that are greater than 2 and less than 5.
You can select multiple columns by using the syntax `df[['column1', 'column2']]`.
The output shows only the rows of the DataFrame that meet the specified condition, such as employees with a salary greater than a certain amount.
Label and integer-based indexing.
Swapping axes 0 and 2 rearranges the dimensions of the 3D array, changing the order of the elements along those axes.
Data Indexing refers to setting and accessing custom indices in a DataFrame, allowing for more flexible data retrieval.
Filtering rows based on conditions in a DataFrame.
You can access multiple rows by providing a range of positions, for example, df.iloc[1:3] returns rows 1 and 2.
Swapping any two axes of an array, for example, using np.swapaxes(array_3d, 0, 2).
The result is a new array where each element is squared, for example, squared_array = array_1d ** 2 results in [1, 4, 9, 16, 25].
You can slice the ndarray using sub_array = array_2d[:, 1] to extract the second column.
The .T attribute is used to obtain the transposed version of a NumPy array.
The swapaxes() function is used to swap any two axes of an array, making it useful for multi-dimensional arrays.
Boolean masking in NumPy allows for filtering data and applying logical conditions to arrays, enabling selection of elements that meet specific criteria.
The function np.sum calculates the sum of all elements in an array or along a specified axis (rows or columns).
Array with Range refers to a NumPy array that contains a sequence of numbers, typically generated using functions like np.arange or np.linspace.
The function np.min is used to find the minimum value in a NumPy array.
Element-wise arithmetic operations such as addition, subtraction, multiplication, and division between DataFrames.
Performing element-wise operations without explicit loops, such as a + b which adds two arrays element-wise.
The Exponential function in NumPy calculates the value of e raised to the power of each element in an array, resulting in an output like [2.71828183, 7.3890561, 20.08553692].
Trigonometric Functions in NumPy include sine, cosine, and tangent, as well as their inverse functions, allowing for calculations based on the angles in radians.
The Cosine function computes the cosine of each element in an array (in radians), resulting in an output like [0.54030231, -0.41614684, -0.9899925].
'Age_Category' is a new column created to categorize individuals based on their age, using a function applied to the 'Age' column.
An Array of Ones is a NumPy array filled with the value 1, often used for initializing data structures.
The 'pd.DataFrame' function is used to create a DataFrame object from various data structures such as dictionaries, lists, or arrays.
Filters rows where column values match a specified list.
df.iloc[2] returns the row at position 2 (the 3rd row) of the DataFrame.
The Log base 10 function in NumPy calculates the logarithm of each element in an array to the base 10, resulting in an output like [0., 0.30103, 0.47712125].
You can use df.loc[df['Age'] > 30, ['Name', 'Department']] to filter employees older than 30 and select the 'Name' and 'Department' columns.
A DataFrame is a two-dimensional, size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns) in the Pandas library.
Hierarchical Indexing is a method in Pandas that allows for multi-level indexing, enabling the organization of complex data structures.
You can modify data in specific cells based on a condition, e.g., df.loc[df['Name'] == 'Bob', 'Salary'] = 90000.
You can combine row and column access using loc, for example, df.loc[1, ['Name', 'City']] accesses elements in row 1 for the 'Name' and 'City' columns.
NumPy arrays support element-wise operations, eliminating the need for loops and making code concise and faster.
The Square root function computes the square root of each element in an array, producing an output like [1., 1.41421356, 1.73205081].
Aggregation is the process of grouping data and creating pivot tables for summarization, allowing for easier analysis of large datasets.
Accessing elements of an array, for example, arr[0] accesses the first element.
Flipping an array over its diagonal, represented as matrix.T for the transpose of a matrix.
The Sine function computes the sine of each element in an array (in radians), producing an output like [0.84147098, 0.90929743, 0.14112001].
Element-wise operations such as addition, subtraction, multiplication, and division.
Slicing provides powerful tools for slicing arrays and accessing subarrays.
Accessing a subarray, such as arr[:3] which retrieves the first 3 elements.
The Natural log function in NumPy computes the natural logarithm (base e) of each element in an array, producing an output like [0., 0.69314718, 1.09861229].
You can create a 1D ndarray using np.array([1, 2, 3, 4, 5]).
You can create a 2D ndarray (matrix) using np.array([[1, 2, 3], [4, 5, 6]]).
The Tangent function computes the tangent of each element in an array (in radians), producing an output like [1.55740772, -2.18503986, -0.14254654].
The Power function raises each element in an array to a specified power, such as squaring each element, resulting in an output like [1, 4, 9].