Python is a popular and high-level programming language that can help a developer create programs quickly and effectively without taking care of details. We can also get Python’s benefits in Finance as it can help you automate your trade, optimize your portfolio, and backtest analysis.
Mostly, high-level languages are slow, but that’s not the case with Python. The high speed of Python execution is another reason we consider it an exceptional language for Finance.
So, we’re here to guide you about an amazing trio that will enable you to win in Finance. Without further ado, we’ll let you dive in to learn about Python for Finance.
What is Numpy?
Numpy is a package in Python for scientific computing. Suppose we have 800 stocks and need to calculate the price-to-earnings ratio. We will have one column for price and one for earnings stored in the lists.
Now, if we loop through each row, our CPU takes 800 cycles to calculate the answer (Converting the calculation into bytecode and then calculating the answer). But with Numpy, you can decrease the cycle time.
Vectorization processes multiple calculations using Single Instruction Multiple Data (SIMD), and Numpy takes advantage of it. So, with Numpy, you can get your calculation done within 200 cycles.
Let’s show examples of how Numpy makes things easy for a developer and simplifies multiple tasks.
Using Python and a List of Lists (800 Stocks)
Using Numpy (800 Stocks)
From the above mentioned examples, you can see how Numpy helps in fast execution and development effectiveness. One thing you should know is that the shape type of both vectors must be the same, i.e., 5-item integer or 10-element float.
How to Create a Numpy Array?
You can also create an array in Numpy, allowing you to easily calculate several statistics. Here’s how you can create an array to return the value along a certain axis.
NumPy.mean – It returns the average value.
NumPy.ndarray.max – it returns the maximum value.
NumPy.ndarray.min – It returns the minimum value.
You must be well aware of Ms. Excel, right? 2d arrays are just like Excel. In Excel, values are organized in rows and columns. The max, min, and mean calculations will be performed across the 2D array cells. (Unless we give another condition).
Similarly, in Numpy, to get the max out of the row, we will specify ndarray.max(axis=1), and to get the max out of the column, we will specify ndarray.max(axis=0).
Note: All fields must be the same type if you’re using these operations.
Data Selection in Numpy
Did you ever highlight and copy cells in Ms. Excel? The selection of the data in Numpy is similar. To retrieve Numpy 2d array data, it uses row and column format. (row and column here represent a value or a list)
Data Selection Using using Boolean Arrays
Boolean arrays are amazing as it’s easy to read and understand in code. Boolean uses comparison operators (>,==,<) and gives results in either True or False.
From the above example of 800 socks, to find all stocks with a pe_ratio less than 8, we can use this:
This will return a one-dimensional array with either True or False values. Let me give you a table for your better understanding. (Price-to-earnings ratio)
|Company Name||Price||Earnings||P/E Ratio||PE < 8|
Once we get a Boolean array, it’s easy for us to apply it to the original array and get our desired values using this:
|Company Name||Price||Earnings||P/E Ratio|
Now, let’s deal with a 2-D array. In a 2-D array, we need to ensure the mask dimension is the same as the array dimension (That is being filtered). Let’s show you the working of the Numpy 2D array.
Now that you have seen how we selected data, it’s time to modify it. Well, It’s easy for you now as you have gone through the selection process. We will modify the above array by assigning a value to selections.
What Is Pandas?
Pandas is a Numpy Extension that analyzes and manipulates the data and makes it easier for us to perform operations. In simpler words, we can say it’s a cherry on top. Numpy and Panda together can do wonders for you!
Pandas Data Types
There are two data types in Pandas:
Pandas Series – It’s just like Numpy one-dimensional array but in the pandas version.
Pandas Dataframe – It’s a pandas version of a Numpy two-dimensional array.
Now, you must think, if they’re similar, why use this? No doubt it’s similar but has additional features to make your computing or manipulation more effective and fast. Also, we have named labels in the pandas version.
Dataframe further has three core components:
- Index axis (row)
- Column axis (Column)
- Data (Values)
We’ve attached an image below to make it easy for you to understand data from a Pandas Dataframe.
How to Create Pandas Dataframe?
You can create a pandas dataframe by reading records from JSON, API, CSV, database, or by constructing from a dictionary:
How to Create Pandas Series?
To create the Pandas series, the process is similar (reading records from JSON, API, CSV, or database). Let us give an example of the above dataframe to construct the Pandas series.
There are two data types; continuous and categorical. In continuous data, we deal with numerical values, such as measurements of quantities (Price, Liters, Grams). On the other hand, Categorical represents discrete amounts, or we can say attributes (High Price, Weighs more)
Now that you have understood the difference in a layman’s language, it’s time to dive into the technical side. Have a look at the table below:
|General Type||Pandas String Name||NumPy / Pandas Object|
How to Understand the Data?
dataframe.info – It returns the basic information about the data frame. (the shape, data which is shorthand for data types)
dataframe.index – Returns row labels
dataframe.columns – Returns column labels
dataframe.to_numpy – Returns dataframe values
series.index – Returns axis labels of the series
series.to_numpy – Returns the values of the series
series.dtype – Returns the type of the series
dataframe.head – Returns 5 records starting at the top
dataframe.tail – Returns 5 records starting at the bottom
series.head – Returns the first n rows (Starting at the top)
series.tail – Returns last n rows (Starting at the bottom)
The data selection in pandas can be done using loc, iloc, indexers :
DataFrame[row_selection, column_selection] – Used for the selection of columns
DataFrame.loc[row_selection, column_selection] – Used to index both rows and columns by labels including the last column.
DataFrame.iloc[row_selection, column_selection] – Used to index rows and columns by integer except the last integer position.
Series.loc[index_selection] – Used for indexing rows by the index label.
Series.iloc[index_selection] – Used for indexing rows by the integer position.
Data Selection Shortcuts
If the above method looks tiring, we have some shortcuts that may help you select data. You can use…
dataframe[‘label’] to select a single or a list of rows.
Dataframe[‘label1′:’label3′] to slice rows.
dataframe.loc and dataframe.iloc to perform row selection.
Pandas series and pandas dataframes have different methods for understanding. However, several methods work for both pandas series and pandas dataframe.
We get descriptive stats with Series.describe and DataFrame.describe. We can use the chain method to return a pandas series and then describe it. Please have a look at the example below:
We will still need to specify if we want results across the columns or rows for some dataframe methods. If we want results for each column, we will sum across the index values, and if we want results for each row, we will sum across the column values.
To understand axis numbers, remember that in a dataframe[0,1], 0 represents rows, and 1 represents the column. You don’t have to state it, as 0 is, by default, representing calculating rows.
Below is the methods list and what they return to the requested axis.
- Series.sum and DataFrame.sum returns the sum.
- Series.mean and DataFrame.mean returns the mean.
- Series.median and DataFrame.median return the median.
- Series.mode and DataFrame.mode returns the mode.
- Series.max and DataFrame.max returns the maximum value.
- Series.min and DataFrame.min returns the minimum value.
- Series.count and DataFrame count return the number of non-NA elements.
- Series.value_counts returns the unique value counts as a series.
Standard Operations Between Two Series
You can use all Python numeric operations on the Pandas series.
Now that we’ve been through understanding and selecting the data, it’s time to assign the values. Now, it’s really a piece of cake for you. You can also use Pandas selection shortcuts for values assignment, including boolean indexing.
want to learn how to algo trade so you can remove all emotions from trading and automate it 100%? click below to join the free discord and then join the bootcamp to get started today.