File, Block and Object Storage:
- FS: Data as individual files
- BS: Stores and manages as individual chunks where each block receives a unique identifier, but no additional meta-data is stored with it
- OS: Data is stored and managed as objects (This entails: Data itself, meta-data about the data, and a unique identifier)
  - Helpful for storing and retrieving growing amounts of data of any type
    - This is why its a perfect choice for Amazon S3 or Data Lakes in general
Amazon S3 - Data Lake; allows u to ingest anything and gives you tools to create a secure, compliant, and audit-able data lake
with Data lake in place, you can now build on top of this “repo” using other tools to do Data Warehousing, Analytics, and ML directly from the lake

Databases

THe data is stored on DISK, such that it is persistant! (Don’t lose it if it crashes)
RDBMSs provide a way to structure data within disk so that reading from and writing to it is efficient. The data is formatted into precisely defined fields, so that it can be queried easily.

B+ Trees

To effficiently read and write to databases, they use a data structure called B+ Tree
Instead of having 2 childern like a binary tree, this tree has m+ childern.
Each node doesn’t contain 1 value, but rather 2 values
Data is actually ONLY stored in the leaf nodes, rest of the nodes just help you get there
The leaf nodes are stored in some form of LinkedList, which are linked in a sorted order. This means that once you get access to a specific piece of information in a leaf node, but now want to get similar stuff, it’ll just be easier to find the next few elements.
If there are M nodes, this must mean that there are m-1 keys. For example, in the above picture we have m =3, but only 2 keys above it.
B+ trees also provide indexing
- Indexing is a way to improve the speed of data retrieval operations on a database table, coming at the cost of additional writes and storage space to maintain the index data structure. To give a quick example, if we had a bunch of names that mapped to their respective phone numbers, we could pick the name field as the index, allowing us to retrieve phone numbers faster in the future. However, there are multiple downsides to indexing, which are beyond the scope of this discussion.

How data is stored - Relational Databases

In an SQL database, data is stored inside tables. Tables are a way to organize data where each row contains information about a single primary key. A primary key uniquely identifies each record, where a record is a row.

If we go back to the phone numbers example, let's say we want to uniquely identify each person by their phone number and store their name in a table called People. We must declare the structure of the table first before inserting any records. In SQL, it can be defined as the following:

CREATE TABLE People (
    PhoneNumber int PRIMARY KEY,
    Name varchar(100)
);

varchar, (sometimes pronounced var-car) is a data type for variable length strings. This can inlcude numbers, letters and special characters.

If we wanted to have another table which would associate each PhoneNumber in our People table with an address, indicating that each person must have an address, we can do so in a table called HOMES. We want to ensure that no PhoneNumber that does not exist in People can be inserted into Homes. This is an example of a FOREIGN KEY constraint. This now means that each row in Homes table is associated with a row in the People table via the PhoneNumber.