Contents
Why do you need a database?
Because the application needs to save the user’s data, such as Word needs to save the user’s document, so that it can continue to edit or copy to another computer next time.
One of the easiest ways to save user data is to write user data to a file. For example, to save information for all students in a class, you can write a CSV file to the file:
id, name, gender, score 1, John, M, 90 2, Jane, F, 95 3, Lana, M, 88 4, Paul, F, 88
If you want to save information about all classes in the school, you can write another CSV file.
However, as the functionality of applications becomes more complex and the amount of data grows larger, how to manage them becomes a big problem:
- Reading and writing files and parsing out the data requires a lot of duplicate code;
- Quickly querying specified data from thousands of data requires complex logic.
If each application writes its own code for reading and writing data, it is inefficient and error-prone. On the other hand, the interface for accessing data by each application is different, and the data is difficult to reuse.
Therefore, the database appears as a software that manages data specifically. Instead of managing the data yourself, the application reads and writes data through the interface provided by the database software. As for how the data itself is stored in the file, that is the matter of the database software, the application itself does not care:
┌──────────────┐ │ application │ └──────────────┘ ▲│ ││ read││write ││ │▼ ┌──────────────┐ │ database │ └──────────────┘
In this way, the ability to read and write data is greatly simplified when writing applications.
Data model
The database organizes, stores, and manages data according to the data structure. In fact, the database has three models:
- Hierarchical model
- Mesh model
- Relational model
A hierarchical model is a way of organizing data in a hierarchical relationship of “up and down”. The data structure of a hierarchical model looks like a tree:
┌─────┐ │ │ └─────┘ │ ┌───────┴───────┐ │ │ ┌─────┐ ┌─────┐ │ │ │ │ └─────┘ └─────┘ │ │ ┌───┴───┐ ┌───┴───┐ │ │ │ │ ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐ │ │ │ │ │ │ │ │ └─────┘ └─────┘ └─────┘ └─────┘
The mesh model connects each data node with many other nodes, and its data structure looks like a road network between many cities:
┌─────┐ ┌─────┐ ┌─│ │──────│ │──┐ │ └─────┘ └─────┘ │ │ │ │ │ │ └──────┬─────┘ │ │ │ │ ┌─────┐ ┌─────┐ ┌─────┐ │ │─────│ │─────│ │ └─────┘ └─────┘ └─────┘ │ │ │ │ ┌─────┴─────┐ │ │ │ │ │ │ ┌─────┐ ┌─────┐ │ └──│ │─────│ │──┘ └─────┘ └─────┘
The relational model treats the data as a two-dimensional table. Any data can be uniquely determined by the row number + column number. Its data model looks like an Excel table:
┌─────┬─────┬─────┬─────┬─────┐ │ │ │ │ │ │ ├─────┼─────┼─────┼─────┼─────┤ │ │ │ │ │ │ ├─────┼─────┼─────┼─────┼─────┤ │ │ │ │ │ │ ├─────┼─────┼─────┼─────┼─────┤ │ │ │ │ │ │ └─────┴─────┴─────┴─────┴─────┘
Over time and market competition, ultimately, the relational model based relational database gained absolute market share.
Why is the relational database the most widely used?
Because the relational model is simpler to understand and use than the hierarchical model and the mesh model.
The relational model of relational databases is based on mathematical theory. We define a domain as a set of values with the same data type. Given a set of fields D1, D2,…, Dn, their Cartesian set is defined as D1 × D2 × … × Dn = {(d1,d2,…,dn)|di∈Di,i=1,2,…,n}, and a subset of D1×D2×…×Dn is called in the domain D1, D2, …, the relationship on Dn, expressed as R (D1, D2, …, Dn), where R means $#%&^@!&$#;!~%¥%:(… I don’t understand at all, and everyone doesn’t need to understand.
Although the relational model based on mathematical theory is quite complicated, the relationship model based on daily life is very easy to understand. Let’s take the school class as an example. A class student can save it in a form and define it as follows:
ID | Name | Class ID | gender | age |
---|---|---|---|---|
1 | Xiao Ming | 201 | M | 9 |
2 | Xiaohong | 202 | F | 8 |
3 | Xiaojun | 202 | M | 8 |
4 | noob | 201 | F | 9 |
Among them, the class ID corresponds to another class table:
ID | name | head teacher |
---|---|---|
201 | Second grade class | Teacher Wong |
202 | Second grade second class | Teacher Li |
By giving a class name, you can find a class record. According to the class ID, you can find multiple student records. In this way, the two-dimensional table establishes a “one-to-many” relationship through ID mapping.
type of data
For a relational table, in addition to defining the name of each column, you also need to define the data type of each column. Standard data types supported by relational databases include values, strings, time, and so on:
name | Types of | Description |
---|---|---|
INT | Integer | 4-byte integer type with a range of approximately +/- 2.1 billion |
BIGINT | Long integer | 8-byte integer type with a range of approximately +/- 922 billion |
REAL | Floating point | 4-byte floating point number, range +/-10 38 |
DOUBLE | Floating point | 8-byte floating point number, range approximately +/-10 308 |
DECIMAL(M,N) | High precision decimal | The decimal specified by the user, for example, DECIMAL (20, 10) represents a total of 20 digits, of which 10 decimal places, usually used for financial calculations |
CHAR(N) | Fixed length string | Stores a string of the specified length. For example, CHAR(100) always stores a string of 100 characters. |
VARCHAR(N) | Variable length string | Store variable-length strings. For example, VARCHAR(100) can store strings of 0~100 characters. |
BOOLEAN | Boolean type | Store True or False |
DATE | Date type | Store the date, for example, 2018-06-22 |
TIME | Time type | Storage time, for example, 12:20:59 |
DATETIME | Date and time type | Store date + time, for example, 2018-06-22 12:20:59 |
The most common data types are listed in the table above. Many data types also have aliases, for example, REALthey can be written FLOAT(24). There are also some data types that are not commonly used, for example, TINYINT(ranging from 0 to 255). Each database vendor also supports specific data types, for example JSON.
When selecting a data type, choose the appropriate type based on the business rules. In general, it BIGINTcan meet the needs of integer storage and VARCHAR(N)meet the needs of string storage. These two types are the most widely used.
Mainstream relational database
At present, the mainstream relational databases are mainly divided into the following categories:
- Commercial databases, such as: Oracle , SQL Server , DB2, etc.;
- Open source databases such as MySQL , PostgreSQL, etc.
- Desktop database, represented by Microsoft Access , suitable for desktop applications;
- Embedded database, represented by Sqlite , is suitable for mobile applications and desktop applications.
SQL
What is SQL? SQL is an abbreviation for Structured Query Language, which is used to access and manipulate database systems. SQL statements can query data in the database, add, update, and delete data in the database, as well as manage and maintain the database. Different databases support SQL, so that we can operate a variety of different databases by learning the language of SQL.
Although SQL has been defined as a standard by ANSI organizations, unfortunately, different databases have less consistent support for standard SQL. Also, most databases have been extended on standard SQL. That is to say, if only standard SQL is used, theoretically all databases can be supported, but if you use the extended SQL of a particular database, you can’t execute a database. For example, Oracle calls its own extended SQL PL/SQL, and Microsoft calls its own extended SQL T-SQL.
The reality is that if we only use the core functionality of standard SQL, then all databases can usually be executed. Infrequently used SQL functions, different databases support different degrees. The respective extended functions supported by each database are usually referred to as “dialects”.
In general, the SQL language defines several capabilities for manipulating databases:
DDL: Data Definition Language
DDL allows users to define data, that is, create tables, delete tables, and modify table structures. Typically, DDL is performed by a database administrator.
DML: Data Manipulation Language
DML provides users with the ability to add, delete, and update data. These are the daily operations of the application against the database.
DQL: Data Query Language
DQL allows users to query data, which is also the most common daily operation of the database.
Grammatical features
SQL language keywords are not case sensitive! ! ! However, for different databases, for table names and column names, some databases are case-sensitive, and some databases are not case-sensitive. The same database, some are case sensitive on Linux, and some are not case sensitive on Windows.
Therefore, this tutorial stipulates that the SQL keyword is always capitalized to highlight, the table name and column name are lowercase.
Discussion about this post