Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ENUM type into DBAL 4 #6465

Closed
michnovka opened this issue Jul 7, 2024 · 10 comments
Closed

Add ENUM type into DBAL 4 #6465

michnovka opened this issue Jul 7, 2024 · 10 comments

Comments

@michnovka
Copy link

Feature Request

When DBAL came out first, there were no PHP enums. Now we have them, they are widely used and enum support in ORM has increased in past versions. However we still map them to varchar or other string types. Most databases support ENUM types, and the performance issues with ENUMs have been debunked (seriously, most of the criticism and hate for ENUM types stems from this article from 2011, and most of the claims are no longer valid). or fixed in the past years.

All supported SQL platforms support ENUM types (or alternative for SQLite CHECK) and for those that do not, a string could be leveraged same way it is now.

I would like to add enum support natively into DBAL

Q A
New Feature yes
RFC yes

Summary

This will be tricky mainly because unlike other types enums have not length, but sets of values. We will need to extend Doctrine\DBAL\Schema\Column and add ?array $_enumOptions field.

Another issue is that the EnumType class should return different types in convertToPHPValue, though all will inherit from BackedEnum

And most diffucult problem, which is one I stumbled on whilst working on doctrine/migrations#1441 , - from PHP point of view, every EnumType is a different class. If you have ColorEnumType and TasteEnumType they generate ENUM('red','black') and ENUM('salty','sweet') but DBAL has no idea how to map these to the right EnumType. It gets even worse if you have 2 PHP enum types, one for ColorEnumType and AnotherColorEnumType both with just values red and black, at which point even if DBAL could guess the propery type based on the ENUM values, it will be confused here.

The ONLY solution I see is to add back comments with typehint. At least for ENUMs. It makes sense here.

If we add comments back, I can even see a way to avoid touching the Doctrine\DBAL\Schema\Column and make it work.

This is a complex task, I am willing to work on it, but before I start, I would like to have a consensus that this is the way we can go, and that adding comments back is something thats not a no-go. We can really use comment just for enums, so that the EnumType would know to look there, while it would be ignored for other types.

@greg0ire
Copy link
Member

greg0ire commented Jul 8, 2024

I think of that article every time somebody mentions enums. Out of the 8 reasons it lists, which ones exactly would you say are no longer valid/fixed and why?

@michnovka
Copy link
Author

michnovka commented Jul 8, 2024

Well,

1. Data isn't being treated like data

The critique that ENUM misplaces data by embedding it in schema metadata rather than storing it as standard data overlooks its purpose of enforcing data integrity. ENUM is beneficial for fields with a stable and limited set of values, ensuring that only valid, predefined options are entered. This is akin to other data constraints that prevent invalid data types or null values in specific fields, thus maintaining data quality and consistency without additional application-layer checks.

2. Changing the member list of ENUM columns is very expensive

The concern that altering ENUM lists in MySQL is resource-intensive has been addressed in more recent versions. MySQL 8.0 and later offer improved capabilities for managing ALTER TABLE operations that involve ENUMs. These versions have been optimized to handle certain schema modifications more efficiently, potentially without requiring a full table rebuild, depending on the specific change and context.

In MySQL 8.0.17 and newer, certain types of ALTER TABLE operations can be performed instantly — specifically, those that don't modify the storage requirements of the underlying ENUM values. For instance, adding new ENUM values that don't cause the number of potential values to exceed 255 (the limit for a single byte representation) can often be handled without a full table copy​. However, operations that change ENUMs from a one-byte to a two-byte representation because the list exceeds 256 distinct values still require a more involved process.

Moreover, the application of InnoDB’s online DDL capabilities allows for many ALTER TABLE operations to be performed in place, which means the database remains available for read and write operations during the schema change, significantly reducing the operational impact​.

3.It's impossible to add additional attributes or related info

The criticism that ENUM types cannot accommodate related attributes like a reference table is valid but overlooks the specific purpose and use cases of ENUM. ENUM is designed for fixed sets of values where additional attributes or frequent changes to data are not expected. It excels in scenarios requiring high data integrity and performance optimization due to its internal handling as integers, which speeds up comparisons and indexing.

Moreover, if we store the value as a varchar (which is current approach in Doctrine) and not as a JOIN TABLE, this reason applies equally, so ENUM vs VARCHAR behaves same in this case.

4. Getting a list of distinct ENUM members is a pain.

Well this is debatable. SELECT COLUMN_TYPE FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = 'your_table' AND COLUMN_NAME = 'your_enum_column';. And in case of our SchemaManager, when this is actually needed, this query (or a similar one) is actually already executed, so we get the list of values by default in the column details.

5. ENUM columns may only offer limited or negligible effects on optimization.

The argument that ENUM's performance optimization is negligible until a certain scale is reached overlooks the specific contexts and environments where ENUM's efficiencies become significant.

Efficient Storage and Access: ENUM types are stored as integers internally, which can lead to more efficient data storage and faster access compared to strings, especially in tables with a large number of rows and frequent access patterns that involve these ENUM columns. This storage efficiency reduces the database size and can enhance performance by minimizing I/O operations, which is crucial for high-volume transaction systems.

Reduction of Joins: While modern SQL databases handle joins efficiently, eliminating unnecessary joins can simplify query execution plans. This is particularly beneficial in read-heavy databases where reducing the number of joins can directly impact the response time and system load. ENUMs eliminate the need for joins to reference tables for each query, thereby simplifying the underlying SQL and potentially reducing the execution time for queries involving these ENUM columns.

In conclusion, while ENUM may not always offer significant performance benefits over reference tables, in scenarios where the dataset and query patterns align with ENUM's strengths, it can provide both performance improvements and schema simplifications. Therefore, the decision to use ENUM should be driven by specific needs and empirical data rather than a one-size-fits-all approach.

6. You can't reuse the member-list of an ENUM column in other tables.

The limitation of not being able to reuse ENUM member lists across tables is a valid point but underscores ENUM's intended use for specific, unchanging data sets. ENUM is optimal for attributes with a stable and consistent set of values, enhancing performance by reducing storage and ensuring data integrity without the complexities of join operations. For cases requiring shared and dynamic lists across tables, reference tables are more suitable. The choice between ENUM and reference tables should be based on the specific stability and performance needs of the application, balancing simplicity and flexibility.

7. ENUM columns have noteable gotchas.

The concern that ENUM columns do not robustly enforce data integrity when invalid values are inserted—resulting in the insertion of an empty string (' ') or the numeric index being confused with the value—highlights a need for careful ENUM usage, yes. However just because a data type behaves a bit "strange", does not mean it behaves unexpectedly. As with other types, you have to know what you do when using this specific data type.

Many other types have their own quirks. E.g. TIMESTAMP can only store data until 2038. FROM_UNIXTIME(anything after 2038) gives NULL. FLOAT should not be used for currency and where precision is required, DECIMAL should instead. in Oracle, empty strings are treated as NULL. Etc. So, as long as ENUM behavior is "quirky" and not "undocumented" and "unpredicatable", this should not be an argument.

Strict SQL Mode: When strict SQL mode is enabled in MySQL, the database will reject invalid ENUM values outright instead of substituting them with an empty string or incorrect index. This behavior aligns closer to what you would expect from using a reference table with foreign keys, as it enforces data integrity by preventing invalid data entry at the database level​​.

8. ENUM has limited portability to other DBMS.

Yes, true. There is CHECK for Oracle, SQL Server and SQLite which is NOT the same. But as the article admits:

Generally, migrating to a different database system is something that just doesn't happen that often and everybody assumes will bring out demons anyway, which is why this just squeaks in at number 8 on the list.


I think that it should be up to the developer to decide whether to use ENUMs or not. For me personally, the data inside my DB pose higher value to me than the application. Also, maybe my application is not the only one that will write to the database, and while I can make sure I always write in proper "allowed" values into my VARCHAR representation of enum, will the other application too? With DB ENUM type the boundaries and integrity of my dataset are ensured for anybody writing into the DB.

@derrabus
Copy link
Member

derrabus commented Jul 8, 2024

I would be open to adding an enum type to DBAL. If you want to work on this, please go ahead.

But be warned that this won't be a trivial feature. If we add this type, we need to do it right. This means that you will need to build an extensive functional (!) and portable (!) test suite for this feature, covering:

  • Schema definition and deployment
  • Schema introspection
  • Schema comparison and diff computation
  • Data insertion and retrival

This is an abstraction layer with support for multiple DBMS. While building this feature, you will debug issues with databases you've never worked with before. Be prepared to spin up a DB2 in Docker. ✌🏻

The ONLY solution I see is to add back comments with typehint. At least for ENUMs. It makes sense here.

Absolutely not. The old type comments won't help you here. We must be able to introspect a table that has no comments. And for sure, we must handle the case that the deployed ENUM column and the corresponding PHP enum are out of sync, e.g when you add a new case to the enum and attempt to generate a migration.

This means that the EnumType that we build will need to work without a PHP enum, mapping values to plain old strings in PHP. This will solve your problem of guessing the right enum and my problem of representing a deployed enum that is out of sync. I believe we should even build this type without support for PHP enums in its first iteration.

To me, using an enum in MySQL and using one in PHP are two related but different concerns. I'd like to avoid luring people into thinking that they need to switch the whole database to enum columns just because they use enums in PHP. Mapping a PHP enum to a VARCHAR or INT column is still a good solution. That is especially the case when you have to assume that the list of enum cases might change regularly. Also, mapping a MySQL ENUM to a PHP string can be a valid thing to do, especially if you need to operate on a database with a dynamic/unknown schema.

Also, when you think about it: The ORM is already able to handle the PHP enum hydration, so for a first iteration, it might just be enough to have an EnumType that maps to PHP string.


Regarding that article: It still is valid after all this time even though MySQL has gotten better. The article does not tell us to avoid MySQL ENUMs at all cost. It lists valid reasons for not using it. This means, after taking all of those points into account, we can still make a well-informed decision in favor of a MySQL ENUM.

  1. Data isn't being treated like data

Still valid! I've seen projects where enums are abused for things like "car manufacturer" or "department". And yes, that is data and changing data should not result in the need to altering your schema. If you need integrity checks on this, foreign key constraints got you covered.

  1. Changing the member list of ENUM columns is very expensive

Okay, now it might be expensive and it's not very transparent to the user under which circumstances this operation is expensive. We still need to take into consideration, how often the list of enum values might be changed in the future.

  1. It's impossible to add additional attributes or related info

Still valid. 🤷🏻‍♂️

  1. Getting a list of distinct ENUM members is a pain.

It is! But that's complexity we can bury inside our schema manager or platform classes.

  1. ENUM columns may only offer limited or negligible effects on optimization.

Yes. If you only do it for storage optimization, use a SMALLINT or TINYINT will do nicely.

  1. You can't reuse the member-list of an ENUM column in other tables.

Valid! Change an enum that you've reused in five tables and you need to alter five tables. Yay.

  1. ENUM columns have noteable gotchas.

Valid, but… 🤷🏻‍♂️

  1. ENUM has limited portability to other DBMS.

And here we are at the biggest concern that this type raises for a database abstraction layer. If we build a feature here, it must be portable! A new type that is exclusive to MySQL would be a no-go.


I think that it should be up to the developer to decide whether to use ENUMs or not.

Yes. We're not here to lecture anyone. Using ENUMs in MySQL and friends can be fine. But it's important to me that it's not the natural equivalent to a PHP enum.

@dbu
Copy link
Member

dbu commented Jul 9, 2024

side note: in doctrine ORM, backed PHP enums can already be used in entities (mentioned here in the doc). for the use case of using PHP enums in ORM entities, you don't need enums on database level.

i comment this so people who look for enums in ORM don't get confused. supporting database enums on dbal level would have merit. it would allow to use PHP enums to define the database enum configuration, and would simplify the enum handling explained here in the doc.

@beberlei
Copy link
Member

beberlei commented Jul 9, 2024

Doctrine DBAL is - as you pointed out yourself- not suited to support database level enums. It goes against a few architectural choices we made for Type and will have to workaround this for no real benefit. If you want to use ENUM, just create the DDL SQL yourself. DBAL can only go so far with support for anything databases offer, we have to focus on a common denominator that is maintainable.

So from my POV ENUM support as you outlined with comments is a no merge.

@beberlei
Copy link
Member

beberlei commented Jul 9, 2024

@michnovka also, it feels that your wall of text comment is AI generated, so if it is, i find it disrespectful of you for our time to have us read so much text and expect us to reply to you in a structured way when you let a machine do the thinking for you.

@michnovka
Copy link
Author

@derrabus I've given it some thought over night and did research into ENUM types in other DBMS. Seems that the CHECK alternative is fundamentally different, so it doesn't seem like this can be supported on other platforms than MySQL and friends and PostgreSQL. Hence, since this should be an abstraction layer, probably this is not a good idea to implement.

This idea came out of doctrine/migrations#1441 because Doctrine not recognizing ENUMs is causing schema diff inconsistencies. I thought that while at it, why not just implement an EnumType type once and for all - after all, I use it in almost every project with Doctrine with my custom type (now partially broken due to migrations).

I think I will focus on fixing the migration issue and overall improvement of https://github.com/Elao/PhpEnums which adds the Doctrine ENUM type for MySQL together with some other helpful features for PHP enums.

The old type comments won't help you here. We must be able to introspect a table that has no comments. And for sure, we must handle the case that the deployed ENUM column and the corresponding PHP enum are out of sync, e.g when you add a new case to the enum and attempt to generate a migration.

Fixing migrations to compare actual SQL of old schema instead of generating an "expected" SQL based on deduced type will allow me to use custom ENUM type for my use-case and we will not bloat DBAL with MySQL specific code.

@beberlei I did not generate the text with AI. I was trying to make a strong argument for ENUMs and did research, which I was compiling in Word, so unlike my other replies usually typed on phone it had some structure and grammar/syntax checks. I am sorry if it was too long, I wanted to make a comprehensive argument in one post.

@derrabus
Copy link
Member

derrabus commented Jul 9, 2024

@derrabus I've given it some thought over night and did research into ENUM types in other DBMS. Seems that the CHECK alternative is fundamentally different, so it doesn't seem like this can be supported on other platforms than MySQL and friends and PostgreSQL. Hence, since this should be an abstraction layer, probably this is not a good idea to implement.

Thank you for your answer and for opening this discussion. I'm going to close the issue now.

@PowerKiKi
Copy link
Contributor

@michnovka, you mentioned using https://github.com/Elao/PhpEnums instead. But it also generate VARCHAR instead of ENUM. So why not use the native enumType column attribute, then ? or do you plan to somehow implement support for DB ENUM in elao/enum ?

Copy link

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 12, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants