DB Table Optimization
@!
Hubzilla Support Forum When looking at table schemas, it is always noticeable that field definitions with CHAR(191) are found very often. This may have historical reasons, which are not clear to me today. The field type CHAR leads to a fixed size in the table rows and in the example to a fixed space requirement of 191 characters (not necessarily 191 bytes). If such a field content is selected by PHP, an automatic trim takes place, so that if 100 characters are used in the field of 191 characters, 100 characters are available in terms of content. So far this is also correct. Only the disadvantage is given that in the storage of the table place for 191 characters is held. This is also correct, because it can happen that values of up to 191 characters are inserted or updated in this field. The more rows such a table has, the more space is wasted unnecessarily. Both mysql and postgresql offer the field type VARCHAR. This has the great advantage that table fields only take up space in storage with their current contents. An additional length information is stored for it, but this requires at most 1 to 3 bytes (depending on the now maximum field size). In the example, the space requirement for VARCHAR would be 100 characters plus 1 byte.
In an actual example with the item table, and on a small hub instance, the difference between CHAR and VARCHAR is considerable:
With 20 fields of type CHAR:
SELECT AVG_ROW_LENGTH FROM information_schema.tables WHERE TABLE_NAME = 'item';
= 33.809
= 182.7 MiB (data 140.5 index 42.2)
With 20 fields test wise as VARCHAR instead of CHAR:
SELECT AVG_ROW_LENGTH FROM information_schema.tables WHERE TABLE_NAME = 'itemvar';
= 18.438
= 101.8 MiB (data 84.6 index 17.3)
Thus, VARCHAR requires only 56% of the storage compared to CHAR.
The item table is one of the largest in Hubzilla and on large Hub instances significant improvements can be made by making slight changes to the table schema.