> While this was a shock in the moment, I would later learn that this attitude was fairly common among Rails developers at the time who were largely encouraged to treat their database as a dumb data store by convention and to let the application do the work.
This is not a bad attitude depending on specifically what levels of "dumbness" you mean. IMHO, the golden rules for database setup are:
- Spend a lot of time designing your schema. Then spend some more time.
- Do not skimp on data integrity measures like proper primary/foreign keys
- Pay attention to potential usage patterns and build indices for performance; this is not a one-time activity, keep observing the usage and look out for hot-spots
Some other tangential points that are often overlooked:
- Do not push all your business logic into a hairball of nested stored procs and triggers; debugging/testing all that is a bitch compared to writing unit/integration tests or dropping into debugger in an IDE. You can be a million times more expressive in a full-featured modern language like Java or C# etc. The main exception to this is when you need to push/transform tons of data around and all of it is on the database - a well tuned stored proc would be orders of magnitude faster. This is also not something is needed frequently and could be symptomatic of a bigger issue.
- Tons of database trips are expensive; an efficient join instead can do wonders for the performance. If you are doing that all the time, maybe its time to put that behind a view. Also, cache what you can and be conservative about it.
- If you are almost always including a filter clause in all your joins, maybe you can explore partitioning.
Basically imagine the database like an appliance that does only a few things but those things are FUNDAMENTAL so we ensure that it does them very very well. Don't make it make your coffee too (though PostgreSQL probably can do even that efficiently).
> There were other factors too, such as depending on Rails to validate uniqueness rather than the database. It resulted in a lot of messy data cleanup that I had to deal with which were causing hard to diagnose issues because race conditions will always happen.
> Second, the database is the only place that can make guarantees about data integrity so use them.
Typically, these rules are not what people mean when they say they use their DB as a dumb data store. What they mean is:
- your "schema" is not that important. Sure, you spend a lot of time modeling data in such a way that your application always handles valid data. Once that's done, your DB is "just" a store for that data.
- data integrity is enforced before data reaches the DB. pks/fks are mostly there to ensure fast queries.
The idea is that data in your system should be in a sane state way before it reaches the database, and once it's sane, making more controls at DB level is just a layer of administrative checks done in a language that is not really well-suited for that.
This is not a bad attitude depending on specifically what levels of "dumbness" you mean. IMHO, the golden rules for database setup are:
- Spend a lot of time designing your schema. Then spend some more time.
- Do not skimp on data integrity measures like proper primary/foreign keys
- Pay attention to potential usage patterns and build indices for performance; this is not a one-time activity, keep observing the usage and look out for hot-spots
Some other tangential points that are often overlooked:
- Do not push all your business logic into a hairball of nested stored procs and triggers; debugging/testing all that is a bitch compared to writing unit/integration tests or dropping into debugger in an IDE. You can be a million times more expressive in a full-featured modern language like Java or C# etc. The main exception to this is when you need to push/transform tons of data around and all of it is on the database - a well tuned stored proc would be orders of magnitude faster. This is also not something is needed frequently and could be symptomatic of a bigger issue.
- Tons of database trips are expensive; an efficient join instead can do wonders for the performance. If you are doing that all the time, maybe its time to put that behind a view. Also, cache what you can and be conservative about it.
- If you are almost always including a filter clause in all your joins, maybe you can explore partitioning.
Basically imagine the database like an appliance that does only a few things but those things are FUNDAMENTAL so we ensure that it does them very very well. Don't make it make your coffee too (though PostgreSQL probably can do even that efficiently).
> There were other factors too, such as depending on Rails to validate uniqueness rather than the database. It resulted in a lot of messy data cleanup that I had to deal with which were causing hard to diagnose issues because race conditions will always happen.
> Second, the database is the only place that can make guarantees about data integrity so use them.
Thats what I meant.