Re: Constructing master lists of data types Lassi Kortela 30 Sep 2019 15:09 UTC
> What are the various levels of Scheme data types we want to support, and how, for databases? In my experience this kind of task is usually best to approach by making a survey and putting the results in a huge table. It gives confidence that we're making the right choices, and the survey usually finds things that are surprisingly easy or surprisingly hard. That can change the entire design. > Text is *messy* Yes. We may have to delegate most of the messes to DB engine configuration or leave them as the user's problem :) > databases tend to be set to one encoding Yes, but the percentage of data in a large database that actually uses the claimed encoding is another matter :) I've dealt with huge databases where there were continual character encoding problems. Likewise, people who administer long-lived web forums can be heard complaining about it. > but SQLite3 is more flexible. Suggestions on precisely what to do here are solicited, this is not a thing I know very well. If the DB engine encoding is known, and we're running in a Scheme implementations like Gauche that supports many character encodings, we can construct strings with the correct encoding right off the bat. In Schemes that use only one character encoding internally, and the query results use a different encoding, we should probably return the results as bytevectors instead of strings. Then the user can hook up some charset conversion library if they want strings (or reconfigure their database to emit the encoding used by the Scheme). DB engines are big pieces of software so maybe many of them have charset conversion engines built in, and we can pick the encoding we want to receive in the connection string (https://www.connectionstrings.com/). > At the database side, I need to construct something that contains all the types a column (or whatever for e.g. graph databases) can have, which are supported by which databases, and with what quirks. This will eventually become architecture and code to Do The Right Thing in the middle of the sdbi stack, automagically, or at the direction of the user. Auto-conversion is very nice when it works. I suspect the problem domain is complex enough that users should be able to configure what auto-converter (if any) to use for each data type. It can help in scenarios that require high performance on big data sets to turn off some unused conversions. Or if you're interfacing to a legacy DB that has particularly weird data, some standard conversions may not work right.