Prior are now not subjective but useful, the OP is about the problem of choosing the best priors. The best options are informative priors (1) and regularizers (2). So, for example, choosing as prior a Laplace distribution for the unknown parameters is equivalent to the LASSO that is a well known way of obtaining sparse models with few coefficients. In (2) there is an example in which a prior suggest a useful regularization method for regression. In (3) the author discusses prior modeling.
The platform allows the agents to create new tools and organize knowledge. Is about 60000 lines of code, about six months 10 hours/day seven days a week. I hope my friend get the appropriate funding to continue with his project, as karpathy recommend in one of his talk, first get the system to obtain top capacity, then reduce costs.
By solving a hard problem or providing a new way of attacking some problem an agent can show behavior that mimic intelligence. What kind of problem should this system attack?
>> find the GCD (greatest common divisor) of the smallest and largest numbers in an array
Just for a short comparison, In J the analogous code is </ +. >/
Where / is for reduce, +. is for the GCD, the LCM is *.
The basic idea of J notation is using some small change to mean the contrary, for example {. for first and {: for last, {. for take and }. for drop (one symbol can be used as a unary or binary operator with different meaning. So if floor is <. you can guess what will be the symbol for roof. For another example /:~ is for sorting in ascending order and I imagine that you can guess what is the symbol for sorting in descending order. In a sense, J notation include some semantic meaning, a LLM could use that notation to try to change an algorithm. So perhaps someone could think about how to expand this idea for LLM to generate new algorithms.
The matrix m, the sum of the rows, and the maximum of the sum of the rows in J (separated by ;)
m ; (+/ m) ; >./ +/ m
┌─────┬───────┬──┐
│0 1 2│9 12 15│15│
│3 4 5│ │ │
│6 7 8│ │ │
└─────┴───────┴──┘
To understand this you need to know that >. and <. are the min and max functions, and that in J three functions separated by spaces, f g h, constitutes a new function mathematically defined by (f g h)(x) = g(f(x), h(x)). An example is (+/ % #) which applied to a list gives the mean of the list. Here +/ gives the total, # gives the number of elements and % is the quotient.
I didn't know about his inequalities, but I found (1) that provides an example of using Talagrand inequality applied to the longest increasing subsequence problem (12 pages, easy to read). It seems to be a broad generalization of the Hamming distance.
I think that yours comment is very interesting, I have reflected many times about how to differentiate things that appear in the same context of things that are similar. Any big idea here could be the spark to initiate a great startup.
In Spanish we have ensayo = attempt or try, frequently is used to refer to attempts before the real show or performance (in a music festival, a theater, ..). Also ensayo is just equivalent to what Montaigned introduced when referred to a text composition.
(1) https://en.wikipedia.org/wiki/Prior_probability#Informative_...
(2) https://skeptric.com/prior-regularise/index.html
(3) https://betanalpha.github.io/assets/case_studies/prior_model...