Talk:Boltzmann machine - Misplaced Pages

This is an old revision of this page, as edited by 92.0.230.198 (talk) at 17:31, 27 June 2015 (→Yes, but does it do? What is it for? ;o): new section). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Revision as of 17:31, 27 June 2015 by 92.0.230.198 (talk) (→Yes, but does it do? What is it for? ;o): new section)(diff) ← Previous revision | Latest revision (diff) | Newer revision → (diff)

Global Energy

Why does the global energy function have:

\sum \limits _{i<j}\cdots

Shouldn't this be:

\sum \limits _{i,j}\cdots

But I could be misunderstanding... 129.215.26.79 (talk) 15:31, 13 May 2014 (UTC)

Never mind I see it just saves having to divide by two to account for double counting. 129.215.26.79 (talk) 12:43, 15 May 2014 (UTC)

Looking at it from the point of view of a programmer, it says "Don't do all the work twice". ;-) 92.0.230.198 (talk) 17:25, 27 June 2015 (UTC)

Training sign

I removed the minus sign from the RHS of this:

{\frac {\partial {G}}{\partial {w_{ij}}}}={\frac {1}{T}}

If p+ is clamped and p- is unclamped, then we want to make the weights MORE like the correlation of the clamped and less like unclamped, I think ... please check this! Charles Fox

You're incorrect, the minus sign is needed —Preceding unsigned comment added by 72.137.60.77 (talk) 17:36, 5 April 2009 (UTC)

What does "marginalize" mean in the following?

"We denote the converged distribution, after we marginalize it over the visible units V, as P − (V)." There is no other instance of this word in the article. Even a technically-minded reader wouldn't understand this article if the word isn't defined anywhere. - Will

CRF

Is the Boltzmann machine the same as a Conditional Random Field? If so that should be mentioned somewhere!

No, it isn't. A CRF can however be viewed as convexified Boltzmann machine with hand-picked features. - DaveWF 06:10, 19 April 2007 (UTC)

The threshold

What is the importance of the threshold parameter? How is it set?

Learned like any other parameter. Just have a connection wired to '+1' all the time instead of another unit. I should add this. - DaveWF 06:10, 19 April 2007 (UTC)

Can threshold be referred to as bias? Also, the link on threshold takes you to the disambiguation page, which has no articles describing threshold in this context.

The term threshold is wrong in this context. As the description of Boltzmann machines is in this article, there is no threshold function, and thus no threshold. Instead the Theta is a bias here. If a unit is activated, the bias Theta of that unit will be added to the total energy function. I have updated the text accordingly. I assume this a copy and paste error, by just copying the term threshold over from the description of Hopfield networks, where it actually makes sense, since Hopfield networks have a threshold function. - sebastian.stueker 23:30, 17 May 2013 (UTC)

The Training Section

I have a problem understanding what is P+(Vα). P+ is the distribution of the states after the values for Vα are fixed. So P+(Vα) should be 1 for those fixed values and 0 for any other values to Vα.

Also, what does α iterate over in the summation for G?

The cost function

What is the cost function? What cost does it measure? How do we train the network if we have more than one input?

{-1,1} or {0,1} ?

In the definition of s the article claims that s_i is either -1 or 1. Five lines below, it says that the nodes are in state 0 or 1, which is also what I found in (admittedly older) literature on the subject. Is the {-1,1} simply wrong or am I missing something? —Preceding unsigned comment added by Drivehonor (talk • contribs) 13:56, 7 August 2007

I think either representation should work. But I'm not sure. Can anyone confirm this? —Preceding unsigned comment added by Zholyte (talk • contribs) 19:47, 10 November 2007 (UTC)

You probably just don't understand, because it doesn't matter at all. —Preceding unsigned comment added by 130.15.15.193 (talk) 23:12, 2 December 2009 (UTC)

Etimology

WHY is it called a Boltzmann machine? Is it named after Ludwig Boltzmann? The Ludwig Boltzmann article references this one... but that can't be determinant. --Nehushtan (talk) 22:10, 12 January 2009 (UTC)

Yes, it's named for Ludwig Boltzmann. AmiDaniel (talk) 08:31, 26 September 2011 (UTC)

Because the underlying energy minimization strategy involves the Boltzmann Distribution p.r.newman (talk) 13:45, 19 May 2013 (UTC)

Question : the new phase of learning ?

Question, "Later, the weights are updated to maximize the probability of the network producing the completed data." What's this mean ? Is this new phase of learning ? Is this mean that : $p_{ij}^{+}$ are set as constatn in this phase - compute as a characteristic of learning set ? - for example if our data set is {{1,1,0},{1,0,1},{1,0,0}} then : $p_{12}^{+}=1/3$ , : $p_{13}^{+}1/3$ , : $p_{23}^{+}=0$ in all later iterations ? Peter 212.76.37.154 (talk) 16:04, 28 January 2009 (UTC)

Please improve the first paragraph

The first paragraph (the description) fails to describe what Boltzmann machine is. It only talks about what it is not. Given that how many things are not a Boltzmann machine, it is a bit wasteful... The class, where this particular network belongs (and where this article links it to) is lacking a description (is a stub / something automatically generated and not informative). Other explanations are given by counterexample. I.e. it says that this network is a counterpart of something else and that it can't be used for something. Neither statement is helpful in understanding of what it is or where it can be useful. 79.181.224.222 (talk) 22:19, 16 December 2012 (UTC)

Tidy up of citations needed

I move the Ackley et al. citation in the article to be an in-line reference but then got daunted by trying to bring the other citations and further readings into line with Misplaced Pages standards. I'll try and get back to it but hope others will feel free to take it on! p.r.newman (talk) 13:45, 19 May 2013 (UTC)

Incorrect statement about scalability?

"the time the machine must be run in order to collect equilibrium statistics grows exponentially with the machine's size". I've talked to ML researchers who have disputed this point. This "fact" has been here for years - do we have a reference on it?

Yes, but does it do? What is it for? ;o)

The article tells us what it looks like and how to train it but I can't for the life of me see what it takes as input and what it gives as output. A bit more on that and especially an example or two would be a big improvement. 92.0.230.198 (talk) 17:31, 27 June 2015 (UTC)