26, Private, 94936, Assoc-acdm, 12, Never-married, Sales, Not-in-family, White, Male, 0, 0, 40, United-States, <=50K
38, Private, 296478, Assoc-voc, 11, Married-civ-spouse, Craft-repair, Husband, White, Male, 7298, 0, 40, United-States, >50K
36, State-gov, 119272, HS-grad, 9, Married-civ-spouse, Protective-serv, Husband, White, Male, 7298, 0, 40, United-States, >50K
33, Private, 85043, HS-grad, 9, Never-married, Farming-fishing, Not-in-family, White, Male, 0, 0, 20, United-States, <=50K
22, State-gov, 293364, Some-college, 10, Never-married, Protective-serv, Own-child, Black, Female, 0, 0, 40, United-States, <=50K
43, Self-emp-not-inc, 241895, Bachelors, 13, Never-married, Sales, Not-in-family, White, Male, 0, 0, 42, United-States, <=50K
...
Since census data means people and the law of large numbers, things often turn out to be under a normal distribution. Let's for now forget about the neural net and build a predictive model with the good, old gaussian:
While being a clever way of drawing a bell curve, the gaussian is inherently unimodal and, because of the square within the functional dependency, symmetric. Indeed, a very perfect function — but maybe too perfect for nature?
val src = scala.io.Source.fromFile(getResourceFile("file/income.txt")).getLines.map(_.split(",")).flatMap { k =>
(if (k.length > 14) Some(k(14)) else None).map { over50k => (k(0).toDouble, if (over50k.equals(" >50K")) 1.0 else 0.0) }
}.toArray
val f = Sigmoid
val network = Network(Vector(1) :: Dense(20, f) :: Dense(1, f) :: SquaredMeanError())
val maxAge = train.map(_._1).sorted.reverse.head
val xs = train.map(a => Seq(a._1 / maxAge))
val ys = train.map(a => Seq(a._2)) // Boolean > 50k
network.train(xs, ys)
Furthermore, we need to map the age to domain , because this is where the sigmoid operates on. So, we divide by maximum age and start training. After a couple of seconds, and a cup of sencha, training with 2000 samples succeeds, so we can normalize and plot both models: