Slides From Database Talk at Drexel
I gave a lecture at Drexel this week on non-relational databases and “big data”. The slides are up. They are all new since last time; the world of NoSQL and Big Data has changed a whole lot in 2 years :)
I gave a lecture at Drexel this week on non-relational databases and “big data”. The slides are up. They are all new since last time; the world of NoSQL and Big Data has changed a whole lot in 2 years :)
Ben Corrie from VMware gave a talk on March 15, 2012 at the San Francisco Java Usergroup on tuning the JVM for a virtual machine. The event was filled to capacity, but fortunately you can find the video, slides, and a more detailed description of the talk below.
The number of Java workloads running on virtualized infrastructure has been increasing exponentially over the last few years. Advancements in processors and hypervisor technology now make virtualizing Java a compelling proposition. However, there are still best practice provisos and considerations, particularly in the area of JVM memory management.
This talk will present a lot of the innovation, practical insight, and lessons learned gained from the last year by a senior engineer from VMware who recently developed a Java “ballooning” solution called Elastic Memory for Java (EM4J)
Ben’s Slides:
I really enjoy reverse engineering stuff. I also really like playing video games. Sometimes, I get bored and start wondering how the video game I’m playing works internally. Last year, this led me to analyze Tales of Symphonia 2, a Wii RPG. This game uses a custom virtual machine with some really interesting features (including cooperative multithreading) in order to describe cutscenes, maps, etc. I started to be very interested in how this virtual machine worked, and wrote a (mostly) complete implementation of this virtual machine in C++.
However, I recently discovered that some other games are also using this same virtual machine for their own scripts. I was quite interested by that fact and started analyzing scripts for these games and trying to find all the improvements between versions of the virtual machine. Three days ago, I started working on Tales of Vesperia (PS3) scripts, which seem to be compiled in the same format as I analyzed before. Unfortunately, every single file in the scripts directory seemed to be compressed using an unknown compression format, using the magic number “TLZC”.
Very cool reveng post.
Gotta try this soon and see if it helps reduce GC pause times for my apps.
Continuing the Chrome extension hacking (see part 1 and 2), this time I’d like to draw you attention to the oh-so-popular AdBlock extension. It has over a million users, is being actively maintained and is a piece of a great software (heck, even I use it!). However - due to how Chrome extensions work in general it is still relatively easy to bypass it and display some ads. Let me describe two distinct vulnerabilities I’ve discovered. They are both exploitable in the newest 2.5.22 version.
In Maryland, job seekers applying to the state’s Department of Corrections have been asked during interviews to log into their accounts and let an interviewer watch while the potential employee clicks through wall posts, friends, photos and anything else that might be found behind the privacy wall.
Won’t be working for any of these places. *sigh*
Here’s an experiment anyone can do: Go get your Apple IR remote. The LED emits at 980nm, or about 306THz, in the near-IR spectrum. Relatively speaking, this is just outside of the visible range. Take the remote into the basement, or the darkest room in your house, in the middle of the night, with the lights off. Let your eyes adjust to the blackness.
Can you see the LED flash when you press a button [4]? No? Not even the tiniest amount? Try a few other IR remotes; most use an IR wavelength even closer to the visible band, around 310-320THz. You won’t be able to see them either, even though they would be blindingly, painfully bright if they were in the visible spectrum.
–>These near-IR LEDs emit at about 20% beyond the visible frequency limit. 192kHz audio extends to 400% of the audible limit. Lest I be accused of comparing apples and oranges, auditory and visual perception drop off similarly toward the edges.
SML: Scalable Machine LearningPractical informationSTATISTICS 241B, COMPUTER SCIENCE C281BUpdates
Volume: 3 hours per week (3 credits)
Time: Tuesday, 4-7pm (3 lectures /in one block)
Location: 306 SODA
Instructor: Alex Smola (available 1-3pm Tuesdays in Evans 418)
TA: Dapo Omidiran
Grading Policy: Assignments (40%), Project (50%), Midterm project review (10%), Scribe (Bonus 5%)
Piazza discussion board
Overview
02222012 - Slides are online
02222012 - New assignments are live
02222012 - Video for SVM (first three sets) are uploaded
02222012 - Video for Optimization complete
02052012 - Slides for Streams and Optimization are uploaded
02052012 - Videos now have sound enabled
01252012 - Problem set 1 is uploaded
01252012 - Slides and videos are uploaded
01252012 - Project ideas and datasets are uploaded
01192012 - The graphical models tab has links to video lectures on tutorials on the subject (this is mainly for students who didn’t get to attend the class by Mike Jordan and Martin Wainwright).
01182012 - The systems slides are available now (follow the systems link)
01182012 - Updated project guidelines
Scalable Machine Learning occurs when Statistics, Systems, Machine Learning and Data Mining are combined into flexible, often nonparametric, and scalable techniques for analyzing large amounts of data at internet scale. This class aims to teach methods which are going to power the next generation of internet applications.
The class will cover systems and processing paradigms, an introduction to statistical analysis, algorithms for data streams, generalized linear methods (logistic models, support vector machines, etc.), large scale convex optimization, kernels, graphical models and inference algorithms such as sampling and variational approximations, and explore/exploit mechanisms. Applications include social recommender systems, real time analytics, spam filtering, topic models, and document analysis.
Resources Prerequisites
Basic probability and statistics. Having attended a machine class would be a big plus but is not absolutely required. Particularly some knowledge of kernels and graphical models would be useful.
Basic linear algebra (matrices, vectors, eigenvalues). Knowing functional analysis would be great but not required.
Ability to write code that exceeds ‘Hello World’. Preferably beyond Matlab or R.
Basic knowledge of optimization. Having attended a convex optimization class would be great.
Page generated 2012-02-22 21:44:22 PST, by jemdoc.
Looks like some really awesome content in here.
NoSQL databases are often compared by various non-functional criteria, such as scalability, performance, and consistency. This aspect of NoSQL is well-studied both in practice and theory because specific non-functional properties are often the main justification for NoSQL usage and fundamental results on distributed systems like CAP theorem are well applicable to the NoSQL systems. At the same time, NoSQL data modeling is not so well studied and lacks of systematic theory like in relational databases. In this article I provide a short comparison of NoSQL system families from the data modeling point of view and digest several common modeling techniques.
To explore data modeling techniques, we have to start with some more or less systematic view of NoSQL data models that preferably reveals trends and interconnections. The following figure depicts imaginary “evolution” of the major NoSQL system families, namely, Key-Value stores, BigTable-style databases, Document databases, Full Text Search Engines, and Graph databases:
See: my earlier post re: Node.js ;-)