Using MySQL with InnoDB for teaching an RDBMS class?

Folks:

I’m teaching a course in RDBMS programming and Internet application development in January at MIT. Most of my development experience is with Oracle, which is available for anyone to download and install, but it is a short class and I would rather not have the students engaged in sysadmin/dbadmin. MIT does not maintain an Oracle installation for student use. The school does, however, have a MySQL server and associated HTTP server that students can use to build a database-backed Web site. I’m wondering if I can use this installation for teaching and am asking MySQL experts reading this Weblog to comment.

My first assumption is that the InnoDB storage engine will be the best choice due to its support for integrity constraints and isolation of simultaneous users with Oracle-style multiversioning.

Here’s what I want to teach for the RDBMS component:

  • Using integrity constraints such as CHECK, NOT NULL, and foreign key
  • create table as select and INSERT into Table A by querying Table B (so INSERT with a SELECT)
  • open two simultaneous connections and show how Isolation (the “I” in “ACID”) works (one connection will have half-finished transaction and the other one will do a query and see the unmodified data; also want to show locking)
  • JOIN, OUTER JOIN, GROUP BY, HAVING
  • set operations, including UNION, INTERSECT, MINUS
  • using views
  • stored procedures and triggers
  • techniques for dealing with tree-structured data
  • techniques for dealing with time series data

Can MySQL support all of this? Is InnoDB the best storage engine to use?

(Separately I want to have each student build a minimal iPhone application that queries the RDBMS; assume that this will have to be done by basically building a custom Web apge. I also want to have the students build an Android app that queries the RDBMS via a Web request, but it looks as though this can only be done with a heavyweight SDK and a lot of Java experience (which some of the students will have).)

44 thoughts on “Using MySQL with InnoDB for teaching an RDBMS class?

  1. You’ll probably want to go with PostgreSQL. It’s far more focused than MySQL on being SQL-compliant and feature-complete. Compared with Oracle (which I also work with), setting it up is a breeze, given a commodity server, a student aid should be able to set it up in an afternoon. Basically, install any major Linux distribution and use the distribution’s PostgreSQL packages.
    MySQL can do most of what’s on your list. Expect complicated UNION/INTERSECT/MINUS constructs to be unnecessarily slow, but correct. Stored procedures on MySQL are quite limited, but exist. PostgreSQL has PL/pgSQL, which borrows heavily from PL/SQL, so if you’re used to that, you should feel at home. It also has PL/Python, for people who like that sort of thing.
    You’ll hit a wall with the tree-structured stuff on MySQL. Oracle has (as you probably know) its proprietary START WITH / CONNECT BY solution and also supports standard SQL recursive CTEs. PostgreSQL supports the latter, MySQL supports nothing of the sort.

  2. I think InnoDB storage engine is an appropriate choice for MySQL.
    Set operators UNION, INTERSECT, MINUS might not be supported and need some rewriting ( see http://www.bitbybit.dk/carsten/blog/?p=71 )
    However since MySQL has been acquired by Oracle recently, I would use PostgreSQL as a teaching subject in an university.

  3. Another vote for PostgreSQL, which is much closer to Oracle than MySQL.

    You use WITH RECURSIVE (CTEs) for trees, as it does not implement Oracle’s proprietary CONNECT BY. The rest is available, including triggers, stored procedures, window functions and much more, including partitioning.

    You could also get the students to install the free Oracle 10g Express (available for Linux and Windows), it is limited to 4GB and is missing the Enterprise Edition features like partitioning, but is otherwise full-featured.

  4. Rather than have them write native iPhone and Android apps you should consider having them write one that’s crossplatform with something like http://www.phonegap.com/ . It’s a javascript+html platform that offers fully functional gps, sound, vibration, and accelerometer native controls on iphone, android, and blackberry,

    God help us all, but javascript is the new C.

  5. Guys: Thanks for the PostgreSQL suggestion. Of course that would be my preference over MySQL, but it doesn’t help me avoid sysadmin. The only already-set-up RDBMS that is available for student use is MySQL. So if MySQL can do the job I would like to use it.

  6. Ry4an: Thanks for the phonegap suggestion. I had seen that but had not investigated it. Is it practical for a class of 15 students to use? Do they need to register well in advance? Pay fees? Can their apps migrate to real devices? It is going to be a three-day intensive class. Won’t it take longer than that for Steve Jobs to personally approve either one account for all 15 students or 15 accounts for each of the 15?

  7. I would agree with the above in using PostgreSQL. Setup is absolutely a breeze, and, as far as I can tell, is trying to make it as easy as possible for ex-Oracle DBAs/developers to transition to Postgre.

    Postgre handles everything you list quite well (although I have no idea what time series data is, so I can’t say anything about this.)

  8. Josh: How would 15 students, who will arrive with a mix of laptops running different operating systems and versions, install and use PostgreSQL? And then serve Web pages from those installations that would be viewable by other students in the class and ultimately by an Android or iPhone?

  9. Why do you see a need for everyone to have a local instance? My approach would be one Linux server shared by all of them and managed by course staff, with a system account for each student that has its own database and its own ~/public_html with CGI or, if they insist, PHP. Basically what the university already has with MySQL, I assume, only with PostgreSQL.
    MySQL is clearly “good enough” for basic web stuff, proven simply by how many people use it. Your list, though, reads like you want to do “real” SQL, with serious use of constraints, stored procedures and high-abstraction queries. MySQL sucks for that.

  10. Phil, wouldn’t the simplest solution be a virtual machine already setup with Linux and Postgresql or MySQL if desired. VirtualBox is free and you can setup any Linux distro you want with Postgresql ready to go. You might even setup the complete dev environment too. The other great thing about this solution is that you can reset it back to square one or simple copy someone else’s VM over a non-working VM on a student’s computer. Virtual box will work on Windows, Linux, and Macs so that should solve your compatibility problem.

  11. Can you set up a free Amazon EC2 instance (or ask Joyent for a smartmachine for academic purposes) with postgresql on it for the students to use?

  12. If you could deal with the students using a hosted database and running SQL via a web interface you could have them all sign up for accounts on apex.oracle.com.

    With this account it would be a breeze for them to build web pages using the built in Apex framework and they could even expose queries as JSON via REST or XML.

    The beauty of this approach is you get Oracle for free, you get a web host for free, and you don’t have to install anything.

  13. the current version of mysql was impossible to compile and install with innodb so instead I used the drop in replacement mariadb, which is from some of the original mysql developers who didn’t like where mysql was going.

  14. One method to skip having to get each student to build and config a webserver/db/appserver on each of their laptops would be to setup a virtual machine with everything you want them to use and then have all of them install that first so they would have a common baseline for the class.

    VirtualBox would probably be a decent choice: http://www.virtualbox.org/

    With respect to mobile apps, I’d second Ryan’s suggestion to use one of the JS interpreted crossplatform mobile solutions. Titanium Appcellerator is the other big option instead of PhoneGap. Both have opensource / freely downloadable SDK’s etc. However, I think for class like this you are probably best to just skip trying to create an iPhone app that can run on an actual iPhone as you very quickly start to run into significant developer / sysadmin overhead of having to manage certs and provisioning profiles.

  15. Use a virtual machine from Turnkey Linux and Virtualbox. You could configure one the way you want, then provide copies.

    Colin

    http://www.turnkeylinux.org/lapp

    “The LAPP stack is an open source web platform that can be used to run dynamic web sites and servers. It is considered by many to be a powerful alternative to the more popular LAMP stack and includes Linux, Apache, PostgreSQL (instead of MySQL) and PHP, Python and Perl.”

  16. Folks: the virtual machine idea is kind of interesting, though I’m not sure how well it will work given that these machines will be on MIT’s wireless network and won’t have static IP addresses. Also, can that virtual machine truly be supported on Windows XP, Mac OS 10.x, whatever weird build of Linux?

    Greg: I didn’t know about the Oracle Apex service. Thanks for that pointer. If the version hosted by Oracle gives the students the full power of the standard Oracle RDBMS that could be very interesting. Mostly this is an RDBMS course with just a touch of “here’s how you stick a Web/app interface on top”.

  17. While MySQL’s InnoDB engine does support transactions, one of the gotchas is that locks cannot be held across stored procedures. I.e. it does not support nested transactions. Therefore, it’s impossible to write modular stored procedures that maintain transactional integrity. You have to explicitly require users to obtain locks before calling the stored procedure, which means no stored procedure can obtain locks except those at the “top level” (and where exactly _is_ that?). Furthermore, at the top level one must also explicitly lock all tables that are referred to by aliases anywhere down the stored procedure chain, effectively making the aliases akin to global variables. Hope there aren’t any collisions! Seems like it’d be easy for the db server to track what tables aliases map to in the sproc invocation stack, but what do I know?

    Much like yourself, I was a former Sybase user back in the day and have to say I was quite surprised by these basic oversights — they may have been addressed in a more recent release. This was my experience as of March 2010. Sybase circa 1999 handled all of this with no issues. Colleagues consistently recommend Postgres to me over MySQL.

  18. Thanks, Will. Since most MySQL users have no formal training in the RDBMS nor any experience with standard RDBMS implementations, it doesn’t surprise me that they still leave out much of the stuff that standard SQL applications rely on. It seems as though the biggest uses of MySQL are for key/value pair storage and retrieval, which is why many of the users seem delighted to migrate to the “NoSQL” more distributed/higher performance alternatives.

    That said… in an introductory course the point is to show them that writing a stored procedure is possible, not to build a practical application. And the school is already running MySQL for student use… hence my original posting.

  19. I just tried out Oracle Application Express. It is pretty slick for the pure SQL stuff. I haven’t figured out how to generate Web pages from the database yet. Maybe that is the “Websheet applications” tab under Application Builder.

    Oracle is always surprising me with the breadth of the software offered. But then they also sometimes unpleasantly surprise me by discontinuing stuff that proved unpopular!

  20. If your students want to see iPhone apps (including phonegap) run in the iPhone simulator they’ll need an Intel Mac with OSX and be registered with Apple to d’load the iOS dev kit. To run on an actual device you have to pay, though I think schools like Stanford have worked out deals with Apple to avoid students paying (not sure.)

    Android, I presume, is much more flexible in that regard (if you can stand Java.)

  21. ScottE: I thought about that, but figured that at least one student would have a physical iPhone and perhaps at least one student would have a Mac laptop.

  22. Greg: I just tried out Oracle Apex. It seems great for playing around with pure SQL, e.g., creating and querying tables. I couldn’t figure out how to build Web pages, though. It seems very complex, at least as complex as learning PHP and maybe more so. Furthermore, I’m not even sure that it is possible to have an application return simple raw XML for parsing by an Android app or whatever. It seems like it is trying to be way too magic and therefore would end up needing to be the subject of its own three-day course in how to use Oracle Apex app builder (not a very useful long-term skill for students).

  23. Writing native iPhone apps is a course in itself. You can learn it in one dedicated course, but let’s face it: nobody in your class will already know Objective-C and the Cocoa Touch UI toolkit except for those who are already writing Mac, iPhone or iPad apps. Remember that Apple does not allow virtual machines on the iPhone platform; that means no Java and no Flash.

    Instead, offer each student the option to write a phone app in the platform of one’s choice: iPhone for those already doing it, Android for the Java folks, and Ajax for the Web app experts.

  24. VMs are the way to go for a short class. Your MIT grad cloud computing class should have made elastic computing resources available for the undergrads by now on donor funded commodity hardware – don’tcha think?

    Depending on the level of the course, I wander why mix in web apps with RDBMS technology? If you wish to diverge from RDBMS technology and teach tangential subjects, consider the topics of either RDF/triple stores or columnar/nosql technologies. Web apps coding is such drudgery, and besides, from what I hear its ghetto work!

  25. Loke: Coding Web apps can certainly be shopped out to Eastern Europe inexpensively, but the students should learn at least the basics of getting stuff to/from the RDBMS, shouldn’t they? Web browsers and Android/iPhone apps are surely the most important interfaces right now, no?

  26. Phil – given that pro RDBMS designers and DBAs almost never develop web apps, have the kids code projects using stored procs, triggers, pl-sql etc. If non-RDBMS products must be utilized for coding, write simple simulations in Java/Groovy/Perl as server side code examples that could be part of a web app. Or if you want more complexity, expose DBMS ops using simple RESTful web services that are demo-able with REST web tools. I recommend this in lieu of spending time with UI development or web app deployment complexities. Also consider JPA implementations such as Hibernate for a code project.

    I guess I have seem too many coders who lacked the ability to do even fairly simple composite key joins, ergo I am skeptical of churning out more coders who don’t understand a CS technology as fundamental as RDBMSs.

  27. About the iPhone app, you can use the UIWebView to link to a web application or use iPhone web app meta tags to make a web page installable on the iPhone launcher. It doesn’t get easier than writting simple XHR requests from Javascript.

    The non-relational schema less CouchDB has a builtin HTTP JSON interface you can query using Javascript, otherwise assuming they know Java, with the right tutorial, setting up an Apache Tomcat instance and initialize a non-pooling connection and do a query to the MySQL in JSP, that responds using setContentType(“text/xml”), shouldn’t take more than 15 minutes for a novice.

  28. I think MySQL will work. Virtual machines are also worth considering, though. There are good no-cost VM “players” for mac, windows, and linux. There are also sites like http://bitnami.org/ and http://www.turnkeylinux.org/ that provide ready-to-go appliances for developers using a variety of languages / frameworks. Postgres + Python would be my personal preference, and isn’t MIT’s intro CS course taught with Python now?

    Another positive for VMs is that students can take the full setup with them after the course. The lack of public-facing IPs is annoying, but can be overcome pretty simply using a single public server with SSH reverse-tunneling. That allows you to connect a listening port on the public server to a listening point on whatever machine you’ve ssh’d from, and doesn’t require configuring any routers.

    Do you intend to choose a specific language or web framework outside the database? Simple PHP will make the connections between RDMS and web page very transparent, although it’s an atrocious language. The most popular web application frameworks today provide a level of abstraction to SQL, which may be an extra distraction for this course. Of course, you need to understand relational databases in order to use these abstraction layers non-stupidly. You might consider a really light-weight framework like https://github.com/defnull/bottle/wiki or http://webpy.org/ .

    Finally, I question the feasibility of teaching native app development for iOS or Android during this short a course. I’m not an expert on those platforms, but I believe they both support some HTML5 and Javascript features that allow you to build an offline-usable app using your choice of web server backends. Approaching tablet programming from that angle might better connect with the RDBMS portion of the course.

  29. Thanks, Shimon. I think you’re right about limiting this to HTML5/Javascript, maybe giving the students a skeleton to work with so that they don’t get bogged down in material that they can easily study on their own after the intensive course.

  30. Rick: Thanks for the vote of confidence. This course will definitely not be on OpenCourseware because it is nearly all hands-on exercises with assistance from roaming instructors. OpenCourseware was designed for capturing traditional lecture-style classes. Once we’ve got the exercises finalized they will be available from my server.

  31. Sorry to be late in responding Phil. Once you get inside your Apex workspace use the Application Builder menu item. You might already have a sample application but it is easy enough to create a new one. The wizards make it a snap to build a simple application around a set of tables.

    An application is a container for various setting, shared components like navigation and global variables, and pages. A page is mostly composed of regions which can be forms, reports, menus, or even just plain old pl/sql printing out via htp.p.

    If you contact me directly I’d be happy to walk you through via web con. Once you see the forest the trees are a snap. This is definitely easier than PHP and if you want to keep your students out of the non sql stuff you can build the web app for them, export it and then they can easily import it into their own workspace to work on.

    Using the RESTful access stuff this will definitely be able to give you XML or JSON via HTTP.

    Again, please contact me if you are interested in pursuing this. I’d hate to see you go with mySQL or waste any time installing databases.

  32. Philip, I forgot to mention, VoltDB, RDBMS with ACID support, which has a JSON interface, with the advantages of “NoSQL” speed ups.

  33. If you have any concurrency-related exercises that were developed on Oracle, you need to be aware that InnoDB has a very different concurrency control approach. An InnoDB transaction with ISOLATION LEVEL SERIALIZABLE takes traditional shared locks for reading (that is, the concurrency control is strict two-phase locking). In contrast, Oracle uses the “Snapshot isolation” mechanism for concurrency control when a transaction is declared SERIALIZABLE. One important consequence: InnoDB serializable transaction can be blocked when trying to read data (whereas Oracle never delays a read operation). An even more important one: a system running concurrent InnoDB serializable transactions is genuinely serializable, as defined in a database textbook, and it maintains any integrity property of the data, even undeclared properties that are enforced in application code (whereas a system running on Oracle with concurrent “serializable” transactions need not have the property of serializable history as defined in a textbook, and data can be affected to violate undeclared integrity constraints). PostgreSQL is like Oracle in the concurrency control algorithm (though with a different implementation, leading to very different performance profile)

  34. Hi Philip,

    I assume you are teaching this class over IAP? Would it be possible to audit?

    Thanks,
    Nemanja

  35. Nemanja: It will be over MLK weekend, I think, and is already in the EECS offerings. If you show up for all three days to “audit”, I’m not sure how that is different than simply taking the course! We’re not going to fail anyone or otherwise injure their self-esteem.

  36. Also, students could install something like PHP Triad on their laptops and run MySQL, Apache, PHP locally.

  37. mysql…ugh. Does it actually enforce referential integrity constraints yet?

    Setting up a virtual box virtual machine is easy, and it runs just fine under Windows XP and Mac OS X 10.5+. The big bad assumption here is that people have at least 1GB of ram on their machine and a few GB of disk space.

    Any standard linux installation will allow you to install postgres or mysql.

    Alternatively, you could install the native windows/mac os x mysql installation on their machines. I tend to prefer to avoid doing this as it can be awkward sometimes to remove all traces of the application after you are done.

    Good luck.

  38. Slightly off topic but something interesting and (I hope) Greenspun-worthy to consider when it comes to MySQL ….

    Because the data for individual MySQL tables are in individual files, on Linux it should be possibly to use inotify to track changes to these files. So although there might be a second representation of the schema in code, for instance for object-relational middleware, the database schema would remain the canonical source for the data. Then changes to the underlying schema could be listened for, instead of rebuilding the schema whether or not it changes, which is a common approach.

  39. Most students might end up installing Microsoft SQLExpress/eval version of SQL Server on their laptops and then figuring out how to migrate the data with DBartisan once the app is ready. Its painful to learn a database with a help file that is one big html page (Mysql). SQL server has query plans, auto indexes, nice debugger. You can link Excel sheets without having to wrestle with ODBC drivers. Visual query designers warn you when primary keys are not set correctly, do tab completion in sql and write correct join syntax. If you want to replicate there is merge and transactional replication built in. If you want to do machine learning, MS analysis server comes with the SQL server and supports NaiveBayes and other ML algorithms. MIT boxes might not even have syntax highlighting for Mysql. Good thing about DBArtisan is that it automatically translates SQL flavors from Oracle to SQL server and Mysql so you do not have to worry (exept for datetime calcs) and the prof will never know 🙂

  40. I don’t envy your choices, obviously everyone responding has their own preference for one RDBMS over another. There are many NoSQL solutions that are being successfully employed, which of course angers the RDBMS purists.

    http://kb.askmonty.org/v/mariadb-versus-mysql

    I would not dismiss MariaDB at all, if you are using MySQL, you will eventually find yourself migrating to MariaDB in the future as that is where the past MySQL developers and creators have gone. If you look at their license agreement it is obvious they are willing to continue to help MySQL, however based on Oracle’s treatment of Java (and other open source projects) their help will ultimately be ignored, rebuffed or denied and thus MariaDB will quickly replace MySQL as the dominant M in the LAMP stack. You do not need a crystal ball to look at Oracle’s past actions in order to know the future.

    Some helpful links to dispell many myths:

    http://developers.slashdot.org/article.pl?sid=09/11/09/2335214 ~ “Unprecedented data volumes are driving businesses to look at alternatives to the traditional relational database technology that has served us well for over thirty years”

    http://www.rackspacecloud.com/blog/2009/11/09/nosql-ecosystem/ ~ “The fundamental problem is that relational databases cannot handle many modern workloads. There are three specific problem areas: scaling out to data sets like Digg’s (3 TB for green badges) or Facebook’s (50 TB for inbox search) or eBay’s (2 PB overall), per-server performance, and rigid schema design.”

    http://developers.facebook.com/blog/ ~ Developers at Facebook discuss scaling / development issues. You will have to scroll through the list and of course Google is your friend.

    http://www.slideshare.net/robtweed/gtm-a-tried-and-tested-schemaless-database ~ Focuses on GT.M but mentions 8 other NoSQL solutions.

    Any class that does not at least give links to current implementation of bigtable types of NoSQL solutions that typically run on top of a limited RDBMS implementation, is a bit out of date in my opinion. Again purists might not like it, but there is a reason many very successful companies have scaled their solutions with this very valid combination of RDBMS + NoSQL tools..

    I have nothing against PostgreSQL by the way. Just not the M in my LAMP stack. If I had my way I would run multiple servers with each of the dominant solutions just to have alternatives when one project or another goes down some proprietary dark alley never to see the light of day again. Its smart to have options…at least three.

    Its so late in the game for a class starting next month I sincerely hope you have made your decision and have started building by now.

    Considering that MariaDB’s storage engines are the fastest when you compare MySQL, MariaDB, PostgreSQL, etc… I am not sure why you would not just go on and use it as your base, assuming you must have an RDBMS at all and than look for other solutions to ride on top of that. The important thing is that MariaDB’s file system can work with the other two if it is built first and is on the bottom, however the opposite can not be said to be true. And you can go either MyISAM or InnoDB with MariaDB. Not sure why you are focused only on InnoDB considering the improvements MariaDB has made with MyISAM but sure you have your reasons.

    To be honest considering the need for handling extremely large data sets, I have begun to think of data stored in either POOLS or BLOBS and ways to share parts of that pool/blob with other networked machines say a third or fourth of the pool at a time in order for streaming interactions to work over a network in real time. The biggest issues going forward appear to be your Internet Access Provider and how badly they are throttling your upstream bandwidth…a common and reprehensible practice especially with Cable Internet providers.

    One thing is for sure I will not purchase another house without Fiber To The Home and preferably without a current American Telco between me and the Internet as their zeal for creating a billable event is severely impacting innovation and has for the last two decades. How un-American is that.

    A friend of mine put a map together to let Americans know where they could move to in order to get synchronous bandwidth and the ability to run a cloud server in your home. Here is the URL:

    http://maps.google.com/maps/ms?hl=en&ie=UTF8&msa=0&msid=107636815986001525252.0004925bda8ab3461a9ef&ll=38.891033,-98.129883&spn=22.725937,53.481445&t=h&z=5 ~ Long URL to Google Map.

    http://sn.im/1axal4 Shortened URL for Twitter and other social media sites.

    Here is a link on building MariaDB on CentOS (which many companies have migrated to from Redhat as CentOS and Redhat are pretty much the same). The point is your students could download CentOS and install it at home.

    http://www.rasyid.net/2010/11/12/how-to-install-mariadb-on-centos-5-5/ ~ How to install MariaDB on CentOS 5.5

    I hope some of these links help you if not directly with the class as links in your Resource/Link page for your students.

Comments are closed.