I have an application that creates around 24k records a day into a single table with a current count of 9.12M rows. I have been thinking about Shard'ing with mysql async driver for the single table based on users but am really interested in using a cloud technology such as hadoop locally as a test and then creating a hybrid were I could leverage hadoop on EC2 for scaling. Has anyone had any experience with access times for tables with of this size? The user knows they are grabbing a lot of data but I can't have them wait 5 min for a map reduce function to be performed. I was planning on using the REST api as there is not currently a PHP "driver" although I have seen Rasmus talking about adding one?
I should also point out that the query is commonly the last 6 months of data and each user is adding about 0-4 rows a day.
I do have experience with large tables but not all large tables are the same. Could you tell us about why you've counted out more traditional solutions like cache and what not?