close

資料來源:http://whatua.com/category/bigdata/

  1. github: https://github.com/whomm/bigdata-tech-index
  2.  國內外相關
    1. 國內資料分析計算平臺產品
      1. 神策
        1. https://www.sensorsdata.cn
          1. https://www.sensorsdata.cn/blog/technical_implementation_of_sensors_analytics/
          2. https://www.sensorsdata.cn/manual/
      2. growingio
        1. https://www.growingio.com/
      3. 海致:
        1. https://www.bdp.cn/home.html
      4. 阿李雲 quick bi
        1. https://data.aliyun.com/product/bi
      5. finebi
        1. http://www.finebi.com/
          1. finereport
          2. http://www.finereport.com/
    2. 國外資料分析平臺
      1. tableau 資料分析:
        1. https://www.tableau.com/
      2. http://www.pentaho.com/
        1. ETL
          1. KETTLE
            1. Pentaho Data Integration ( ETL ) a.k.a Kettle
            2. https://github.com/pentaho/pentaho-kettle
            3. https://wiki.pentaho.com/display/COM/Community+Wiki+Home
      3. http://www.spagobi.org/
      4. https://www.bmc.com/
        1. CONTROL-M
          1. Control-M 是一套 數位業務自動化解決方案,能夠簡化並自動化各種批次處理應用工作負載。在基礎架構、資料和應用程式中優化 SLA 並加速應用程式部署。
          2. http://www.bmcsoftware.cn/it-solutions/control-m.html
          3. http://www.doc88.com/p-1863463402569.html
      5. https://www.teradata.com.cn
  3. 數據視覺化
    1. superset
      1. https://superset.incubator.apache.org/
    2. 報表工具 https://git.oschina.net/max256/morpho
    3. 關聯技術
      1. 前端技術
        1. echarts
          1. http://echarts.baidu.com/
        2. antv
          1. https://antv.alipay.com/zh-cn/index.html
    4. cboard 開源BI儀錶板平臺,支援互動式多維報表設計和資料分析
      1. https://github.com/yzhang921/CBoard
    5. datav 阿裡雲的資料視覺化產品
  4. 資料同步
    1. 資料傳輸
      1. kafka: a distibuted streaming platform
        1. http://kafka.apache.org/
      2. ActiveMQ
      3. RabbitMQ
    2. 資料收集
      1. flume
        1. Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. It has a simple and flexible architecture based on streaming data flows. It is robust and fault tolerant with tunable reliability mechanisms and many failover and recovery mechanisms. It uses a simple extensible data model that allows for online analytic application.
        2. http://flume.apache.org/
      2. logstash
    3. 分散式資料庫同步系統
      1. https://github.com/alibaba/otter
        1. canal mysql資料同步 https://github.com/alibaba/canal
      2. sqoop
        1. Apache Sqoop(TM) is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases.
        2. http://sqoop.apache.org/
    4. 自動化資料同步流
      1. http://nifi.apache.org/
    5. 資料同步工具
      1. mysql replication protocal go 實現: https://github.com/siddontang/go-mysql
      2. mysql replication protocal python 實現 https://github.com/noplay/python-mysql-replication
    6. DataX
      1. DataX 是阿裡巴巴集團內被廣泛使用的離線資料同步工具/平臺,實現包括 MySQLOracleSqlServerPostgreHDFSHiveADSHBaseTableStore(OTS)MaxCompute(ODPS)DRDS 等各種異構資料來源之間高效的資料同步功能。
      2. https://github.com/alibaba/DataX
  5. ETL
    1. KETTLE
      1. https://community.hds.com/docs/DOC-1009855
  6. 離線任務調度
    1. hadoop任務調度
      1. http://oozie.apache.org/
      2. https://azkaban.github.io/
    2. 原阿裡宙斯 zeue
      1. https://github.com/ctripcorp/dataworks-zeus
    3. 個人開源任務調度
      1. https://github.com/xuxueli/xxl-job
    4. control-m
      1. https://baike.baidu.com/item/control-m/176677?fr=aladdin
    5. 資料平臺作業調度和實踐
      1. https://www.jianshu.com/p/bddffdfea00b
      2. https://www.jianshu.com/p/428ae367a38b
    6. autosys
    7. etl-automation
    8. tws (ibm)
    9. TASKCTL
      1. http://www.taskctl.com/Service/Document
    10. JobCtrl
      1. 海量任務作業調度監控平臺 Primeton JobCtrl
      2. http://www.primeton.com/
    11. EDB
    12. USE
    13. SMC
    14. JMC
    15. Moia
  7. 計算引擎&框架
    1. spark
      1. http://spark.apache.org/
    2. taz
    3. hadoop-mapreduce
      1. http://hadoop.apache.org/
    4. bigflow
      1. https://github.com/baidu/bigflow
    5. storm
    6. flink
      1. Apache Flink® is an open-source stream processing framework for distributed, high-performing, always-available, and accurate data streaming applications.
      2. http://flink.apache.org/
    7. hive
      1. The Apache Hive data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage using SQL. Structure can be projected onto data already in storage. A command line tool and JDBC driver are provided to connect users to Hive.
      2. http://hive.apache.org/
    8. impala
  8. 大資料存儲
    1. OLTP(on-line transaction processing)
    2. OLAPOn-Line Analytical Processing
      1. PALO
        1. 百度資料倉庫Palo是百度雲上提供的PB級別的MPP資料倉庫服務,以較低的成本提供在大資料集上的高性能分析和報表查詢功能。
        2. 百度資料倉庫Palo不是面向OLTP的資料庫產品,而是一款面向OLAP的資料庫產品,和百度資料倉庫Palo功能定位比較相似的產品包括GreenplumVerticaExadata等商業資料倉庫系統和Amazon RedShiftGoogle BigQuery等雲服務,大家可以參考以上產品來理解百度資料倉庫Palo
      2. https://cloud.baidu.com/doc/PALO/System.html#.E7.B3.BB.E7.BB.9F.E6.9E.B6.E6.9E.84
      3. Cloud-native MySQL database for unlimited scalability and performance
        1. http://radondb.io/
      4. tidb 國產開源分散式newsql關係型數據庫 (完美相容mysql
        1. https://pingcap.com/index.html
      5. kudu 開源分散式 nosql olap資料庫
        1. a new addition to the open source Apache Hadoop ecosystem, Apache Kudu completes Hadoops storage layer to enable fast analytics on fast data.
        2. http://kudu.apache.org/
        3. 產考文檔:
          1. 小米kudu即時分析系統&kuduhbaseparquet對比 https://baijia.baidu.com/s?old_id=581124
      6. kylin
        1. Apache Kylin™是一個開源的分散式分析引擎,提供Hadoop/Spark之上的SQL查詢介面及多維分析(OLAP)能力以支援超大規模資料,最初由eBay Inc. 開發並貢獻至開源社區。它能在亞秒內查詢巨大的Hive表。
        2. http://kylin.apache.org/
      7. greenplum
        1. Greenplum DB 號稱是世界上第一個開源的大規模並行資料倉庫,最初是基於 PostgreSQL,現在已經添加了大量資料庫方面的創新。Greenplum 提供 PD 級別資料量的強大和快速分析能力,特別是面向大資料方面的分析能力,支援大資料的超高性能分析查詢。
        2. https://greenplum.org/
        3. http://www.greenplum.net.cn/
        4. 產考資料
          1. Greenplum資源隔離指南》 https://yq.aliyun.com/articles/57763
          2. 《三張圖讀懂Greenplum在企業的正確使用姿勢》 https://yq.aliyun.com/articles/57736
      8. Vertica
        1. https://www.vertica.com/
      9. Exadata
      10. Amazon RedShift
      11. Google BigQuery
    3. parquet hadoop生態下的列式存儲、資料處理框架
      1. https://parquet.apache.org/
      2. 適用場景:
      3. 適用案例:
    4. Elasticsearch 是一個分散式的 RESTful 風格的搜索和資料分析引擎,能夠解決不斷湧現出的各種用例
      1. https://www.elastic.co/
        1. 生態關聯
          1. logstash
          2. beats
          3. kibana
    5. hbase 分散式列式存儲
      1. Apache HBase is the Hadoop database, a distributed, scalable, big data store.
      2. https://hbase.apache.org/
      3. 中文產考資料: http://abloz.com/hbase/book.html
      4. 技術延伸
        1. openTSDB 基於hbase的時間序列資料庫
          1. The Scalable Time Series Database. Store and serve massive amounts of time series data without losing granularity.
          2. http://opentsdb.net/
        2. kylin
    6. prestodb 開源的分散式SQL互動式解析查詢引擎
      1. Distributed SQL Query Engine for Big Data
      2. https://prestodb.io/
      3. http://prestodb-china.com/
      4. https://github.com/CHINA-JD/presto/
    7. 分散式檔存儲
      1. https://github.com/chrislusf/seaweedfs

 

arrow
arrow
    文章標籤
    BigData 大數據
    全站熱搜

    Y銘 發表在 痞客邦 留言(0) 人氣()