Tensorflow(@CVision)
13K subscribers
1.11K photos
190 videos
67 files
2.1K links
اخبار حوزه یادگیری عمیق و هوش مصنوعی
مقالات و یافته های جدید یادگیری عمیق
بینایی ماشین و پردازش تصویر

TensorFlow, Keras, Deep Learning, Computer Vision

سایت دوره
http://class.vision

👨‍💻👩‍💻پشتیبان دوره ها:
@classvision_support
Download Telegram
Deep learning
Yann LeCun, Yoshua Bengio & Geoffrey Hinton
http://www.nature.com/nature/journal/v521/n7553/full/nature14539.html

#Deep_learning allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction. These methods have dramatically improved the state-of-the-art in speech recognition, visual object #recognition, object detection and many other domains such as drug discovery and genomics. Deep learning discovers intricate structure in large data sets by using the backpropagation algorithm to indicate how a machine should change its internal parameters that are used to compute the representation in each layer from the representation in the previous layer. Deep #convolutional nets have brought about breakthroughs in processing #images, #video, #speech and #audio, whereas #recurrent nets have shone light on sequential data such as #text and speech.
خانه ی #هوشمند مارک #زاکربرگ بنیان گذار فیس بوک که از متدهای نوین هوش مصنوعی نظیر بازشناسی شئ، بازشناسی چهره، بازشناسی گفتار، پردازش زبان‌های طبیعی و ... بهره برده است.
زاکربرگ از انگیزه ی خود برای این کار و گام های انجام کارش می‌نویسد:


https://www.facebook.com/notes/mark-zuckerberg/building-jarvis/10154361492931634/

چالش شخصی من برای سال 2016 ساخت یک هوش مصنوعی ساده برای خانه ام بوده - مثل جارویس در فیلم مرد آهنین...

Building Jarvis:
- Getting Started: Connecting the Home
- #Natural_Language
- #Vision and #Face_Recognition
- Messenger Bot
- Voice and #Speech_Recognition
- Facebook Engineering Environment

—------
Vision and Face Recognition:
About one-third of the human #brain is dedicated to vision, and there are many important #AI problems related to understanding what is happening in images and videos. These problems include #tracking (eg is Max awake and moving around in her crib?), #object_recognition (eg is that Beast or a rug in that room?), and face recognition (eg who is at the door?).
Face recognition is a particularly difficult version of object recognition because most people look relatively similar compared to telling apart two random objects — for example, a sandwich and a house. But Facebook has gotten very good at face recognition for identifying when your friends are in your photos. That expertise is also useful when your friends are at your door and your AI needs to determine whether to let them in.
To do this, I installed a few cameras at my door that can capture images from all angles. AI systems today cannot identify people from the back of their heads, so having a few angles ensures we see the person's face. I built a simple server that continuously watches the cameras and runs a two step process: first, it runs face detection to see if any person has come into view, and second, if it finds a face, then it runs face recognition to identify who the person is. Once it identifies the person, it checks a list to confirm I'm expecting that person, and if I am then it will let them in and tell me they're here.
This type of visual AI system is useful for a number of things, including knowing when Max is awake so it can start playing music or a Mandarin lesson, or solving the context problem of knowing which room in the house we're in so the AI can correctly respond to context-free requests like "turn the lights on" without providing a location. Like most aspects of this AI, vision is most useful when it informs a broader model of the world, connected with other abilities like knowing who your friends are and how to open the door when they're here. The more context the system has, the smarter is gets overall.

#mark_zuckerberg #smart_home
بهبود صوت گفتار با شبکه های کانولوشنالی عمیق، و فریم ورک تنسورفلو:

WaveMedic: Convolutional Neural Networks for #Speech Audio #Enhancement:
http://cs229.stanford.edu/proj2016/report/FisherScherlis-WaveMedic-project.pdf

-------------
مرتبط (کاری در deepmind)
The WaveNet neural network architecture directly generates a raw audio waveform, showing excellent results in text-to-speech and general audio generation (see the DeepMind blog post and paper for details).

A #TensorFlow implementation of DeepMind's WaveNet paper:
https://github.com/ibab/tensorflow-wavenet
#مقاله
مقاله ی جدید و جالب Google Brain + کد #تنسرفلو
آموزش یک شبکه عصبی برای چندین کار مختلف همزمان!

One Model To Learn Them All
(Submitted on 16 Jun 2017)
pic: http://deepnn.ir/tensorflow-telegram-files/tensor2tensor.PNG


🔗abstract:
https://arxiv.org/abs/1706.05137

🔗Paper:
https://arxiv.org/pdf/1706.05137.pdf

🔗Code:
https://github.com/tensorflow/tensor2tensor

یادگیری عمیق در بسیاری از زمینه ها نظیر تشخیص گفتار، طبقه بندی تصویر، ترجمه و ... استفاده می‌شود.
اما تا کنون بدین نحو بوده که برای هر مساله، یک مدل عمیق با یک معماری خاص انتخاب میشد و با تنظیم پارامترها و با فرآیند یادگیری و تنظیم اوزان شبکه برای آن مساله به خوبی کار میکرد اما برای مسائل دیگر قابل استفاده نبود.
در این مقاله یک مدل واحد که در حوزه های مختلف نتایج خوبی داشته استفاده شده و چندین کار را آموزش دیده است. به طور خاص، این مدل تنها به صورت همزمان در ImageNet، وظایف مختلف ترجمه، شرح تصویر، تشخیص گفتار، و کار تجزیه زبان انگلیسی آموزش داده است.
این مدل در بسیاری از مسائل با مدلهای state-of-the-art هر حوزه که فقط برای آن کار آموزش دیده اند قابل مقایسه بوده و در برخی از حوزه ها کارایی بهتری نسبت به زمانی که فقط برای همان حوزه آموزش دیده شده گزارش شده است.


# Google_Brain #tensor2tensor
#deep_learning
#speech_recognition, #image_classification, #translation
#سورس_کد #مقاله

در این روش که چند روز پیش توسط فیس بوک اوپن سورس شده آموزش speech recognition به صورت ‌end-to-end صورت میگیرد .

Open sourcing wav2letter++, the fastest state-of-the-art speech system, and flashlight, an ML library going native

https://code.fb.com/ai-research/wav2letter/

CNN architectures are competitive with #recurrent architectures for tasks in which modeling long-range dependencies is important, such as #language_modeling, machine translation, and #speech_synthesis. In end-to-end #speech_recognition, however, recurrent architectures are still more prevalent for both acoustic and language modeling.
#آموزش

Recognizing Speech Commands Using Recurrent Neural Networks with Attention

https://towardsdatascience.com/recognizing-speech-commands-using-recurrent-neural-networks-with-attention-c2b2ba17c837

سورس کد:

A Keras implementation of neural attention model for speech command recognition
https://github.com/douglas125/SpeechCmdRecognition

مرتبط با:

سورس و مقاله wav2letter++ یک روش end2end
https://t.me/cvision/850
جلسه مربوط به Attention در RNNها:
https://www.aparat.com/v/SPZzH
جلسه مربوط به پردازش صوت در RNNها:
https://www.aparat.com/v/cEKal

#attention #rnn #lstm #keras #Speech
#سورس_کد

#Mozilla has released open source #speech recognition model & data. Word error rate 6.5%, which is close to human.

Project DeepSpeech is an open source Speech-To-Text engine, using a model trained by machine learning techniques, based on Baidu's Deep Speech research paper. Project DeepSpeech uses Google's TensorFlow project to make the implementation easier.

Data: https://voice.mozilla.org/data
400k recordings, 500 hours of speech.

Model: https://github.com/mozilla/DeepSpeech
TensorFlow implementation of Baidu's DeepSpeech architecture.

https://deepspeech.readthedocs.io/en/latest/
DeepSpeech’s code documentation!

مرتبط با:
https://t.me/cvision/875
https://t.me/cvision/850

#speech_recognition #Tensorflow
#مجموعه_داده
مجموعه داده عظیم دادگان صوتی و گفتار که #ناسا منتشر کرده است.
حدود 19.000 ساعت گفتار ضبط شده از آپالو 11!

Massive Speech Dataset !!! 19,000 hours of Apollo-11 recordings

TASK#1: Speech Activity Detection: SAD

TASK#2: Speaker Diarization: SD

TASK#3: Speaker Identification: SID

TASK#4: Automatic Speech Recognition: ASR

TASK#5: Sentiment Detection: SENTIMENT
http://fearlesssteps.exploreapollo.org/

#NASA #speech #sentiment #dataset
Clone a voice in 5 seconds to generate arbitrary speech in real-time.
تنها ۵ ثانیه از صداتون رو به این نرم‌افزار بدید تا هر متنی که دلتون میخواد رو با صدای خودتون ایجاد کنه! البته من با CPU تست کردم احتمالا با GPU نتیجه خفن‌تری بده

#AI #voice #realtime #application #CUDA #python #pytorch #torch #artificial #intelligence #speech #neural #network

https://github.com/CorentinJ/Real-Time-Voice-Cloning

🙏Thanks to: @pythony